Dissertations / Theses: 'CUDA FRAMEWORK'

1

Dworaczyk, Wiltshire Austin Aaron. "CUDA ENHANCED FILTERING IN A PIPELINED VIDEO PROCESSING FRAMEWORK." DigitalCommons@CalPoly, 2013. https://digitalcommons.calpoly.edu/theses/1072.

Full text

Abstract:

The processing of digital video has long been a significant computational task for modern x86 processors. With every video frame composed of one to three planes, each consisting of a two-dimensional array of pixel data, and a video clip comprising of thousands of such frames, the sheer volume of data is significant. With the introduction of new high definition video formats such as 4K or stereoscopic 3D, the volume of uncompressed frame data is growing ever larger. Modern CPUs offer performance enhancements for processing digital video through SIMD instructions such as SSE2 or AVX. However, even with these instruction sets, CPUs are limited by their inherently sequential design, and can only operate on a handful of bytes in parallel. Even processors with a multitude of cores only execute on an elementary level of parallelism. GPUs provide an alternative, massively parallel architecture. GPUs differ from CPUs by providing thousands of throughput-oriented cores, instead of a maximum of tens of generalized “good enough at everything” x86 cores. The GPU’s throughput-oriented cores are far more adept at handling large arrays of pixel data, as many video filtering operations can be performed independently. This computational independence allows for pixel processing to scale across hun- dreds or even thousands of device cores. This thesis explores the utilization of GPUs for video processing, and evaluates the advantages and caveats of porting the modern video filtering framework, Vapoursynth, over to running entirely on the GPU. Compute heavy GPU-enabled video processing results in up to a 108% speedup over an SSE2-optimized, multithreaded CPU implementation.

APA, Harvard, Vancouver, ISO, and other styles

2

Karlsson, Per. "A GPU-based framework for efficient image processing." Thesis, Linköpings universitet, Medie- och Informationsteknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-112093.

Full text

Abstract:

This thesis tries to answer how to design a framework for image processing on the GPU, supporting the common environments OpenGL GLSL, OpenCL and CUDA. An generalized view of GPU image processing is presented. The framework is called gpuip and is implemented in C++ but also wrapped with Python-bindings. The framework is cross-platform and works for Windows, Mac OSX and Unix operating systems. The thesis also involves the work of creating two executable programs that uses the gpuip-framework. One of the programs has a graphical user interface and the other program is command-line only. Both programs are developed in Python. Performance tests are created to compare the GPU environments against a single core CPU implementation. All the GPU implementations in the gpuip-framework are significantly faster than the CPU when executing the presented test-cases. On average, the framework is two magnitudes faster than the single core CPU.

APA, Harvard, Vancouver, ISO, and other styles

3

Giordano, Andrea. "Sviluppo di una simulazione ad agenti di un modello di infezione virale tramite il framework FLAME GPU." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/15755/.

Full text

Abstract:

Tesi sullo sviluppo di una simulazione ad agenti di nella quale un virus infetta la popolazione. Questa simulazione è stata sviluppata mediante l'uso di FLAME GPU, un framework che permette di creare simulazioni ad agenti con codice CUDA, per eseguirle sulla GPU. L'interesse primario di questa tesi è quello di verificare le prestazioni che si possono ottenere con questo framework, all'aumentare della popolazione. Inoltre viene verificata la differenza tra due diversi tipi di implementazione della stessa simulazione, sempre con l'utilizzo di FLAME GPU, per paragonare i tempi ottenuti. Vengono infine discussi i risultati, spiegando le differenze tra le due differenti implementazioni.

APA, Harvard, Vancouver, ISO, and other styles

4

Fabian, Xavier. "Precision measurements in the weak interaction framework: development of realistic simulations for the LPCTrap device installed at GANIL." Caen, 2015. http://hal.in2p3.fr/tel-01288412.

Full text

Abstract:

Cette thèse s'inscrit dans l'effort déployé pour mesurer le paramètre de corrélation angulaire bêta-neutrino aβν dans trois décroissances bêta nucléaires (6He+, 35Ar+ et 19Ne+). La structure V-A de l'interaction faible prévoit que aβν = +1 pour les transitions de Fermi pures et aβν = -1/3 pour les transitions de Gamow-Teller pure. Une mesure fine de ce paramètre pour tester un écart à ces valeurs peut révéler l'existence de courants exotiques. Par ailleurs, la mesure de ce paramètre dans le cas de transitions mirroirs permet d'extraire le premier élément de la matrice de Cabibbo-Kobayashi-Maskawa (CKM), Vud. Le dispositif LPCTrap, installé au GANIL, est conçu pour préparer un faisceau continu d'ions à l'injection dans un piège de Paul dédié. Ce dernier permet de disposer d'une source quasi-ponctuelle à partir de laquelle les produits de désintégrations sont détectés en coïncidences. C'est par l'étude de la distribution du temps de vol des ions de recul qu'est extrait la valeur de aβν et, depuis 2010, les probabilités de Shake-Off (SO) associées. Cette étude nécessite la simulation complète des expériences LPCTrap. La majeur partie du présent travail est dédiée à de telles simulations, en particulier à la modélisation de la dynamique du nuage d'ions piégés. Le programme CLOUDA, qui profite des unités de calcul graphique (GPU), a été développé dans cette optique et sa caractérisation complète est présentée ici. Trois aspects importants sont abordés: le champ de piégeage électro-magnétique, les collisions réalistes entre les ions et les atomes de gaz tampon et l'effet de la charge d'espace. La présente étude démontre l'importance de ces simulations pour accroître le contrôle des erreurs systématiques sur aβν
This work belongs to the effort presently deployed to measure the angular correlation parameter aβν in three nuclear beta decays (6He+, 35Ar+ and 19Ne+). The V-A structure of the weak interaction implies that aβν = +1 for a pure Fermi transition and aβν = -1/3 for a pure Gamow-Teller transition. A thorough measurement of this parameter to check any deviation from these values may lead to the discovery of possible exotic currents. Furthermore, the measurement of aβν in mirror transitions allows the extraction of Vud, the first element of the Cabibbo-Kobayashi-Maskawa (CKM) matrix. The LPCTrap apparatus, installed at GANIL, is designed to ready a continuous ion beam for injection in a dedicated Paul trap. This latter device allows to have a quasi-ponctual source from which the decay products are detected in coincidence. It is from the study of the recoil ion time-of-flight (TOF) distribution that aβν is withdrawn and, since 2010, the associated Shake-Off (SO) probabilities. This study requires the complete simulation of the LPCTrap experiments. The major part of this work is dedicated to such simulations, especially to the modeling of the trapped ion cloud dynamic. The CLOUDA program, which takes advantage of graphics processing unit (GPU), was developed in this context and its full characterization is presented here. Three important aspects are addressed: the electromagnetic trapping field, the realistic collisions between the ions and the buffer gas atoms and the space charge effect. The present work shows the importance of these simulations to increase the control of the systematic errors on aβν

APA, Harvard, Vancouver, ISO, and other styles

5

Badalov, Alexey Pavlovich. "Coprocessor integration for real-time event processing in particle physics detectors." Doctoral thesis, Universitat Ramon Llull, 2016. http://hdl.handle.net/10803/396128.

Full text

Abstract:

Els experiments de física d’altes energies actuals disposen d’acceleradors amb més energía, sensors més precisos i formes més flexibles de recopilar les dades. Aquesta ràpida evolució requereix de més capacitat de càlcul; els processadors massivament paral·lels, com ara les targes acceleradores gràfiques, ens posen a l’abast aquesta major capacitat de càlcul a un cost sensiblement inferior a les CPUs tradicionals. L’ús d’aquest tipus de processadors requereix, però, de nous algoritmes i nous enfocaments de l’organització de les dades que són difícils d’integrar en els programaris actuals. En aquest treball s’exploren els problemes derivats de l’ús d’algoritmes paral·lels en els entorns de programari existents, orientats a CPUs, i es proposa una solució, en forma de servei, que comunica amb els diversos pipelines que processen els esdeveniments procedents de les col·lisions de partícules, recull les dades en lots i els envia als algoritmes corrent sobre els processadors massivament paral·lels. Aquest servei s’integra en Gaudí - l’entorn de software de dos dels quatre experiments principals del Gran Col·lisionador d’Hadrons. S’examina el sobrecost que el servei afegeix als algoritmes paral·lels. S’estudia un cas d´ùs del servei per fer una reconstrucció paral·lela de les traces detectades en el VELO Pixel, el subdetector encarregat de la detecció de vèrtex en l’upgrade de LHCb. Per aquest cas, s’observen les característiques del rendiment en funció de la mida dels lots de dades. Finalment, les conclusions en posen en el context dels requeriments del sistema de trigger de LHCb.
La física de altas energías dispone actualmente de aceleradores con energías mayores, sensores más precisos y métodos de recopilación de datos más flexibles que nunca. Su rápido progreso necesita aún más potencia de cálculo; el hardware masivamente paralelo, como las unidades de procesamiento gráfico, nos brinda esta potencia a un coste mucho más bajo que las CPUs tradicionales. Sin embargo, para usar eficientemente este hardware necesitamos algoritmos nuevos y nuevos enfoques de organización de datos difíciles de integrarse con el software existente. En este trabajo, se investiga cómo se pueden usar estos algoritmos paralelos en las infraestructuras de software ya existentes y que están orientadas a CPUs. Se propone una solución en forma de un servicio que comunica con los diversos pipelines que procesan los eventos de las correspondientes colisiones de particulas, reúne los datos en lotes y se los entrega a los algoritmos paralelos acelerados por hardware. Este servicio se integra con Gaudí — la infraestructura del entorno de software que usan dos de los cuatro gran experimentos del Gran Colisionador de Hadrones. Se examinan los costes añadidos por el servicio en los algoritmos paralelos. Se estudia un caso de uso del servicio para ejecutar un algoritmo paralelo para el VELO Pixel (el subdetector encargado de la localización de vértices en el upgrade del experimento LHCb) y se estudian las características de rendimiento de los distintos tamaños de lotes de datos. Finalmente, las conclusiones se contextualizan dentro la perspectiva de los requerimientos para el sistema de trigger de LHCb.
High-energy physics experiments today have higher energies, more accurate sensors, and more flexible means of data collection than ever before. Their rapid progress requires ever more computational power; and massively parallel hardware, such as graphics cards, holds the promise to provide this power at a much lower cost than traditional CPUs. Yet, using this hardware requires new algorithms and new approaches to organizing data that can be difficult to integrate with existing software. In this work, I explore the problem of using parallel algorithms within existing CPU-orientated frameworks and propose a compromise between the different trade-offs. The solution is a service that communicates with multiple event-processing pipelines, gathers data into batches, and submits them to hardware-accelerated parallel algorithms. I integrate this service with Gaudi — a framework underlying the software environments of two of the four major experiments at the Large Hadron Collider. I examine the overhead the service adds to parallel algorithms. I perform a case study of using the service to run a parallel track reconstruction algorithm for the LHCb experiment's prospective VELO Pixel subdetector and look at the performance characteristics of using different data batch sizes. Finally, I put the findings into perspective within the context of the LHCb trigger's requirements.

APA, Harvard, Vancouver, ISO, and other styles

6

Varadarajan, Aravind Krishnan. "Improving Bio-Inspired Frameworks." Thesis, Virginia Tech, 2018. http://hdl.handle.net/10919/97506.

Full text

Abstract:

In this thesis, we provide solutions to two different bio-inspired algorithms. The first is enhancing the performance of bio-inspired test generation for circuits described in RTL Verilog, specifically for branch coverage. We seek to improve upon an existing framework, BEACON, in terms of performance. BEACON is an Ant Colony Optimization (ACO) based test generation framework. Similar to other ACO frameworks, BEACON also has a good scope in improving performance using parallel computing. We try to exploit the available parallelism using both multi-core Central Processing Units (CPUs) and Graphics Processing Units(GPUs). Using our new multithreaded approach we can reduce test generation time by a factor of 25�-- compared to the original implementation for a wide variety of circuits. We also provide a 2-dimensional factoring method for BEACON to improve available parallelism to yield some additional speedup. The second bio-inspired algorithm we address is for Deep Neural Networks. With the increasing prevalence of Neural Nets in artificial intelligence and mission-critical applications such as self-driving cars, questions arise about its reliability and robustness. We have developed a test-generation based technique and metric to evaluate the robustness of a Neural Nets outputs based on its sensitivity to its inputs. This is done by generating inputs which the neural nets find difficult to classify but at the same time is relatively apparent to human perception. We measure the degree of difficulty for generating such inputs to calculate our metric.
MS

APA, Harvard, Vancouver, ISO, and other styles

7

Odeh, Khuloud, Annita Seckinger, and Carina Forsman-Knecht. "Connected Urban Development (CUD) Initiative as an Approach towards Sustainability in Urban Areas." Thesis, Blekinge Tekniska Högskola, Avdelningen för maskinteknik, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-3127.

Full text

Abstract:

With the increasing number of Information and Communication Technology (ICT)-based initiatives addressing sustainability in urban areas, it is important to examine the possible contributions these initiatives can make when transitioning society as a whole towards sustainability. This thesis investigates CUD‟s potential as a supportive approach to move urban areas towards sustainability, and the adjustment needed in the current strategies for alignment to a goal of global sustainability. This was accomplished by working with CUD Pilot Cities, various experts in urban development, ICT authorities and sustainability researchers. A scientific approach to the understanding of sustainability concepts provides the basis of this evaluation of the CUD initiative, the benefits and challenges, including the role of connectivity and the applicability of ICT. Within this context, recommendations were made to further improve the CUD initiative‟s effectiveness in moving urban areas towards sustainability. An ideal initiative was envisioned in relation to system boundaries and components, strategic guidelines, actions and tools - “CUD Gold” - and steps were suggested for how to make CUD more strategic in its pioneering endeavors of global urban sustainability.
Med det ökande antalet informations-och kommunikationsteknik (IKT)-baserade initiativ som riktar sig mot hållbarhet i städerna, är det viktigt att undersöka de eventuella bidrag dessa initiativ kan ge när de stödjer samhällens hållbara utveckling. Denna uppsats undersöker Connected urban developments (CUD) potential som initiativ och dess stödjande strategi för städers hållbara utveckling. Förslag till justeringar av CUDs strategi studeras och rekommendationer för bättre anpassning av nuvarande strategier till CUDs nya mål ges. Orginalstrategierna skrevs utifrån orginalmålet, reducering av koldioxidutsläpp och behövde justeras för att bättre inriktas mot det nya målet, global hållbarhet. Detta uppnåddes genom att studera Connected Urban Development som organisation och genom att intervjua representanter från CUDs organisation, representanter för Pilotprojektsstäder, olika experter på stadsplanering, IT och hållbar utveckling (både forskare och praktiker). Ett vetenskapligt förhållningssätt till kunskap om hållbarhetsbegrepp utgör grunden för denna utvärdering av CUD-initiativet, dess fördelar och utmaningar, inklusive rollen för bredbandsuppkoppling och tillämpning av informations-och kommunikationsteknik. I detta sammanhang ges rekommendationer för att ytterligare förbättra CUD-initiativets effektivitet gällande städers hållbara utveckling. Gruppens forskning utgick från ett föreställt idealiskt initiativ i förhållande till systemets gränser och komponenter, strategiska riktlinjer, åtgärder och verktyg - "CUD Gold" och åtgärder föreslås för att göra CUD mer strategiska i sin banbrytande ansträngning för att stödja hållbara städer på global nivå.

Carina Forsman-Knecht S. Bellevuevägen 2 371 61 Lyckeby E-mail: cinaknecht@gmail.com US +1 435 503 8460 Sweden +46 (0)733 629951 Skype: cinaknecht Annita Seckinger 10620 Barnwood Lane Potomac Maryland 20854 email: a2ndger@yahoo.com Khuloud Odeh address: 2501 Calvert St. NW Apt. 401, Washington, DC, 20008, USA phones: home +1-202-332-1103, mobile:+1 301-768-1886 email: khuloud.odeh@gmail.com

APA, Harvard, Vancouver, ISO, and other styles

8

Awan, Ammar Ahmad. "Co-designing Communication Middleware and Deep Learning Frameworks for High-Performance DNN Training on HPC Systems." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1587433770960088.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Oliveira, Danilo Senen Cavallieri de. "Fintechs e inclusão financeira: o caso da implementação de uma plataforma digital de pagamentos em favelas do Rio de Janeiro e São Paulo." reponame:Repositório Institucional do FGV, 2018. http://hdl.handle.net/10438/23940.

Full text

Abstract:

Submitted by Danilo Cavallieri-de-Oliveira (danilosenen@gmail.com) on 2018-05-14T20:06:39Z No. of bitstreams: 1 180313DissertaçãoDSCOvffsent.pdf: 1907936 bytes, checksum: ed2fa17510013b1a651ad18d4b0119df (MD5)
Approved for entry into archive by Debora Nunes Ferreira (debora.nunes@fgv.br) on 2018-05-16T19:31:18Z (GMT) No. of bitstreams: 1 180313DissertaçãoDSCOvffsent.pdf: 1907936 bytes, checksum: ed2fa17510013b1a651ad18d4b0119df (MD5)
Rejected by Suzane Guimarães (suzane.guimaraes@fgv.br), reason: Prezado Danilo, Rejeitamos a sua submissão pois o nome do autor foi alterado na ficha catalográfica, sendo assim é necessário fazer a correção e submeter o arquivo novamente. Por gentileza utilizar a ficha enviada pela biblioteca inserindo nela apenas o número de folhas do seu trabalho. Quaisquer dúvidas entrar em contato com o telefone 11 3799-7732. Estamos à disposição! on 2018-05-17T14:12:33Z (GMT)
Submitted by Danilo Cavallieri-de-Oliveira (danilosenen@gmail.com) on 2018-05-17T19:34:32Z No. of bitstreams: 1 180313DissertaçãoDSCOvffsent.pdf: 1890653 bytes, checksum: 76d0e733d4ccf4c138d6c56f736fe490 (MD5)
Approved for entry into archive by Debora Nunes Ferreira (debora.nunes@fgv.br) on 2018-05-22T17:03:48Z (GMT) No. of bitstreams: 1 180313DissertaçãoDSCOvffsent.pdf: 1890653 bytes, checksum: 76d0e733d4ccf4c138d6c56f736fe490 (MD5)
Approved for entry into archive by Suzane Guimarães (suzane.guimaraes@fgv.br) on 2018-05-22T17:23:29Z (GMT) No. of bitstreams: 1 180313DissertaçãoDSCOvffsent.pdf: 1890653 bytes, checksum: 76d0e733d4ccf4c138d6c56f736fe490 (MD5)
Made available in DSpace on 2018-05-22T17:23:29Z (GMT). No. of bitstreams: 1 180313DissertaçãoDSCOvffsent.pdf: 1890653 bytes, checksum: 76d0e733d4ccf4c138d6c56f736fe490 (MD5) Previous issue date: 2018-03-13
O presente estudo visa responder à pergunta de pesquisa: 'Como ocorre o processo de implantação de uma plataforma digital de pagamentos, desenvolvida por uma fintech, que visa promover a inclusão financeira?'. Para isso, foi analisado o caso do CUFA Card, plataforma digital de pagamentos implantada na favela Parque União, parte do Complexo da Maré no Rio de Janeiro, e na comunidade de Heliópolis, em São Paulo. Compõem o presente trabalho uma revisão de literatura, que nos permite compreender como as fintechs podem ser uma oportunidade para se promover a inclusão financeira; e um estudo de caso, onde foram realizadas entrevistas com os grupos sociais envolvidos na criação ou implantação dessa plataforma, as quais foram transcritas, codificadas e analisadas utilizando o software Atlas TI©. Como principal contribuição, temos a análise de como ocorre a implantação de uma plataforma digital de pagamentos, que visa promover a inclusão financeira a luz da implantação do CUFA Card, bem como com a descrição de como ocorreu a articulação entre diferentes grupos sociais para viabilização do projeto e explicitação do conteúdo resultante da implantação dessa tecnologia, a qual é derivada da parceria entre a fintech Conta Um e a organização FHolding/CUFA. Para isso utilizou-se do multilevel framework de Pozzebon, Diniz e Jayo (2009), uma teoria nativa do campo de sistemas da informação, que possibilitou um melhor entendimento sobre o caso estudado, ao analisar, concomitantemente, aspectos tecnológicos e sociais da implantação dessa da plataforma em um dado contexto onde essa está sendo inserida. O estudo contribui também com a geração de insights para pesquisas futuras e para a prática, ao estudar a relação entre fintechs e inclusão financeira, um tema ainda incipiente na literatura, e ao analisar o processo de implantação da plataforma de pagamento, destacando questões cruciais desse processo.
The present study aims to answer the research question: 'How does the process of implementing a digital payments platform, developed by a fintech, that aims to pro-mote financial inclusion occurs?'. For that, was analyzed the case of CUFA Card, a digital payment platform implemented in the Parque União favela, part of the Com-plexo da Maré in Rio de Janeiro, and in the community of Heliópolis, in São Paulo. The present work compiles a literature review, which allows us to understand how fintechs can be an opportunity to promote financial inclusion; and a case study where interviews were conducted with the social groups involved in the creation or imple-mentation of this platform, that were transcribed, coded and analyzed using Atlas TI © software. As a main contribution, we have the analysis on how this process occurs in the light of the CUFA Card implementation, as well as describing how the articulation between different social groups occurred to project feasibility and explicit content resulting from the implementation of this technology, which is derived from the partnership between the fintech Conta Um and the FHolding /CUFA organization. The work also brings as contributions the articulation of the multilevel framework, by Pozzebon, Diniz and Jayo (2009) and which is native to the information systems field, that made possible a better understanding of the case studied, while simultaneously analyzing technological and social aspects of the implementation of this platform in a given context where it is being inserted. It also contributes to the generation of in-sights for future research and practice, studying the relationship between fintechs and financial inclusion, a still incipient topic in the literature, and analysing the implementation process of a payment platform, highlighting what is the crucial issues in this process.

APA, Harvard, Vancouver, ISO, and other styles

10

Chen, Yu-Wen, and 陳郁文. "Online Derivatives Arbitrage Trading Mechanism Based on CUDA Framework." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/35874995565142123742.

Full text

Abstract:

碩士
國立交通大學
資訊科學與工程研究所
104
Parallel Computing denotes a technique to simultaneously process a huge amount of data with low dependency by multiple processing units. In other words, we can divide a complex problem or a huge data set into many small independent problems or small data chunks, and reduce the overall computational time through allocating these problems to different process units in the same time. High frequency trading is becoming important in financial markets and the ability to deal a huge amount of financial trading data in real time is thus critical. This thesis apply parallel computing technique to search for arbitrage opportunities and design trading strategies for TAIEX options and futures. Usually, arbitrage opportunity comes from occasionally irrational price quotes. In highly competitive and mature markets, arbitrage opportunities are not only extremely rare but also fleeting. Therefore, the technique which can process a great number of data rapidly such as Parallel Computing is very suitable for finding arbitrage opportunity. This research revises the framework of [5]. I implement the following arbitrage strategies: convexity strategy and put-call-future parity strategy, and we have introduced spread strategy in my framework to seek arbitrage opportunities in TAIEX Exchange of Futures. Besides, the off-line framework that uses virtual exchange to simulate tradings. I add the online real-time trading mode which can receive price quotes from a remote server and send back the encoded strategies through a TCP channel.

APA, Harvard, Vancouver, ISO, and other styles

11

SUN, PO-YUAN, and 孫伯元. "A Nonlinear Dynamic Analysis Acceleration Framework Using CUDA And OpenMP." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/d2cvf8.

Full text

Abstract:

碩士
國立臺北科技大學
土木工程系土木與防災碩士班
107
To verify the aseismatic capability of a structure requires many types of performance tests. One of the testing type named real-time hybrid testing, requires real-time numerical analysis (based on finite element analysis) running concurrently during the test. However, nonlinear dynamic structural analysis leads to huge computation and takes a lot of time. This work is aiming to accelerate real-time numerical analysis. Based on an open-source structural analysis program OpenSees, this framework can use CUDA and OpenMP for parallel computation and is suitable for multicore or SIMD architecture hardware. Expensive runtime routines like forming stiffness matrix and updating numeric model are distributed to multicore CPUs and GPUs to reduce calculation time in each time step.

APA, Harvard, Vancouver, ISO, and other styles

12

Stinson, Derek L. "Deep Learning with Go." Thesis, 2020. http://hdl.handle.net/1805/22729.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI)
Current research in deep learning is primarily focused on using Python as a support language. Go, an emerging language, that has many benefits including native support for concurrency has seen a rise in adoption over the past few years. However, this language is not widely used to develop learning models due to the lack of supporting libraries and frameworks for model development. In this thesis, the use of Go for the development of neural network models in general and convolution neural networks is explored. The proposed study is based on a Go-CUDA implementation of neural network models called GoCuNets. This implementation is then compared to a Go-CPU deep learning implementation that takes advantage of Go's built in concurrency called ConvNetGo. A comparison of these two implementations shows a significant performance gain when using GoCuNets compared to ConvNetGo.

APA, Harvard, Vancouver, ISO, and other styles

13

ARUN. "HUMAN EMOTION RECOGNITION USING DEEO LEARNING TECHNIQUES." Thesis, 2017. http://dspace.dtu.ac.in:8080/jspui/handle/repository/16003.

Full text

Abstract:

Human emotion recognition plays an important role in the interpersonal relationship. The automatic recognition of emotions has been an active research topic from early eras. Therefore, there are several advances made in this field. Emotions are reflected from speech, hand and gestures of the body and through facial expressions. Hence extracting and understanding of emotion has a high importance of the interaction between human and machine communication. The clinical, emotionless computer or robot is a staple of science fiction, but science fact is starting to change: computers are getting much better at understanding emotions. Automated customer service “bots” will be better able to know if a customer is getting the help they need. Robot caregivers involved with telemedicine may be able to detect pain or depression even if the patient doesn’t explicitly talk about it. Insurance companies are even experimenting with call voice analytics that can detect that someone is telling lies to their claims handers. This project will use deep learning techniques to detect human emotions from faces, since face is the prime source for recognizing human emotions. In particular, we used convolutional neural network(CNN) as the deep learning technique. Network was designed in Python language with the help of deep learning library by Google called TensorFlow without the CUDA framework.

APA, Harvard, Vancouver, ISO, and other styles

14

(8812109), Derek Leigh Stinson. "Deep Learning with Go." Thesis, 2020.

Find full text

Abstract:

Current research in deep learning is primarily focused on using Python as a support language. Go, an emerging language, that has many benefits including native support for concurrency has seen a rise in adoption over the past few years. However, this language is not widely used to develop learning models due to the lack of supporting libraries and frameworks for model development. In this thesis, the use of Go for the development of neural network models in general and convolution neural networks is explored. The proposed study is based on a Go-CUDA implementation of neural network models called GoCuNets. This implementation is then compared to a Go-CPU deep learning implementation that takes advantage of Go's built in concurrency called ConvNetGo. A comparison of these two implementations shows a significant performance gain when using GoCuNets compared to ConvNetGo.

APA, Harvard, Vancouver, ISO, and other styles

15

Fraga, António Fernando Crisóstomo. "Parallel Face Detection." Master's thesis, 2020. http://hdl.handle.net/10316/94026.

Full text

Abstract:

Dissertação de Mestrado Integrado em Engenharia Electrotécnica e de Computadores apresentada à Faculdade de Ciências e Tecnologia
O reconhecimento de faces em imagens é atualmente feito em grande escala e as imagens utilizadas tende a ser cada vez mais de resolução mais elevadas. Isto pode ser um desafio complicado em arquiteturas sequenciais, pois, com o aumento do número total de pixels das imagens, o desempenho geral desse tipo de implementações tende a diminuir drasticamente. A tese apresentada descreve a implementação de uma framework baseada no artigo Viola-Jones “Rapid Object Detection using a Boosted Cascade of Simple Features” [2]. Desta forma, as arquiteturas paralelas (GPUs e GPUs de baixo consumo), emergem como a solução ideal já que oferecem elevados valores de poder computacional e números de cores que beneficiam o processamento de grandes quantidades de data em paralelo. Utilizando, assim, as vantagens destas arquiteturas para uma paralelização e otimização específica a esta implementação, obtendo, portanto, uma melhoria significativa na performance em comparação a arquiteturas sequenciais em imagens de alta resolução. Por sua vez, também é realizada uma análise dos resultados desta implementação, que acaba por ser bem-sucedida em diversas GPUs, com o objetivo de fazer uma análise conclusiva da influência dos recursos de GPU disponíveis (Power, CUDA cores, etc.) na aceleração geral da GPU. De referir ainda que este detetor de caras baseado em arquiteturas paralelas foi capaz de obter uma aceleração global de até 33 vezes superior em imagens de 8k em comparação com a versão sequencial inicialmente implementada.
Face detection is typically used millions of times per day in many different contexts and the resolution of the images has seen a significant increase. These high-resolution images can be a very defiant challenge in sequentially based architecture since with the rise in the number of pixels the overall performance of this type of implementation decreases drastically.The following paper describes the implementation of a framework of the Viola-Jones “Rapid Object Detection using a Boosted Cascade of Simple Features” [2] in parallel architectures such as GPUs and low-power GPUs. They emerge as natural candidates for the acceleration that we seek, offering a very high computational power and core numbers that enable the process of such large amounts of data in parallelIt also shows the parallelization and optimization of the implementation utilizing the advantages offered by these architectures to achieve an overall performance boost and speedup in high-resolution images when comparing to sequential architectures. An analysis of the results shows the successful implementation and the influence that the GPU resources available (Power, CUDA cores, etc.) have on the overall GPU speedup as well as in its performance. This parallel face detector implementation was able to obtain a global speedup as high as 33 times in 8k images in comparison with the sequential version. An analysis of the results shows the successful implementation and the influence that the GPU resources available (Power, CUDA cores, etc.) have on the overall GPU speedup as well as in its performance. This parallel face detector implementation was able to obtain a global speedup as high as 33 times in 8k images in comparison with the sequential version.

APA, Harvard, Vancouver, ISO, and other styles

16

Park, Yubin. "CUDIA : a probabilistic cross-level imputation framework using individual auxiliary information." Thesis, 2011. http://hdl.handle.net/2152/ETD-UT-2011-12-4746.

Full text

Abstract:

In healthcare-related studies, individual patient or hospital data are not often publicly available due to privacy restrictions, legal issues or reporting norms. However, such measures may be provided at a higher or more aggregated level, such as state-level, county-level summaries or averages over health zones such as Hospital Referral Regions (HRR) or Hospital Service Areas (HSA). Such levels constitute partitions over the underlying individual level data, which may not match the groupings that would have been obtained if one clustered the data based on individual-level attributes. Moreover, treating aggregated values as representatives for the individuals can result in the ecological fallacy. How can one run data mining procedures on such data where different variables are available at different levels of aggregation or granularity? In this thesis, we seek a better utilization of variably aggregated datasets, which are possibly assembled from different sources. We propose a novel "cross-level" imputation technique that models the generative process of such datasets using a Bayesian directed graphical model. The imputation is based on the underlying data distribution and is shown to be unbiased. This imputation can be further utilized in a subsequent predictive modeling, yielding improved accuracies. The experimental results using a simulated dataset and the Behavioral Risk Factor Surveillance System (BRFSS) dataset are provided to illustrate the generality and capabilities of the proposed framework.
text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'CUDA FRAMEWORK'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles