Дисертації з теми "High-performance, graph processing, GPU"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-36 дисертацій для дослідження на тему "High-performance, graph processing, GPU".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Segura, Salvador Albert. "High-performance and energy-efficient irregular graph processing on GPU architectures." Doctoral thesis, Universitat Politècnica de Catalunya, 2021. http://hdl.handle.net/10803/671449.
Повний текст джерелаEl processament de grafs és un domini prominent i establert com a la base de noves aplicacions emergents en àrees com l'anàlisi de dades i Machine Learning, que permeten aplicacions com ara navegació per carretera, xarxes socials i reconeixement automàtic de veu. La gran quantitat de dades emprades en aquests dominis requereix d’arquitectures d’alt rendiment, com ara GPGPU. Tot i que el processament de grans càrregues de treball basades en grafs presenta un alt grau de paral·lelisme, els patrons d’accés a la memòria tendeixen a ser irregulars, fet que redueix l’eficiència a causa de la divergència d’accessos a memòria. Per tal de millorar aquests problemes, les aplicacions de grafs per a GPGPU realitzen operacions de stream compaction que processen nodes/arestes per tal que els passos posteriors funcionin en un conjunt de dades compactat. Proposem deslliurar d’aquesta tasca a la extensió hardware Stream Compaction Unit (SCU) adaptada als requisits d’aquestes operacions, que a més realitza un pre-processament filtrant i reordenant els elements processats.Mostrem que les ineficiències de divergència de memòria prevalen en aplicacions GPGPU basades en grafs irregulars, tot i que trobem que és possible relaxar la relació estricta entre threads i les dades processades per obtenir noves optimitzacions. Com a tal, proposem la Irregular accesses Reorder Unit (IRU), una nova extensió de maquinari integrada al pipeline de la GPU que reordena i filtra les dades processades pels threads en accessos irregulars que milloren la convergència d’accessos a memòria. Finalment, aprofitem els punts forts de les propostes anteriors per aconseguir millores sinèrgiques. Ho fem proposant la IRU-enhanced SCU (ISCU), que utilitza els mecanismes de pre-processament eficients de la IRU per millorar l’eficiència de stream compaction de la SCU i les limitacions de rendiment de NoC a causa de les operacions de pre-processament de la SCU.
McLaughlin, Adam Thomas. "Power-constrained performance optimization of GPU graph traversal." Thesis, Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/50209.
Повний текст джерелаLee, Dongwon. "High-performance computer system architectures for embedded computing." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/42766.
Повний текст джерелаSedaghati, Mokhtari Naseraddin. "Performance Optimization of Memory-Bound Programs on Data Parallel Accelerators." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1452255686.
Повний текст джерелаHong, Changwan. "Code Optimization on GPUs." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1557123832601533.
Повний текст джерелаHassan, Mohamed Wasfy Abdelfattah. "Using Workload Characterization to Guide High Performance Graph Processing." Diss., Virginia Tech, 2021. http://hdl.handle.net/10919/103469.
Повний текст джерелаDoctor of Philosophy
Graph processing is a very important application domain, which is emphasized by the fact that many real-world problems can be represented as graph applications. For instance, looking at the internet, web pages can be represented as the graph vertices while hyper links between them represent the edges. Analyzing these types of graphs is used for web search engines, ranking websites, and network analysis among other uses. However, graph processing is computationally demanding and very challenging to optimize. This is due to the irregular nature of graph problems, which can be characterized by frequent indirect memory accesses. Such a memory access pattern is dependent on the data input and impossible to predict, which renders CPUs' sophisticated caching policies useless to performance. With the rise of heterogeneous computing that enabled using hardware accelerators, a new research area was born, attempting to maximize performance by utilizing the available hardware devices in a heterogeneous ecosystem. This dissertation aims to improve the efficiency of utilizing such heterogeneous systems when targeting graph applications. More specifically, this research focuses on the collaboration of CPUs and FPGAs (Field Programmable Gate Arrays) in a CPU-FPGA hybrid system. Innovative ideas are presented to exploit the strengths of each available device in such a heterogeneous system, as well as addressing some of the inherent challenges of graph processing. Automated frameworks are introduced to efficiently utilize the FPGA devices, in addition to distributing and scheduling the workload across multiple devices to maximize the performance of graph applications.
Smith, Michael Shawn. "Performance Analysis of Hybrid CPU/GPU Environments." PDXScholar, 2010. https://pdxscholar.library.pdx.edu/open_access_etds/300.
Повний текст джерелаCyrus, Sam. "Fast Computation on Processing Data Warehousing Queries on GPU Devices." Scholar Commons, 2016. http://scholarcommons.usf.edu/etd/6214.
Повний текст джерелаMadduri, Kamesh. "A high-performance framework for analyzing massive complex networks." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/24712.
Повний текст джерелаCommittee Chair: Bader, David; Committee Member: Berry, Jonathan; Committee Member: Fujimoto, Richard; Committee Member: Saini, Subhash; Committee Member: Vuduc, Richard
Hordemann, Glen J. "Exploring High Performance SQL Databases with Graphics Processing Units." Bowling Green State University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1380125703.
Повний текст джерелаCollet, Julien. "Exploration of parallel graph-processing algorithms on distributed architectures." Thesis, Compiègne, 2017. http://www.theses.fr/2017COMP2391/document.
Повний текст джерелаWith the advent of ever-increasing graph datasets in a large number of domains, parallel graph-processing applications deployed on distributed architectures are more and more needed to cope with the growing demand for memory and compute resources. Though large-scale distributed architectures are available, notably in the High-Performance Computing (HPC) domain, the programming and deployment complexity of such graphprocessing algorithms, whose parallelization and complexity are highly data-dependent, hamper usability. Moreover, the difficult evaluation of performance behaviors of these applications complexifies the assessment of the relevance of the used architecture. With this in mind, this thesis work deals with the exploration of graph-processing algorithms on distributed architectures, notably using GraphLab, a state of the art graphprocessing framework. Two use-cases are considered. For each, a parallel implementation is proposed and deployed on several distributed architectures of varying scales. This study highlights operating ranges, which can eventually be leveraged to appropriately select a relevant operating point with respect to the datasets processed and used cluster nodes. Further study enables a performance comparison of commodity cluster architectures and higher-end compute servers using the two use-cases previously developed. This study highlights the particular relevance of using clustered commodity workstations, which are considerably cheaper and simpler with respect to node architecture, over higher-end systems in this applicative context. Then, this thesis work explores how performance studies are helpful in cluster design for graph-processing. In particular, studying throughput performances of a graph-processing system gives fruitful insights for further node architecture improvements. Moreover, this work shows that a more in-depth performance analysis can lead to guidelines for the appropriate sizing of a cluster for a given workload, paving the way toward resource allocation for graph-processing. Finally, hardware improvements for next generations of graph-processing servers areproposed and evaluated. A flash-based victim-swap mechanism is proposed for the mitigation of unwanted overloaded operations. Then, the relevance of ARM-based microservers for graph-processing is investigated with a port of GraphLab on a NVIDIA TX2-based architecture
Ling, Cheng. "High performance bioinformatics and computational biology on general-purpose graphics processing units." Thesis, University of Edinburgh, 2012. http://hdl.handle.net/1842/6260.
Повний текст джерелаCooper, Lee Alex Donald. "High Performance Image Analysis for Large Histological Datasets." The Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=osu1250004647.
Повний текст джерелаShanmugam, Sakthivadivel Saravanakumar. "Fast-NetMF: Graph Embedding Generation on Single GPU and Multi-core CPUs with NetMF." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1557162076041442.
Повний текст джерелаAbu, Doleh Anas. "High Performance and Scalable Matching and Assembly of Biological Sequences." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1469092998.
Повний текст джерелаHenriksson, Jonas. "Implementation of a real-time Fast Fourier Transform on a Graphics Processing Unit with data streamed from a high-performance digitizer." Thesis, Linköpings universitet, Programvara och system, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-113389.
Повний текст джерелаBanihashemi, Seyed Parsa. "Parallel explicit FEM algorithms using GPU's." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54391.
Повний текст джерелаNedel, Werner Mauricio. "Analise dos efeitos de falhas transientes no conjunto de banco de registradores em unidades gráficas de processamento." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2015. http://hdl.handle.net/10183/140441.
Повний текст джерелаGraphic Process Units (GPUs) are specialized massively parallel units that are widely used due to their high computing processing capability with respective lower costs. The ability to rapidly manipulate high amounts of memory simultaneously makes them suitable for solving computer-intensive problems, such as analysis of air traffic control, academic researches, image processing and others. General-Purpose Graphic Processing Units (GPGPUs) designates the use of GPUs in applications commonly handled by Central Processing Units (CPUs). The rapid proliferation of GPUs due to the advent of significant programming support has brought programmers to use such devices in safety critical applications, like automotive, space and medical. This crescent use of GPUs pushed developers to explore its parallel architecture and proposing new implementations of such devices. The FLEXible GRaphics Processor (FlexGrip) is an example of GPGPU optimized for Field Programmable Arrays (FPGAs) implementation, fully compatible with GPU’s compiled programs. The increasing demand for computational has pushed GPUs to be built in cuttingedge technology down to 28nm fabrication process for the latest NVIDIA devices with operating clock frequencies up to 1GHz. The increases in operating frequencies and transistor density combined with the reduction of voltage supplies have made transistors more susceptible to faults caused by radiation. The program model adopted by GPUs makes constant accesses to its memories and registers, making this device sensible to transient perturbations in its stored values. These perturbations are called Single Event Upset (SEU), or just bit-flip, and might cause the system to experience an error. The main goal of this work is to study the behavior of the GPGPU FlexGrip under the presence of SEUs in a range of applications. The distribution of computational resources of the GPUs and its impact in the GPU confiability is also explored, as well as the characterization of the errors observed in the fault injection campaigns. Results can indicate efficient configurations of GPUs in order to avoid perturbations in the system under the presence of SEUs.
Vitor, Giovani Bernardes 1985. "Rastreamento de alvo móvel em mono-visão aplicado no sistema de navegação autônoma utilizando GPU." [s.n.], 2010. http://repositorio.unicamp.br/jspui/handle/REPOSIP/264975.
Повний текст джерелаDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Mecânica
Made available in DSpace on 2018-08-16T19:38:32Z (GMT). No. of bitstreams: 1 Vitor_GiovaniBernardes_M.pdf: 6258094 bytes, checksum: fbd34947eb1efdce50b97b27f56c1920 (MD5) Previous issue date: 2010
Resumo: O sistema de visão computacional é bastante útil em diversas aplicações de veículos autônomos, como em geração de mapas, desvio de obstáculos, tarefas de posicionamento e rastreamento de alvos. Além disso, a visão computacional pode proporcionar um ganho significativo na confiabilidade, versatilidade e precisão das tarefas robóticas, questões cruciais na maioria das aplicações reais. O presente trabalho tem como objetivo principal o desenvolvimento de uma metodologia de controle servo visual em veículos robóticos terrestres para a realização de rastreamento e perseguição de um alvo. O procedimento de rastreamento é baseado na correspondência da região alvo entre a seqüência de imagens, e a perseguição pela geração do movimento de navegação baseado nas informações da região alvo. Dentre os aspectos que contribuem para a solução do procedimento de rastreamento proposto, considera-se o uso das técnicas de processamento de imagens como filtro KNN, filtro Sobel, filtro HMIN e transformada Watershed que unidas proporcionam a robustez desejada para a solução. No entanto, esta não é uma técnica compatível com sistema de tempo real. Deste modo, tais algoritmos foram modelados para processamento paralelo em placas gráficas utilizando CUDA. Experimentos em ambientes reais foram analisados, apresentando diversos resultados para o procedimento de rastreamento, bem como validando a utilização das GPU's para acelerar o processamento do sistema de visão computacional
Abstract: The computer vision system is useful in several applications of autonomous vehicles, such as map generation, obstacle avoidance tasks, positioning tasks and target tracking. Furthermore, computer vision can provide a significant gain in reliability, versatility and accuracy of robotic tasks, which are important concerns in most applications. The present work aims at the development of a visual servo control method in ground robotic vehicles to perform tracking and follow of a target. The procedure for tracking is based on the correspondence between the target region sequence of images, and persecution by the generation of motion based navigation of information from target region. Among the aspects that contribute to the solution of the proposed tracking procedure, we consider the use of imaging techniques such as KNN filter, Sobel filter, HMIN filter and Watershed transform that together provide the desired robustness for the solution. However, this is not a technique compatible with real-time system. Thus, these algorithms were modeled for parallel processing on graphics cards using CUDA. Experiments in real environments were analyzed showed different results for the procedure for tracking and validating the use of GPU's to accelerate the processing of computer vision system
Mestrado
Mecanica dos Sólidos e Projeto Mecanico
Mestre em Engenharia Mecânica
PETRINI, ALESSANDRO. "HIGH PERFORMANCE COMPUTING MACHINE LEARNING METHODS FOR PRECISION MEDICINE." Doctoral thesis, Università degli Studi di Milano, 2021. http://hdl.handle.net/2434/817104.
Повний текст джерелаPrecision Medicine is a new paradigm which is reshaping several aspects of clinical practice, representing a major departure from the "one size fits all" approach in diagnosis and prevention featured in classical medicine. Its main goal is to find personalized prevention measures and treatments, on the basis of the personal history, lifestyle and specific genetic factors of each individual. Three factors contributed to the rapid rise of Precision Medicine approaches: the ability to quickly and cheaply generate a vast amount of biological and omics data, mainly thanks to Next-Generation Sequencing; the ability to efficiently access this vast amount of data, under the Big Data paradigm; the ability to automatically extract relevant information from data, thanks to innovative and highly sophisticated data processing analytical techniques. Machine Learning in recent years revolutionized data analysis and predictive inference, influencing almost every field of research. Moreover, high-throughput bio-technologies posed additional challenges to effectively manage and process Big Data in Medicine, requiring novel specialized Machine Learning methods and High Performance Computing techniques well-tailored to process and extract knowledge from big bio-medical data. In this thesis we present three High Performance Computing Machine Learning techniques that have been designed and developed for tackling three fundamental and still open questions in the context of Precision and Genomic Medicine: i) identification of pathogenic and deleterious genomic variants among the "sea" of neutral variants in the non-coding regions of the DNA; ii) detection of the activity of regulatory regions across different cell lines and tissues; iii) automatic protein function prediction and drug repurposing in the context of biomolecular networks. For the first problem we developed parSMURF, a novel hyper-ensemble method able to deal with the huge data imbalance that characterizes the detection of pathogenic variants in the non-coding regulatory regions of the human genome. We implemented this approach with highly parallel computational techniques using supercomputing resources at CINECA (Marconi – KNL) and HPC Center Stuttgart (HLRS Apollo HAWK), obtaining state-of-the-art results. For the second problem we developed Deep Feed Forward and Deep Convolutional Neural Networks to respectively process epigenetic and DNA sequence data to detect active promoters and enhancers in specific tissues at genome-wide level using GPU devices to parallelize the computation. Finally we developed scalable semi-supervised graph-based Machine Learning algorithms based on parametrized Hopfield Networks to process in parallel using GPU devices large biological graphs, using a parallel coloring method that improves the classical Luby greedy algorithm. We also present ongoing extensions of parSMURF, very recently awarded by the Partnership for Advance in Computing in Europe (PRACE) consortium to further develop the algorithm, apply them to huge genomic data and embed its results into Genomiser, a state-of-the-art computational tool for the detection of pathogenic variants associated with Mendelian genetic diseases, in the context of an international collaboration with the Jackson Lab for Genomic Medicine.
Lundgren, Jacob. "Pricing of American Options by Adaptive Tree Methods on GPUs." Thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-265257.
Повний текст джерелаPrades, Gasulla Javier. "Improving Performance and Energy Efficiency of Heterogeneous Systems with rCUDA." Doctoral thesis, Universitat Politècnica de València, 2021. http://hdl.handle.net/10251/168081.
Повний текст джерела[CAT] En l'última dècada la utilització de la GPGPU(General Purpose computing in Graphics Processing Units; Computació de Propòsit General en Unitats de Processament Gràfic) s'ha tornat extremadament popular en els centres de dades de tot el món. Les GPUs (Graphics Processing Units; Unitats de Processament Gràfic) s'han establert com a elements acceleradors de còmput que s'utilitzen al costat de les CPUs formant sistemes heterogenis. La naturalesa massivament paral·lela de les GPUs, destinades tradicionalment al còmput de gràfics, permet realitzar operacions numèriques amb matrius de dades a gran velocitat degut al gran nombre de nuclis que integren i al gran ample de banda d'accés a memòria que posseeixen. En conseqüència, les aplicacions de tot tipus de camps, com ara química, física, enginyeria, intel·ligència artificial, ciència de materials, etc. que presenten aquest tipus de patrons de còmput es veuen beneficiades reduint dràsticament el seu temps d'execució. En general, l'ús de l'acceleració del còmput en GPUs ha significat un pas endavant i una revolució, però no està exempt de problemes, com ara poden ser problemes d'eficiència energètica, baixa utilització de les GPUs, alts costos d'adquisició i manteniment, etc. En aquesta tesi pretenem analitzar les principals mancances que presenten aquests sistemes heterogenis i proposar solucions basades en l'ús de la virtualització remota de GPUs. Per a això hem utilitzat l'eina rCUDA, desenvolupada a la Universitat Politècnica de València, ja que multitud de publicacions l'avalen com el framework de virtualització remota de GPUs més avançat de l'actualitat. Els resultats obtinguts en aquesta tesi mostren que l'ús de rCUDA en entorns de Cloud Computing incrementa el grau de llibertat del sistema, ja que permet crear instàncies virtuals de les GPUs físiques totalment a mida de les necessitats de cadascuna de les màquines virtuals. En entorns HPC (High Performance Computing; Computació d'Altes Prestacions), rCUDA també proporciona un major grau de flexibilitat en l'ús de les GPUs de tot el clúster de còmput, ja que permet desacoblar totalment la part CPU de la part GPU de les aplicacions. A més, les GPUs poden estar en qualsevol node del clúster, sense importar el node en el qual s'està executant la part CPU de l'aplicació. En general, tant per a Cloud Computing com en el cas del HPC, aquest major grau de flexibilitat es tradueix en un augment fins 2x de la productivitat de tot el sistema al mateix temps que es redueix el consum energètic en aproximadament un 15%. Finalment, també hem desenvolupat un mecanisme de migració de treballs de la part GPU de les aplicacions que ha estat integrat dins del framework rCUDA. Aquest mecanisme de migració ha estat avaluat i els resultats mostren clarament que, a canvi d'una petita sobrecàrrega, al voltant de 400 mil·lisegons, en el temps d'execució de les aplicacions, és una potent eina amb la qual, de nou, augmentar la productivitat i reduir la despesa energètica de sistema. En resum, en aquesta tesi s'analitzen els principals problemes derivats de l'ús de les GPUs com acceleradors de còmput, tant en entorns HPC com de Cloud Computing, i es demostra com a través de l'ús del framework rCUDA, aquests problemes poden solucionar-se. A més es desenvolupa un potent mecanisme de migració de treballs GPU, que integrat dins del framework rCUDA, esdevé una eina clau per als futurs planificadors de treballs en clústers heterogenis.
[EN] In the last decade the use of GPGPU (General Purpose computing in Graphics Processing Units) has become extremely popular in data centers around the world. GPUs (Graphics Processing Units) have been established as computational accelerators that are used alongside CPUs to form heterogeneous systems. The massively parallel nature of GPUs, traditionally intended for graphics computing, allows to perform numerical operations with data arrays at high speed. This is achieved thanks to the large number of cores GPUs integrate and the large bandwidth of memory access. Consequently, applications of all kinds of fields, such as chemistry, physics, engineering, artificial intelligence, materials science, and so on, presenting this type of computational patterns are benefited by drastically reducing their execution time. In general, the use of computing acceleration provided by GPUs has meant a step forward and a revolution, but it is not without problems, such as energy efficiency problems, low utilization of GPUs, high acquisition and maintenance costs, etc. In this PhD thesis we aim to analyze the main shortcomings of these heterogeneous systems and propose solutions based on the use of remote GPU virtualization. To that end, we have used the rCUDA middleware, developed at Universitat Politècnica de València. Many publications support rCUDA as the most advanced remote GPU virtualization framework nowadays. The results obtained in this PhD thesis show that the use of rCUDA in Cloud Computing environments increases the degree of freedom of the system, as it allows to create virtual instances of the physical GPUs fully tailored to the needs of each of the virtual machines. In HPC (High Performance Computing) environments, rCUDA also provides a greater degree of flexibility in the use of GPUs throughout the computing cluster, as it allows the CPU part to be completely decoupled from the GPU part of the applications. In addition, GPUs can be on any node in the cluster, regardless of the node on which the CPU part of the application is running. In general, both for Cloud Computing and in the case of HPC, this greater degree of flexibility translates into an up to 2x increase in system-wide throughput while reducing energy consumption by approximately 15%. Finally, we have also developed a job migration mechanism for the GPU part of applications that has been integrated within the rCUDA middleware. This migration mechanism has been evaluated and the results clearly show that, in exchange for a small overhead of about 400 milliseconds in the execution time of the applications, it is a powerful tool with which, again, we can increase productivity and reduce energy foot print of the computing system. In summary, this PhD thesis analyzes the main problems arising from the use of GPUs as computing accelerators, both in HPC and Cloud Computing environments, and demonstrates how thanks to the use of the rCUDA middleware these problems can be addressed. In addition, a powerful GPU job migration mechanism is being developed, which, integrated within the rCUDA framework, becomes a key tool for future job schedulers in heterogeneous clusters.
This work jointly supported by the Fundación Séneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grants (20524/PDC/18, 20813/PI/18 and 20988/PI/18) and by the Spanish MEC and European Commission FEDER under grants TIN2015-66972-C5-3-R, TIN2016-78799-P and CTQ2017-87974-R (AEI/FEDER, UE). We also thank NVIDIA for hardware donation under GPU Educational Center 2014-2016 and Research Center 2015-2016. The authors thankfully acknowledge the computer resources at CTE-POWER and the technical support provided by Barcelona Supercomputing Center - Centro Nacional de Supercomputación (RES-BCV-2018-3-0008). Furthermore, researchers from Universitat Politècnica de València are supported by the Generalitat Valenciana under Grant PROMETEO/2017/077. Authors are also grateful for the generous support provided by Mellanox Technologies Inc. Prof. Pradipta Purkayastha, from Department of Chemical Sciences, Indian Institute of Science Education and Research (IISER) Kolkata, is acknowledged for kindly providing the initial ligand and DNA structures.
Prades Gasulla, J. (2021). Improving Performance and Energy Efficiency of Heterogeneous Systems with rCUDA [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/168081
TESIS
Zheng, Mai. "Towards Manifesting Reliability Issues In Modern Computer Systems." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1436283400.
Повний текст джерелаObrecht, Christian. "High performance lattice Boltzmann solvers on massively parallel architectures with applications to building aeraulics." Phd thesis, INSA de Lyon, 2012. http://tel.archives-ouvertes.fr/tel-00776986.
Повний текст джерелаKerr, Andrew. "A model of dynamic compilation for heterogeneous compute platforms." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/47719.
Повний текст джерелаPasserat-Palmbach, Jonathan. "Contributions to parallel stochastic simulation : application of good software engineering practices to the distribution of pseudorandom streams in hybrid Monte Carlo simulations." Phd thesis, Université Blaise Pascal - Clermont-Ferrand II, 2013. http://tel.archives-ouvertes.fr/tel-00858735.
Повний текст джерелаTeng, Sin Yong. "Intelligent Energy-Savings and Process Improvement Strategies in Energy-Intensive Industries." Doctoral thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2020. http://www.nusl.cz/ntk/nusl-433427.
Повний текст джерелаBusato, Federico. "High-Performance and Power-Aware Graph Processing on GPUs." Doctoral thesis, 2018. http://hdl.handle.net/11562/979445.
Повний текст джерелаMishra, Ashirbad. "Efficient betweenness Centrality Computations on Hybrid CPU-GPU Systems." Thesis, 2016. http://hdl.handle.net/2005/2718.
Повний текст джерелаLain, Jiang-Siang, and 連江祥. "High-performance Cholesky Factorization using the GPU and CPU parallel processing for band matrix." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/36458624994930511368.
Повний текст джерела國立交通大學
土木工程學系
100
The required memory storage and processing time will be increased and elongated when solver linear system in larger matrices. Hence, the application of parallel computing technology on solving of linear system has received considerable interest in the last decade. Most of the parallel computing technologies of the previous studies have focused on iterative algorithm on the distributed parallel computing platforms。 However, the performance of iterative algorithms can realize only for matrices with larger-scaled linear system on super computers. The aim of this study focuses on developing more complicated direct parallel algorithm, on the multi-core CPU (Multi-core) and GPU parallel computing platforms. There are three stages in this study. First, the direct linear system solving algorithms are parallelized and implemented on the multi-core platform. The computing time and precision of solution were investigated and compared to conclude the performance of these different algorithms. Following, the blocked-Cholesky algorithm was utilized and optimized to develop a novel parallel algorithm. Finally, the optimized novel blocked-Cholesky algorithm was implemented on multi-core CPU and GPU parallel computing platforms. The computing results revealed that a 2.3 speed-up achieved fir band-matrices of bandwidth greater than 100 on a four-core platform as compared with performance on a single-core platform. Moreover, the computing performance accomplished 3.3 when the bandwidth of matrices greater than the1000. Notable, a ten-time performance can be reached when the novel algorithm was implemented on a platform of GPU with CUDA technology. The results also revealed that the more the bandwidth of matrices, the higher the achieved performance for computing on GPU platforms.
Azevedo, José Maria Pantoja Mata Vale e. "Image Stream Similarity Search in GPU Clusters." Master's thesis, 2018. http://hdl.handle.net/10362/58447.
Повний текст джерелаCARBONE, Giancarlo. "HPC techniques for large scale data analysis." Doctoral thesis, 2015. http://hdl.handle.net/11573/864567.
Повний текст джерелаCHANG, CHIA-LONG, and 張嘉龍. "The Research on the Benefits of Leading Cloud Computing on Cross Virtual and Real Graph Platform – A "GPU Nomogram Developed a Common Farm Cluster" of National Center for High-Performance Computing as a Research Object." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/u9qf2h.
Повний текст джерела國立臺北教育大學
數位科技設計學系(含玩具與遊戲設計碩士班)
107
National Network Center recently built the completion of the "GPU map farm R & amp; d and sharing cluster", the main users from the domestic animation, special effects, television, film, advertising, communication related school departments and industries, so the use of high efficiency, stability and availability and other needs, in order to complete the calculation project in a limited time. Previous studies on cloud computing performance have focused on building cloud-based virtual systems from the user's point of view for performance evaluation, while few studies have been conducted on the performance of cloud computing systems of cloud computing providers themselves. Therefore, this study focuses on the newly built "GPU Graphics Farm R&D Common Cluster" to study the benefits of cloud computing. The results of this study are as follows: 1. Calculation node, server Management node, GPU Acceleration section, GPU Remote logon node, etc. in the remote graphical interface, remote hardware self-diagnosis, remote power (boot and shutdown and Reset), Virtual Media, remote control keyboard and mouse functional testing, are through the results. 2.SPEC Int test Score, in addition to the Server Management node, graph calculation node, GPU acceleration section, GPU Remote login node and other scores are more than 1500 points, fractional performance is quite ideal. 3. This graph system in the Remote management function test report, you can see the calculation node, server Management node, GPU Acceleration section, GPU Remote logon node and other functional tests, are the results of the adoption. The SPEC Int test score is more than 1500 points in the calculation node, GPU acceleration section, GPU Remote login node, etc., and the score performance is quite ideal. Finally, using the graph farm platform to verify the GPU performance in 4K and 1080p two picture quality and use the solid nVida Tesla V100 and the virtual platform GPU measurement to import the cloud platform without the doubt that the calculation efficiency is reduced. 4.The benefits of a cross virtual and real platform, that is, a virtualized cloud platform, while calculating the performance of the graph, do not differ too much by using cloud computing but can have the benefits of cloud computing, so it is helpful to cross virtual platforms.
Ramashekar, Thejas. "Automatic Data Allocation, Buffer Management And Data Movement For Multi-GPU Machines." Thesis, 2013. http://etd.iisc.ernet.in/handle/2005/2627.
Повний текст джерелаPatel, Parita. "Compilation of Graph Algorithms for Hybrid, Cross-Platform and Distributed Architectures." Thesis, 2017. http://etd.iisc.ernet.in/2005/3803.
Повний текст джерелаKriske, Jeffery Edward Jr. "A scalable approach to processing adaptive optics optical coherence tomography data from multiple sensors using multiple graphics processing units." Thesis, 2014. http://hdl.handle.net/1805/6458.
Повний текст джерелаAdaptive optics-optical coherence tomography (AO-OCT) is a non-invasive method of imaging the human retina in vivo. It can be used to visualize microscopic structures, making it incredibly useful for the early detection and diagnosis of retinal disease. The research group at Indiana University has a novel multi-camera AO-OCT system capable of 1 MHz acquisition rates. Until this point, a method has not existed to process data from such a novel system quickly and accurately enough on a CPU, a GPU, or one that can scale to multiple GPUs automatically in an efficient manner. This is a barrier to using a MHz AO-OCT system in a clinical environment. A novel approach to processing AO-OCT data from the unique multi-camera optics system is tested on multiple graphics processing units (GPUs) in parallel with one, two, and four camera combinations. The design and results demonstrate a scalable, reusable, extensible method of computing AO-OCT output. This approach can either achieve real time results with an AO-OCT system capable of 1 MHz acquisition rates or be scaled to a higher accuracy mode with a fast Fourier transform of 16,384 complex values.