Dissertations / Theses: 'Scalable computing'

1

Fleming, Kermin Elliott Jr. "Scalable reconfigurable computing leveraging latency-insensitive channels." Thesis, Massachusetts Institute of Technology, 2013. http://hdl.handle.net/1721.1/79212.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 190-197).
Traditionally, FPGAs have been confined to the limited role of small, low-volume ASIC replacements and as circuit emulators. However, continued Moore's law scaling has given FPGAs new life as accelerators for applications that map well to fine-grained parallel substrates. Examples of such applications include processor modelling, compression, and digital signal processing. Although FPGAs continue to increase in size, some interesting designs still fail to fit in to a single FPGA. Many tools exist that partition RTL descriptions across FPGAs. Unfortunately, existing tools have low performance due to the inefficiency of maintaining the cycle-by-cycle behavior of RTL among discrete FPGAs. These tools are unsuitable for use in FPGA program acceleration, as the purpose of an accelerator is to make applications run faster. This thesis presents latency-insensitive channels, a language-level mechanism by which programmers express points in their their design at which the cycle-by-cycle behavior of the design may be modified by the compiler. By decoupling the timing of portions of the RTL from the high-level function of the program, designs may be mapped to multiple FPGAs without suffering the performance degradation observed in existing tools. This thesis demonstrates, using a diverse set of large designs, that FPGA programs described in terms of latency-insensitive channels obtain significant gains in design feasibility, compilation time, and run-time when mapped to multiple FPGAs.
by Kermin Elliott Fleming, Jr.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

2

Spagnuolo, Carmine. "Scalable computational science." Doctoral thesis, Universita degli studi di Salerno, 2017. http://hdl.handle.net/10556/2581.

Full text

Abstract:

2015 - 2016
Computational science also know as scientiﬁc computing is a rapidly growing novel ﬁeld that uses advanced computing in order to solve complex problems. This new discipline combines technologies, modern computational methods and simulations to address problems too complex to be reliably predicted only by theory and too dangerous or expensive to be reproduced in laboratories. Successes in computational science over the past twenty years have caused demand of supercomputing, to improve the performance of the solutions and to allow the growth of the models, in terms of sizes and quality. From a computer scientist’s perspective, it is natural to think to distribute the computation required to study a complex systems among multiple machines: it is well known that the speed of singleprocessor computers is reaching some physical limits. For these reasons, parallel and distributed computing has become the dominant paradigm for computational scientists who need the latest development on computing resources in order to solve their problems and the “Scalability” has been recognized as the central challenge in this science. In this dissertation the design and implementation of Frameworks, Parallel Languages and Architectures, which enable to improve the state of the art on Scalable Computational Science, are discussed. Frameworks. The proposal of D-MASON, a distributed version of MASON, a wellknown and popular Java toolkit for writing and running Agent-Based Simulations (ABSs). D-MASON introduces a framework level parallelization so that scientists that use the framework (e.g., a domain expert with limited knowledge of distributed programming) could be only minimally aware of such distribution. D-MASON, was began to be developed since 2011, the main purpose of the project was overcoming the limits of the sequentially computation of MASON, using distributed computing. D-MASON enables to do more than MASONin terms of size of simulations (number of agents and complexity of agents behaviors), but allows also to reduce the simulation time of simulations written in MASON. For this reason, one of the most important feature of D-MASON is that it requires a limited number of changing on the MASON’s code in order to execute simulations on distributed systems. v D-MASON, based on Master-Worker paradigm, was initially designed for heterogeneous computing in order to exploit the unused computational resources in labs, but it also provides functionality to be executed in homogeneous systems (as HPC systems) as well as cloud infrastructures. The architecture of D-MASON is presented in the following three papers, which describes all D-MASON layers: • Cordasco G., Spagnuolo C. and Scarano V. Toward the new version of D-MASON: Efﬁciency, Effectiveness and Correctness in Parallel and Distributed Agent-based Simulations. 1st IEEE Workshop on Parallel and Distributed Processing for Computational Social Systems. IEEE International Parallel & Distributed Processing Symposium 2016. • Cordasco G., De Chiara R., Mancuso A., Mazzeo D., Scarano V. and Spagnuolo C. Bringing together efﬁciency and effectiveness in distributed simulations: the experience with D-MASON. SIMULATION: Transactions of The Society for Modeling and Simulation International, June 11, 2013. • Cordasco G., De Chiara R., Mancuso A., Mazzeo D., Scarano V. and Spagnuolo C. A Framework for distributing Agent-based simulations. Ninth International Workshop Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms of Euro-Par 2011 conference. Much effort has been made, on the Communication Layer, to improve the communication efﬁciency in the case of homogeneous systems. D-MASON is based on Publish/Subscribe (PS) communication paradigm and uses a centralized message broker (based on the Java Message Service standard) to deal with heterogeneous systems. The communication for homogeneous system uses the Message Passing Interface (MPI) standard and is also based on PS. In order to use MPI within Java, D-MASON uses a Java binding of MPI. Unfortunately, this binding is relatively new and does not provides all MPI functionalities. Several communication strategies were designed, implemented and evaluated. These strategies were presented in two papers: • Cordasco G., Milone F., Spagnuolo C. and Vicidomini L. Exploiting D-MASON on Parallel Platforms: A Novel Communication Strategy 2st Workshop on Parallel and Distributed Agent-Based Simulations of EuroPar 2014 conference. • Cordasco G., Mancuso A., Milone F. and Spagnuolo C. Communication strategies in Distributed Agent-Based Simulations: the experience with D-MASON 1st Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2013 conference. vi D-MASON provides also mechanisms for the visualization and gathering of the data in distributed simulation (available on the Visualization Layer). These solutions are presented in the paper: • Cordasco G., De Chiara R., Raia F., Scarano V., Spagnuolo C. and Vicidomini L. Designing Computational Steering Facilities for Distributed Agent Based Simulations. Proceedings of the ACM SIGSIM Conference on Principles of Advanced Discrete Simulation 2013. In DABS one of the most complex problem is the partitioning and balancing of the computation. D-MASON provides, in the Distributed Simulation layer, mechanisms for partitioning and dynamically balancing the computation. D-MASON uses ﬁeld partitioning mechanism to divide the computation among the distributed system. The ﬁeld partitioning mechanism provides a nice trade-off between balancing and communication effort. Nevertheless a lot of ABS are not based on 2D- or 3D-ﬁelds and are based on a communication graph that models the relationship among the agents. Inthiscasetheﬁeldpartitioningmechanismdoesnotensuregoodsimulation performance. Therefore D-MASON provides also a speciﬁc mechanisms to manage simulation that uses a graph to describe agent interactions. These solutions were presented in the following publication: • Antelmi A., Cordasco G., Spagnuolo C. and Vicidomini L.. On Evaluating Graph Partitioning Algorithms for Distributed Agent Based Models on Networks. 3rd Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2015 conference. The ﬁeld partitioning mechanism, intuitively, enables the mono and bi-dimensional partitioning of an Euclidean space. This approach is also know as uniform partitioning. But in some cases, e.g. simulations that simulate urban areas using a Geographical Information System (GIS), the uniform partitioning degrades the simulation performance, due to the unbalanced distribution of the agents on the ﬁeld and consequently on the computational resources. In such a case, D-MASON provides a non-uniform partitioning mechanism (inspired by Quad-Tree data structure), presented in the following paper: • Lettieri N., Spagnuolo C. and Vicidomini L.. Distributed Agent-based Simulation and GIS: An Experiment With the dynamics of Social Norms. 3rd Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2015 conference. • G. Cordasco and C. Spagnuolo and V. Scarano. Work Partitioning on Parallel and Distributed Agent-Based Simulation. IEEE Workshop on vii Parallel and Distributed Processing for Computational Social Systems of International Parallel & Distributed Processing Symposium, 2017. The latest version of D-MASON provides a web-based System Management, to better use D-MASON in Cloud infrastructures. D-MASON on the Amazon EC2 Cloud infrastructure and its performance in terms of speed and cost were compared against D-MASON on an HPC environment. The obtained results, and the new System Management Layer are presented in the following paper: • MCarillo,GCordasco,FSerrapica,CSpagnuolo,P.Szufel,andL.Vicidomini. D-Mason on the Cloud: an Experience with Amazon Web Services. 4rd Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2016 conference. ParallelLanguages. The proposal of an architecture, which enable to invoke code supported by a Java Virtual Machine (JVM) from code written in C language. Swft/T, is a parallel scripting language for programming highly concurrent applications in parallel and distributed environments. Swift/T is the reimplemented version of Swift language, with a new compiler and runtime. Swift/T improve Swift, allowing scalability over 500 tasks per second, load balancing feature, distributed data structures, and dataﬂow-driven concurrent task execution. Swif/T provides an interesting feature the one of calling easily and natively other languages (as Python, R, Julia, C) by using special language functions named leaf functions. Considering the actual trend of some supercomputing vendors (such as Cray Inc.) that support in its processors Java Virtual Machines (JVM), it is desirable to provide methods to call also Java code from Swift/T. In particular is really attractive to be able to call scripting languages for JVM as Clojure, Scala, Groovy, JavaScript etc. For this purpose a C binding to instanziate and call JVM was designed. This binding is used in Swif/T (since the version 1.0) to develop leaf functions that call Java code. The code are public available at GitHub project page. Frameworks. The proposal of two tools, which exploit the computing power of parallel systems to improve the effectiveness and the efﬁciency of Simulation Optimization strategies. Simulations Optimization (SO) is used to refer to the techniques studied for ascertaining the parameters of a complex model that minimize (or maximize) given criteria (one or many), which can only be computed by performing a simulation run. Due to the the high dimensionality of the search space, the heterogeneity of parameters, the irregular shape and the stochastic nature of the objective evaluation function, the tuning of such systems is extremely demanding from the computational point of view. The ﬁrst frameworks is SOF: Zero Conﬁguration Simulation Optimization Framework on the Cloud, it was designed to run SO process in viii the cloud. SOF is based on the Apache Hadoop infrastructure and is presented in the following paper: • Carillo M., Cordasco G., Scarano V., Serrapica F., Spagnuolo C. and Szufel P. SOF: Zero Conﬁguration Simulation Optimization Framework on the Cloud. Parallel, Distributed, and Network-Based Processing 2016. The second framework is EMEWS: Extreme-scale Model Exploration with Swift/T, it has been designed at Argonne National Laboratory (USA). EMEWS as SOF allows to perform SO processes in distributed system. Both the frameworks are mainly designed for ABS. In particular EMEWS was tested using the ABS simulation toolkit Repast. Initially, EMEWS was not able to easily execute out of the box simulations written in MASON and NetLogo. This thesis presents new functionalities of EMEWS and solutions to easily execute MASON and NetLogo simulations on it. The EMEWS use cases are presented in the following paper: • J. Ozik, N. T. Collier, J. M. Wozniak and C. Spagnuolo From Desktop To Large-scale Model Exploration with Swift/T. Winter Simulation Conference 2016. Architectures. The proposal of an open-source, extensible, architecture for the visualization of data in HTML pages, exploiting a distributed web computing. Following the Edge-centric Computing paradigm, the data visualization is performed edge side ensuring data trustiness, privacy, scalability and dynamic data loading. The architecture has been exploited in the Social Platform for Open Data (SPOD). The proposed architecture, that has also appeared in the following papers: • G. Cordasco, D. Malandrino, P. Palmieri, A. Petta, D. Pirozzi, V. Scarano, L. Serra, C. Spagnuolo, L. Vicidomini A Scalable Data Web Visualization Architecture. Parallel, Distributed, and Network-Based Processing 2017. • G. Cordasco, D. Malandrino, P. Palmieri, A. Petta, D. Pirozzi, V. Scarano, L. Serra, C. Spagnuolo, L. Vicidomini An Architecture for Social Sharing and Collaboration around Open Data Visualisation. In Poster Proc. of the 19th ACM conference on "Computer-Supported Cooperative Work and Social Computing 2016. • G. Cordasco, D. Malandrino, P. Palmieri, A. Petta, D. Pirozzi, V. Scarano, L. Serra, C. Spagnuolo, L. Vicidomini An extensible architecture for an ecosystem of visualization web-components for Open Data Maximising interoperability Workshop— core vocabularies, location-aware data and more 2015. [edited by author]
Computational science anche conosciuta come calcolo scientiﬁco è un settore in rapida crescita che usa il calcolo avanzato per affrontare problemi complessi. Questa nuova disciplina, combina tecnologia, moderni metodi computazionali e simulazioni per affrontare problemi troppo difﬁcili da poter essere studiati solo in teoria o troppo pericolosi e costosi per poter essere riprodotti sperimentalmente in laboratorio. I progressi dell’ultimo ventennio in computational science hanno sfruttato il supercalcolo per migliorare le performance delle soluzioni e permettere la crescita dei modelli, in termini di dimensioni e qualità dei risultati ottenuti. Le soluzioni adottate si avvalgono del calcolo distribuito: è ben noto che la velocità di un computer con un singolo processore sta raggiungendo dei limiti ﬁsici. Per queste ragioni, la computazione parallela e distribuita è diventata il principale paradigma di calcolo per affrontare i problemi nell’ambito della computational science, in cui la scalabilità delle soluzioni costituisce la sﬁda da affrontare. In questa tesi vengono discusse la progettazione e l’implementazione di Framework, Linguaggi Paralleli e Architetture che consentono di migliorare lo stato dell’arte della Scalable Computational Science. In particolare, i maggiori contributi riguardano: Frameworks. La proposta di D-MASON, una versione distribuita di MASON, un toolkit Java per la scrittura e l’esecuzione di simulazioni basate su agenti (AgentBased Simulations, ABSs). D-MASON introduce la parallelizzazione a livello framework per far si che gli scienziati che lo utilizzano (ad esempio un esperto con limitata conoscenza della programmazione distribuita) possano rendersi conto solo minimamente di lavorare in ambiente distribuito (ad esempio esperti del dominio con limitata esperienza o nessuna esperienza nel calcolo distribuito). D-MASON è un progetto iniziato nel 2011, il cui principale obiettivo è quello di superare i limiti del calcolo sequenziale di MASON, sfruttando il calcolo distribuito. D-MASON permette di simulare modelli molto più complessi (in termini di numero di agenti e complessità dei comportamenti dei singoli agenti) rispetto a MASON e inoltre consente, a parità di calcolo, di ridurre il tempo necessario ad eseguire le simulazioni MASON. D-MASON è stato progettato in modo da permettere la migrazione di simulazioni scritte in MASON con un numero limitato di modiﬁche da apportare al codice, al ﬁne di garantire il massimo della semplicità d’uso. v D-MASON è basato sul paradigma Master-Worker, inizialmente pensato per sistemi di calcolo eterogenei, nelle sue ultime versioni consente l’esecuzione anche in sistemi omogenei come sistemi HPC e infrastrutture di cloud computing. L’architettura di D-MASON è stata presentata nelle seguenti pubblicazioni: • Cordasco G., Spagnuolo C. and Scarano V. Toward the new version of D-MASON: Efﬁciency, Effectiveness and Correctness in Parallel and Distributed Agent-based Simulations. 1st IEEE Workshop on Parallel and Distributed Processing for Computational Social Systems. IEEE International Parallel & Distributed Processing Symposium 2016. • Cordasco G., De Chiara R., Mancuso A., Mazzeo D., Scarano V. and Spagnuolo C. Bringing together efﬁciency and effectiveness in distributed simulations: the experience with D-MASON. SIMULATION: Transactions of The Society for Modeling and Simulation International, June 11, 2013. • Cordasco G., De Chiara R., Mancuso A., Mazzeo D., Scarano V. and Spagnuolo C. A Framework for distributing Agent-based simulations. Ninth International Workshop Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms of Euro-Par 2011 conference. Uno degli strati architetturali di D-MASON che ne determina le prestazioni, è il il Communication Layer, il quale offre le funzionalità di comunicazione tra tutte le entità coinvolte nel calcolo. La comunicazione in D-MASON è basata sul paradigma Publish/Subscribe (PS). Al ﬁne di soddisfare la ﬂessibilità e la scalabilità richiesta, vengono fornite due strategie di comunicazione, una centralizzata (utilizzando Java Message Service) e una decentralizzata (utilizzando Message Passing Interface). La comunicazione in sistemi omogenei è sempre basata su PS ma utilizza lo standard Message Passing Interface (MPI). Al ﬁne di utilizzare MPI in Java, lo strato di comunicazione di D-MASON è implementato sfruttando un binding Java a MPI. Tale soluzione non permette però l’utilizzo di tutte le funzionalità di MPI. Al tal proposito molteplici soluzioni sono stare progettate e implementate, e sono presentate nelle seguenti pubblicazioni: • Cordasco G., Milone F., Spagnuolo C. and Vicidomini L. Exploiting D-MASON on Parallel Platforms: A Novel Communication Strategy 2st Workshop on Parallel and Distributed Agent-Based Simulations of EuroPar 2014 conference. • Cordasco G., Mancuso A., Milone F. and Spagnuolo C. Communication strategies in Distributed Agent-Based Simulations: the experience with D-MASON 1st Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2013 conference. vi D-MASON offre anche meccanismi per la visualizzazione centralizzata e la raccolta di informazioni in simulazioni distribuite (tramite il Visualization Layer). I risultati ottenuti sono stati presentati nella seguente pubblicazione: • Cordasco G., De Chiara R., Raia F., Scarano V., Spagnuolo C. and Vicidomini L. Designing Computational Steering Facilities for Distributed Agent Based Simulations. Proceedings of the ACM SIGSIM Conference on Principles of Advanced Discrete Simulation 2013. Quando si parla di simulazioni distribuite una delle principali problematiche è il bilanciamento del carico. D-MASON offre, nel Distributed Simulation Layer, meccanismi per il partizionamento dinamico e il bilanciamento del carico. DMASON utilizza la tecnica del ﬁeld partitioning per suddividere il lavoro tra le entità del sistema distribuito. La tecnica di ﬁeld partitioning consente di ottenere un buon equilibrio tra il bilanciamento del carico e l’overhead di comunicazione. Molti modelli di simulazione non sono basati su spazi 2/3-dimensionali ma bensì modellano le relazioni tra gli agenti utilizzando strutture dati grafo. In questi casi la tecnica di ﬁeld partitioning non garantisce soluzioni che consentono di ottenere buone prestazioni. Per risolvere tale problema, D-MASON fornisce particolari soluzioni per simulazioni che utilizzano i graﬁ per modellare le relazioni tra gli agenti. I risultati conseguiti sono stati presentati nella seguente pubblicazione: • Antelmi A., Cordasco G., Spagnuolo C. and Vicidomini L.. On Evaluating Graph Partitioning Algorithms for Distributed Agent Based Models on Networks. 3rd Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2015 conference. Il metodo di ﬁeld partitioning consente il partizionamento di campi Euclidei mono e bi-dimensionali; tale approccio è anche conosciuto con il nome di partizionamento uniforme. In alcuni casi, come ad esempio simulazioni che utilizzano Geographical Information System (GIS), il metodo di partizionamento uniforme non è in grado di garantire buone prestazioni, a causa del posizionamento non bilanciato degli agenti sul campo di simulazione. In questi casi, D-MASON offre un meccanismo di partizionamento non uniforme (inspirato alla struttura dati Quad-Tree), presentato nelle seguenti pubblicazioni: • Lettieri N., Spagnuolo C. and Vicidomini L.. Distributed Agent-based Simulation and GIS: An Experiment With the dynamics of Social Norms. 3rd Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2015 conference. • G. Cordasco and C. Spagnuolo and V. Scarano. Work Partitioning on Parallel and Distributed Agent-Based Simulation. IEEE Workshop on vii Parallel and Distributed Processing for Computational Social Systems of International Parallel & Distributed Processing Symposium, 2017. Inoltre, D-MASON èstatoestesoalloscopodifornireun’infrastrutturaSimulation-asa-Service(SIMaaS),chesempliﬁcailprocessodiesecuzionedisimulazionidistribuite in un ambiente di Cloud Computing. D-MASON nella sua versione più recente offre uno strato software di management basato su web, che ne consente estrema facilità d’uso in ambienti Cloud. Utilizzando il System Management, D-MASON è stato sperimentato sull’infrastruttura Cloud Amazon EC2 confrontando le prestazioni in questo ambiente cloud con un sistema HPC. I risultati ottenuti sono stati presentati nella seguente pubblicazione: • MCarillo,GCordasco,FSerrapica,CSpagnuolo,P.Szufel,andL.Vicidomini. D-Mason on the Cloud: an Experience with Amazon Web Services. 4rd Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2016 conference. LinguaggiParalleli. La proposta di un’architettura, la quale consente di invocare il codice per Java Virtual Machine (JVM) da codice scritto in linguaggio C. Swift/T è un linguaggio di scripting parallelo per sviluppare applicazioni altamente scalabili in ambienti paralleli e distribuiti. Swift/T è l’implementazione del linguaggio Swift per ambienti HPC. Swift/T migliora il linguaggio Swift, consentendo la scalabilità ﬁno a 500 task per secondo, il bilanciamento del carico, strutture dati distribuite, e dataﬂow task execution. Swift/T consente di invocare nativamente codice scritto in altri linguaggi (come Python, R, Julia e C) utilizzando particolari funzioni deﬁnite come leaf function. Il trend attuale di molti produttori di sistemi di supercalcolo (come Cray Inc.), è quello di offrire processori che supportano JVM. Considerato ciò in questa tesi viene presentato il metodo adottato in Swift/T per l’invocazione di linguaggi per JVM (come Java, Clojure, Scala, Groovy, JavaScript) da Swift/T. A tale scopo è stato realizzato un binding C per l’invocazione e la gestione di codice per JVM. Questa soluzione è stata utilizzata in Swift/T (dalla versione 1.0) per estendere il supporto del linguaggio anche a linguaggi per JVM. Il codice sviluppato è stato rilasciato sotto licenza open source ed è disponibile in un repository pubblico su GitHub. Frameworks. La proposta di due tool che sfruttano la potenza di calcolo di sistemi distribuiti per migliorare l’efﬁcacia e l’efﬁcienza di strategie di Simulation Optimization. Simulation Optimization (SO) si riferisce alle tecniche utilizzate per l’individuazione dei parametri di un modello complesso che minimizzano (o massimizzano) determinati criteri, i quali possono essere computati solo tramite l’esecuzione di una simulazione. A causa dell’elevata dimensionalità dello spazio dei parametri, della loro eterogeneità e, della natura stocastica della funzione di viii valutazione, la conﬁgurazione di tali sistemi è estremamente onerosa dal punto di vista computazionale. In questo lavoro sono presentati due framework per SO. Il primo framework è SOF:Zero ConﬁgurationSimulation OptimizationFramework on the Cloud, progettato per l’esecuzione del processo SO in ambienti di cloud computing. SOF è basato su Apache Hadoop ed è presentato nella seguente pubblicazione: • Carillo M., Cordasco G., Scarano V., Serrapica F., Spagnuolo C. and Szufel P. SOF: Zero Conﬁguration Simulation Optimization Framework on the Cloud. Parallel, Distributed, and Network-Based Processing 2016. Il secondo framework è EMEWS: Extreme-scale Model Exploration with Swift/T, progettato per eseguire processi SO in sistemi HPC. Entrambi i framework sono stati sviluppati principalmente per ABS. In particolare EMEWS è stato sperimentato utilizzando il toolkit ABS chiamato Repast. Nella sua prima versione EMEWS non supportava simulazioni scritte in MASON e NetLogo. In questo lavoro di tesi sono descritte alcune funzionalità di EMEWS che consentono il supporto a tali simulazioni. EMEWS e alcuni casi d’uso sono presentati nella seguente pubblicazione: • J. Ozik, N. T. Collier, J. M. Wozniak and C. Spagnuolo From Desktop To Large-scale Model Exploration with Swift/T. Winter Simulation Conference 2016. Architetture. La proposta di un’architettura open source per la visualizzazione web di dati dinamici. Tale architettura si basa sul paradigma di Edge-centric Computing; la visualizzazione dei dati è eseguita lato client, garantendo in questo modo l’afﬁdabilità dei dati, la privacy e la scalabilità in termini di numero di visualizzazioni concorrenti. L’architettura è stata utilizzata all’interno della piattaforma sociale SPOD (Social Platform for Open Data), ed è stata presentata nelle seguenti pubblicazioni: • G. Cordasco, D. Malandrino, P. Palmieri, A. Petta, D. Pirozzi, V. Scarano, L. Serra, C. Spagnuolo, L. Vicidomini A Scalable Data Web Visualization Architecture. Parallel, Distributed, and Network-Based Processing 2017. • G. Cordasco, D. Malandrino, P. Palmieri, A. Petta, D. Pirozzi, V. Scarano, L. Serra, C. Spagnuolo, L. Vicidomini An Architecture for Social Sharing and Collaboration around Open Data Visualisation. In Poster Proc. of the 19th ACM conference on "Computer-Supported Cooperative Work and Social Computing 2016. • G. Cordasco, D. Malandrino, P. Palmieri, A. Petta, D. Pirozzi, V. Scarano, L. Serra, C. Spagnuolo, L. Vicidomini An extensible architecture for an ecosystem of visualization web-components for Open Data Maximising interoperability Workshop— core vocabularies, location-aware data and more 2015. [a cura dell'autore]
XV n.s. (XXIX)

APA, Harvard, Vancouver, ISO, and other styles

3

何世全 and Sai-chuen Ho. "Single I/O space for scalable cluster computing." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2000. http://hub.hku.hk/bib/B31222614.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Eltony, Amira M. (Amira Madeleine). "Scalable trap technology for quantum computing with ions." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/99822.

Full text

Abstract:

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages [187]-214).
Quantum computers employ quantum mechanical effects, such as superposition and entanglement, to process information in a distinctive way, with advantages for simulation and for new, and in some cases more-efficient algorithms. A quantum bit is a two-level quantum system, such as the electronic or spin state of a trapped atomic ion. Physics experiments with single atomic ions acting as "quantum bits" have demonstrated many of the ingredients for a quantum computer. But to perform useful computations these experimental systems will need to be vastly scaled-up. Our goal is to engineer systems for large-scale quantum computation with trapped ions. Building on established techniques of microfabrication, we create ion traps incorporating exotic materials and devices, and we investigate how quantum algorithms can be efficiently mapped onto physical trap hardware. An existing apparatus built around a bath cryostat is modified for characterization of novel ion traps and devices at cryogenic temperatures (4 K and 77 K). We demonstrate an ion trap on a transparent chip with an integrated photodetector, which allows for scalable, efficient state detection of a quantum bit. To understand and better control electric field noise (which limits gate fidelities), we experiment with coating trap electrodes in graphene. We develop traps compatible with standard CMOS manufacturing to leverage the precision and scale of this platform, and we design a Single Instruction Multiple Data (SIMD) algorithm for implementing the QFT using a distributed array of ion chains. Lastly, we explore how to bring it all together to create an integrated trap module from which a scalable architecture can be assembled.
by Amira M. Eltony.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

5

Rrustemi, Alban. "Computing surfaces : a platform for scalable interactive displays." Thesis, University of Cambridge, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.612533.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Allcock, David Thomas Charles. "Surface-electrode ion traps for scalable quantum computing." Thesis, University of Oxford, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.559722.

Full text

Abstract:

The major challenges in trapped-ion quantum computation are to scale up few-ion experiments to many qubits and to improve control techniques so that quantum logic gates can be carried out with higher fidelities. This thesis re- ports experimental progress in both of these areas. In the early part of the the- sis we describe the fabrication of a surface-electrode ion trap, the development of the apparatus and techniques required to operate it and the successful trap- ping of ⁴⁰Ca⁺ ions. Notably we developed methods to control the orientation of the principal axes and to minimise ion micromotion. We propose a repumping scheme that simplifies heating rate measurements for ions with low-lying D levels, and use it to characterise the electric field noise in the trap. Surface-electrode traps are important because they offer a route to dense integration of electronic and optical control elements using existing microfabrication technology. We explore this scaling route by testing a series of three traps that were microfabricated at Sandia National Laboratories. Investigations of micromotion and charging of the surface by laser beams were carried out and improvements to future traps are suggested. Using one of these traps we also investigated anomalous electrical noise from the electrode surfaces and discovered that it can be reduced by cleaning with a pulsed laser. A factor of two de- crease was observed; this represents the first in situ removal of this noise source, an important step towards higher gate fidelities. In the second half of the thesis we describe the design and construction of an experiment for the purpose of replacing laser-driven multi-qubit quantum logic gates with microwave-driven ones. We investigate magnetic-field-independent hyperfine qubits in ⁴⁰Ca⁺ as suitable qubits for this scheme. We make a design study of how best to integrate an ion trap with the microwave conductors required to implement the gate and propose a novel integrated resonant structure. The trap was fabricated and ions were successfully loaded. Single-qubit experiments show that the microwave fields above the trap are in excellent agreement with software simulations. There are good prospects for demonstrating a multi-qubit gate in the near future. We conclude by discussing the possibilities for larger-scale quantum computation by combining microfabricated traps and microwave control.

APA, Harvard, Vancouver, ISO, and other styles

7

Ho, Sai-chuen. "Single I/O space for scalable cluster computing /." Hong Kong : University of Hong Kong, 2000. http://sunzi.lib.hku.hk/hkuto/record.jsp?B21841512.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Pang, Xiaolin. "Scalable Algorithms for Outlier Detection." Thesis, The University of Sydney, 2014. http://hdl.handle.net/2123/11743.

Full text

Abstract:

Outlier detection is an important problem for the data mining community as outliers often embody potentially new and valuable information. Nowadays, in the face of exponential growth in data generation, extracting outliers from such massive data sets is a non-trivial task and requires the design and implementation of new scalable algorithms which is the main focus of the thesis. More specifically, we make the following contributions: We propose a new algorithm for detecting emerging outliers in traffic data by extending the Likelihood Ratio Test Statistics (LRT) framework. We also propose a general and efficient pattern mining approach for spatio-temporal outlier detection that is based on our statistical models. We propose a unified parallel approach for LRT computation in GPGPU, multi-core and cloud cluster environments. We also present new algorithmic techniques for computing the Likelihood Ratio Test (LRT) in parallel for a large spatial data grid by utilizing these distributed architectures. As a separate contribution, we present novel approaches which simultaneously perform clustering and outlier detection without specifying the number of clusters. These methods are formulated as an integer programming optimisation task.

APA, Harvard, Vancouver, ISO, and other styles

9

Tran, Viet-Trung. "Scalable data-management systems for Big Data." Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2013. http://tel.archives-ouvertes.fr/tel-00920432.

Full text

Abstract:

Big Data can be characterized by 3 V's. * Big Volume refers to the unprecedented growth in the amount of data. * Big Velocity refers to the growth in the speed of moving data in and out management systems. * Big Variety refers to the growth in the number of different data formats. Managing Big Data requires fundamental changes in the architecture of data management systems. Data storage should continue being innovated in order to adapt to the growth of data. They need to be scalable while maintaining high performance regarding data accesses. This thesis focuses on building scalable data management systems for Big Data. Our first and second contributions address the challenge of providing efficient support for Big Volume of data in data-intensive high performance computing (HPC) environments. Particularly, we address the shortcoming of existing approaches to handle atomic, non-contiguous I/O operations in a scalable fashion. We propose and implement a versioning-based mechanism that can be leveraged to offer isolation for non-contiguous I/O without the need to perform expensive synchronizations. In the context of parallel array processing in HPC, we introduce Pyramid, a large-scale, array-oriented storage system. It revisits the physical organization of data in distributed storage systems for scalable performance. Pyramid favors multidimensional-aware data chunking, that closely matches the access patterns generated by applications. Pyramid also favors a distributed metadata management and a versioning concurrency control to eliminate synchronizations in concurrency. Our third contribution addresses Big Volume at the scale of the geographically distributed environments. We consider BlobSeer, a distributed versioning-oriented data management service, and we propose BlobSeer-WAN, an extension of BlobSeer optimized for such geographically distributed environments. BlobSeer-WAN takes into account the latency hierarchy by favoring locally metadata accesses. BlobSeer-WAN features asynchronous metadata replication and a vector-clock implementation for collision resolution. To cope with the Big Velocity characteristic of Big Data, our last contribution feautures DStore, an in-memory document-oriented store that scale vertically by leveraging large memory capability in multicore machines. DStore demonstrates fast and atomic complex transaction processing in data writing, while maintaining high throughput read access. DStore follows a single-threaded execution model to execute update transactions sequentially, while relying on a versioning concurrency control to enable a large number of simultaneous readers.

APA, Harvard, Vancouver, ISO, and other styles

10

Surapaneni, Chandra Sekhar Medhi Deepankar. "Dynamically organized and scalable virtual organizations in Grid computing." Diss., UMK access, 2005.

Find full text

Abstract:

Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2005.
"A thesis in computer science." Typescript. Advisor: Deepankar Medhi. Vita. Title from "catalog record" of the print edition Description based on contents viewed March 12, 2007. Includes bibliographical references (leaves 85-87). Online version of the print edition.

APA, Harvard, Vancouver, ISO, and other styles

11

Lanore, Vincent. "On Scalable Reconfigurable Component Models for High-Performance Computing." Thesis, Lyon, École normale supérieure, 2015. http://www.theses.fr/2015ENSL1051/document.

Full text

Abstract:

La programmation à base de composants est un paradigme de programmation qui facilite la réutilisation de code et la séparation des préoccupations. Les modèles à composants dits « reconfigurables » permettent de modifier en cours d'exécution la structure d'une application. Toutefois, ces modèles ne sont pas adaptés au calcul haute performance (HPC) car ils reposent sur des mécanismes ne passant pas à l'échelle.L'objectif de cette thèse est de fournir des modèles, des algorithmes et des outils pour faciliter le développement d'applications HPC reconfigurables à base de composants. La principale contribution de la thèse est le modèle à composants formel DirectMOD qui facilite l'écriture et la réutilisation de code de transformation distribuée. Afin de faciliter l'utilisation de ce premier modèle, nous avons également proposé :• le modèle formel SpecMOD qui permet la spécialisation automatique d'assemblage de composants afin de fournir des fonctionnalités de génie logiciel de haut niveau ; • des mécanismes de reconfiguration performants à grain fin pour les applications AMR, une classe d'application importante en HPC.Une implémentation de DirectMOD, appelée DirectL2C, a été réalisée et a permis d'implémenter une série de benchmarks basés sur l'AMR pour évaluer notre approche. Des expériences sur grappes de calcul et supercalculateur montrent que notre approche passe à l'échelle. De plus, une analyse quantitative du code produit montre que notre approche est compacte et facilite la réutilisation
Component-based programming is a programming paradigm which eases code reuse and separation of concerns. Some component models, which are said to be "reconfigurable", allow the modification at runtime of an application's structure. However, these models are not suited to High-Performance Computing (HPC) as they rely on non-scalable mechanisms.The goal of this thesis is to provide models, algorithms and tools to ease the development of component-based reconfigurable HPC applications.The main contribution of the thesis is the DirectMOD component model which eases development and reuse of distributed transformations. In order to improve on this core model in other directions, we have also proposed:• the SpecMOD formal component model which allows automatic specialization of hierarchical component assemblies and provides high-level software engineering features;• mechanisms for efficient fine-grain reconfiguration for AMR applications, an important application class in HPC.An implementation of DirectMOD, called DirectL2C, as been developed so as to implement a series of benchmarks to evaluate our approach. Experiments on HPC architectures show our approach scales. Moreover, a quantitative analysis of the benchmark's codes show that our approach is compact and eases reuse

APA, Harvard, Vancouver, ISO, and other styles

12

Albaiz, Abdulaziz (Abdulaziz Mohammad). "MPI-based scalable computing platform for parallel numerical application." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/95562.

Full text

Abstract:

Thesis: S.M., Massachusetts Institute of Technology, Computation for Design and Optimization Program, 2014.
Cataloged from PDF version of thesis.
Includes bibliographical references (page 61).
Developing parallel numerical applications, such as simulators and solvers, involves a variety of challenges in dealing with data partitioning, workload balancing, data dependencies, and synchronization. Many numerical applications share the need for an underlying parallel framework for parallelization on multi-core/multi-machine hardware. In this thesis, a computing platform for parallel numerical applications is designed and implemented. The platform performs parallelization by multiprocessing over MPI library, and serves as a layer of abstraction that hides the complexities in dealing with data distribution and inter-process communication. It also provides the essential functions that most numerical application use, such as handling data-dependency, workload-balancing, and overlapping communication and computation. The performance evaluation of the parallel platform shows that it is highly scalable for large problems.
by Abdulaziz Albaiz.
S.M.

APA, Harvard, Vancouver, ISO, and other styles

13

Helal, Ahmed Elmohamadi Mohamed. "Automated Runtime Analysis and Adaptation for Scalable Heterogeneous Computing." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/96607.

Full text

Abstract:

In the last decade, there have been tectonic shifts in computer hardware because of reaching the physical limits of the sequential CPU performance. As a consequence, current high-performance computing (HPC) systems integrate a wide variety of compute resources with different capabilities and execution models, ranging from multi-core CPUs to many-core accelerators. While such heterogeneous systems can enable dramatic acceleration of user applications, extracting optimal performance via manual analysis and optimization is a complicated and time-consuming process. This dissertation presents graph-structured program representations to reason about the performance bottlenecks on modern HPC systems and to guide novel automation frameworks for performance analysis and modeling and runtime adaptation. The proposed program representations exploit domain knowledge and capture the inherent computation and communication patterns in user applications, at multiple levels of computational granularity, via compiler analysis and dynamic instrumentation. The empirical results demonstrate that the introduced modeling frameworks accurately estimate the realizable parallel performance and scalability of a given sequential code when ported to heterogeneous HPC systems. As a result, these frameworks enable efficient workload distribution schemes that utilize all the available compute resources in a performance-proportional way. In addition, the proposed runtime adaptation frameworks significantly improve the end-to-end performance of important real-world applications which suffer from limited parallelism and fine-grained data dependencies. Specifically, compared to the state-of-the-art methods, such an adaptive parallel execution achieves up to an order-of-magnitude speedup on the target HPC systems while preserving the inherent data dependencies of user applications.
Doctor of Philosophy
Current supercomputers integrate a massive number of heterogeneous compute units with varying speed, computational throughput, memory bandwidth, and memory access latency. This trend represents a major challenge to end users, as their applications have been designed from the ground up to primarily exploit homogeneous CPUs. While heterogeneous systems can deliver several orders of magnitude speedup compared to traditional CPU-based systems, end users need extensive software and hardware expertise as well as significant time and effort to efficiently utilize all the available compute resources. To streamline such a daunting process, this dissertation presents automated frameworks for analyzing and modeling the performance on parallel architectures and for transforming the execution of user applications at runtime. The proposed frameworks incorporate domain knowledge and adapt to the input data and the underlying hardware using novel static and dynamic analyses. The experimental results show the efficacy of the introduced frameworks across many important application domains, such as computational fluid dynamics (CFD), and computer-aided design (CAD). In particular, the adaptive execution approach on heterogeneous systems achieves up to an order-of-magnitude speedup over the optimized parallel implementations.

APA, Harvard, Vancouver, ISO, and other styles

14

De, Guzman Ethan Paul Palisoc. "Energy Efficient Computing using Scalable General Purpose Analog Processors." DigitalCommons@CalPoly, 2021. https://digitalcommons.calpoly.edu/theses/2305.

Full text

Abstract:

Due to fundamental physical limitations, conventional digital circuits have not been able to scale at the pace expected from Moore’s law. In addition, computationally intensive applications such as neural networks and computer vision demand large amounts of energy from digital circuits. As a result, energy efficient alternatives are needed in order to provide continued performance scaling. Analog circuits have many well known benefits: the ability to store more information onto a single wire and efficiently perform mathematical operations such as addition, subtraction, and differential equation solving. However, analog computing also comes with drawbacks such as its sensitivity to process variation and noise, limited scalability, programming difficulty, and poor compatibility with digital circuits and design tools. We propose to leverage the strengths of analog circuits and avoid its weaknesses by using digital circuits and time-encoded computation. Time-encoded circuits also operate on continuous data but are implemented using digital circuits. We propose a novel scalable general purpose analog processor using time-encoded circuits that is well suited for emerging applications that require high numeric precision. The processor’s datapath, including time-domain register file and function units are described. We evaluate our proposed approach using an implementation that is simulated with a 0.18µm TSMC process and demonstrate that this approach improves the performance of a scientific benchmark by 4x compared against conventional analog implementations and improves energy consumption by 146x compared against digital implementations.

APA, Harvard, Vancouver, ISO, and other styles

15

Buehrer, Gregory T. "Scalable mining on emerging architectures." Columbus, Ohio : Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1198866625.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Paolucci, Cristian. "Prototyping a scalable Aggregate Computing cluster with open-source solutions." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/15716/.

Full text

Abstract:

L'Internet of Things è un concetto che è stato ora adottato in modo pervasivo per descrivere un vasto insieme di dispositivi connessi attraverso Internet. Comunemente, i sistemi IoT vengono creati con un approccio bottom-up e si concentrano principalmente sul singolo dispositivo, il quale è visto come la basilare unità programmabile. Da questo metodo può emergere un comportamento comune trovato in molti sistemi esistenti che deriva dall'interazione di singoli dispositivi. Tuttavia, questo crea un'applicazione distribuita spesso dove i componenti sono strettamente legati tra di loro. Quando tali applicazioni crescono in complessità, tendono a soffrire di problemi di progettazione, mancanza di modularità e riusabilità, difficoltà di implementazione e problemi di test e manutenzione. L'Aggregate Programming fornisce un approccio top-down a questi sistemi, in cui l'unità di calcolo di base è un'aggregazione anziché un singolo dispositivo. Questa tesi consiste nella progettazione e nella distribuzione di una piattaforma, basata su tecnologie open-source, per supportare l'Aggregate Computing nel cloud, in cui i dispositivi saranno in grado di scegliere dinamicamente se il calcolo si trova su se stessi o nel cloud. Anche se Aggregate Computing è intrinsecamente progettato per un calcolo distribuito, il Cloud Computing introduce un'alternativa scalabile, affidabile e altamente disponibile come strategia di esecuzione. Quest'opera descrive come sfruttare una Reactive Platform per creare un'applicazione scalabile nel cloud. Dopo che la struttura, l'interazione e il comportamento dell'applicazione sono stati progettati, viene descritto come la distribuzione dei suoi componenti viene effettuata attraverso un approccio di containerizzazione con Kubernetes come orchestratore per gestire lo stato desiderato del sistema con una strategia di Continuous Delivery.

APA, Harvard, Vancouver, ISO, and other styles

17

Austin, Paul Baden. "Towards a file system for a scalable parallel computing engine." Thesis, University of York, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.304159.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Carrillo, Snaider. "Scalable hierarchical networks-on-chip architecture for brain-inspired computing." Thesis, Ulster University, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.633690.

Full text

Abstract:

The brain is highly efficient in how it processes information and tolerates faults. Significant research is therefore focused on harnessing this efficiency and to build artificial neural systems that can emulate the key information processing principles of the brain. However, existing software approaches are too slow and cannot provide the dense interconnect for the billions of neurons and synapses that are required. Therefore, it is necessary to look to new custom hardware architectures to address this scalability issue and to enable the deployment of brain-like embedded systems processors. This thesis presents a novel Hierarchical Networks-on-Chip (H-NoC) architecture for SNN hardware, which aims to address the scalability issue by creating a modular array of clusters of neurons using a hierarchical structure of low and high-level routers. The proposed H-NoC architecture can be viewed as a flat 3D structure, which mimics to a degree the hierarchical organisation found in biological neural systems. Furthermore, this H-NoC architecture also incorporates a novel spike traffic compression technique to exploit SNN traffic patterns and locality between neurons, thus reducing traffic overhead and improving throughput on the network. In addition, novel adaptive routing capabilities between clusters, balance local and global traffic loads to sustain throughput under bursting activity. The thesis also reports on analytical results based on five large-scale scenarios, which demonstrate the scalability of the proposed H-NoC approach under varied traffic load intensities. Simulation and synthesis analysis using 65-nm CMOS technology demonstrate a good trade-off between high throughput and low cost area/power footprints per cluster. The thesis concludes with results on the mapping of the IRIS and Wisconsin Breast Cancer data sets using the proposed H-NoC architecture, and validates in FPGA hardware, the analytical performance. Most importantly, the FPGA implementation of both benchmarks demonstrates that the H-NoC architecture can provide up to 100x speedup when compared with biological real-time system equivalents.

APA, Harvard, Vancouver, ISO, and other styles

19

Suresh, Visalakshmi. "Scalable and responsive real time event processing using cloud computing." Thesis, University of Newcastle upon Tyne, 2017. http://hdl.handle.net/10443/3917.

Full text

Abstract:

Cloud computing provides the potential for scalability and adaptability in a cost e ective manner. However, when it comes to achieving scalability for real time applications response time cannot be high. Many applications require good performance and low response time, which need to be matched with the dynamic resource allocation. The real time processing requirements can also be characterized by unpredictable rates of incoming data streams and dynamic outbursts of data. This raises the issue of processing the data streams across multiple cloud computing nodes. This research analyzes possible methodologies to process the real time data in which applications can be structured as multiple event processing networks and be partitioned over the set of available cloud nodes. The approach is based on queuing theory principles to encompass the cloud computing. The transformation of the raw data into useful outputs occurs in various stages of processing networks which are distributed across the multiple computing nodes in a cloud. A set of valid options is created to understand the response time requirements for each application. Under a given valid set of conditions to meet the response time criteria, multiple instances of event processing networks are distributed in the cloud nodes. A generic methodology to scale-up and scale-down the event processing networks in accordance to the response time criteria is de ned. The real time applications that support sophisticated decision support mechanisms need to comply with response time criteria consisting of interdependent data ow paradigms making it harder to improve the performance. Consideration is given for ways to reduce the latency,improve response time and throughput of the real time applications by distributing the event processing networks in multiple computing nodes.

APA, Harvard, Vancouver, ISO, and other styles

20

Liu, Jiuxing. "Designing high performance and scalable MPI over InfiniBand." The Ohio State University, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=osu1095296555.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Andersson, Filip, and Simon Norberg. "Scalable applications in a distributed environment." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-3917.

Full text

Abstract:

As the amount of simultaneous users of distributed systems increase, scalability is becoming an important factor to consider during software development. Without sufficient scalability, systems might have a hard time to manage high loads, and might not be able to support a high amount of users. We have determined how scalability can best be implemented, and what extra costs this leads to. Our research is based on both a literature review, where we have looked at what others in the field of computer engineering thinks about scalability, and by implementing a highly scalable system of our own. In the end we came up with a couple of general pointers which can help developers to determine if they should focus on scalable development, and what they should consider if they choose to do so.

APA, Harvard, Vancouver, ISO, and other styles

22

Li, Dong. "Scalable and Energy Efficient Execution Methods for Multicore Systems." Diss., Virginia Tech, 2011. http://hdl.handle.net/10919/26098.

Full text

Abstract:

Multicore architectures impose great pressure on resource management. The exploration spaces available for resource management increase explosively, especially for large-scale high end computing systems. The availability of abundant parallelism causes scalability concerns at all levels. Multicore architectures also impose pressure on power management. Growth in the number of cores causes continuous growth in power. In this dissertation, we introduce methods and techniques to enable scalable and energy efficient execution of parallel applications on multicore architectures. We study strategies and methodologies that combine DCT and DVFS for the hybrid MPI/OpenMP programming model. Our algorithms yield substantial energy saving (8.74% on average and up to 13.8%) with either negligible performance loss or performance gain (up to 7.5%). To save additional energy for high-end computing systems, we propose a power-aware MPI task aggregation framework. The framework predicts the performance effect of task aggregation in both computation and communication phases and its impact in terms of execution time and energy of MPI programs. Our framework provides accurate predictions that lead to substantial energy saving through aggregation (64.87% on average and up to 70.03%) with tolerable performance loss (under 5%). As we aggregate multiple MPI tasks within the same node, we have the scalability concern of memory registration for high performance networking. We propose a new memory registration/deregistration strategy to reduce registered memory on multicore architectures with helper threads. We investigate design polices and performance implications of the helper thread approach. Our method efficiently reduces registered memory (23.62% on average and up to 49.39%) and avoids memory registration/deregistration costs for reused communication memory. Our system enables the execution of application input sets that could not run to the completion with the memory registration limitation.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

23

Mühll, Johann Rudolf Vonder. "Concept and implementation of a scalable architecture for data-parallel computing /." [S.l.] : [s.n.], 1996. http://e-collection.ethbib.ethz.ch/show?type=diss&nr=11787.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Tashakor, Ghazal. "Scalable agent-based model simulation using distributed computing on system biology." Doctoral thesis, Universitat Autònoma de Barcelona, 2021. http://hdl.handle.net/10803/671332.

Full text

Abstract:

El modelat basat en agents és una eina informàtica molt útil que permet simular un comportament complex utilitzant regles tant a escales micro com macro. La complexitat d’aquest tipus de modelat està en la definició de les regles que tendran els agents per definir elements estructurals o els patrons de comportament estàtics i/o dinàmics. La present tesis aborda la definició de models complexos de xarxes biològiques que representen cèl·lules canceroses per obtenir comportaments sobre diferents escenaris mitjançant simulació i conèixer l’evolució del procés de metàstasi per a usuaris no-experts en sistemes de còmput. A més es desenvolupa una prova de concepte de com incorporar tècniques d’anàlisi de xarxes dinàmiques i d’aprenentatge automàtic en els models basats en agents a partir del desenvolupament d’un sistema de simulació federat per millorar el procés de presa de decisions. Per al desenvolupament d’aquesta tesi s’ha tingut que abordar, des del punt de vista de la simulació, la representació de xarxes biològiques complexes basades en grafs i investigar com integrar la topologia i funcions d’aquest tipus de xarxes interactuant amb un model basat en agents. En aquest objectiu, s’ha utilitzat el model ABM com a base per a la construcció, agrupament i classificació dels elements de la xarxa i que representen l’estructura d’una xarxa biològica complexa i escalable. La simulació d’un model complex de múltiples escales i múltiples agents, proporciona una eina útil per a que un científic, no-expert en computació, pugui executar un model complex i paramètric i utilitzar-ho com a eina d’anàlisi d’escenaris o predicció de variacions segons els diferents perfils de pacients considerats. El desenvolupament s’ha centrat en un model de tumor basat en agents que ha evolucionat des d’un model ABM simple i bé conegut, al qual se li han incorporat les variables i dinàmiques referenciades per l’Hallmarks of Cancer, fins a un models basat en grafs. Aquest model, basat en grafs, permet representar a diferents nivells d’interacció i dinàmiques dins de les cèl·lules en l’evolució d’un tumor que permet diferents graus de representacions (a nivell molecular/cel·lular). Tot això s’ha posat en funcionament en un entorn de simulació i ha creat un flux de treball (workflow) per construir una xarxa escalable complexa basada en un escenari de creixement tumoral i on s’apliquen tècniques dinàmiques per conèixer el creixement de la xarxa tumoral sobre diferents patrons. L’experimentació s’ha realitzat utilitzant l’entorn de simulació desenvolupat considerat l’execució de models per a diferents perfils de pacients, com a mostra de la seva funcionalitat, per a paràmetres d’interès per a l’expert no-informàtic com per exemple l’evolució del volum del tumor. L’entorn ha estat dissenyat per descobrir i classificar subgrafs del model de tumor basat en agents, que permetran distribuir els models en un sistema de còmput d’altes prestacions per poder analitzar escenaris complexos i/o diferents perfils de pacients amb patrons tumorals amb un alt nombre de cèl·lules canceroses en un temps reduït.
El modelado basado en agentes es una herramienta computacional muy útil que permite simular un comportamiento complejo utilizando reglas tanto en escalas micro como macro. La complejidad de este tipo de modelado radica en la definición de las reglas que tendrán los agentes para definir los elementos estructurales o los patrones de comportamiento estáticos y/o dinámicos. La presente tesis aborda la definición de modelos complejos de redes biológicas que representan células cancerosas para obtener comportamientos sobre diferentes escenarios mediante simulación y conocer la evolución del proceso de metástasis para usuarios no expertos en sistemas de cómputo. Además se desarrolla una prueba de concepto de cómo incorporar técnicas de análisis de redes dinámicas y de aprendizaje automático en los modelos basados en agentes a partir del desarrollo de un sistema de simulación federado para mejorar el proceso de toma de decisiones. Para el desarrollo de esta tesis se han tenido que abordar, desde el punto de vista de la simulación, la representación de redes biológicas complejas basadas en grafos e investigar como integrar la topología y funciones de este tipo de redes interactuando un modelo basado en agentes. En este objetivo, se ha utilizado el modelo ABM como base para la construcción, agrupamiento y clasificación de los elementos de la red y que representan la estructura de una red biológica compleja y escalable. La simulación de un modelo complejo de múltiples escalas y múltiples agentes, proporciona una herramienta útil para que un científico, no-experto en computación, pueda ejecutar un modelo complejo paramétrico y utilizarlo como herramienta de análisis de escenarios o predicción de variaciones según los diferentes perfiles de pacientes considerados. El desarrollo se ha centrado en un modelo de tumor basado en agentes que ha evolucionado desde un modelo ABM simple y bien conocido, al cual se le han incorporado las variables y dinámicas referenciadas por el Hallmarks of Cancer, a un modelo complejo basado en grafos. Este modelo, basado en grafos, se utiliza para representar a diferentes niveles de interacción y dinámicas dentro de las células en la evolución de un tumor que permite diferentes grado de representaciones (a nivel molecular/celular). Todo ello se ha puesto en funcionamiento en un entorno de simulación y se ha creado un flujo de trabajo (workflow) para construir una red escalable compleja basada en un escenario de crecimiento tumoral y donde se aplican técnicas dinámicas para conocer el crecimiento de la red tumoral sobre diferentes patrones. La experimentación se ha realizado utilizando el entorno de simulación desarrollado considerado la ejecución de modelos para diferentes perfiles de pacientes, como muestra de su funcionalidad, para calcular parámetros de interés para el experto no-informático como por ejemplo la evolución del volumen del tumor. El entorno ha sido diseñado para descubrir y clasificar subgrafos del modelo de tumor basado en agentes, que permitirá distribuir los modelos en un sistema de cómputo de altas prestaciones y así poder analizar escenarios complejos y/o diferentes perfiles de pacientes con patrones tumorales con un alto número de células cancerosas en un tiempo reducido.
Agent-based modeling is a very useful computational tool to simulate complex behavior using rules at micro and macro scales. This type of modeling’s complexity is in defining the rules that the agents will have to define the structural elements or the static and dynamic behavior patterns. This thesis considers the definition of complex models of biological networks that represent cancer cells obtain behaviors on different scenarios by means of simulation and to know the evolution of the metastatic process for non-expert users of computer systems. Besides, a proof of concept has been developed to incorporate dynamic network analysis techniques and machine learning in agent-based models based on developing a federated simulation system to improve the decision-making process. For this thesis’s development, the representation of complex biological networks based on graphs has been analyzed, from the simulation point of view, to investigate how to integrate the topology and functions of this type of networks interacting with an agent-based model. For this purpose, the ABM model has been used as a basis for the construction, grouping, and classification of the network elements representing the structure of a complex and scalable biological network. The simulation of complex models with multiple scales and multiple agents provides a useful tool for a scientist, non-computer expert to execute a complex parametric model and use it to analyze scenarios or predict variations according to the different patient’s profiles. The development has focused on an agent-based tumor model that has evolved from a simple and well-known ABM model. The variables and dynamics referenced by the Hallmarks of Cancer have been incorporated into a complex model based on graphs. Based on graphs, this model is used to represent different levels of interaction and dynamics within cells in the evolution of a tumor with different degrees of representations (at the molecular/cellular level). A simulation environment and workflow have been created to build a complex, scalable network based on a tumor growth scenario. In this environment, dynamic techniques are applied to know the tumor network’s growth using different patterns. The experimentation has been carried out using the simulation environment developed considering the execution of models for different patient profiles, as a sample of its functionality, to calculate parameters of interest for the non-computer expert, such as the evolution of the tumor volume. The environment has been designed to discover and classify subgraphs of the agent-based tumor model to execute these models in a high-performance computer system. These executions will allow us to analyze complex scenarios and different profiles of patients with tumor patterns with a high number of cancer cells in a short time.

APA, Harvard, Vancouver, ISO, and other styles

25

Benedicto, Kathryn Flores 1977. "Regions : a scalable infrastructure for scoped service location in ubiquitous computing." Thesis, Massachusetts Institute of Technology, 1999. http://hdl.handle.net/1721.1/80038.

Full text

Abstract:

Thesis (S.B. and M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.
Includes bibliographical references (leaves 108-109).
by Kathryn Flores Benedicto.
S.B.and M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

26

Coons, Samuel W. "Virtual thin client a scalable service discovery approach for pervasive computing /." [Gainesville, Fla.] : University of Florida, 2001. http://purl.fcla.edu/fcla/etd/anp4316.

Full text

Abstract:

Thesis (M.E.)--University of Florida, 2001.
Title from first page of PDF file. Document formatted into pages; contains xi, 68 p.; also contains graphics. Vita. Includes bibliographical references (p. 66-67).

APA, Harvard, Vancouver, ISO, and other styles

27

De, Francisci Morales Gianmarco. "Big data and the web: algorithms for data intensive scalable computing." Thesis, IMT Alti Studi Lucca, 2012. http://e-theses.imtlucca.it/34/1/De%20Francisci_phdthesis.pdf.

Full text

Abstract:

This thesis explores the problem of large scale Web mining by using Data Intensive Scalable Computing (DISC) systems. Web mining aims to extract useful information and models from data on the Web, the largest repository ever created. DISC systems are an emerging technology for processing huge datasets in parallel on large computer clusters. Challenges arise from both themes of research. The Web is heterogeneous: data lives in various formats that are best modeled in different ways. Effectively extracting information requires careful design of algorithms for specific categories of data. TheWeb is huge, but DISC systems offer a platform for building scalable solutions. However, they provide restricted computing primitives for the sake of performance. Efficiently harnessing the power of parallelism offered by DISC systems involves rethinking traditional algorithms. This thesis tackles three classical problems in Web mining. First we propose a novel solution to finding similar items in a bag of Web pages. Second we consider how to effectively distribute content from Web 2.0 to users via graph matching. Third we show how to harness the streams from the real-time Web to suggest news articles. Our main contribution lies in rethinking these problems in the context of massive scaleWeb mining, and in designing efficient MapReduce and streaming algorithms to solve these problems on DISC systems.

APA, Harvard, Vancouver, ISO, and other styles

28

Jarratt, Marie Claire. "Readout and Control: Scalable Techniques for Quantum Information Processing." Thesis, The University of Sydney, 2019. https://hdl.handle.net/2123/21572.

Full text

Abstract:

Quantum mechanics allows for the processing of information in entirely new ways, surpassing the computational limits set by classical physics. Termed `quantum information processing', scaling this scheme relies on simultaneously increasing the number of qubits -- the fundamental unit of quantum computation -- whilst reducing their error rates. With this comes a variety of challenges, including the ability to readout the quantum state of large numbers of qubits, as well as to control their evolution in order to mitigate errors. This thesis aims to address these challenges by developing techniques for the readout and control of quantum systems. The first series of experiments focuses on the readout of GaAs/AlGaAs semiconductor quantum systems, primarily relating to the technique of dispersive gate sensing (DGS). DGS is used to probe electron transmission in an open system, a quantum point contact, demonstrating an ability to resolve characteristic features of a one-dimensional ballistic channel in the limit where transport is not possible. DGS is also used to observe anomalous signals in the potential landscape of quantum-dot defining gate electrodes. A technique for time domain multiplexing is also presented, which allows for readout resources, in the form of microwave components, to be shared between multiple qubits, increasing the capacity of a single readout line. The second series of experiments validates control techniques using trapped 171Yb+ ions. Classical error models are engineered using high-bandwidth IQ modulation of the microwave source used to drive qubit rotations. Reductions in the coherent lifetime of the quantum system are shown to match well with quantitative models. This segues in to developing techniques to understand and suppress noise in the system. This is achieved using the filter-transfer function approach, which casts arbitrary quantum control operations on qubits as noise spectral filters.

APA, Harvard, Vancouver, ISO, and other styles

29

Drolia, Utsav. "Adaptive Distributed Caching for Scalable Machine Learning Services." Research Showcase @ CMU, 2017. http://repository.cmu.edu/dissertations/1004.

Full text

Abstract:

Applications for Internet-enabled devices use machine learning to process captured data to make intelligent decisions or provide information to users. Typically, the computation to process the data is executed in cloud-based backends. The devices are used for sensing data, offloading it to the cloud, receiving responses and acting upon them. However, this approach leads to high end-to-end latency due to communication over the Internet. This dissertation proposes reducing this response time by minimizing offloading, and pushing computation close to the source of the data, i.e. to edge servers and devices themselves. To adapt to the resource constrained environment at the edge, it presents an approach that leverages spatiotemporal locality to push subparts of the model to the edge. This approach is embodied in a distributed caching framework, Cachier. Cachier is built upon a novel caching model for recognition, and is distributed across edge servers and devices. The analytical caching model for recognition provides a formulation for expected latency for recognition requests in Cachier. The formulation incorporates the effects of compute time and accuracy. It also incorporates network conditions, thus providing a method to compute expected response times under various conditions. This is utilized as a cost function by Cachier, at edge servers and devices. By analyzing requests at the edge server, Cachier caches relevant parts of the trained model at edge servers, which is used to respond to requests, minimizing the number of requests that go to the cloud. Then, Cachier uses context-aware prediction to prefetch parts of the trained model onto devices. The requests can then be processed on the devices, thus minimizing the number of offloaded requests. Finally, Cachier enables cooperation between nearby devices to allow exchanging prefetched data, reducing the dependence on remote servers even further. The efficacy of Cachier is evaluated by using it with an art recognition application. The application is driven using real world traces gathered at museums. By conducting a large-scale study with different control variables, we show that Cachier can lower latency, increase scalability and decrease infrastructure resource usage, while maintaining high accuracy.

APA, Harvard, Vancouver, ISO, and other styles

30

Lucernati, Romano. "Scalable and Seamless Discovery and Selection of Services in Mobile Cloud Computing." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016.

Find full text

Abstract:

Mobile devices are now capable of supporting a wide range of applications, many of which demand an ever increasing computational power. To this end, mobile cloud computing (MCC) has been proposed to address the limited computation power, memory, storage, and energy of such devices. An important challenge in MCC is to guarantee seamless discovery of services. To this end, this thesis proposes an architecture that provides user-transparent and low-latency service discovery, as well as automated service selection. Experimental results on a real cloud computing testbed demonstrated that the proposed work outperforms state of-the-art approaches by achieving extremely low discovery delay.

APA, Harvard, Vancouver, ISO, and other styles

31

PIVANTI, Marcello. "A Scalable Parallel Architecture with FPGA-Based Network Processor for Scientific Computing." Doctoral thesis, Università degli studi di Ferrara, 2012. http://hdl.handle.net/11392/2389440.

Full text

Abstract:

This thesis discuss the design and the implementation of an FPGA-Based Network Processor for scientific computing, like Lattice Quantum ChromoDinamycs (LQCD) and fluid-dynamics applications based on Lattice Boltzmann Methods (LBM). State-of-the-art programs in this (and other similar) applications have a large degree of available parallelism, that can be easily exploited on massively parallel systems, provided the underlying communication network has not only high-bandwidth but also low-latency. I have designed in details, built and tested in hardware, firmware and software an implementation of a Network Processor, tailored for the most recent families of multi-core processors. The implementation has been developed on an FPGA device to easily interface the logic of NWP with the CPU I/O sub-system. In this work I have assessed several ways to move data between the main memory of the CPU and the I/O sub-system to exploit high data throughput and low latency, enabling the use of “Programmed Input Output” (PIO), “Direct Memory Access” (DMA) and “Write Combining” memory-settings. On the software side, I developed and test a device driver for the Linux operating system to access the NWP device, as well as a system library to efficiently access the network device from user-applications. This thesis demonstrates the feasibility of a network infrastructure that saturates the maximum bandwidth of the I/O sub-systems available on recent CPUs, and reduces communication latencies to values very close to those needed by the processor to move data across the chip boundary.

APA, Harvard, Vancouver, ISO, and other styles

32

Alham, Nasullah Khalid. "Parallelizing support vector machines for scalable image annotation." Thesis, Brunel University, 2011. http://bura.brunel.ac.uk/handle/2438/5452.

Full text

Abstract:

Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them Support Vector Machines (SVMs) are used extensively due to their generalization properties. However, SVM training is notably a computationally intensive process especially when the training dataset is large. In this thesis distributed computing paradigms have been investigated to speed up SVM training, by partitioning a large training dataset into small data chunks and process each chunk in parallel utilizing the resources of a cluster of computers. A resource aware parallel SVM algorithm is introduced for large scale image annotation in parallel using a cluster of computers. A genetic algorithm based load balancing scheme is designed to optimize the performance of the algorithm in heterogeneous computing environments. SVM was initially designed for binary classifications. However, most classification problems arising in domains such as image annotation usually involve more than two classes. A resource aware parallel multiclass SVM algorithm for large scale image annotation in parallel using a cluster of computers is introduced. The combination of classifiers leads to substantial reduction of classification error in a wide range of applications. Among them SVM ensembles with bagging is shown to outperform a single SVM in terms of classification accuracy. However, SVM ensembles training are notably a computationally intensive process especially when the number replicated samples based on bootstrapping is large. A distributed SVM ensemble algorithm for image annotation is introduced which re-samples the training data based on bootstrapping and training SVM on each sample in parallel using a cluster of computers. The above algorithms are evaluated in both experimental and simulation environments showing that the distributed SVM algorithm, distributed multiclass SVM algorithm, and distributed SVM ensemble algorithm, reduces the training time significantly while maintaining a high level of accuracy in classifications.

APA, Harvard, Vancouver, ISO, and other styles

33

Aguilar, Xavier. "Towards Scalable Performance Analysis of MPI Parallel Applications." Licentiate thesis, KTH, High Performance Computing and Visualization (HPCViz), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-165043.

Full text

Abstract:

A considerably fraction of science discovery is nowadays relying on computer simulations. High Performance Computing (HPC) provides scientists with the means to simulate processes ranging from climate modeling to protein folding. However, achieving good application performance and making an optimal use of HPC resources is a heroic task due to the complexity of parallel software. Therefore, performance tools and runtime systems that help users to execute applications in the most optimal way are of utmost importance in the landscape of HPC. In this thesis, we explore different techniques to tackle the challenges of collecting, storing, and using fine-grained performance data. First, we investigate the automatic use of real-time performance data in order to run applications in an optimal way. To that end, we present a prototype of an adaptive task-based runtime system that uses real-time performance data for task scheduling. This runtime system has a performance monitoring component that provides real-time access to the performance behavior of anapplication while it runs. The implementation of this monitoring component is presented and evaluated within this thesis. Secondly, we explore lossless compression approaches for MPI monitoring. One of the main problems that performance tools face is the huge amount of fine-grained data that can be generated from an instrumented application. Collecting fine-grained data from a program is the best method to uncover the root causes of performance bottlenecks, however, it is unfeasible with extremely parallel applications or applications with long execution times. On the other hand, collecting coarse-grained data is scalable but sometimes not enough to discern the root cause of a performance problem. Thus, we propose a new method for performance monitoring of MPI programs using event flow graphs. Event flow graphs provide very low overhead in terms of execution time and storage size, and can be used to reconstruct fine-grained trace files of application events ordered in time.

QC 20150508

APA, Harvard, Vancouver, ISO, and other styles

34

Langmead, Benjamin Thomas. "Highly scalable short read alignment with the Burrows-Wheeler Transform and cloud computing." College Park, Md.: University of Maryland, 2009. http://hdl.handle.net/1903/9458.

Full text

Abstract:

Thesis (M.S.) -- University of Maryland, College Park, 2009.
Thesis research directed by: Dept. of Computer Science. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.

APA, Harvard, Vancouver, ISO, and other styles

35

Klenk, Benjamin [Verfasser], and Holger [Akademischer Betreuer] Fröning. "Communication Architectures for Scalable GPU-centric Computing Systems / Benjamin Klenk ; Betreuer: Holger Fröning." Heidelberg : Universitätsbibliothek Heidelberg, 2018. http://d-nb.info/1177691078/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Cazalas, Jonathan M. "Efficient and Scalable Evaluation of Continuous, Spatio-temporal Queries in Mobile Computing Environments." Doctoral diss., University of Central Florida, 2012. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5154.

Full text

Abstract:

A variety of research exists for the processing of continuous queries in large, mobile environments. Each method tries, in its own way, to address the computational bottleneck of constantly processing so many queries. For this research, we present a two-pronged approach at addressing this problem. Firstly, we introduce an efficient and scalable system for monitoring traditional, continuous queries by leveraging the parallel processing capability of the Graphics Processing Unit. We examine a naive CPU-based solution for continuous range-monitoring queries, and we then extend this system using the GPU. Additionally, with mobile communication devices becoming commodity, location-based services will become ubiquitous. To cope with the very high intensity of location-based queries, we propose a view oriented approach of the location database, thereby reducing computation costs by exploiting computation sharing amongst queries requiring the same view. Our studies show that by exploiting the parallel processing power of the GPU, we are able to significantly scale the number of mobile objects, while maintaining an acceptable level of performance. Our second approach was to view this research problem as one belonging to the domain of data streams. Several works have convincingly argued that the two research fields of spatio-temporal data streams and the management of moving objects can naturally come together. [IlMI10, ChFr03, MoXA04] For example, the output of a GPS receiver, monitoring the position of a mobile object, is viewed as a data stream of location updates. This data stream of location updates, along with those from the plausibly many other mobile objects, is received at a centralized server, which processes the streams upon arrival, effectively updating the answers to the currently active queries in real time. For this second approach, we present GEDS, a scalable, Graphics Processing Unit (GPU)-based framework for the evaluation of continuous spatio-temporal queries over spatio-temporal data streams. Specifically, GEDS employs the computation sharing and parallel processing paradigms to deliver scalability in the evaluation of continuous, spatio-temporal range queries and continuous, spatio-temporal kNN queries. The GEDS framework utilizes the parallel processing capability of the GPU, a stream processor by trade, to handle the computation required in this application. Experimental evaluation shows promising performance and shows the scalability and efficacy of GEDS in spatio-temporal data streaming environments. Additional performance studies demonstrate that, even in light of the costs associated with memory transfers, the parallel processing power provided by GEDS clearly counters and outweighs any associated costs. Finally, in an effort to move beyond the analysis of specific algorithms over the GEDS framework, we take a broader approach in our analysis of GPU computing. What algorithms are appropriate for the GPU? What types of applications can benefit from the parallel and stream processing power of the GPU? And can we identify a class of algorithms that are best suited for GPU computing? To answer these questions, we develop an abstract performance model, detailing the relationship between the CPU and the GPU. From this model, we are able to extrapolate a list of attributes common to successful GPU-based applications, thereby providing insight into which algorithms and applications are best suited for the GPU and also providing an estimated theoretical speedup for said GPU-based applications.
ID: 031001567; System requirements: World Wide Web browser and PDF reader.; Mode of access: World Wide Web.; Title from PDF title page (viewed August 26, 2013).; Thesis (Ph.D.)--University of Central Florida, 2012.; Includes bibliographical references (p. 103-112).
Ph.D.
Doctorate
Computer Science
Engineering and Computer Science
Computer Science

APA, Harvard, Vancouver, ISO, and other styles

37

Safranek, Robert J. "Enhancements to the scalable coherent interface cache protocol." PDXScholar, 1999. https://pdxscholar.library.pdx.edu/open_access_etds/3977.

Full text

Abstract:

As the number of NUMA system's cache coherency protocols based on the IEEE Std. 1596-1992, Standard for Scalable Coherent Interface (SCI) Specification increases, it is important to review this complex protocol to determine if the protocol can be enhanced in any way. This research provides two realizable extensions to the standard SCI cache protocol. Both of these extensions lie in the basic confines of the SCI architectures. The first extension is a simplification to the SCI protocol in the area of prepending to a sharing list. Depending if the cache line is marked "Fresh" or "Gone", the flow of events is distinctly different. The guaranteed forward progress extension is a simplification to the SCI protocol in this area; making the act of prepending to an existing sharing list independent of whether the line is in the "Fresh" or "Gone" state. In addition, this extension eliminates the need for SCI command, as well as distributes the resource requirements of supplying data of a shared line equally among all nodes of the sharing list. The second extension addresses the time to purge (or invalidate) an SCI sharing list. This extension provides a realizable solution that allows the node being invalidated to acknowledge the request prior to the completion of the invalidation while maintaining the memory consistency model of the processors of the system. The resulting cache protocol was developed and implemented for Sequent Computer System Inc. NUMA-Q system. The cache protocol was run on systems ranging from eight to sixty four processors and provided between 7% and 20% reduction in time to invalidate an SCI sharing list.

APA, Harvard, Vancouver, ISO, and other styles

38

Brzeczko, Albert Walter. "Scalable framework for turn-key honeynet deployment." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/51842.

Full text

Abstract:

Enterprise networks present very high value targets in the eyes of malicious actors who seek to exfiltrate sensitive proprietary data, disrupt the operations of a particular organization, or leverage considerable computational and network resources to further their own illicit goals. For this reason, enterprise networks typically attract the most determined of attackers. These attackers are prone to using the most novel and difficult-to-detect approaches so that they may have a high probability of success and continue operating undetected. Many existing network security approaches that fall under the category of intrusion detection systems (IDS) and intrusion prevention systems (IPS) are able to detect classes of attacks that are well-known. While these approaches are effective for filtering out routine attacks in automated fashion, they are ill-suited for detecting the types of novel tactics and zero-day exploits that are increasingly used against the enterprise. In this thesis, a solution is presented that augments existing security measures to provide enhanced coverage of novel attacks in conjunction with what is already provided by traditional IDS and IPS. The approach enables honeypots, a class of tech- nique that observes novel attacks by luring an attacker to perform malicious activity on a system having no production value, to be deployed in a turn-key fashion and at large scale on enterprise networks. In spite of the honeypot’s efficacy against tar- geted attacks, organizations can seldom afford to devote capital and IT manpower to integrating them into their security posture. Furthermore, misconfigured honeypots can actually weaken an organization’s security posture by giving the attacker a stag- ing ground on which to perform further attacks. A turn-key approach is needed for organizations to use honeypots to trap, observe, and mitigate novel targeted attacks.

APA, Harvard, Vancouver, ISO, and other styles

39

Wu, Fan. "Ubiquitous Scalable Graphics: An End-to-End Framework using Wavelets." Worcester, Mass. : Worcester Polytechnic Institute, 2008. http://www.wpi.edu/Pubs/ETD/Available/etd-111908-165451/.

Full text

Abstract:

Dissertation (Ph.D.)--Worcester Polytechnic Institute.
Keywords: Energy Consumption; Perceptual Error Metric; Multiresolution; Wavelets; Mobile Graphics. Includes bibliographical references (p. 109-124).

APA, Harvard, Vancouver, ISO, and other styles

40

Mohror, Kathryn Marie. "Scalable event tracking on high-end parallel systems." PDXScholar, 2010. https://pdxscholar.library.pdx.edu/open_access_etds/2811.

Full text

Abstract:

Accurate performance analysis of high end systems requires event-based traces to correctly identify the root cause of a number of the complex performance problems that arise on these highly parallel systems. These high-end architectures contain tens to hundreds of thousands of processors, pushing application scalability challenges to new heights. Unfortunately, the collection of event-based data presents scalability challenges itself: the large volume of collected data increases tool overhead, and results in data files that are difficult to store and analyze. Our solution to these problems is a new measurement technique called trace profiling that collects the information needed to diagnose performance problems that traditionally require traces, but at a greatly reduced data volume. The trace profiling technique reduces the amount of data measured and stored by capitalizing on the repeated behavior of programs, and on the similarity of the behavior and performance of parallel processes in an application run. Trace profiling is a hybrid between profiling and tracing, collecting summary information about the event patterns in an application run. Because the data has already been classified into behavior categories, we can present reduced, partially analyzed performance data to the user, highlighting the performance behaviors that comprised most of the execution time.

APA, Harvard, Vancouver, ISO, and other styles

41

Hilbrich, Tobias. "Runtime MPI Correctness Checking with a Scalable Tools Infrastructure." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-175472.

Full text

Abstract:

Increasing computational demand of simulations motivates the use of parallel computing systems. At the same time, this parallelism poses challenges to application developers. The Message Passing Interface (MPI) is a de-facto standard for distributed memory programming in high performance computing. However, its use also enables complex parallel programing errors such as races, communication errors, and deadlocks. Automatic tools can assist application developers in the detection and removal of such errors. This thesis considers tools that detect such errors during an application run and advances them towards a combination of both precise checks (neither false positives nor false negatives) and scalability. This includes novel hierarchical checks that provide scalability, as well as a formal basis for a distributed deadlock detection approach. At the same time, the development of parallel runtime tools is challenging and time consuming, especially if scalability and portability are key design goals. Current tool development projects often create similar tool components, while component reuse remains low. To provide a perspective towards more efficient tool development, which simplifies scalable implementations, component reuse, and tool integration, this thesis proposes an abstraction for a parallel tools infrastructure along with a prototype implementation. This abstraction overcomes the use of multiple interfaces for different types of tool functionality, which limit flexible component reuse. Thus, this thesis advances runtime error detection tools and uses their redesign and their increased scalability requirements to apply and evaluate a novel tool infrastructure abstraction. The new abstraction ultimately allows developers to focus on their tool functionality, rather than on developing or integrating common tool components. The use of such an abstraction in wide ranges of parallel runtime tool development projects could greatly increase component reuse. Thus, decreasing tool development time and cost. An application study with up to 16,384 application processes demonstrates the applicability of both the proposed runtime correctness concepts and of the proposed tools infrastructure.

APA, Harvard, Vancouver, ISO, and other styles

42

Putnam, Patrick P. "Scalable, High-Performance Forward Time Population Genetic Simulation." University of Cincinnati / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1522419645847035.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Raja, Chandrasekar Raghunath. "Designing Scalable and Efficient I/O Middleware for Fault-Resilient High-Performance Computing Clusters." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1417733721.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Clay, Lenitra M. "Replication techniques for scalable content distribution in the internet." Diss., Georgia Institute of Technology, 2002. http://hdl.handle.net/1853/8491.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Wadhwa, Bharti. "Scalable Data Management for Object-based Storage Systems." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/99791.

Full text

Abstract:

Parallel I/O performance is crucial to sustain scientific applications on large-scale High-Performance Computing (HPC) systems. Large scale distributed storage systems, in particular the object-based storage systems, face severe challenges for managing the data efficiently. Inefficient data management leads to poor I/O and storage performance in HPC applications and scientific workflows. Some of the main challenges for efficient data management arise from poor resource allocation, load imbalance in object storage targets, and inflexible data sharing between applications in a workflow. In addition, parallel I/O makes it challenging to shoehorn new interfaces, such as taking advantage of multiple layers of storage and support for analysis in the data path. Solving these challenges to improve performance and efficiency of object-based storage systems is crucial, especially for upcoming era of exascale systems. This dissertation is focused on solving these major challenges in object-based storage systems by providing scalable data management strategies. In the first part of the dis-sertation (Chapter 3), we present a resource contention aware load balancing tool (iez) for large scale distributed object-based storage systems. In Chapter 4, we extend iez to support Progressive File Layout for object-based storage system: Lustre. In the second part (Chapter 5), we present a technique to facilitate data sharing in scientific workflows using object-based storage, with our proposed tool Workflow Data Communicator. In the last part of this dissertation, we present a solution for transparent data management in multi-layer storage hierarchy of present and next-generation HPC systems.This dissertation shows that by intelligently employing scalable data management techniques, scientific applications' and workflows' flexibility and performance in object-based storage systems can be enhanced manyfold. Our proposed data management strategies can guide next-generation HPC storage systems' software design to efficiently support data for scientific applications and workflows.
Doctor of Philosophy
Large scale object-based storage systems face severe challenges to manage the data efficiently for HPC applications and workflows. These storage systems often manage and share data inflexibly, without considering the load imbalance and resource contention in the underlying multi-layer storage hierarchy. This dissertation first studies how resource contention and inflexible data sharing mechanisms impact HPC applications' storage and I/O performance; and then presents a series of efficient techniques, tools and algorithms to provide efficient and scalable data management for current and next-generation HPC storage systems

APA, Harvard, Vancouver, ISO, and other styles

46

Dinan, James S. "Scalable Task Parallel Programming in the Partitioned Global Address Space." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1275418061.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Li, Hengsha. "Real-time Cloudlet PaaS for GreenIoT : Design of a scalable server PaaS and a GreenIoT application." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-239004.

Full text

Abstract:

Cloudlet is a recent topic that has attained much interest in network system research. It can be characterized as a PaaS (Platform as a Service) layer that allows mobile clients to execute their code in the cloud. Cloudlet can be seen as a layer at the edge of the communication network.In this thesis, we present a cloudlet architecture design which includes cloudlet code as a part of client application itself. We first provide an overview of related work and describe existing challenges which need to be addressed. Next, we present an overview design for a cloudlet-based implementation. Finally, we present the cloudlet architecture including a prototype of both client application and cloudlet server. For the prototype of a CO2 data visualization application, we focus on how to format the functions in client side, how to schedule cloudlet PaaS in the server, and how to make the server scalable. Finally, we conclude with a performance evaluation.Cloudlet technology is likely to be widely applied in IoT projects, such as data visualization of air quality and water quality, for fan control and traffic steering, or other use cases. Compared to the traditional centralized cloud architecture, cloudlet has high responsiveness, flexibility and scalability.
Cloudlet är en ny teknik som har fått stort intresse inom nätverksforskning. Tekniken kan beskrivas som en PaaS-plattform (Platform as a Service) som tillåter mobila klienter att exekvera sin kod i molnet. Cloudlet kan ses som ett lager i kanten av kommunikationsnätet.I denna rapport presenteras en cloudlet-baserad arkitektur som inkluderar cloudlet-kod som en del av själva tillämpning på klient-sidan. Vi ger först en översikt av relaterat arbete inom området och beskriver existerande utmaningar som behöver adresseras. Därefter presenterar vi en övergripande design för en cloudlet-baserad implementering. Slutligen presenterar vi cloudlet-arkitekturen, inklusive en prototypimplementation av både klient-tillämpning och cloudlet-server. I vår prototyp av en datavisualiseringstillämpning för CO2, fokuserar vi på hur man formaterar funktionerna på klientsidan, hur man schemalägger cloudlet-PaaS på serversidan, samt hur servern kan göras skalbar. Rapporten avslutas med en prestandautvärdering.Cloudlet-tekniken bedöms i stor utsträckning att användas i IoT-projekt, såsom datavisualisering av luftkvalitet och vattenkvalitet, fläktstyrning och trafikstyrning eller andra användningsområden. Jämfört med den traditionella centraliserade molnarkitekturen har cloudlet hög respons, flexibilitet och skalbarhet.

APA, Harvard, Vancouver, ISO, and other styles

48

Sridhar, Jaidev Krishna. "Scalable Job Startup and Inter-Node Communication in Multi-Core InfiniBand Clusters." The Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=osu1243909406.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Chai, Lei. "High Performance and Scalable MPI Intra-node Communication Middleware for Multi-core Clusters." The Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=osu1236639834.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Baheri, Betis. "MARS: Multi-Scalable Actor-Critic Reinforcement Learning Scheduler." Kent State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=kent1595039454920637.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Scalable computing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles