Dissertations / Theses: 'Computing clusters'

1

Shum, Kam Hong. "Adaptive parallelism for computing on heterogeneous clusters." Thesis, University of Cambridge, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.627563.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Aji, Ashwin M. "Programming High-Performance Clusters with Heterogeneous Computing Devices." Diss., Virginia Tech, 2015. http://hdl.handle.net/10919/52366.

Full text

Abstract:

Today's high-performance computing (HPC) clusters are seeing an increase in the adoption of accelerators like GPUs, FPGAs and co-processors, leading to heterogeneity in the computation and memory subsystems. To program such systems, application developers typically employ a hybrid programming model of MPI across the compute nodes in the cluster and an accelerator-specific library (e.g.; CUDA, OpenCL, OpenMP, OpenACC) across the accelerator devices within each compute node. Such explicit management of disjointed computation and memory resources leads to reduced productivity and performance. This dissertation focuses on designing, implementing and evaluating a runtime system for HPC clusters with heterogeneous computing devices. This work also explores extending existing programming models to make use of our runtime system for easier code modernization of existing applications. Specifically, we present MPI-ACC, an extension to the popular MPI programming model and runtime system for efficient data movement and automatic task mapping across the CPUs and accelerators within a cluster, and discuss the lessons learned. MPI-ACC's task-mapping runtime subsystem performs fast and automatic device selection for a given task. MPI-ACC's data-movement subsystem includes careful optimizations for end-to-end communication among CPUs and accelerators, which are seamlessly leveraged by the application developers. MPI-ACC provides a familiar, flexible and natural interface for programmers to choose the right computation or communication targets, while its runtime system achieves efficient cluster utilization. Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

3

周志賢 and Chi-yin Edward Chow. "Adaptive recovery with hierarchical checkpointing on workstation clusters." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1999. http://hub.hku.hk/bib/B29812914.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Chow, Chi-yin Edward. "Adaptive recovery with hierarchical checkpointing on workstation clusters /." Hong Kong : University of Hong Kong, 1999. http://sunzi.lib.hku.hk/hkuto/record.jsp?B20792700.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Melas, Panagiotis. "The performance evaluation of workstation clusters." Thesis, University of Southampton, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.326395.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Ribeiro, Tiago Filipe Rodrigues. "Developing and evaluating clopencl applications for heterogeneous clusters." Master's thesis, Instituto Politécnico de Bragança, Escola Superior de Tecnologia e Gestão, 2012. http://hdl.handle.net/10198/7948.

Full text

Abstract:

In the last few years, the computing systems processing capabilities have increased significantly, changing from single-core to multi-core and even many-core systems. Accompanying this evolution, local networks have also become faster, with multi-gigabit technologies like Infiniband, Myrinet and 10G Ethernet. Parallel/distributed programming tools and standards, like POSIX Threads, OpenMP and MPI, have helped to explore these technologies and have been frequently combined, giving rise to Hybrid Programming Models. Recently, co-processors like GPUs and FPGAs, started to be used as accelerators, requiring specialized frameworks (like CUDA for NVIDIA GPUs). Presented with so much heterogeneity, the industry formulated the OpenCL specification, as a standard to explore heterogeneous systems. However, in the context of cluster computing, one problem surfaces: OpenCL only enables a developer to use the devices that are present in the local machine. With many processor devices scattered across cluster nodes (CPUs, GPUs and other co-processors), it then became important to enable software developers to take full advantage of the full cluster device set. This dissertation demonstrates and evaluates an OpenCL extension, named clOpenCL, which supports the simple deployment and efficient running of OpenCL-based parallel applications that may span several cluster nodes, thus expanding the original single-node OpenCL model. The main contributions are that clOpenCL i) offers a transparent approach to the porting of traditional OpenCL applications to cluster environments and ii) provides significant performance increases over classical (non-)hybrid parallel approaches. Nos últimos anos, a capacidade de processamento dos sistemas de computação aumentou significativamente, passando de CPUs com um núcleo para CPUs multi-núcleo. Acompanhando esta evolução, as redes locais também se tornaram mais rápidas, com tecnologias multi-gigabit como a Infiniband, Myrinet e 10G Ethernet. Ferramentas e standards paralelos/distribuídos, como POSIX Threads, OpenMP e MPI, ajudaram a explorar esses sistemas, e têm sido frequentemente combinados dando origem a Modelos de Programação Híbrida. Mais recentemente, co-processadores como GPUs e FPGAs, começaram a ser utilizados como aceleradores, exigindo frameworks especializadas (como o CUDA para GPUs NVIDIA). Deparada com tanta heterogeneidade, a indústria formulou a especificação OpenCL, como sendo um standard para exploração de sistemas heterogéneos. No entanto, no contexto da computação em cluster, um problema surge: o OpenCL só permite ao desenvolvedor utilizar dispositivos presentes na máquina local. Com tantos dispositivos de processamento espalhados pelos nós de um cluster (CPUs, GPUs e outros co-processadores), tornou-se assim importante habilitar os desenvolvedores de software, a tirarem o máximo proveito do conjunto total de dispositivos do cluster. Esta dissertação demonstra e avalia uma extensão OpenCL, chamada clOpenCL, que suporta a implementação simples e execução eficiente de aplicações paralelas baseadas em OpenCL que podem estender-se por vários nós do cluster, expandindo assim o modelo original de um único nó do OpenCL. As principais contribuições referem-se a que o clOpenCL i) oferece uma abordagem transparente à portabilidade de aplicações OpenCL tradicionais para ambientes cluster e ii) proporciona aumentos significativos de desempenho sobre abordagens paralelas clássicas (não) híbridas.

APA, Harvard, Vancouver, ISO, and other styles

7

Rough, Justin, and mikewood@deakin edu au. "A Platform for reliable computing on clusters using group communications." Deakin University. School of Computing and Mathematics, 2001. http://tux.lib.deakin.edu.au./adt-VDU/public/adt-VDU20060412.141015.

Full text

Abstract:

Shared clusters represent an excellent platform for the execution of parallel applications given their low price/performance ratio and the presence of cluster infrastructure in many organisations. The focus of recent research efforts are on parallelism management, transport and efficient access to resources, and making clusters easy to use. In this thesis, we examine reliable parallel computing on clusters. The aim of this research is to demonstrate the feasibility of developing an operating system facility providing transport fault tolerance using existing, enhanced and newly built operating system services for supporting parallel applications. In particular, we use existing process duplication and process migration services, and synthesise a group communications facility for use in a transparent checkpointing facility. This research is carried out using the methods of experimental computer science. To provide a foundation for the synthesis of the group communications and checkpointing facilities, we survey and review related work in both fields. For group communications, we examine the V Distributed System, the x-kernel and Psync, the ISIS Toolkit, and Horus. We identify a need for services that consider the placement of processes on computers in the cluster. For Checkpointing, we examine Manetho, KeyKOS, libckpt, and Diskless Checkpointing. We observe the use of remote computer memories for storing checkpoints, and the use of copy-on-write mechanisms to reduce the time to create a checkpoint of a process. We propose a group communications facility providing two sets of services: user-oriented services and system-oriented services. User-oriented services provide transparency and target application. System-oriented services supplement the user-oriented services for supporting other operating systems services and do not provide transparency. Additional flexibility is achieved by providing delivery and ordering semantics independently. An operating system facility providing transparent checkpointing is synthesised using coordinated checkpointing. To ensure a consistent set of checkpoints are generated by the facility, instead of blindly blocking the processes of a parallel application, only non-deterministic events are blocked. This allows the processes of the parallel application to continue execution during the checkpoint operation. Checkpoints are created by adapting process duplication mechanisms, and checkpoint data is transferred to remote computer memories and disk for storage using the mechanisms of process migration. The services of the group communications facility are used to coordinate the checkpoint operation, and to transport checkpoint data to remote computer memories and disk. Both the group communications facility and the checkpointing facility have been implemented in the GENESIS cluster operating system and provide proof-of-concept. GENESIS uses a microkernel and client-server based operating system architecture, and is demonstrated to provide an appropriate environment for the development of these facilities. We design a number of experiments to test the performance of both the group communications facility and checkpointing facility, and to provide proof-of-performance. We present our approach to testing, the challenges raised in testing the facilities, and how we overcome them. For group communications, we examine the performance of a number of delivery semantics. Good speed-ups are observed and system-oriented group communication services are shown to provide significant performance advantages over user-oriented semantics in the presence of packet loss. For checkpointing, we examine the scalability of the facility given different levels of resource usage and a variable number of computers. Low overheads are observed for checkpointing a parallel application. It is made clear by this research that the microkernel and client-server based cluster operating system provide an ideal environment for the development of a high performance group communications facility and a transparent checkpointing facility for generating a platform for reliable parallel computing on clusters.

APA, Harvard, Vancouver, ISO, and other styles

8

Daillidis, Christos. "Establishing Linux Clusters for high-performance computing (HPC) at NPS." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2004. http://library.nps.navy.mil/uhtbin/hyperion/04Sept%5FDaillidis.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Nakad, Zahi Samir. "High Performance Applications on Reconfigurable Clusters." Thesis, Virginia Tech, 2000. http://hdl.handle.net/10919/35682.

Full text

Abstract:

Many problems faced in the engineering world are computationally intensive. Filtering using FIR (Finite Impulse Response) filters is an example to that. This thesis discusses the implementation of a fast, reconfigurable, and scalable FIR (Finite Impulse Response) digital filter. Constant coefficient multipliers and a Fast FIFO implementation are also discussed in connection with the FIR filter. This filter is used in two of its structures: the direct-form and the lattice structure. The thesis describes several configurations that can be created with the different components available and reports the testing results of these configurations. Master of Science

APA, Harvard, Vancouver, ISO, and other styles

10

Rafique, Muhammad Mustafa. "An Adaptive Framework for Managing Heterogeneous Many-Core Clusters." Diss., Virginia Tech, 2011. http://hdl.handle.net/10919/29119.

Full text

Abstract:

The computing needs and the input and result datasets of modern scientific and enterprise applications are growing exponentially. To support such applications, High-Performance Computing (HPC) systems need to employ thousands of cores and innovative data management. At the same time, an emerging trend in designing HPC systems is to leverage specialized asymmetric multicores, such as IBM Cell and AMD Fusion APUs, and commodity computational accelerators, such as programmable GPUs, which exhibit excellent price to performance ratio as well as the much needed high energy efficiency. While such accelerators have been studied in detail as stand-alone computational engines, integrating the accelerators into large-scale distributed systems with heterogeneous computing resources for data-intensive computing presents unique challenges and trade-offs. Traditional programming and resource management techniques cannot be directly applied to many-core accelerators in heterogeneous distributed settings, given the complex and custom instruction sets architectures, memory hierarchies and I/O characteristics of different accelerators. In this dissertation, we explore the design space of using commodity accelerators, specifically IBM Cell and programmable GPUs, in distributed settings for data-intensive computing and propose an adaptive framework for programming and managing heterogeneous clusters. The proposed framework provides a MapReduce-based extended programming model for heterogeneous clusters, which distributes tasks between asymmetric compute nodes by considering workload characteristics and capabilities of individual compute nodes. The framework provides efficient data prefetching techniques that leverage general-purpose cores to stage the input data in the private memories of the specialized cores. We also explore the use of an advanced layered-architecture based software engineering approach and provide mixin-layers based reusable software components to enable easy and quick deployment of heterogeneous clusters. The framework also provides multiple resource management and scheduling policies under different constraints, e.g., energy-aware and QoS-aware, to support executing concurrent applications on multi-tenant heterogeneous clusters. When applied to representative applications and benchmarks, our framework yields significantly improved performance in terms of programming efficiency and optimal resource management as compared to conventional, hand-tuned, approaches to program and manage accelerator-based heterogeneous clusters. Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

11

Nielson, Curtis R. "A Descriptive Performance Model of Small, Low Cost, Diskless Beowulf Clusters." Diss., CLICK HERE for online access, 2003. http://contentdm.lib.byu.edu/ETD/image/etd280.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Dantas, Mario A. R. "Efficient scheduling of parallel applications on workstation clusters." Thesis, University of Southampton, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.243462.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Shan, Meijuan. "Distributed object-oriented parallel computing on heterogeneous workstation clusters using Java." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/mq43403.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Ouyang, Xiangyong. "Efficient Storage Middleware Design in InfiniBand Clusters for High End Computing." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1331108157.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Vyapamakula, Sreeramachandra Sankeerth. "Expedient Modal Decomposition of Massive Datasets Using High Performance Computing Clusters." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu151515633114873.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Albring, Morten. "Towards quantum information processing with Cr3+ based heterometallic clusters." Thesis, University of Manchester, 2014. https://www.research.manchester.ac.uk/portal/en/theses/towards-quantum-information-processing-with-cr3-based-heterometallic-clusters(6ff7e303-ca75-4632-986d-48bea42d96e3).html.

Full text

Abstract:

An investigation of the electronic structure of some transition metal clusters comprising anti-ferromagnetically coupled, heterometallic arrays of eight metal ions that are wheel-shaped, is reported. The compounds were synthesized and provided by Dr. Grigore Timco of The University of Manchester and are formulated by their metal content as Cr7M, where M = a divalent 3d metal. Two families of wheels are the subject of this research, defined ‘green’ and ‘purple’ from their physical appearance. Within each family, the compounds are all isostructural. From simulation using a single Hamiltonian for Cr7M-purple compounds, where M = Zn, Mn, or Ni, it is shown that with only two exchange parameters, one JCr-Cr and one JCr-M, data from bulk magnetization, neutron scattering, Electron Paramagnetic Resonance (EPR) spectroscopy at multiple frequencies and specific heat measurements can be modelled and that there is transferability of parameters. Preliminary attempts to measure electron spin relaxation times for two of the purple wheels have shown values of T1 and T2 that are comparable with those of the more extensively studied green wheels and hence further studies in this area are warranted. Variable temperature Q- and W-band EPR spectra for a series of nine heterodimers comprising one green and one purple wheel, M=Zn, Mn or Ni in each case, are reported. For Cr7Zn-purple there is no magnetic exchange detected, whereas weak and quantifiable exchange is required to interpret the spectra from the other six dimers. EPR studies of three trimers of the form purple-green-purple are reported and the presence of magnetic exchange is identified by comparison with the spectra of the component single and double wheel compounds, although this is not quantified because of the numerical size of the simulations that are required. The process of comparing simulated to experimental spectra is a complex problem and one which is central to the work reported in this thesis. The problem of fitting has been investigated and two novel solutions, one based upon pixel mapping and the other based on wavelet transformation are proposed.

APA, Harvard, Vancouver, ISO, and other styles

17

Sajjapongse, Kittisak. "Hierarchical scheduling and uniform access programming frameworks for heterogeneous CPU-GPU computing clusters." Thesis, University of Missouri - Columbia, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10178997.

Full text

Abstract:

The advance of the GPU hardware architecture has made GPUs attractive devices for general-purpose computing. Modern GPUs are equipped with an increasing number of cores, a flexible memory hierarchy, and a large memory capacity. While the computational power of modern GPU devices has allowed their introduction in high-performance computing (HPC) clusters and the efficient processing of ever larger workloads, existing software components for HPC clusters still offer basic support for hardware heterogeneity and often cause performance limitations in the presence of GPU devices. In particular, two kinds of limitations are associated with these software components: runtime support and programmability. We found that these limitations are due to the fact that existing software frameworks for heterogeneous clusters treat GPUs as dedicated coprocessor devices. In this dissertation, we propose two software frameworks for addressing the performance and hardware underutilization issues found in heterogeneous CPU-GPU clusters as well as increasing their programmability. Our frameworks provide a uniform view of compute resources and treat CPUs and GPUs equally as first-class resources, allowing efficient management of heterogeneous compute resources. First, we propose a hierarchical scheduling framework consisting of a node-level runtime and a cluster-level scheduler that provides abstraction of heterogeneous compute resources at different granularities. This hierarchical framework targets existing applications and does not require their modification. In the node-level runtime, we identify and design mechanisms, such as virtual GPUs, GPU virtual memory, dynamic load balancing and pre-emption, which are necessary to support efficient sharing and load balancing schemes for GPUs within a compute node. In the cluster-level scheduler, we introduce mechanisms to abstract compute nodes and perform load balancing in concert with the node-level runtime. Our hierarchical scheduling framework allows supporting different load balancing policies and does not require additional inputs (such as profiling information) from users. Second, we propose a programming framework based on a novel memory and execution model. Our memory model hides disjoint addressing spaces (corresponding to different CPUs, GPUs and compute nodes) and provides a view of a single virtual memory space that can be accessed by all compute resources in a heterogeneous cluster. Our execution model provides uniform access to compute resources and allows our framework to treat all CPUs and GPUs equally and to access data in the virtual memory space.

APA, Harvard, Vancouver, ISO, and other styles

18

TUNSTIG, SEBASTIAN. "System modeling for process mapping on toscattered computational nodes in highperformance computing clusters." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-155822.

Full text

Abstract:

The task of assigning a parallel program’s processes to processorsin a computer system is referred to as process mapping.It is desired that such a mapping results in communication being kept as local as possible in the system while also achieving load balance between computational units, as this would reduce program execution time. By representing the program and the system as graphs, the problem can be defined and solved using existing graph algorithms. In this thesis we study the process of modeling virtual systems consisting of scattered nodes in a supercomputer, in a way that process mapping can be performed with these models. Although the supercomputer has a structured interconnection network that forms a 3D torus, the subsets of the system that are modeled and used for program execution are spread out in the system and hence they themselves do not form a logical topology. We present and evaluate two methods for model creation, both based on measurements performed on the system.

APA, Harvard, Vancouver, ISO, and other styles

19

Rosenvinge, Einar Magnus. "Online Task Scheduling on Heterogeneous Clusters : An Experimental Study." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2004. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-278.

Full text

Abstract:

We study the problem of scheduling applications composed of a large number of tasks on heterogeneous clusters. Tasks are identical, independent from each other, and can hence be computed in any order. The goal is to execute all the tasks as quickly as possible. We use the Master-Worker paradigm, where tasks are maintained by the master which will hand out batches of a variable amount of tasks to requesting workers. We introduce a new scheduling strategy, the Monitor strategy, and compare it to other strategies suggested in the literature. An image filtering application, known as matched filtering, has been used to compare the different strategies. Our implementation involves datastaging techniques in order to circumvent the possible bottleneck incurred by the master, and multi-threading to prevent possible processor idleness.

APA, Harvard, Vancouver, ISO, and other styles

20

Desai, Harit S. "Evaluation and Tuning of Gigabit Ethernet performance on Clusters." Kent State University / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=kent1185819165.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Zhang, Jie Zhang. "Designing and Building Efficient HPC Cloud with Modern Networking Technologies on Heterogeneous HPC Clusters." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1532737201524604.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Upadhayaya, Niraj. "Memory management and optimization using distributed shared memory systems for high performance computing clusters." Thesis, University of the West of England, Bristol, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.421743.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Raja, Chandrasekar Raghunath. "Designing Scalable and Efficient I/O Middleware for Fault-Resilient High-Performance Computing Clusters." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1417733721.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Mohammed, Awaizulla Shareef. "Investigation of Immersion Cooled ARM-Based Computer Clusters for Low-Cost, High-Performance Computing." Thesis, University of North Texas, 2017. https://digital.library.unt.edu/ark:/67531/metadc1011866/.

Full text

Abstract:

This study aimed to investigate performance of ARM-based computer clusters using two-phase immersion cooling approach, and demonstrate its potential benefits over the air-based natural and forced convection approaches. ARM-based clusters were created using Raspberry Pi model 2 and 3, a commodity-level, single-board computer. Immersion cooling mode utilized two types of dielectric liquids, HFE-7000 and HFE-7100. Experiments involved running benchmarking tests Sysbench high performance linpack (HPL), and the combination of both in order to quantify the key parameters of device junction temperature, frequency, execution time, computing performance, and energy consumption. Results indicated that the device core temperature has direct effects on the computing performance and energy consumption. In the reference, natural convection cooling mode, as the temperature raised, the cluster started to decease its operating frequency to save the internal cores from damage. This resulted in decline of computing performance and increase of execution time, further leading to increase of energy consumption. In more extreme cases, performance of the cluster dropped by 4X, while the energy consumption increased by 220%. This study therefore demonstrated that two-phase immersion cooling method with its near-isothermal, high heat transfer capability would enable fast, energy efficient, and reliable operation, particularly benefiting high performance computing applications where conventional air-based cooling methods would fail.

APA, Harvard, Vancouver, ISO, and other styles

25

Koop, Matthew J. "High-Performance Multi-Transport MPI Design for Ultra-Scale InfiniBand Clusters." The Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=osu1243581928.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Ranadive, Adit Uday. "Virtualized resource management in high performance fabric clusters." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54241.

Full text

Abstract:

Providing performance and isolation guarantees for applications running in virtualized datacenter environments requires continuous management of the underlying physical resources. For communication- and I/O-intensive applications running on such platforms, the management methods must adequately deal with the shared use of the high-performance fabrics these applications require. In particular, new classes of latency-sensitive and data-intensive workloads running in virtualized environments rely on emerging fabrics like 40+Gbps Ethernet and InfiniBand/RoCE with support for RDMA, VMM-bypass and hardware-level virtualization (SR-IOV). However, the benefits provided by these technology advances are offset by several management constraints: (i) the inability of the hypervisor to monitor the VMs’ usage of these fabrics can affect the platform’s ability to provide isolation and performance guarantees, (ii) the hypervisor cannot provide fine-grained I/O provisioning or perform management decisions for VMs, thus reducing the degree of consolidation that can be supported on the platforms, and (iii) without such support it is harder to integrate these fabrics into emerging cloud computing platforms and datacenter fabric management solutions. This is made particularly challenging for workloads spanning multiple VMs, utilizing physical resources distributed across multiple server nodes and the interconnection fabric. This thesis addresses the problem of realizing a flexible, dynamic resource management system for virtualized platforms with high performance fabrics. We make the following key contributions: (i) A lightweight monitoring tool, IBMon, integrated with the hypervisor to monitor VMs’ use of RDMA-enabled virtualized interconnects, using memory introspection techniques. (ii) The design and construction of a resource management system that leverages IBMon to provide latency-sensitive applications performance guarantees. This system is built on microeconomic principles of supply and demand and can be deployed on a per-node (Resource Exchange) or a multi-node (Distributed Resource Exchange) basis. Fine-grained resource allocations can be enforced through several mechanisms, including CPU capping or fabric-level congestion control. (iii) Sphinx, a fabric management solution that leverages Resource Exchange to orchestrate network and provide latency proportionality for consolidated workloads, based on user/application-specified policies. (iv) Implementation and experimental evaluation using InfiniBand clusters virtualized with the Xen or KVM hypervisor, managed via the OpenFloodlight SDN controller, and using representative data-intensive and latency-sensitive benchmarks.

APA, Harvard, Vancouver, ISO, and other styles

27

Carver, Eric R. "Reducing Network Latency for Low-cost Beowulf Clusters." University of Cincinnati / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1406880971.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Rosa, Bruno Otto Theodoro. "Análise de sistemas de comunicação para computação paralela em clusters." Universidade de São Paulo, 2002. http://www.teses.usp.br/teses/disponiveis/76/76132/tde-01062009-112839/.

Full text

Abstract:

Apesar do aumento constante da largura de banda das tecnologias de rede de computadores as aplicações de processamento paralelo ainda necessitam de uma latência de comunicação mais baixa que a oferecida. Este aspecto não tem sido contemplado por estas tecnologias de rede pois está relacionado à maneira como o sistema operacional utiliza-se dos recursos do hardware com relação aos dados enviados pelas aplicações dos usuários. Neste trabalho apresentamos um estudo da técnica para diminuição desta latência e as características necessárias para implementação deste tipo de sistemas, incluindo mecanismos de transferência de dados, técnicas para tradução de endereços, proteção, transferência de controle, grau de confiabilidade e implementação de \"Multicasting\". Apresentamos também o estudo de um sistema já implementado, chamado M-VIA, comparando seu desempenho com o TCP/IP tradicional. Despite the constant bandwidth increase in computer networks parallel processing tasks still require a lower communication latency than offered. This necessity has not been addressed by these network technologies because it is related to how operating systems use hardware resources to send user data through network. In this work we present strategies to lower latency and the requirements to implement these systems, including data transfer mechanisms, address translation , security, control transfer, reliability and \"Multicasting\" deployment . We also present a ready to use system, M-VIA, comparing it to traditional TCP/IP performance.

APA, Harvard, Vancouver, ISO, and other styles

29

Boettger, Stefan [Verfasser], Udo [Gutachter] Kebschull, and Roberto V. [Gutachter] Zicari. "Virtual machine scheduling in dedicated computing clusters / Stefan Boettger ; Gutachter: Udo Kebschull, Roberto V. Zicari." Frankfurt am Main : Universitätsbibliothek Johann Christian Senckenberg, 2014. http://d-nb.info/1143024214/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Chen, Chong. "Acceleration of Computer Based Simulation, Image Processing, and Data Analysis Using Computer Clusters with Heterogeneous Accelerators." University of Dayton / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=dayton148036732102682.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Sridhar, Jaidev Krishna. "Scalable Job Startup and Inter-Node Communication in Multi-Core InfiniBand Clusters." The Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=osu1243909406.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Chai, Lei. "High Performance and Scalable MPI Intra-node Communication Middleware for Multi-core Clusters." The Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=osu1236639834.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Zimmermann, Ralf [Verfasser], Christof [Gutachter] Paar, and Tanja [Gutachter] Lange. "Cryptanalysis using reconfigurable hardware clusters for high-performance computing / Ralf Zimmermann. Gutachter: Christof Paar ; Tanja Lange." Bochum : Ruhr-Universität Bochum, 2016. http://d-nb.info/1109051468/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Roderus, Jens, Simon Larson, and Eric Pihl. "Hadoop scalability evaluation for machine learning algorithms on physical machines : Parallel machine learning on computing clusters." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-20102.

Full text

Abstract:

The amount of available data has allowed the field of machine learning to flourish. But with growing data set sizes comes an increase in algorithm execution times. Cluster computing frameworks provide tools for distributing data and processing power on several computer nodes and allows for algorithms to run in feasible time frames when data sets are large. Different cluster computing frameworks come with different trade-offs. In this thesis, the scalability of the execution time of machine learning algorithms running on the Hadoop cluster computing framework is investigated. A recent version of Hadoop and algorithms relevant in industry machine learning, namely K-means, latent Dirichlet allocation and naive Bayes are used in the experiments. This paper provides valuable information to anyone choosing between different cluster computing frameworks. The results show everything from moderate scalability to no scalability at all. These results indicate that Hadoop as a framework may have serious restrictions in how well tasks are actually parallelized. Possible scalability improvements could be achieved by modifying the machine learning library algorithms or by Hadoop parameter tuning.

APA, Harvard, Vancouver, ISO, and other styles

35

Soares, Thiago Marques. "HCLogP: um modelo computacional para clusters heterogêneos." Universidade Federal de Juiz de Fora (UFJF), 2017. https://repositorio.ufjf.br/jspui/handle/ufjf/4506.

Full text

Abstract:

Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-05-15T14:39:21Z No. of bitstreams: 1 thiagomarquessoares.pdf: 1372109 bytes, checksum: 0decc31aa35ac2d0364f017e2f671861 (MD5) Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-05-17T15:59:41Z (GMT) No. of bitstreams: 1 thiagomarquessoares.pdf: 1372109 bytes, checksum: 0decc31aa35ac2d0364f017e2f671861 (MD5) Made available in DSpace on 2017-05-17T15:59:41Z (GMT). No. of bitstreams: 1 thiagomarquessoares.pdf: 1372109 bytes, checksum: 0decc31aa35ac2d0364f017e2f671861 (MD5) Previous issue date: 2017-03-09 CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior O modelo LogP foi desenvolvido em 1993 para medir os efeitos da latência de comunicação, ocupação dos processadores e banda passante em multiprocessadores com memória distribuída. A ideia era caracterizar multiprocessadores de memória distribuída usando estes parâmetros chave, analisando seus impactos no desempenho. Este trabalho propõe um novo modelo, baseado no LogP, que descreve a inﬂuência destes parâmetros no desempenho de aplicações regulares executadas em um agregado (cluster) de computadores heterogêneos. O modelo considera que um agregado heterogêneo é composto por diferentes tipos de processadores, aceleradores e controladores de rede. Os resultados mostram que o pior erro nas estimativas feitas pelo modelo para o tempo de execução paralelo foi de 19,2%, e, em muitos casos, a execução estimada foi igual ou próxima do tempo real. Além disso, com base neste modelo, foi desenvolvido um escalonador, que baseado nas características da aplicação e do ambiente, escolhe um subconjunto de componentes que minimizem o tempo total de execução paralelo. O escalonador obteve êxito na escolha da melhor conﬁguração para a execução de aplicações com diferentes comportamentos. The LogP model was proposed in 1993 to measure the eﬀects of communication latency, processor occupancy and bandwidth in distributed memory multiprocessors. The idea was to characterize distributed memory multiprocessor using these key parameters and study their impact on performance in simulation environments. This work proposes a new model, based on LogP, that describes the impacts on performance of regular applications executing on a heterogeneous cluster. The model considers that a heterogeneous cluster is composed of distinct types of processors, accelerators and networks. The results show that the worst error in the estimations of the parallel execution time was about 19,2%, and, in many cases, the estimated execution time is equal to or very close to the real one. In addition, based on this model, a scheduler was developed. Based on the applications and computational environment characteristics, the scheduler chooses the subset of processors, accelerators and networks that minimize the parallel execution time. For applications with diﬀerent behaviors, the scheduler successfully chose the best conﬁguration.

APA, Harvard, Vancouver, ISO, and other styles

36

Adam, Constantin. "Scalable Self-Organizing Server Clusters with Quality of Service Objectives." Licentiate thesis, KTH, School of Electrical Engineering (EES), 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-272.

Full text

Abstract:

Advanced architectures for cluster-based services that have been recently proposed allow for service differentiation, server overload control and high utilization of resources. These systems, however, rely on centralized functions, which limit their ability to scale and to tolerate faults. In addition, they do not have built-in architectural support for automatic reconfiguration in case of failures or addition/removal of system components.Recent research in peer-to-peer systems and distributed management has demonstrated the potential benefits of decentralized over centralized designs: a decentralized design can reduce the configuration complexity of a system and increase its scalability and fault tolerance.This research focuses on introducing self-management capabilities into the design of cluster-based services. Its intended benefits are to make service platforms dynamically adapt to the needs of customers and to environment changes, while giving the service providers the capability to adjust operational policies at run-time.We have developed a decentralized design that efficiently allocates resources among multiple services inside a server cluster. The design combines the advantages of both centralized and decentralized architectures. It allows associating a set of QoS objectives with each service. In case of overload or failures, the quality of service degrades in a controllable manner. We have evaluated the performance of our design through extensive simulations. The results have been compared with performance characteristics of ideal systems.

APA, Harvard, Vancouver, ISO, and other styles

37

Kissami, Imad. "High Performance Computational Fluid Dynamics on Clusters and Clouds : the ADAPT Experience." Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCD019/document.

Full text

Abstract:

Dans cette thèse, nous présentons notre travail de recherche dans le domaine du calcul haute performance en mécanique des fluides (CFD) pour architectures de type cluster et cloud. De manière générale, nous nous proposons de développer un solveur efficace, appelé ADAPT, pour la résolution de problèmes de CFD selon une vue classique correspondant à des développements en MPI et selon une vue qui nous amène à représenter ADAPT comme un graphe de tâches destinées à être ordonnancées sur une plateforme de type cloud computing. Comme première contribution, nous proposons une parallélisation de l’équation de diffusion-convection couplée àun système linéaire en 2D et en 3D à l’aide de MPI. Une parallélisation à deux niveaux est utilisée dans notre implémentation pour exploiter au mieux les capacités des machines multi-coeurs. Nous obtenons une distribution équilibrée de la charge de calcul en utilisant la décomposition du domaine à l’aide de METIS, ainsi qu’une résolution pertinente de notre système linéaire creux de très grande taille en utilisant le solveur parallèle MUMPS (Solveur MUltifrontal Massivement Parallèle). Notre deuxième contribution illustre comment imaginer la plateforme ADAPT, telle que représentée dans la premièrecontribution, comme un service. Nous transformons le framework ADAPT (en fait, une partie du framework)en DAG (Direct Acyclic Graph) pour le voir comme un workflow scientifique. Ensuite, nous introduisons de nouvelles politiques à l’intérieur du moteur de workflow RedisDG, afin de planifier les tâches du DAG, de manière opportuniste.Nous introduisons dans RedisDG la possibilité de travailler avec des machines dynamiques (elles peuvent quitter ou entrer dans le système de calcul comme elles veulent) et une approche multi-critères pour décider de la “meilleure”machine à choisir afin d’exécuter une tâche. Des expériences sont menées sur le workflow ADAPT pour illustrer l’efficacité de l’ordonnancement et des décisions d’ordonnancement dans le nouveau RedisDG In this thesis, we present our research work in the field of high performance computing in fluid mechanics (CFD) for cluster and cloud architectures. In general, we propose to develop an efficient solver, called ADAPT, for problemsolving of CFDs in a classic view corresponding to developments in MPI and in a view that leads us to represent ADAPT as a graph of tasks intended to be ordered on a cloud computing platform. As a first contribution, we propose a parallelization of the diffusion-convection equation coupled to a linear systemin 2D and 3D using MPI. A two-level parallelization is used in our a implementation to take advantage of thecurrent distributed multicore machines. A balanced distribution of the computational load is obtained by using the decomposition of the domain using METIS, as well as a relevant resolution of our very large linear system using the parallel solver MUMPS (Massive Parallel MUltifrontal Solver). Our second contribution illustrates how to imagine the ADAPT framework, as depicted in the first contribution, as a Service. We transform the framework (in fact, a part of the framework) as a DAG (Direct Acyclic Graph) in order to see it as a scientific workflow. Then we introduce new policies inside the RedisDG workflow engine, in order to schedule tasks of the DAG, in an opportunistic manner. We introduce into RedisDG the possibility to work with dynamic workers (they can leave or enter into the computing system as they want) and a multi-criteria approach to decide on the “best” worker to choose to execute a task. Experiments are conducted on the ADAPT workflow to exemplify howfine is the scheduling and the scheduling decisions into the new RedisDG

APA, Harvard, Vancouver, ISO, and other styles

38

Ozmen, Semih. "Linear Static Analysis Of Large Structural Models On Pc Clusters." Master's thesis, METU, 2009. http://etd.lib.metu.edu.tr/upload/2/12610763/index.pdf.

Full text

Abstract:

This research focuses on implementing and improving a parallel solution framework for the linear static analysis of large structural models on PC clusters. The framework consists of two separate programs where the first one is responsible from preparing data for the parallel solution that involves partitioning, workload balancing, and equation numbering. The second program is a fully parallel nite element program that utilizes substructure based solution approach with direct solvers. The first step of data preparation is partitioning the structure into substructures. After creating the initial substructures, the estimated imbalance of the substructures is adjusted by iteratively transferring nodes from the slower substructures to the faster ones. Once the final substructures are created, the solution phase is initiated. Each processor assembles its substructure&#039 s stiffness matrix and condenses it to the interfaces. The interface equations are then solved in parallel with a block-cyclic dense matrix solver. After computing the interface unknowns, each processor calculates the internal displacements and element stresses or forces. Comparative tests were done to demonstrate the performance of the solution framework.

APA, Harvard, Vancouver, ISO, and other styles

39

Alfonso, Laguna Carlos de. "Efficient and elastic management of computing infrastructures." Doctoral thesis, Universitat Politècnica de València, 2016. http://hdl.handle.net/10251/57187.

Full text

Abstract:

[EN] Modern data centers integrate a lot of computer and electronic devices. However, some reports state that the mean usage of a typical data center is around 50% of its peak capacity, and the mean usage of each server is between 10% and 50%. A lot of energy is destined to power on computer hardware that most of the time remains idle. Therefore, it would be possible to save energy simply by powering off those parts from the data center that are not actually used, and powering them on again as they are needed. Most data centers have computing clusters that are used for intensive computing, recently evolving towards an on-premises Cloud service model. Despite the use of low consuming components, higher energy savings can be achieved by dynamically adapting the system to the actual workload. The main approach in this case is the usage of energy saving criteria for scheduling the jobs or the virtual machines into the working nodes. The aim is to power off idle servers automatically. But it is necessary to schedule the power management of the servers in order to minimize the impact on the end users and their applications. The objective of this thesis is the elastic and efficient management of cluster infrastructures, with the aim of reducing the costs associated to idle components. This objective is addressed by automating the power management of the working nodes in a computing cluster, and also proactive stimulating the load distribution to achieve idle resources that could be powered off by means of memory overcommitment and live migration of virtual machines. Moreover, this automation is of interest for virtual clusters, as they also suffer from the same problems. While in physical clusters idle working nodes waste energy, in the case of virtual clusters that are built from virtual machines, the idle working nodes can waste money in commercial Clouds or computational resources in an on-premises Cloud. [ES] En los Centros de Procesos de Datos (CPD) existe una gran concentración de dispositivos informáticos y de equipamiento electrónico. Sin embargo, algunos estudios han mostrado que la utilización media de los CPD está en torno al 50%, y que la utilización media de los servidores se encuentra entre el 10% y el 50%. Estos datos evidencian que existe una gran cantidad de energía destinada a alimentar equipamiento ocioso, y que podríamos conseguir un ahorro energético simplemente apagando los componentes que no se estén utilizando. En muchos CPD suele haber clusters de computadores que se utilizan para computación de altas prestaciones y para la creación de Clouds privados. Si bien se ha tratado de ahorrar energía utilizando componentes de bajo consumo, también es posible conseguirlo adaptando los sistemas a la carga de trabajo en cada momento. En los últimos años han surgido trabajos que investigan la aplicación de criterios energéticos a la hora de seleccionar en qué servidor, de entre los que forman un cluster, se debe ejecutar un trabajo o alojar una máquina virtual. En muchos casos se trata de conseguir equipos ociosos que puedan ser apagados, pero habitualmente se asume que dicho apagado se hace de forma automática, y que los equipos se encienden de nuevo cuando son necesarios. Sin embargo, es necesario hacer una planificación de encendido y apagado de máquinas para minimizar el impacto en el usuario final. En esta tesis nos planteamos la gestión elástica y eficiente de infrastructuras de cálculo tipo cluster, con el objetivo de reducir los costes asociados a los componentes ociosos. Para abordar este problema nos planteamos la automatización del encendido y apagado de máquinas en los clusters, así como la aplicación de técnicas de migración en vivo y de sobreaprovisionamiento de memoria para estimular la obtención de equipos ociosos que puedan ser apagados. Además, esta automatización es de interés para los clusters virtuales, puesto que también sufren el problema de los componentes ociosos, sólo que en este caso están compuestos por, en lugar de equipos físicos que gastan energía, por máquinas virtuales que gastan dinero en un proveedor Cloud comercial o recursos en un Cloud privado. [CAT] En els Centres de Processament de Dades (CPD) hi ha una gran concentració de dispositius informàtics i d'equipament electrònic. No obstant això, alguns estudis han mostrat que la utilització mitjana dels CPD està entorn del 50%, i que la utilització mitjana dels servidors es troba entre el 10% i el 50%. Estes dades evidencien que hi ha una gran quantitat d'energia destinada a alimentar equipament ociós, i que podríem aconseguir un estalvi energètic simplement apagant els components que no s'estiguen utilitzant. En molts CPD sol haver-hi clusters de computadors que s'utilitzen per a computació d'altes prestacions i per a la creació de Clouds privats. Si bé s'ha tractat d'estalviar energia utilitzant components de baix consum, també és possible aconseguir-ho adaptant els sistemes a la càrrega de treball en cada moment. En els últims anys han sorgit treballs que investiguen l'aplicació de criteris energètics a l'hora de seleccionar en quin servidor, d'entre els que formen un cluster, s'ha d'executar un treball o allotjar una màquina virtual. En molts casos es tracta d'aconseguir equips ociosos que puguen ser apagats, però habitualment s'assumix que l'apagat es fa de forma automàtica, i que els equips s'encenen novament quan són necessaris. No obstant això, és necessari fer una planificació d'encesa i apagat de màquines per a minimitzar l'impacte en l'usuari final. En esta tesi ens plantegem la gestió elàstica i eficient d'infrastructuras de càlcul tipus cluster, amb l'objectiu de reduir els costos associats als components ociosos. Per a abordar este problema ens plantegem l'automatització de l'encesa i apagat de màquines en els clusters, així com l'aplicació de tècniques de migració en viu i de sobreaprovisionament de memòria per a estimular l'obtenció d'equips ociosos que puguen ser apagats. A més, esta automatització és d'interés per als clusters virtuals, ja que també patixen el problema dels components ociosos, encara que en este cas estan compostos per, en compte d'equips físics que gasten energia, per màquines virtuals que gasten diners en un proveïdor Cloud comercial o recursos en un Cloud privat. Alfonso Laguna, CD. (2015). Efficient and elastic management of computing infrastructures [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/57187 TESIS

APA, Harvard, Vancouver, ISO, and other styles

40

Shankar, Dipti. "Designing Fast, Resilient and Heterogeneity-Aware Key-Value Storage on Modern HPC Clusters." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1563522337179638.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Cesarini, Daniele. "OpenMP task scheduling strategies to mitigate hardware variability in tightly-coupled shared memory clusters." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amslaurea.unibo.it/7759/.

Full text

Abstract:

In questa tesi sono stati apportati due importanti contributi nel campo degli acceleratori embedded many-core. Abbiamo implementato un runtime OpenMP ottimizzato per la gestione del tasking model per sistemi a processori strettamente accoppiati in cluster e poi interconnessi attraverso una network on chip. Ci siamo focalizzati sulla loro scalabilità e sul supporto di task di granularità fine, come è tipico nelle applicazioni embedded. Il secondo contributo di questa tesi è stata proporre una estensione del runtime di OpenMP che cerca di prevedere la manifestazione di errori dati da fenomeni di variability tramite una schedulazione efficiente del carico di lavoro.

APA, Harvard, Vancouver, ISO, and other styles

42

Hines, Michael R. "Techniques for collective physical memory ubiquity within networked clusters of virtual machines." Diss., Online access via UMI:, 2009.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

43

Bahcecioglu, Tunc. "Parallel Solution Of Soil-structure Interaction Problems On Pc Clusters." Master's thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12612954/index.pdf.

Full text

Abstract:

Numerical assessment of soil structure interaction problems require heavy computational efforts because of the dynamic and iterative (nonlinear) nature of the problems. Furthermore, modeling soil-structure interaction may require

APA, Harvard, Vancouver, ISO, and other styles

44

Luo, Miao. "Designing Efficient MPI and UPC Runtime for Multicore Clusters with InfiniBand, Accelerators and Co-Processors." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1374197706.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Muthukrishnan, Gayathri. "Utilizing Hierarchical Clusters in the Design of Effective and Efficient Parallel Simulations of 2-D and 3-D Ising Spin Models." Thesis, Virginia Tech, 2004. http://hdl.handle.net/10919/9944.

Full text

Abstract:

In this work, we design parallel Monte Carlo algorithms for the Ising spin model on a hierarchical cluster. A hierarchical cluster can be considered as a cluster of homogeneous nodes which are partitioned into multiple supernodes such that communication across homogenous clusters is represented by a supernode topological network. We consider different data layouts and provide equations for choosing the best data layout under such a network paradigm. We show that the data layouts designed for a homogeneous cluster will not yield results as good as layouts designed for a hierarchical cluster. We derive theoretical results on the performance of the algorithms on a modified version of the LogP model that represents such tiered networking, and present simulation results to analyze the utility of the theoretical design and analysis. Furthermore, we consider the 3-D Ising model and design parallel algorithms for sweep spin selection on both homogeneous and hierarchical clusters. We also discuss the simulation of hierarchical clusters on a homogeneous set of machines, and the efficient implementation of the parallel Ising model on such clusters. Master of Science

APA, Harvard, Vancouver, ISO, and other styles

46

Wister, Ovando Miguel Antonio. "Arquitectura de descubrimiento de servicios en MANET basada en dispositivos de capacidades superiores liderando clusters." Doctoral thesis, Universidad de Murcia, 2008. http://hdl.handle.net/10803/10925.

Full text

Abstract:

This thesis introduces LIFT, a combination of a cluster-based approach with a cross-layer scheme in order to discover services in MANET. In this proposal, High Capability Devices (HCD) are differentiated from Limited Capability Devices (LCD). HCD are set up as the cluster leaders in each cluster so as to perform most of the service discovery activities. Thus, LIFT manages local traffic instead of global traffic. Consequently, messages, energy, computing processes, and bandwidth were reduced due to the optimum usage of network resources. In order to know if LIFT achieves its goal to minimize resources, we have compared LIFT with another well-known solution (AODV-SD) in terms of control message overhead, energy consumption, PDR, throughput, hop count average, NRL, end-to-end delay, and service acquisition time. After carrying out many trials and simulations, LIFT improved previous results in the area. La tesis presenta a LIFT, una solución para descubrir servicios en MANET que combina un enfoque basado en cluster con un esquema cross-layer. En esta propuesta se diferencian los dispositivos de capacidades superiores (HCD) de los dispositivos de capacidades limitadas (LCD). Los HCD se establecen como líderes en cada cluster para ejecutar la mayoría de las actividades de descubrimiento de servicios. De esta forma, LIFT maneja tráfico local en vez de tráfico global. Por tanto, se reduce el consumo de mensajes, energía y cómputo al hacer uso óptimo de los recursos de la red. Para saber si LIFT logra el objetivo de minimizar recursos, lo hemos comparado contra otra solución (AODV-SD) en aspectos como sobrecarga de paquetes de control, consumo de energía, PDR, throughput, promedio de saltos, NRL, retardo extremo a extremo y tiempo de adquisición de servicios. Después de muchas pruebas y simulaciones, LIFT mejora resultados anteriores en este campo

APA, Harvard, Vancouver, ISO, and other styles

47

Dimitrov, Rossen Petkov. "Overlapping of communication and computation and early binding fundamental mechanisms for improving parallel performance on clusters of workstations /." Diss., Mississippi State : Mississippi State University, 2001. http://library.msstate.edu/etd/show.asp?etd=etd-04092001-231941.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Freitas, Henrique Cota de. "Arquitetura de NoC programável baseada em múltiplos clusters de cores para suporte a padrões de comunicação coletiva." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2009. http://hdl.handle.net/10183/16656.

Full text

Abstract:

As próximas gerações de processadores many-core exigem que novas abordagens no projeto de arquitetura de processadores sejam propostas. Neste novo contexto, as redes de comunicação intra-chip são importantes para garantir o desempenho dos programas. Soluções tradicionais de interconexão possuem limites físicos que comprometem a escalabilidade e o desempenho no processamento de aplicações paralelas de diversos tipos. A alternativa apontada pelo estado da arte é a Network-on-Chip (NoC) composta por roteadores e outros elementos de rede capazes de prover comunicação escalável e de alto desempenho. No entanto, as cargas de trabalho geram padrões de comunicação diferentes que podem influenciar no desempenho da rede. Existem pesquisas que abordam metodologias de projeto dedicado de NoCs em função de domínios de aplicações específicos. Apesar de uma NoC dedicada possuir um alto desempenho, cargas de trabalho paralelas geram padrões de comunicação coletiva que mudam dinamicamente. Com o objetivo de aumentar a flexibilidade de redes-em-chip, trabalhos correlatos utilizam conceitos de computação reconfigurável para aumentar a capacidade da arquitetura da NoC se adaptar em função de padrões de comunicação. Alguns trabalhos focam na programação de FPGAs e outros em ASICs polimórficos. O objetivo desta tese é propor uma arquitetura de Network-on-Chip que suporte múltiplos clusters de núcleos de processamento através de roteadores programáveis e de topologias reconfiguráveis. Cada roteador é composto por uma chave crossbar reconfigurável capaz de implementar topologias dinamicamente através do uso de um segundo nível de reconfiguração. Os roteadores possuem processadores de rede que aumentam a flexibilidade e a capacidade da NoC se adaptar ao padrão de comunicação através de programas que monitoram e gerenciam a rede. Portanto, a contribuição da tese é a Arquitetura de NoC Programável Baseada em Múltiplos Clusters de Cores. Os resultados baseados em modelos analíticos e de simulação, e cargas de trabalho artificiais e naturais, mostram que a arquitetura da NoC possui um alto desempenho e vazão de pacotes, proporcionados pela adaptação de topologias e redução da influência da rede na comunicação. A ocupação em FPGA mostra que os roteadores programáveis possuem tamanho similares a NoCs com arquiteturas tradicionais para gerenciamento de mesma quantidade de núcleos. A menor utilização de buffers de entrada resulta em uma melhor eficiência no consumo de potência e energia. Portanto, através dos modelos de projeto e avaliação foi possível verificar através dos resultados que a arquitetura da MCNoC é uma alternativa para suportar padrões de comunicações coletivas. For the next generation of many-core processors, new design methodologies must be proposed. In this context, on-chip interconnections are important to assure the program performance. Traditional approaches of interconnections have physical constraints that reduce the scalability and performance to process parallel applications. The state-of-theart points out to the Network-on-Chip (NoC), which consists of routers and other network devices capable of increasing the communication scalability and performance. However, workloads produce different types of communication patterns, which can influence the network performance. There are research works that explore applicationspecific NoC design to response the demand on specific workloads. Although a dedicated NoC has a high performance, parallel workloads have different collective communication patterns. In order to increase the flexibility of NoCs, related works use concepts of reconfigurable computing to add architecture adaptability to support dynamic communication patterns. Some works focus on FPGA-based reconfiguration and others on polymorphic ASICs. The goal of this thesis is to propose an alternative Programmable Multi-Cluster NoC architecture. Each router consists of a reconfigurable crossbar switch capable of implementing dynamic topologies through a second reconfiguration level. The routers have network processors that increase the flexibility and the NoC adaptability through management programs in order to support different workloads. Therefore, the contribution of this thesis is the following: A Programmable Multi-Cluster NoC (MCNoC) architecture. Based on analytical and simulation models, and artificial and natural workloads, results show the high performance and throughput for the proposed NoC architecture, due to the adaptable topologies and low network latency impact. Results based on FPGA shows a similar component utilization considering the proposed programmable NoC relative to conventional NoC architectures for the same number of processing cores. The low utilization of input buffers improves the efficiency of power and energy consumption. Therefore, through design and evaluation models, the NoC proposal was verified and the results point out the MCNoC as an alternative architecture to support collective communication patterns.

APA, Harvard, Vancouver, ISO, and other styles

49

Dickman, Thomas J. "Event List Organization and Management on the Nodes of a Many-Core Beowulf Cluster." University of Cincinnati / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1378196499.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Oliveira, Juliano Amorim de. "Um estudo comparativo de cargas de trabalho e políticas de escalonamento para aplicações paralelas em clusters e grids computacionais." Universidade de São Paulo, 2006. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-12012007-143257/.

Full text

Abstract:

Diversas políticas de escalonamento para aplicações paralelas voltadas a ambientes computacionais distribuídos têm sido propostas. Embora tais políticas apresentem bons resultados, elas são, geralmente, avaliadas em cenários específicos. Quando o cenário muda, com diferentes ambientes distribuídos e condições de carga, essas políticas podem ter seu desempenho deteriorado. Nesse contexto, este trabalho apresenta um estudo comparativo envolvendo dez políticas de escalonamento avaliadas em diferentes cenários. Cada uma das políticas foi submetida a uma combinação de quatro cargas de trabalho de ocupação da UCP e três variações da taxa de comunicação média entre os processos, utilizando a rede. Foram considerados ainda três sistemas distribuídos distintos: dois clusters, com diferentes quantidades de nós, e um grid computacional. Foi utilizada a simulação com ambientes próximos ao real e cargas de trabalho obtidas de modelos realísticos. Os resultados demonstraram que, embora as políticas sejam voltadas a ambientes computacionais paralelos e distribuídos, quando o cenário muda, o desempenho cai e a ordem de classificação entre as políticas se altera. Os resultados permitiram ainda demonstrar a necessidade de se considerar a comunicação entre os processos durante o escalonamento em grids computacionais. Several scheduling policies for parallel applications directed to the distributed computational environments have been proposed. Although such policies present good results, they, generally, are evaluated in specific scenarios. When scenario change, by using different distributed environments and workload conditions, these policies can have its performance spoiled. In this context, this work presents a comparative study involving ten scheduling policies evaluated on different scenarios. Each policy was submitted to a combination of four CPU occupation workloads and three variations of interprocess average communication rates, using the network. Three different distributed systems had been yet considered: two clusters, with different amounts of nodes, and one grid computing. Simulation was used with environments near to the real and workloads obtained of realistic models. Although the policies are directed to parallel and distributed environments, the results have demonstrated that when scenario change, the performance falls and the ranking between the policies changes too. The results have still allowed to demonstrate the necessity of considering interprocess communication during the scheduling in a grid computing.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Computing clusters'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles