Log in

Relevant bibliographies by topics / Distribute and Parallel Computing / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Distribute and Parallel Computing.

Dissertations / Theses on the topic 'Distribute and Parallel Computing'

Author: Grafiati

Published: 9 March 2023

Last updated: 10 March 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Distribute and Parallel Computing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Xu, Lei. "Cellular distributed and parallel computing." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:88ffe124-c2fd-4144-86fe-47b35f4908bd.

Full text

Abstract:

This thesis focuses on novel approaches to distributed and parallel computing that are inspired by the mechanism and functioning of biological cells. We refer to this concept as cellular distributed and parallel computing which focuses on three important principles: simplicity, parallelism, and locality. We first give a parallel polynomial-time solution to the constraint satisfaction problem (CSP) based on a theoretical model of cellular distributed and parallel computing, which is known as neural-like P systems (or neural-like membrane systems). We then design a class of simple neural-like P systems to solve the fundamental maximal independent set (MIS) selection problem efficiently in a distributed way, by drawing inspiration from the way that developing cells in the fruit fly become specialised. Building on the novel bio-inspired approach to distributed MIS selection, we propose a new simple randomised algorithm for another fundamental distributed computing problem: the distributed greedy colouring (GC) problem. We then propose an improved distributed MIS selection algorithm that incorporates for the first time another important feature of the biological system: adapting the probabilities used at each node based on local feedback from neighbouring nodes. The improved distributed MIS selection algorithm is again extended to solve the distributed greedy colouring problem. Both improved algorithms are simple and robust and work under very restrictive conditions, moreover, they both achieve state-of-the-art performance in terms of their worst-case time complexity and message complexity. Given any n-node graph with maximum degree Delta, the expected time complexity of our improved distributed MIS selection algorithm is O(log n) and the message complexity per node is O(1). The expected time complexity of our improved distributed greedy colouring algorithm is O(Delta + log n) and the message complexity per node is again O(1). Finally, we provide some experimental results to illustrate the time and message complexity of our proposed algorithms in practice. In particular, we show experimentally that the number of colours used by our distributed greedy colouring algorithms turns out to be optimal or near-optimal for many standard graph colouring benchmarks, so they provide effective simple heuristic approaches to computing a colouring with a small number of colours.

APA, Harvard, Vancouver, ISO, and other styles

2

Xiang, Yonghong. "Interconnection networks for parallel and distributed computing." Thesis, Durham University, 2008. http://etheses.dur.ac.uk/2156/.

Full text

Abstract:

Parallel computers are generally either shared-memory machines or distributed- memory machines. There are currently technological limitations on shared-memory architectures and so parallel computers utilizing a large number of processors tend tube distributed-memory machines. We are concerned solely with distributed-memory multiprocessors. In such machines, the dominant factor inhibiting faster global computations is inter-processor communication. Communication is dependent upon the topology of the interconnection network, the routing mechanism, the flow control policy, and the method of switching. We are concerned with issues relating to the topology of the interconnection network. The choice of how we connect processors in a distributed-memory multiprocessor is a fundamental design decision. There are numerous, often conflicting, considerations to bear in mind. However, there does not exist an interconnection network that is optimal on all counts and trade-offs have to be made. A multitude of interconnection networks have been proposed with each of these networks having some good (topological) properties and some not so good. Existing noteworthy networks include trees, fat-trees, meshes, cube-connected cycles, butterflies, Möbius cubes, hypercubes, augmented cubes, k-ary n-cubes, twisted cubes, n-star graphs, (n, k)-star graphs, alternating group graphs, de Bruijn networks, and bubble-sort graphs, to name but a few. We will mainly focus on k-ary n-cubes and (n, k)-star graphs in this thesis. Meanwhile, we propose a new interconnection network called augmented k-ary n- cubes. The following results are given in the thesis.1. Let k ≥ 4 be even and let n ≥ 2. Consider a faulty k-ary n-cube Q(^k_n) in which the number of node faults f(_n) and the number of link faults f(_e) are such that f(_n) + f(_e) ≤ 2n - 2. We prove that given any two healthy nodes s and e of Q(^k_n), there is a path from s to e of length at least k(^n) - 2f(_n) - 1 (resp. k(^n) - 2f(_n) - 2) if the nodes s and e have different (resp. the same) parities (the parity of a node Q(^k_n) in is the sum modulo 2 of the elements in the n-tuple over 0, 1, ∙∙∙ , k - 1 representing the node). Our result is optimal in the sense that there are pairs of nodes and fault configurations for which these bounds cannot be improved, and it answers questions recently posed by Yang, Tan and Hsu, and by Fu. Furthermore, we extend known results, obtained by Kim and Park, for the case when n = 2.2. We give precise solutions to problems posed by Wang, An, Pan, Wang and Qu and by Hsieh, Lin and Huang. In particular, we show that Q(^k_n) is bi-panconnected and edge-bipancyclic, when k ≥ 3 and n ≥ 2, and we also show that when k is odd, Q(^k_n) is m-panconnected, for m = (^n(k - 1) + 2k - 6’ / ‘_2), and (k -1) pancyclic (these bounds are optimal). We introduce a path-shortening technique, called progressive shortening, and strengthen existing results, showing that when paths are formed using progressive shortening then these paths can be efficiently constructed and used to solve a problem relating to the distributed simulation of linear arrays and cycles in a parallel machine whose interconnection network is Q(^k_n) even in the presence of a faulty processor.3. We define an interconnection network AQ(^k_n) which we call the augmented k-ary n-cube by extending a k-ary n-cube in a manner analogous to the existing extension of an n-dimensional hypercube to an n-dimensional augmented cube. We prove that the augmented k-ary n-cube Q(^k_n) has a number of attractive properties (in the context of parallel computing). For example, we show that the augmented k-ary n-cube Q(^k_n) - is a Cayley graph (and so is vertex-symmetric); has connectivity 4n - 2, and is such that we can build a set of 4n - 2 mutually disjoint paths joining any two distinct vertices so that the path of maximal length has length at most max{{n- l)k- (n-2), k + 7}; has diameter [(^k) / (_3)] + [(^k - 1) /( _3)], when n = 2; and has diameter at most (^k) / (_4) (n+ 1), for n ≥ 3 and k even, and at most [(^k)/ (_4) (n + 1) + (^n) / (_4), for n ^, for n ≥ 3 and k odd.4. We present an algorithm which given a source node and a set of n - 1 target nodes in the (n, k)-star graph S(_n,k) where all nodes are distinct, builds a collection of n - 1 node-disjoint paths, one from each target node to the source. The collection of paths output from the algorithm is such that each path has length at most 6k - 7, and the algorithm has time complexity O(k(^3)n(^4)).

APA, Harvard, Vancouver, ISO, and other styles

3

Kim, Young Man. "Some problems in parallel and distributed computing /." The Ohio State University, 1992. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487776210795651.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Freeh, Vincent William 1959. "Software support for distributed and parallel computing." Diss., The University of Arizona, 1996. http://hdl.handle.net/10150/290588.

Full text

Abstract:

This dissertation addresses creating portable and efficient parallel programs for scientific computing. Both of these aspects are important. Portability means the program can execute on any parallel machine. Efficiency means there is little or no penalty for using our solution instead of hand-coded, architecture-specific programs. Although parallel programming is necessarily more difficult than sequential programming, it is currently more complicated than it has to be. The Filaments package provides fine-grain parallelism and a shared memory programming model. It can be viewed as a "least common denominator" for parallel scientific computing. Fine-grain parallelism supports any number (even thousands) of threads, and shared memory provides a natural programming model. Consequently, the combination allows the programmer to concentrate on the application and not the architecture of the target machine. The Filaments package makes extensive use of run-time decision making. Run-time decision making has several advantages. First, it is often possible to make a better decision because more information is available at run time. Second, run-time decision making can obviate the need for complex, often intractable, static analysis. Moreover, run-time decision making leads to much of the package's efficiency.

APA, Harvard, Vancouver, ISO, and other styles

5

Jin, Xiaoming. "A practical realization of parallel disks for a distributed parallel computing system." [Gainesville, Fla.] : University of Florida, 2000. http://etd.fcla.edu/etd/uf/2000/ane5954/master.PDF.

Full text

Abstract:

Thesis (M.S.)--University of Florida, 2000.
Title from first page of PDF file. Document formatted into pages; contains ix, 41 p.; also contains graphics. Vita. Includes bibliographical references (p. 39-40).

APA, Harvard, Vancouver, ISO, and other styles

6

馬家駒 and Ka-kui Ma. "Transparent process migration for parallel Java computing." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2001. http://hub.hku.hk/bib/B31226474.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Ma, Ka-kui. "Transparent process migration for parallel Java computing /." Hong Kong : University of Hong Kong, 2001. http://sunzi.lib.hku.hk/hkuto/record.jsp?B23589371.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Dutta, Sourav. "PERFORMANCE ESTIMATION AND SCHEDULING FOR PARALLEL PROGRAMS WITH CRITICAL SECTIONS." OpenSIUC, 2017. https://opensiuc.lib.siu.edu/dissertations/1353.

Full text

Abstract:

A fundamental problem in multithreaded parallel programs is the partial serialization that is imposed due to the presence of mutual exclusion variables or critical sections. In this work we investigate a model that considers the threads consisting of an equal number L of functional blocks, where each functional block has the same duration and either accesses a critical section or executes non-critical code. We derived formulas to estimate the average time spent in a critical section in presence of synchronization barrier and in absence of it. We also develop and establish the optimality of a fast polynomial-time algorithm to find a schedule with the shortest makespan for any number of threads and for any number of critical sections for the case of L = 2. For the general case L > 2, which is NP-complete, we present a competitive heuristic and provide experimental comparisons with the ideal integer linear programming (ILP) formulation.

APA, Harvard, Vancouver, ISO, and other styles

9

Winter, Stephen Charles. "A distributed reduction architecture for real-time computing." Thesis, University of Westminster, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.238722.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Valente, Fredy Joao. "An integrated parallel/distributed environment for high performance computing." Thesis, University of Southampton, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.362138.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Wong, K. W. (Kwok Wa Dennis) Carleton University Dissertation Computer Science. "Partitioning B+-tree in parallel and distributed computing environment." Ottawa, 1994.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

12

Karl, Holger. "Responsive Execution of Parallel Programs in Distributed Computing Environments." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, 1999. http://dx.doi.org/10.18452/14455.

Full text

Abstract:

Vernetzte Standardarbeitsplatzrechner (sog. Cluster) sind eine attraktive Umgebung zur Ausf"uhrung paralleler Programme; f"ur einige Anwendungsgebiete bestehen jedoch noch immer ungel"oste Probleme. Ein solches Problem ist die Verl"asslichkeit und Rechtzeitigkeit der Programmausf"uhrung: In vielen Anwendungen ist es wichtig, sich auf die rechtzeitige Fertigstellung eines Programms verlassen zu k"onnen. Mechanismen zur Kombination dieser Eigenschaften f"ur parallele Programme in verteilten Rechenumgebungen sind das Hauptanliegen dieser Arbeit. Zur Behandlung dieses Anliegens ist eine gemeinsame Metrik f"ur Verl"asslichkeit und Rechtzeitigkeit notwendig. Eine solche Metrik ist die Responsivit"at, die f"ur die Bed"urfnisse dieser Arbeit verfeinert wird. Als Fallstudie werden Calypso und Charlotte, zwei Systeme zur parallelen Programmierung, im Hinblick auf Responsivit"at untersucht und auf mehreren Abstraktionsebenen werden Ansatzpunkte zur Verbesserung ihrer Responsivit"at identifiziert. L"osungen f"ur diese Ansatzpunkte werden zu allgemeineren Mechanismen f"ur (parallele) responsive Dienste erweitert. Im Einzelnen handelt es sich um 1. eine Analyse der Responsivit"at von Calypsos ``eager scheduling'' (ein Verfahren zur Lastbalancierung und Fehlermaskierung), 2. die Behebung eines ``single point of failure,'' zum einen durch eine Responsivit"atsanalyse von Checkpointing, zum anderen durch ein auf Standardschnittstellen basierendes System zur Replikation bestehender Software, 3. ein Verfahren zur garantierten Ressourcenzuteilung f"ur parallele Programme und 4.die Einbeziehung semantischer Information "uber das Kommunikationsmuster eines Programms in dessen Ausf"uhrung zur Verbesserung der Leistungsf"ahigkeit. Die vorgeschlagenen Mechanismen sind kombinierbar und f"ur den Einsatz in Standardsystemen geeignet. Analyse und Experimente zeigen, dass diese Mechanismen die Responsivit"at passender Anwendungen verbessern.
Clusters of standard workstations have been shown to be an attractive environment for parallel computing. However, there remain unsolved problems to make them suitable to some application scenarios. One of these problems is a dependable and timely program execution: There are many applications in which a program should be successfully completed at a predictable point of time. Mechanisms to combine the properties of both dependable and timely execution of parallel programs in distributed computing environments are the main objective of this dissertation. Addressing these properties requires a joint metric for dependability and timeliness. Responsiveness is such a metric; it is refined for the purposes of this work. As a case study, Calypso and Charlotte, two parallel programming systems, are analyzed and their shortcomings on several abstraction levels with regard to responsiveness are identified. Solutions for them are presented and generalized, resulting in widely applicable mechanisms for (parallel) responsive services. Specifically, these solutions are: 1) a responsiveness analysis of Calypso's eager scheduling (a mechanism for load balancing and fault masking), 2) ameliorating a single point of failure by a responsiveness analysis of checkpointing and by a standard interface-based system for replication of legacy software, 3) managing resources in a way suitable for parallel programs, and 4) using semantical information about the communication pattern of a program to improve its performance. All proposed mechanisms can be combined and are suitable for use in standard environments. It is shown by analysis and experiments that these mechanisms improve the responsiveness of eligible applications.

APA, Harvard, Vancouver, ISO, and other styles

13

Papay, Juraj. "Performance characterisation of distributed memory MIMD computations." Thesis, University of Warwick, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.310021.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Bennett, Sidney Page. "Designing a Compiler for a Distributed Memory Parallel Computing System." Thesis, Virginia Tech, 2003. http://hdl.handle.net/10919/9684.

Full text

Abstract:

The SCMP processor presents a unique approach to processor design: integrating multiple processors, a network, and memory onto a single chip. The benefits to this design include a reduction in overhead incurred by synchronization, communication, and memory accesses. To properly determine its effectiveness, the SCMP architecture must be exercised under a wide variety of workloads, creating the need for a variety of applications. A compiler can relieve the time spent developing these applications by allowing the use of languages such as C and Fortran. However, compiler development is a research area in its own right, requiring extensive knowledge of the architecture to make good use of its resources. This thesis presents the design and implementation of a compiler for the SCMP architecture. The thesis includes an in-depth analysis of SCMP and the necessary design choices for an effective compiler using the SUIF and MachSUIF toolsets. Two optimizations passes are included in the discussion: partial redundancy elimination and instruction scheduling. While these optimizations are not specific to parallel computing, architectural considerations must still be made to properly implement the algorithms within the SCMP compiler. These optimizations yield an overall reduction in execution time of 15-36%.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

15

Kore, Anand. "Using idle workstations for distributed computing." Ohio : Ohio University, 1998. http://www.ohiolink.edu/etd/view.cgi?ohiou1176488008.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Stuart, Mary Bernadette. "The Eigensolution of symmetric matrices on distributed memory computers." Thesis, Queen's University Belfast, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.295449.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Pyla, Hari Krishna. "Tempest: A Framework for High Performance Thermal-Aware Distributed Computing." Thesis, Virginia Tech, 2007. http://hdl.handle.net/10919/33198.

Full text

Abstract:

Compute clusters are consuming more power at higher densities than ever before. This results in increased thermal dissipation, the need for powerful cooling systems, and ultimately a reduction in system reliability as temperatures increase. Over the past several years, the research community has reacted to this problem by producing software tools such as HotSpot and Mercury to estimate system thermal characteristics and validate thermal-management techniques. While these tools are flexible and useful, they suffer several limitations: for the average user such simulation tools can be cumbersome to use, these tools may take significant time and expertise to port to different systems. Further, such tools produce significant detail and accuracy at the expense of execution time enough to prohibit iterative testing. We propose a fast, easy to use, accurate, portable, software framework called Tempest (for temperature estimator) that leverages emergent thermal sensors to enable user profiling, evaluating, and reducing the thermal characteristics of systems and applications. In this thesis, we illustrate the use of Tempest to analyze the thermal effects of various parallel benchmarks in clusters. We also show how users can analyze the effects of thermal optimizations on cluster applications. Dynamic Voltage and Frequency Scaling (DVFS) reduces the power consumption of high-performance clusters by reducing processor voltage during periods of low utilization. We designed Tempest to measure the runtime effects of processor frequency on thermals. Our experiments indicate HPC workload characteristics greatly impact the effects of DVFS on temperature. We propose a thermal-aware DVFS scheduling approach that proactively controls processor voltage across a cluster by evaluating and predicting trends in processor temperature. We identify approaches that can maintain temperature thresholds and reduce temperature with minimal impact on performance. Our results indicate that proactive, temperature-aware scheduling of DVFS can reduce cluster-wide processor thermals by more than 10 degrees Celsius, the threshold for improving electronic reliability by 50%.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

18

Sziveri, Janos. "Parallel computational techniques for explicit finite element analysis." Thesis, Heriot-Watt University, 1997. http://hdl.handle.net/10399/1254.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Souza, Paulo Sergio Lopes de. "AMIGO: Uma contribuição para a convergência na área de escalonamento de processos." Universidade de São Paulo, 2000. http://www.teses.usp.br/teses/disponiveis/76/76132/tde-25032014-152613/.

Full text

Abstract:

Este trabalho propõe e descreve em detalhes o projeto do AMIGO (DynAMical FlexIble SchedulinG EnvirOnment), uma nova ferramenta de software capaz de viabilizar a união de diferentes algoritmos de escalonamento, de uma maneira completamente transparente ao usuário. O AMIGO é capaz de flexibilizar o escalonamento (em tempo de execução da aplicação) desde a sua configuração até a sua efetiva aplicação. Além da flexibilidade dinâmica e da transparência, o AMIGO também é modular: o seu projeto está dividido em módulos que, entre outras vantagens, facilitam sua execução em diferentes plataformas. Este trabalho também contribui apresentando uma análise crítica da literatura da área, apontando divergências e propondo pontos de convergência importantes. Assim, o levantamento bibliográfico apresentado atua como um material introdutório precioso para que os pesquisadores iniciantes formem um contexto geral sobre a área e, desse modo, aprofundem mais rapidamente seus estudos em outros trabalhos mais específicos. A avaliação de desempenho feita com o AMIGO demonstra que é possível a obtenção de ganhos de desempenho expressivos, com total transparência para o usuário final. Unindo-se desempenho, flexibilidade e transparência, espera-se contribuir para a redução da lacuna existente entre teoria e prática na área de escalonamento de processos
This thesis proposes and describes in details the design of the AMIGO (DynAMical FlexIble SchedulinG EnvirOnment), a novel software tool that makes possible the union of different scheduling algorithms, in a way completely transparent to the user. The AMIGO is able to make flexible the scheduling activity (at run-time), covering all the steps from its configuration up to its effective application. Besides the dynamic flexibility and transparency, AMIGO is also modular: it is split into modules that, among other advantages, facilitate its execution on different platforms. This work also contributes by presenting a critical analysis of the process-scheduling literature, pointing out the existing divergences and proposing important convergence points. Thus, the literature survey presented acts as a precious introductory material, which is able, on one hand, to give to the beginners a broad view of the process-scheduling area and, on the other hand, to facilitate the development of deeper studies in a quicker fashion when more specific works are needed. The performance evaluation of the AMIGO shows that is possible to have expressive performance gains, while having total user transparency. Joining flexibility and transparency it is hoped to contribute for the reduction of the existing gap between theory and practice in the scheduling process area

APA, Harvard, Vancouver, ISO, and other styles

20

McMahon, Adam. "Rendering Animations with Distributed Applets over the Internet." Scholarly Repository, 2009. http://scholarlyrepository.miami.edu/oa_theses/189.

Full text

Abstract:

High quality 3D rendering requires massive computing resources. In order to render animations within a reasonable amount of time, the rendering process is often distributed among a cluster of computers, typically called a rendering farm. However, most individuals and small studios do not have the resources to purchase or lease a rendering farm. In the late 1990?s, Java technology brought a hope that distributed applets could be utilized as an alternative to traditional network rendering models. Yet, this hope was never realized, nor was it fully implemented. Taking into account new developments in web application technology and the Sunflow renderer, this thesis will reexamine the possibility of distributed rendering applets. This thesis will suggest that distributed Java applets can effectively render projects across a collection of heterogeneous and geographically dispersed computers over the Internet. Moreover, this paper will present a prototype web application, called RenderWeb, that uses distributed applets to quickly render projects created in popular animation programs, such as Blender, 3D Studio MAX, and Softimage.

APA, Harvard, Vancouver, ISO, and other styles

21

Jacob, Aju. "Distributed configuration management for reconfigurable cluster computing." [Gainesville, Fla.] : University of Florida, 2004. http://purl.fcla.edu/fcla/etd/UFE0007181.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Shan, Meijuan. "Distributed object-oriented parallel computing on heterogeneous workstation clusters using Java." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/mq43403.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Stewart, Robert. "Reliable massively parallel symbolic computing : fault tolerance for a distributed Haskell." Thesis, Heriot-Watt University, 2013. http://hdl.handle.net/10399/2834.

Full text

Abstract:

As the number of cores in manycore systems grows exponentially, the number of failures is also predicted to grow exponentially. Hence massively parallel computations must be able to tolerate faults. Moreover new approaches to language design and system architecture are needed to address the resilience of massively parallel heterogeneous architectures. Symbolic computation has underpinned key advances in Mathematics and Computer Science, for example in number theory, cryptography, and coding theory. Computer algebra software systems facilitate symbolic mathematics. Developing these at scale has its own distinctive set of challenges, as symbolic algorithms tend to employ complex irregular data and control structures. SymGridParII is a middleware for parallel symbolic computing on massively parallel High Performance Computing platforms. A key element of SymGridParII is a domain specific language (DSL) called Haskell Distributed Parallel Haskell (HdpH). It is explicitly designed for scalable distributed-memory parallelism, and employs work stealing to load balance dynamically generated irregular task sizes. To investigate providing scalable fault tolerant symbolic computation we design, implement and evaluate a reliable version of HdpH, HdpH-RS. Its reliable scheduler detects and handles faults, using task replication as a key recovery strategy. The scheduler supports load balancing with a fault tolerant work stealing protocol. The reliable scheduler is invoked with two fault tolerance primitives for implicit and explicit work placement, and 10 fault tolerant parallel skeletons that encapsulate common parallel programming patterns. The user is oblivious to many failures, they are instead handled by the scheduler. An operational semantics describes small-step reductions on states. A simple abstract machine for scheduling transitions and task evaluation is presented. It defines the semantics of supervised futures, and the transition rules for recovering tasks in the presence of failure. The transition rules are demonstrated with a fault-free execution, and three executions that recover from faults. The fault tolerant work stealing has been abstracted in to a Promela model. The SPIN model checker is used to exhaustively search the intersection of states in this automaton to validate a key resiliency property of the protocol. It asserts that an initially empty supervised future on the supervisor node will eventually be full in the presence of all possible combinations of failures. The performance of HdpH-RS is measured using five benchmarks. Supervised scheduling achieves a speedup of 757 with explicit task placement and 340 with lazy work stealing when executing Summatory Liouville up to 1400 cores of a HPC architecture. Moreover, supervision overheads are consistently low scaling up to 1400 cores. Low recovery overheads are observed in the presence of frequent failure when lazy on-demand work stealing is used. A Chaos Monkey mechanism has been developed for stress testing resiliency with random failure combinations. All unit tests pass in the presence of random failure, terminating with the expected results.

APA, Harvard, Vancouver, ISO, and other styles

24

Janjic, Vladimir. "Load balancing of irregular parallel applications on heterogeneous computing environments." Thesis, University of St Andrews, 2012. http://hdl.handle.net/10023/2540.

Full text

Abstract:

Large-scale heterogeneous distributed computing environments (such as Computational Grids and Clouds) offer the promise of access to a vast amount of computing resources at a relatively low cost. In order to ease the application development and deployment on such complex environments, high-level parallel programming languages exist that need to be supported by sophisticated runtime systems. One of the main problems that these runtime systems need to address is dynamic load balancing that ensures that no resources in the environment are underutilised or overloaded with work. This thesis deals with the problem of obtaining good speedups for irregular applications on heterogeneous distributed computing environments. It focuses on workstealing techniques that can be used for load balancing during the execution of irregular applications. It specifically addresses two problems that arise during work-stealing: where thieves should look for work during the application execution and how victims should respond to steal attempts. In particular, we describe and implement a new Feudal Stealing algorithm and also we describe and implement new granularity-driven task selection policies in the SCALES simulator, which is a work-stealing simulator developed for this thesis. In addition, we present the comprehensive evaluation of the Feudal Stealing algorithm and the granularity-driven task selection policies using the simulations of a large class of regular and irregular parallel applications on a wide range of computing environments. We show how the Feudal Stealing algorithm and the granularity-driven task selection policies bring significant improvements in speedups of irregular applications, compared to the state-of-the-art work-stealing algorithms. Furthermore, we also present the implementation of the task selection policies in the Grid-GUM runtime system [AZ06] for Glasgow Parallel Haskell (GpH) [THLPJ98], in addition to the implementation in SCALES, and we also present the evaluation of this implementation on a large set of synthetic applications.

APA, Harvard, Vancouver, ISO, and other styles

25

Djemame, Karim. "Distributed simulation of high-level algebraic Petri nets." Thesis, University of Glasgow, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.301624.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Pulla, Gautam. "High Performance Computing Issues in Large-Scale Molecular Statics Simulations." Thesis, Virginia Tech, 1999. http://hdl.handle.net/10919/33206.

Full text

Abstract:

Successful application of parallel high performance computing to practical problems requires overcoming several challenges. These range from the need to make sequential and parallel improvements in programs to the implementation of software tools which create an environment that aids sharing of high performance hardware resources and limits losses caused by hardware and software failures. In this thesis we describe our approach to meeting these challenges in the context of a Molecular Statics code. We describe sequential and parallel optimizations made to the code and also a suite of tools constructed to facilitate the execution of the Molecular Statics program on a network of parallel machines with the aim of increasing resource sharing, fault tolerance and availability.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

27

Kohler, Peter. "Concept and implementation of an efficient communication network for distributed parallel computing /." Zürich, 1997. http://e-collection.ethbib.ethz.ch/show?type=diss&nr=12169.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Chinthamani, Meemakshisundaram R. "Future effectiveness of centralized and distributed memory architectures for parallel computing systems." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape7/PQDD_0019/NQ37877.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

LUZZI, Cinzia. "Enabling parallel and interactive distributed computing data analysis for the ALICE experiment." Doctoral thesis, Università degli studi di Ferrara, 2014. http://hdl.handle.net/11392/2388941.

Full text

Abstract:

AliEn (ALICE Environment) is the production environment developed by the ALICE collaboration at CERN. It provides a set of Grid tools enabling the full offline computational work-flow of the experiment (simulation, reconstruction and data analysis) in a distributed and heterogeneous computing environment. In addition to the analysis on the Grid, ALICE users perform local interactive analysis using ROOT and the Parallel ROOT Facility (PROOF). PROOF enables physicists to analyse in parallel medium-sized (200-300 TB) data sets in a short time scale. The default installation of PROOF is on a static dedicated cluster, typically 200-300 cores. This well-proven approach is not devoid of limitations, more specifically for analysis of larger datasets or when the installation of a dedicated cluster is not possible. Using a new framework called Proof on Demand (PoD), PROOF can be used directly on Grid-enabled clusters, by dynamically assigning interactive nodes on user request. This thesis presents the PoD on AliEn project. The integration of Proof on Demand in the AliEn framework provides private dynamic PROOF clusters as a Grid service. This functionality is transparent to the user who will submit interactive jobs to the AliEn system. The ROOT framework, among other things, is used by physicists to carry out the Monte Carlo Simulation of the detector. The engineers working on the mechanical design of the detector need to collaborate with the physicists. However, the softwares used by the engineers are not compatible with ROOT. This thesis describes a second result obtained during this PhD project: the implementation of the TGeoCad Interface that allows the conversion of ROOT geometries to STEP format, compatible with CAD systems. The interface provides an important communication and collaboration tool between physicists and engineers, dealing with the simulation and the design of the detector geometry.

APA, Harvard, Vancouver, ISO, and other styles

30

Kazilas, Panagiotis. "Augmenting MPI Programming Process with Cognitive Computing." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-88913.

Full text

Abstract:

Cognitive Computing is a new and quickly advancing technology. In thelast decade Cognitive Computing has been used to assist researchers in theirendeavors in many different scientific fields such as Health & medicine,Education, Marketing, Psychology and Financial Services. On the otherhand, Parallel programming is a more complex concept than sequentialprogramming. The additional complexity of Parallel Programming isintroduced by its nature that requires implementations of more complexalgorithms and it introduces additional concepts to the developers, namelythe communication between the processes (Distributed memory systems)that execute the parallel program and their synchronization (Share memorysystems). As a result of this additional complexity, a lot of novice developersare reserved in their attempts to implement parallel programs. The objectiveof this research project was to investigate whether we can assist parallelprogramming process through cognitive computing solutions. In order toachieve our objective, the MPI Assistant, a Q&A system has been developedand a case study has been carried out to determine our application’s efficiencyin our attempt to assist parallel programming developers. The case studyshowed that our MPI Assistant system indeed helped developers reduce thetime they spend to develop their solutions, but not improve the quality ofthe program or its efficiency as these improvements require features that areout of this research project’s scope. However, the case study had limitednumber of participants, which may affect our results’ reliability. As a nextstep in our attempt to determine if cognitive computing technologies are ableto assist developers in their parallel programming development, we movedto investigate if cognitive solutions can extract better and more completeresponses compared to our manually-created responses that we created forthe MPI Assistant. We have experimented with 2 different approaches to theproblem. An approach where we manually created responses for the MPIAssistant, and an approach where we investigated if cognitive solutions canautomatically extract better and complete responses. We compared the qualityof the latter automatic responses with the quality of the former which weremanually created.

APA, Harvard, Vancouver, ISO, and other styles

31

Wu, Jiande. "Parallel Computing of Particle Filtering Algorithms for Target Tracking Applications." ScholarWorks@UNO, 2014. http://scholarworks.uno.edu/td/1953.

Full text

Abstract:

Particle filtering has been a very popular method to solve nonlinear/non-Gaussian state estimation problems for more than twenty years. Particle filters (PFs) have found lots of applications in areas that include nonlinear filtering of noisy signals and data, especially in target tracking. However, implementation of high dimensional PFs in real-time for large-scale problems is a very challenging computational task. Parallel & distributed (P&D) computing is a promising way to deal with the computational challenges of PF methods. The main goal of this dissertation is to develop, implement and evaluate computationally efficient PF algorithms for target tracking, and thereby bring them closer to practical applications. To reach this goal, a number of parallel PF algorithms is designed and implemented using different parallel hardware architectures such as Computer Cluster, Graphics Processing Unit (GPU), and Field-Programmable Gate Array (FPGA). Proposed is an improved PF implementation for computer cluster - the Particle Transfer Algorithm (PTA), which takes advantage of the cluster architecture and outperforms significantly existing algorithms. Also, a novel GPU PF algorithm implementation is designed which is highly efficient for GPU architectures. The proposed algorithm implementations on different parallel computing environments are applied and tested for target tracking problems, such as space object tracking, ground multitarget tracking using image sensor, UAV-multisensor tracking. Comprehensive performance evaluation and comparison of the algorithms for both tracking and computational capabilities is performed. It is demonstrated by the obtained simulation results that the proposed implementations help greatly overcome the computational issues of particle filtering for realistic practical problems.

APA, Harvard, Vancouver, ISO, and other styles

32

Marucci, Evandro Augusto. "Paralelização da ferramenta de alinhamento de sequências MUSCLE para um ambiente distribuído /." São José do Rio Preto : [s.n.], 2009. http://hdl.handle.net/11449/89349.

Full text

Abstract:

Orientador: José Márcio Machado
Banca: Liria Matsumoto Sato
Banca: Aleardo Manacero Junior
Resumo: Devido a crescente quantidade de dados genômicos para comparação, a computação paralela está se tornando cada vez mais necessária para realizar uma das operaçoes mais importantes da bioinformática, o alinhamento múltiplo de sequências. Atualmente, muitas ferramentas computacionais são utilizadas para resolver alinhamentos e o uso da computação paralela está se tornando cada vez mais generalizado. Entretanto, embora diferentes algoritmos paralelos tenham sido desenvolvidos para suportar as pesquisas genômicas, muitos deles não consideram aspectos fundamentais da computação paralela. O MUSCLE [1] e uma ferramenta que realiza o alinhamento m ultiplo de sequências com um bom desempenho computacional e resultados biológicos signi cativamente precisos [2]. Embora os m etodos utilizados por ele apresentem diferentes versões paralelas propostas na literatura, apenas uma versão paralela do MUSCLE foi proposta [3]. Essa versão, entretanto, foi desenvolvida para sistemas de mem oria compartilhada. O desenvolvimento de uma versão paralela do MUSCLE para sistemas distribu dos e importante dado o grande uso desses sistemas em laboratórios de pesquisa genômica. Esta paralelização e o foco deste trabalho e ela foi realizada utilizando-se abordagens paralelas existentes e criando-se novas abordagens. Como resultado, diferentes estratégias paralelas foram propostas. Estas estratégias podem ser incorporadas a outras ferramentas de alinhamento que utilizam, em determinadas etapas, a mesma abordagem sequencial. Em cada método paralelizado, considerou-se principalmente a e ciência, a escalabilidade e a capacidade de atender problemas reais da biologia. Os testes realizados mostram que, para cada etapa paralela, ao menos uma estratégia de nida atende bem todos esses crit erios. Al em deste trabalho realizar um paralelismo in edito, ao viabilizar a execução da ferramenta MUSCLE em... (Resumo completo, clicar acesso eletrônico abaixo)
Abstract: Due to increasing amount of genetic data for comparison, parallel computing is becoming increasingly necessary to perform one of the most important operations in bioinformatics, the multiple sequence alignments. Nowadays, many software tools are used to solve sequence alignments and the use of parallel computing is becoming more and more widespread. However, although di erent parallel algorithms were developed to support genetic researches, many of them do not consider fundamental aspects of parallel computing. The MUSCLE [1] is a tool that performs multiple sequence alignments with good computational performance and biological results signi cantly precise [2]. Although the methods used by them have di erent parallel versions proposed in the literature, only one parallel version of the MUSCLE tool was proposed [3]. This version, however, was developed for shared memory systems. The development of a parallel MUSCLE tool for distributed systems is important given the wide use of such systems in laboratories of genomic researches. This parallelization is the aim of this work and it was done using existing parallel approaches and creating new approaches. Consequently, di erent parallel strategies have been proposed. These strategies can be incorporated into other alignment tools that use, in a given stage, the same sequential approach. In each parallel method, we considered mainly the e ciency, scalability and ability to meet real biological problems. The tests show that, for each parallel step, at least one de ned strategy meets all these criteria. In addition to the new MUSCLE parallelization, enabling it execute in a distributed systems, the results show that the de ned strategies have a better performance than the existing strategies.
Mestre

APA, Harvard, Vancouver, ISO, and other styles

33

MAJ, CARLO. "Sensitivity analysis for computational models of biochemical systems." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2014. http://hdl.handle.net/10281/50494.

Full text

Abstract:

Systems biology is an integrated area of science which aims at the analysis of biochemical systems using an holistic perspective. In this context, sensitivity analysis, a technique studying how the output variation of a computational model can be associated to its input state plays a pivotal role. In the thesis it is described how to properly apply the different sensitivity analysis techniques according to the specific case study (i.e., continuous deterministic rather than discrete stochastic output). Moreover, we explicitly consider aspects that have been often neglected in the analysis of computational biochemical models, among others, we propose an exploratory analysis of spatial effects in diffusion processes in crowded environments. Furthermore, we developed an innovative pipeline for the partitioning of the input factor space according with the different qualitative dynamics that may be attained by a model (focusing on steady state and oscillatory behavior). Finally, we describe different implementation methods for the reduction of the computational time required to perform sensitivity analysis by evaluating distribute and parallel approaches of model simulations.

APA, Harvard, Vancouver, ISO, and other styles

34

Ni, Ze. "Comparative Evaluation of Spark andStratosphere." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-118226.

Full text

Abstract:

Nowadays, although MapReduce is applied to the parallel processing on big data, it has some limitations: for instance, lack of generic but efficient and richly functional primitive parallel methods, incapability of entering multiple input parameters on the entry of parallel methods, and inefficiency in the way of handling iterative algorithms. Spark and Stratosphere are developed to deal with (partly) the shortcoming of MapReduce. The goal of this thesis is to evaluate Spark and Stratosphere both from the point of view of theoretical programming model and practical execution on specified application algorithms. In the introductory section of comparative programming models, we mainly explore and compare the features of Spark and Stratosphere that overcome the limitation of MapReduce. After the comparison in theoretical programming model, we further evaluate their practical performance by running three different classes of applications and assessing usage of computing resources and execution time. It is concluded that Spark has promising features for iterative algorithms in theory but it may not achieve the expected performance improvement to run iterative applications if the amount of memory used for cached operations is close to the actual available memory in the cluster environment. In that case, the reason for the poor results in performance is because larger amount of memory participates in the caching operation and in turn, only a small amount memory is available for computing operations of actual algorithms. Stratosphere shows favorable characteristics as a general parallel computing framework, but it has no support for iterative algorithms and spends more computing resources than Spark for the same amount of work. In another aspect, applications based on Stratosphere can achieve benefits by manually setting compiler hints when developing the code, whereas Spark has no corresponding functionality.

APA, Harvard, Vancouver, ISO, and other styles

35

Rezvoy, Clément. "Large Scale Parallel Inference of Protein and Protein Domain families." Phd thesis, Ecole normale supérieure de lyon - ENS LYON, 2011. http://tel.archives-ouvertes.fr/tel-00682495.

Full text

Abstract:

Protein domains are recurring independent segment of proteins. The combinatorial arrangement of domains is at the root of the functional and structural diversity of proteins. Several methods have been developed to infer protein domain decomposition and domain family clustering from sequence information alone. MkDom2 is one of those methods. Mkdom2 infers domain families in a greedy fashion. Families are inferred one after the other in order to create a delineation of domains on proteins and a clustering of those domains in families. MkDom2 is instrumental in the building of the ProDom database. The exponential growth of the number of sequences to process as rendered MkDom2 obsolete, it would now take several years to compute a newrelease of ProDom. We present a nous algorithm, MPI_MkDom2, allowing computation of several families at once across a distributed computing platform. MPI_MkDom2 is an asynchronous distributed algorithm managing load balancing to ensure efficient platform usage; it ensures the creation of a non-overlapping partitioning of the whole protein set. A new proximity measure is defined to assess the effect of the parallel computation on the result. We also Propose a second algorithm, MPI_mkDom3, allowing the simultaneous computation of a clustering of protein domains as well as full protein sharing the same domain decomposition.

APA, Harvard, Vancouver, ISO, and other styles

36

Kannan, Vijayasarathy. "A Distributed Approach to EpiFast using Apache Spark." Thesis, Virginia Tech, 2015. http://hdl.handle.net/10919/55272.

Full text

Abstract:

EpiFast is a parallel algorithm for large-scale epidemic simulations, based on an interpretation of the stochastic disease propagation in a contact network. The original EpiFast implementation is based on a master-slave computation model with a focus on distributed memory using message-passing-interface (MPI). However, it suffers from few shortcomings with respect to scale of networks being studied. This thesis addresses these shortcomings and provides two different implementations: Spark-EpiFast based on the Apache Spark big data processing engine and Charm-EpiFast based on the Charm++ parallel programming framework. The study focuses on exploiting features of both systems that we believe could potentially benefit in terms of performance and scalability. We present models of EpiFast specific to each system and relate algorithm specifics to several optimization techniques. We also provide a detailed analysis of these optimizations through a range of experiments that consider scale of networks and environment settings we used. Our analysis shows that the Spark-based version is more efficient than the Charm++ and MPI-based counterparts. To the best of our knowledge, ours is one of the preliminary efforts of using Apache Spark for epidemic simulations. We believe that our proposed model could act as a reference for similar large-scale epidemiological simulations exploring non-MPI or MapReduce-like approaches.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

37

Liu, Linqian. "A Parallel and Distributed Computing Platform for Neural Networks Using Wireless Sensor Networks." University of Toledo / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1336159164.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Osman, Taha Mohammed. "FADI : a fault-tolerant environment for distributed processing systems." Thesis, Nottingham Trent University, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.388867.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Hernandez, Jesus Israel. "Reactive scheduling of DAG applications on heterogeneous and dynamic distributed computing systems." Thesis, University of Edinburgh, 2008. http://hdl.handle.net/1842/2336.

Full text

Abstract:

Emerging technologies enable a set of distributed resources across a network to be linked together and used in a coordinated fashion to solve a particular parallel application at the same time. Such applications are often abstracted as directed acyclic graphs (DAGs), in which vertices represent application tasks and edges represent data dependencies between tasks. Effective scheduling mechanisms for DAG applications are essential to exploit the tremendous potential of computational resources. The core issues are that the availability and performance of resources, which are already by their nature heterogeneous, can be expected to vary dynamically, even during the course of an execution. In this thesis, we first consider the problem of scheduling DAG task graphs onto heterogeneous resources with changeable capabilities. We propose a list-scheduling heuristic approach, the Global Task Positioning (GTP) scheduling method, which addresses the problem by allowing rescheduling and migration of tasks in response to significant variations in resource characteristics. We observed from experiments with GTP that in an execution with relatively frequent migration, it may be that, over time, the results of some task have been copied to several other sites, and so a subsequent migrated task may have several possible sources for each of its inputs. Some of these copies may now be more quickly accessible than the original, due to dynamic variations in communication capabilities. To exploit this observation, we extended our model with a Copying Management(CM) function, resulting in a new version, the Global Task Positioning with copying facilities (GTP/c) system. The idea is to reuse such copies, in subsequent migration of placed tasks, in order to reduce the impact of migration cost on makespan. Finally, we believe that fault tolerance is an important issue in heterogeneous and dynamic computational environments as the availability of resources cannot be guaranteed. To address the problem of processor failure, we propose a rewinding mechanism which rewinds the progress of the application to a previous state, thereby preserving the execution in spite of the failed processor(s). We evaluate our mechanisms through simulation, since this allow us to generate repeatable patterns of resource performance variation. We use a standard benchmark set of DAGs, comparing performance against that of competing algorithms from the scheduling literature.

APA, Harvard, Vancouver, ISO, and other styles

40

Wang, Chen. "Chemistry Inspired Middleware for Flexible Service Composition and Application." Phd thesis, INSA de Rennes, 2013. http://tel.archives-ouvertes.fr/tel-00932085.

Full text

Abstract:

Les Architectures Orientées Services (SOA) sont adoptées aujourd'hui par de nombreuses entreprises car elles représentent une solution flexible pour la construction d'applications distribuées. Une Application Basée sur des Services (SBA) peut se définir comme un workflow qui coordonne de manière dynamique l'exécution distribuée d'un ensemble de services. Les services peuvent être sélectionnés et intégrés en temps réel en fonction de leur Qualité de Service (QoS), et la composition de services peut être dynamiquement modifiée pour réagir à des défaillances imprévues pendant l'exécution. Les besoins des architectures orientées services présentent des similarités avec la nature: dynamicité, évolutivité, auto-adaptabilité, etc. Ainsi, il n'est pas surprenant que les métaphores inspirées par la nature soient considérées comme des approches appropriées pour la modélisation de tels systèmes. Nous allons plus loin en utilisant le paradigme de programmation chimique comme base de construction d'un middleware. Dans cette thèse, nous présentons un middleware "chimique'' pour l'exécution dynamique et adaptative de SBA. La sélection, l'intégration, la coordination et l'adaptation de services sont modélisées comme une série de réactions chimiques. Tout d'abord, l'instantiation de workflow est exprimée par une série de réactions qui peuvent être effectuées de manière parallèle, distribuée et autonome. Ensuite, nous avons mis en oeuvre trois modèles de coordination pour exécuter une composition de service. Nous montrons que les trois modèles peuvent réagir aux défaillances de type panne franche. Enfin, nous avons évalué et comparé ces modèles au niveau d'efficacité et complexité sur deux workflows. Nous montrons ainsi dans cette thèse que le paradigme chimique possède les qualités nécessaires à l'introduction de la dynamicité et de l'adaptabilité dans la programmation basée sur les services.

APA, Harvard, Vancouver, ISO, and other styles

41

Ghafoor, Sheikh Khaled. "Modeling of an adaptive parallel system with malleable applications in a distributed computing environment." Diss., Mississippi State : Mississippi State University, 2007. http://sun.library.msstate.edu/ETD-db/theses/available/etd-11092007-145420.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Yao, Aixiang I. Song. "An Efficient Parallel Three-Level Preconditioner for Linear Partial Differential Equations." Thesis, Virginia Tech, 1998. http://hdl.handle.net/10919/36499.

Full text

Abstract:

The primary motivation of this research is to develop and investigate parallel preconditioners for linear elliptic partial differential equations. Three preconditioners are studied: block-Jacobi preconditioner (BJ), a two-level tangential preconditioner (D0), and a three-level preconditioner (D1). Performance and scalability on a distributed memory parallel computer are considered. Communication cost and redundancy are explored as well. After experiments and analysis, we find that the three-level preconditioner D1 is the most efficient and scalable parallel preconditioner, compared to BJ and D0. The D1 preconditioner reduces both the number of iterations and computational time substantially. A new hybrid preconditioner is suggested which may combine the best features of D0 and D1.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

43

Gupta, Sounak. "Pending Event Set Management in Parallel Discrete Event Simulation." University of Cincinnati / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1535701778479768.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Klein-Halmaghi, Cristian. "Cooperative Resource Management for Parallel and Distributed Systems." Phd thesis, Ecole normale supérieure de lyon - ENS LYON, 2012. http://tel.archives-ouvertes.fr/tel-00780719.

Full text

Abstract:

High-Performance Computing (HPC) resources, such as Supercomputers, Clusters, Grids and HPC Clouds, are managed by Resource Management Systems (RMSs) that multiple resources among multiple users and decide how computing nodes are allocated to user applications. As more and more petascale computing resources are built and exascale is to be achieved by 2020, optimizing resource allocation to applications is critical to ensure their efficient execution. However, current RMSs, such as batch schedulers, only offer a limited interface. In most cases, the application has to blindly choose resources at submittal without being able to adapt its choice to the state of the target resources, neither before it started nor during execution. The goal of this Thesis is to improve resource management, so as to allow applications to efficiently allocate resources. We achieve this by proposing software architectures that promote collaboration between the applications and the RMS, thus, allowing applications to negotiate the resources they run on. To this end, we start by analysing the various types of applications and their unique resource requirements, categorizing them into rigid, moldable, malleable and evolving. For each case, we highlight the opportunities they open up for improving resource management.The first contribution deals with moldable applications, for which resources are only negotiated before they start. We propose CooRMv1, a centralized RMS architecture, which delegates resource selection to the application launchers. Simulations show that the solution is both scalable and fair. The results are validated through a prototype implementation deployed on Grid'5000. Second, we focus on negotiating allocations on geographically-distributed resources, managed by multiple institutions. We build upon CooRMv1 and propose distCooRM, a distributed RMS architecture, which allows moldable applications to efficiently co-allocate resources managed by multiple independent agents. Simulation results show that distCooRM is well-behaved and scales well for a reasonable number of applications. Next, attention is shifted to run-time negotiation of resources, so as to improve support for malleable and evolving applications. We propose CooRMv2, a centralized RMS architecture, that enables efficient scheduling of evolving applications, especially non-predictable ones. It allows applications to inform the RMS about their maximum expected resource usage, through pre-allocations. Resources which are pre-allocated but unused can be filled by malleable applications. Simulation results show that considerable gains can be achieved. Last, production-ready software are used as a starting point, to illustrate the interest as well as the difficulty of improving cooperation between existing systems. GridTLSE is used as an application and DIET as an RMS to study a previously unsupported use-case. We identify the underlying problem of scheduling optional computations and propose an architecture to solve it. Real-life experiments done on the Grid'5000 platform show that several metrics are improved, such as user satisfaction, fairness and the number of completed requests. Moreover, it is shown that the solution is scalable.

APA, Harvard, Vancouver, ISO, and other styles

45

Collet, Julien. "Exploration of parallel graph-processing algorithms on distributed architectures." Thesis, Compiègne, 2017. http://www.theses.fr/2017COMP2391/document.

Full text

Abstract:

Avec l'explosion du volume de données produites chaque année, les applications du domaine du traitement de graphes ont de plus en plus besoin d'être parallélisées et déployées sur des architectures distribuées afin d'adresser le besoin en mémoire et en ressource de calcul. Si de telles architectures larges échelles existent, issue notamment du domaine du calcul haute performance (HPC), la complexité de programmation et de déploiement d’algorithmes de traitement de graphes sur de telles cibles est souvent un frein à leur utilisation. De plus, la difficile compréhension, a priori, du comportement en performances de ce type d'applications complexifie également l'évaluation du niveau d'adéquation des architectures matérielles avec de tels algorithmes. Dans ce contexte, ces travaux de thèses portent sur l’exploration d’algorithmes de traitement de graphes sur architectures distribuées en utilisant GraphLab, un Framework de l’état de l’art dédié à la programmation parallèle de tels algorithmes. En particulier, deux cas d'applications réelles ont été étudiées en détails et déployées sur différentes architectures à mémoire distribuée, l’un venant de l’analyse de trace d’exécution et l’autre du domaine du traitement de données génomiques. Ces études ont permis de mettre en évidence l’existence de régimes de fonctionnement permettant d'identifier des points de fonctionnements pertinents dans lesquels on souhaitera placer un système pour maximiser son efficacité. Dans un deuxième temps, une étude a permis de comparer l'efficacité d'architectures généralistes (type commodity cluster) et d'architectures plus spécialisées (type serveur de calcul hautes performances) pour le traitement de graphes distribué. Cette étude a démontré que les architectures composées de grappes de machines de type workstation, moins onéreuses et plus simples, permettaient d'obtenir des performances plus élevées. Cet écart est d'avantage accentué quand les performances sont pondérées par les coûts d'achats et opérationnels. L'étude du comportement en performance de ces architectures a également permis de proposer in fine des règles de dimensionnement et de conception des architectures distribuées, dans ce contexte. En particulier, nous montrons comment l’étude des performances fait apparaitre les axes d’amélioration du matériel et comment il est possible de dimensionner un cluster pour traiter efficacement une instance donnée. Finalement, des propositions matérielles pour la conception de serveurs de calculs plus performants pour le traitement de graphes sont formulées. Premièrement, un mécanisme est proposé afin de tempérer la baisse significative de performance observée quand le cluster opère dans un point de fonctionnement où la mémoire vive est saturée. Enfin, les deux applications développées ont été évaluées sur une architecture à base de processeurs basse-consommation afin d'étudier la pertinence de telles architectures pour le traitement de graphes. Les performances mesurés en utilisant de telles plateformes sont encourageantes et montrent en particulier que la diminution des performances brutes par rapport aux architectures existantes est compensée par une efficacité énergétique bien supérieure
With the advent of ever-increasing graph datasets in a large number of domains, parallel graph-processing applications deployed on distributed architectures are more and more needed to cope with the growing demand for memory and compute resources. Though large-scale distributed architectures are available, notably in the High-Performance Computing (HPC) domain, the programming and deployment complexity of such graphprocessing algorithms, whose parallelization and complexity are highly data-dependent, hamper usability. Moreover, the difficult evaluation of performance behaviors of these applications complexifies the assessment of the relevance of the used architecture. With this in mind, this thesis work deals with the exploration of graph-processing algorithms on distributed architectures, notably using GraphLab, a state of the art graphprocessing framework. Two use-cases are considered. For each, a parallel implementation is proposed and deployed on several distributed architectures of varying scales. This study highlights operating ranges, which can eventually be leveraged to appropriately select a relevant operating point with respect to the datasets processed and used cluster nodes. Further study enables a performance comparison of commodity cluster architectures and higher-end compute servers using the two use-cases previously developed. This study highlights the particular relevance of using clustered commodity workstations, which are considerably cheaper and simpler with respect to node architecture, over higher-end systems in this applicative context. Then, this thesis work explores how performance studies are helpful in cluster design for graph-processing. In particular, studying throughput performances of a graph-processing system gives fruitful insights for further node architecture improvements. Moreover, this work shows that a more in-depth performance analysis can lead to guidelines for the appropriate sizing of a cluster for a given workload, paving the way toward resource allocation for graph-processing. Finally, hardware improvements for next generations of graph-processing servers areproposed and evaluated. A flash-based victim-swap mechanism is proposed for the mitigation of unwanted overloaded operations. Then, the relevance of ARM-based microservers for graph-processing is investigated with a port of GraphLab on a NVIDIA TX2-based architecture

APA, Harvard, Vancouver, ISO, and other styles

46

Andersson, Filip, and Simon Norberg. "Scalable applications in a distributed environment." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-3917.

Full text

Abstract:

As the amount of simultaneous users of distributed systems increase, scalability is becoming an important factor to consider during software development. Without sufficient scalability, systems might have a hard time to manage high loads, and might not be able to support a high amount of users. We have determined how scalability can best be implemented, and what extra costs this leads to. Our research is based on both a literature review, where we have looked at what others in the field of computer engineering thinks about scalability, and by implementing a highly scalable system of our own. In the end we came up with a couple of general pointers which can help developers to determine if they should focus on scalable development, and what they should consider if they choose to do so.

APA, Harvard, Vancouver, ISO, and other styles

47

Cordeiro, Daniel. "The impact of cooperation on new high performance computing platforms." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00690908.

Full text

Abstract:

L'informatique a changé profondément les aspects méthodologiques du processus de découverte dans les différents domaines du savoir. Les chercheurs ont à leur disposition aujourd'hui de nouvelles capacités qui permettent d'envisager la résolution de nouveaux problèmes. Les plates-formes parallèles et distribuées composées de ressources partagées entre différents participants peuvent rendre ces nouvelles capacités accessibles à tout chercheur et offrent une puissance de calcul qui a été limitée jusqu'à présent aux projets scientifiques les plus grands (et les plus riches). Dans ce document qui regroupe les résultats obtenus pendant cette thèse, nous explorons quatre facettes différentes de la façon dont les organisations s'engagent dans une collaboration sur de plates-formes parallèles et distribuées. En utilisant des outils classiques de l'analyse combinatoire, de l'ordonnancement multi-objectif et de la théorie des jeux, nous avons montré comment calculer des ordonnancements avec un bon compromis entre les résultats obtenus par les participants et la performance globale de la plate-forme. En assurant des résultats justes et en garantissant des améliorations de performance pour les différents participants, nous pouvons créer une plate-forme efficace où chacun se sent toujours encouragé à collaborer et à partager ses ressources. Tout d'abord, nous étudions la collaboration entre organisations égoïstes. Nous montrons que le comportement égoïste entre les participants impose une borne inférieure sur le makespan global. Nous présentons des algorithmes qui font face à l'égoïsme des organisations et qui présentent des résultats équitables. La seconde étude porte sur la collaboration entre les organisations qui peuvent tolérer une dégradation limitée de leur performance si cela peut aider à améliorer le makespan global. Nous améliorons les bornes d'inapproximabilité connues sur ce problème et nous présentons de nouveaux algorithmes dont les garanties sont proches de l'ensemble de Pareto (qui regroupe les meilleures solutions possibles). La troisième forme de collaboration étudiée est celle entre des participants rationnels qui peuvent choisir la meilleure stratégie pour leur tâches. Nous présentons un modèle de jeu non coopératif pour le problème et nous montrons comment l'utilisation de "coordination mechanisms" permet la création d'équilibres approchés avec un prix de l'anarchie borné. Finalement, nous étudions la collaboration entre utilisateurs partageant un ensemble de ressources communes. Nous présentons une méthode qui énumère la frontière des solutions avec des meilleurs compromis pour les utilisateurs et sélectionne la solution qui apporte la meilleure performance globale.

APA, Harvard, Vancouver, ISO, and other styles

48

McMurtrey, Shannon Dale. "Training and Optimizing Distributed Neural Networks Using a Genetic Algorithm." NSUWorks, 2010. http://nsuworks.nova.edu/gscis_etd/243.

Full text

Abstract:

Parallelizing neural networks is an active area of research. Current approaches surround the parallelization of the widely used back-propagation (BP) algorithm, which has a large amount of communication overhead, making it less than ideal for parallelization. An algorithm that does not depend on the calculation of derivatives, and the backward propagation of errors, better lends itself to a parallel implementation. One well known training algorithm for neural networks explicitly incorporates network structure in the objective function to be minimized which yields simpler neural networks. Prior work has implemented this using a modified genetic algorithm in a serial fashion that is not scalable, thus limiting its usefulness. This dissertation created a parallel version of the algorithm. The performance of the proposed algorithm is compared against the existing algorithm using a variety of syn-thetic and real world problems. Computational experiments with benchmark datasets in-dicate that the parallel algorithm proposed in this research outperforms the serial version from prior research in finding better minima in the same time as well as identifying a simpler architecture.

APA, Harvard, Vancouver, ISO, and other styles

49

Qu, Long. "Méthodes de préconditionnement pour la résolution de systèmes linéaires sur des machines massivement parallèles." Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112053.

Full text

Abstract:

Cette thèse traite d’une nouvelle classe de préconditionneurs qui ont pour but d’accélérer la résolution des grands systèmes creux, courant dans les problèmes scientifiques ou industriels, par les méthodes itératives préconditionnées. Pour appliquer ces préconditionneurs, la matrice d’entrée doit être réorganisée avec un algorithme de dissection emboîtée. Nous introduisons également une technique de recouvrement qui s’adapte à l’idée de chevauchement des sous-domaines provenant des méthodes de décomposition de domaine, aux méthodes de dissection emboîtée pour améliorer la convergence de nos préconditionneurs.Les résultats montrent que cette technique de recouvrement nous permet d’améliorer la vitesse de convergence de Nested SSOR (NSSOR) et Nested Modified incomplete LU with Rowsum proprety (NMILUR) qui sont des préconditionneurs que nous étudions. La dernière partie de cette thèse portera sur nos contributions dans le domaine du calcul parallèle. Nous présenterons la distribution des données et les algorithmes parallèles utilisés pour la mise en oeuvre de nos préconditionneurs. Les résultats montrent que sur une grille régulière 400x400x400, le nombre d’itérations nécessaire à la résolution avec un de nos préconditionneurs, Nested Filtering Factorization préconditionneur (NFF), n’augmente que légèrement quand le nombre de sous-domaines augmente jusqu’à 2048. En ce qui concerne les performances d’exécution sur le super-calculateur Curie, il passe à l’échelle jusqu’à 2048 coeurs et il est 2,6 fois plus rapide que le préconditionneur Schwarz Additif Restreint (RAS) qui est un des préconditionneurs basés sur les méthodes de décomposition de domaine implémentés dans la bibliothèque de calcul scientifique PETSc, bien connue de la communauté
This thesis addresses a new class of preconditioners which aims at accelerating solving large sparse systems arising in scientific and engineering problem by using preconditioned iterative methods. To apply these preconditioners, the input matrix needs to be reordered with K-way nested dissection. We also introduce an overlapping technique that adapts the idea of overlapping subdomains from domain decomposition methods to nested dissection based methods to improve the convergence of these preconditioners. Results show that such overlapping technique improves the convergence rate of Nested SSOR (NSSOR) and Nested Modified Incomplete LU with Rowsum property (NMILUR) precondtioners that we worked on. We also present the data distribution and parallel algorithms for implementing these preconditioners. Results show that on a 400x400x400 regular grid, the number of iterations with Nested Filtering Factorization preconditioner (NFF) increases slightly while increasing the number of subdomains up to 2048. In terms of runtime performance on Curie supercomputer, it scales up to 2048 cores and it is 2.6 times faster than the domain decomposition preconditioner Restricted Additive Schwarz (RAS) as implemented in PETSc

APA, Harvard, Vancouver, ISO, and other styles

50

Tran-The, Hung. "Problème du Consensus dans le Modèle Homonyme." Phd thesis, Université Paris-Diderot - Paris VII, 2013. http://tel.archives-ouvertes.fr/tel-00925941.

Full text

Abstract:

So far, the distributed computing community has either assumed that all the processes of a distributed system have distinct identifiers or, more rarely, that the processes are anonymous and have no identifiers. These are two extremes of the same general model: namely, n processes use l different identifiers, where 1 l n. We call this model homonymous model. To determine the power of homonymous model as well as the importance of identifiers in distributed computing, this thesis studies the consensus problem, one of the most famous distributed computing problem. We give necessary and sufficient conditions on the number of identifiers for solving consensus in a distributed system with t faulty processes in the synchronous case. We show that in crash, send omission and general omission failures model, the uniform consensus is solvable even if processes are anonymous. Thus, identifiers are not useful in that case. However identifiers become important in Byzantine failures model: 3t + 1 identifiers is necessary and sufficient for Byzantine agreement. Surprisingly the number of identifiers must be greater than n+3t 2 in presence of three facets of uncertainty: partial synchrony, Byzantine failures and homonyms. This demonstrates two differences from the classical model (which has l = n): there are situations where relaxing synchrony to partial synchrony renders agreement impossible, and, in the partially synchronous case, increasing the number of correct processes can actually make it harder to reach agreement.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!