To see the other types of publications on this topic, follow the link: Parallel computers.

Dissertations / Theses on the topic 'Parallel computers'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Parallel computers.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Yousif, Hilal M. "Parallel algorithms for MIMD parallel computers." Thesis, Loughborough University, 1986. https://dspace.lboro.ac.uk/2134/15113.

Full text
Abstract:
This thesis mainly covers the design and analysis of asynchronous parallel algorithms that can be run on MIMD (Multiple Instruction Multiple Data) parallel computers, in particular the NEPTUNE system at Loughborough University. Initially the fundamentals of parallel computer architectures are introduced with different parallel architectures being described and compared. The principles of parallel programming and the design of parallel algorithms are also outlined. Also the main characteristics of the 4 processor MIMD NEPTUNE system are presented, and performance indicators, i.e. the speed-up and the efficiency factors are defined for the measurement of parallelism in a given system. Both numerical and non-numerical algorithms are covered in the thesis. In the numerical solution of partial differential equations, a new parallel 9-point block iterative method is developed. Here, the organization of the blocks is done in such a way that each process contains its own group of 9 points on the network, therefore, they can be run in parallel. The parallel implementation of both 9-point and 4- point block iterative methods were programmed using natural and redblack ordering with synchronous and asynchronous approaches. The results obtained for these different implementations were compared and analysed. Next the parallel version of the A.G.E. (Alternating Group Explicit) method is developed in which the explicit nature of the difference equation is revealed and exploited when applied to derive the solution of both linear and non-linear 2-point boundary value problems. Two strategies have been used in the implementation of the parallel A.G.E. method using the synchronous and asynchronous approaches. The results from these implementations were compared. Also for comparison reasons the results obtained from the parallel A.G.E. were compared with the ~ corresponding results obtained from the parallel versions of the Jacobi, Gauss-Seidel and S.O.R. methods. Finally, a computational complexity analysis of the parallel A.G.E. algorithms is included. In the area of non-numeric algorithms, the problems of sorting and searching were studied. The sorting methods which were investigated was the shell and the digit sort methods. with each method different parallel strategies and approaches were used and compared to find the best results which can be obtained on the parallel machine. In the searching methods, the sequential search algorithm in an unordered table and the binary search algorithms were investigated and implemented in parallel with a presentation of the results. Finally, a complexity analysis of these methods is presented. The thesis concludes with a chapter summarizing the main results.
APA, Harvard, Vancouver, ISO, and other styles
2

Su, (Philip) Shin-Chen. "Parallel subdomain method for massively parallel computers." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/17376.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Miller, R. Quentin. "Programming bulk-synchronous parallel computers." Thesis, University of Oxford, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.318894.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Kalaiselvi, S. "Checkpointing Algorithms for Parallel Computers." Thesis, Indian Institute of Science, 1997. https://etd.iisc.ac.in/handle/2005/3908.

Full text
Abstract:
Checkpointing is a technique widely used in parallel/distributed computers for rollback error recovery. Checkpointing is defined as the coordinated saving of process state information at specified time instances. Checkpoints help in restoring the computation from the latest saved state, in case of failure. In addition to fault recovery, checkpointing has applications in fault detection, distributed debugging and process migration. Checkpointing in uniprocessor systems is easy due to the fact that there is a single clock and events occur with respect to this clock. There is a clear demarcation of events that happens before a checkpoint and events that happens after a checkpoint. In parallel computers a large number of computers coordinate to solve a single problem. Since there might be multiple streams of execution, checkpoints have to be introduced along all these streams simultaneously. Absence of a global clock necessitates explicit coordination to obtain a consistent global state. Events occurring in a distributed system, can be ordered partially using Lamport's happens before relation. Lamport's happens before relation ->is a partial ordering relation to identify dependent and concurrent events occurring in a distributed system. It is defined as follows: ·If two events a and b happen in the same process, and if a happens before b, then a->b ·If a is the sending event of a message and b is the receiving event of the same message then a -> b ·If neither a à b nor b -> a, then a and b are said to be concurrent. A consistent global state may have concurrent checkpoints. In the first chapter of the thesis we discuss issues regarding ordering of events in a parallel computer, need for coordination among checkpoints and other aspects related to checkpointing. Checkpointing locations can either be identified statically or dynamically. The static approach assumes that a representation of a program to be checkpointed is available with information that enables a programmer to specify the places where checkpoints are to be taken. The dynamic approach identifies the checkpointing locations at run time. In this thesis, we have proposed algorithms for both static and dynamic checkpointing. The main contributions of this thesis are as follows: 1. Parallel computers that are being built now have faster communication and hence more efficient clock synchronisation compared to those built a few years ago. Based on efficient clock synchronisation protocols, the clock drift in current machines can be maintained within a few microseconds. We have proposed a dynamic checkpointing algorithm for parallel computers assuming bounded clock drifts. 2. The shared memory paradigm is convenient for programming while message passing paradigm is easy to scale. Distributed Shared Memory (DSM) systems combine the advantage of both paradigms and can be visualized easily on top of a network of workstations. IEEE has recently proposed an interconnect standard called Scalable Coherent Interface (SCI), to con6gure computers as a Distributed Shared Memory system. A periodic dynamic checkpointing algorithm has been proposed in the thesis for a DSM system which uses the SCI standard. 3. When information about a parallel program is available one can make use of this knowledge to perform efficient checkpointing. A static checkpointing approach based on task graphs is proposed for parallel programs. The proposed task graph based static checkpointing approach has been implemented on a Parallel Virtual Machine (PVM) platform. We now give a gist of various chapters of the thesis. Chapter 2 of the thesis gives a classification of existing checkpointing algorithms. The chapter surveys algorithm that have been reported in literature for checkpointing parallel/distributed systems. A point to be noted is that most of the algorithms published for checkpointing message passing systems are based on the seminal article by Chandy & Lamport. A large number of checkpointing algorithms have been published by relaxing the assumptions made in the above mentioned article and by extending the features to minimise the overheads of coordination and context saving. Checkpointing for shared memory systems primarily extend cache coherence protocols to maintain a consistent memory. All of them assume that the main memory is safe for storing the context. Recently algorithms have been published for distributed shared memory systems, which extend the cache coherence protocols used in shared memory systems. They however also include methods for storing the status of distributed memory in stable storage. Chapter 2 concludes with brief comments on the desirable features of a checkpointing algorithm. In Chapter 3, we develop a dynamic checkpointing algorithm for message passing systems assuming that the clock drift of processors in the system is bounded. Efficient clock synchronisation protocols have been implemented on recent parallel computers owing to the fact that communication between processors is very fast. Based on efficient clock synchronisation protocols, clock skew can be limited to a few microseconds. The algorithm proposed in the thesis uses clocks for checkpoint coordination and vector counts for identifying messages to be logged. The algorithm is a periodic, distributed algorithm. We prove correctness of the algorithm and compare it with similar clock based algorithms. Distributed Shared Memory (DSM) systems provide the benefit of ease of programming in a scalable system. The recently proposed IEEE Scalable Coherent Interface (SCI) standard, facilitates the construction of scalable coherent systems. In Chapter 4 we discuss a checkpointing algorithm for an SCI based DSM system. SCI maintains cache coherence in hardware using a distributed cache directory which scales with the number of processors in the system. SCI recommends a two phase transaction protocol for communication. Our algorithm is a two phase centralised coordinated algorithm. Phase one initiates checkpoints and the checkpointing activity is completed in phase two. The correctness of the algorithm is established theoretically. The chapter concludes with the discussion of the features of SCI exploited by the checkpointing algorithm proposed in the thesis. In Chapter 5, a static checkpointing algorithm is developed assuming that the program to be executed on a parallel computer is given as a directed acyclic task graph. We assume that the estimates of the time to execute each task in the task graph is given. Given the timing at which checkpoints are to be taken, the algorithm identifies a set of edges where checkpointing tasks can be placed ensuring that they form a consistent global checkpoint. The proposed algorithm eliminates coordination overhead at run time. It significantly reduces the context saving overhead by taking checkpoints along edges of the task graph. The algorithm is used as a preprocessing step before scheduling the tasks to processors. The algorithm complexity is O(km) where m is the number of edges in the graph and k the maximum number of global checkpoints to be taken. The static algorithm is implemented on a parallel computer with a PVM environment as it is widely available and portable. The task graph of a program can be constructed manually or through program development tools. Our implementation is a collection of preprocessing and run time routines. The preprocessing routines operate on the task graph information to generate a set of edges to be checkpointed for each global checkpoint and write the information on disk. The run time routines save the context along the marked edges. In case of recovery, the recovery algorithms read the information from stable storage and reconstruct the context. The limitation of our static checkpointing algorithm is that it can operate only on deterministic task graphs. To demonstrate the practical feasibility of the proposed approach, case studies of checkpointing some parallel programs are included in the thesis. We conclude the thesis with a summary of proposed algorithms and possible directions to continue research in the area of checkpointing.
APA, Harvard, Vancouver, ISO, and other styles
5

Kalaiselvi, S. "Checkpointing Algorithms for Parallel Computers." Thesis, Indian Institute of Science, 1997. http://hdl.handle.net/2005/67.

Full text
Abstract:
Checkpointing is a technique widely used in parallel/distributed computers for rollback error recovery. Checkpointing is defined as the coordinated saving of process state information at specified time instances. Checkpoints help in restoring the computation from the latest saved state, in case of failure. In addition to fault recovery, checkpointing has applications in fault detection, distributed debugging and process migration. Checkpointing in uniprocessor systems is easy due to the fact that there is a single clock and events occur with respect to this clock. There is a clear demarcation of events that happens before a checkpoint and events that happens after a checkpoint. In parallel computers a large number of computers coordinate to solve a single problem. Since there might be multiple streams of execution, checkpoints have to be introduced along all these streams simultaneously. Absence of a global clock necessitates explicit coordination to obtain a consistent global state. Events occurring in a distributed system, can be ordered partially using Lamport's happens before relation. Lamport's happens before relation ->is a partial ordering relation to identify dependent and concurrent events occurring in a distributed system. It is defined as follows: ·If two events a and b happen in the same process, and if a happens before b, then a->b ·If a is the sending event of a message and b is the receiving event of the same message then a -> b ·If neither a à b nor b -> a, then a and b are said to be concurrent. A consistent global state may have concurrent checkpoints. In the first chapter of the thesis we discuss issues regarding ordering of events in a parallel computer, need for coordination among checkpoints and other aspects related to checkpointing. Checkpointing locations can either be identified statically or dynamically. The static approach assumes that a representation of a program to be checkpointed is available with information that enables a programmer to specify the places where checkpoints are to be taken. The dynamic approach identifies the checkpointing locations at run time. In this thesis, we have proposed algorithms for both static and dynamic checkpointing. The main contributions of this thesis are as follows: 1. Parallel computers that are being built now have faster communication and hence more efficient clock synchronisation compared to those built a few years ago. Based on efficient clock synchronisation protocols, the clock drift in current machines can be maintained within a few microseconds. We have proposed a dynamic checkpointing algorithm for parallel computers assuming bounded clock drifts. 2. The shared memory paradigm is convenient for programming while message passing paradigm is easy to scale. Distributed Shared Memory (DSM) systems combine the advantage of both paradigms and can be visualized easily on top of a network of workstations. IEEE has recently proposed an interconnect standard called Scalable Coherent Interface (SCI), to con6gure computers as a Distributed Shared Memory system. A periodic dynamic checkpointing algorithm has been proposed in the thesis for a DSM system which uses the SCI standard. 3. When information about a parallel program is available one can make use of this knowledge to perform efficient checkpointing. A static checkpointing approach based on task graphs is proposed for parallel programs. The proposed task graph based static checkpointing approach has been implemented on a Parallel Virtual Machine (PVM) platform. We now give a gist of various chapters of the thesis. Chapter 2 of the thesis gives a classification of existing checkpointing algorithms. The chapter surveys algorithm that have been reported in literature for checkpointing parallel/distributed systems. A point to be noted is that most of the algorithms published for checkpointing message passing systems are based on the seminal article by Chandy & Lamport. A large number of checkpointing algorithms have been published by relaxing the assumptions made in the above mentioned article and by extending the features to minimise the overheads of coordination and context saving. Checkpointing for shared memory systems primarily extend cache coherence protocols to maintain a consistent memory. All of them assume that the main memory is safe for storing the context. Recently algorithms have been published for distributed shared memory systems, which extend the cache coherence protocols used in shared memory systems. They however also include methods for storing the status of distributed memory in stable storage. Chapter 2 concludes with brief comments on the desirable features of a checkpointing algorithm. In Chapter 3, we develop a dynamic checkpointing algorithm for message passing systems assuming that the clock drift of processors in the system is bounded. Efficient clock synchronisation protocols have been implemented on recent parallel computers owing to the fact that communication between processors is very fast. Based on efficient clock synchronisation protocols, clock skew can be limited to a few microseconds. The algorithm proposed in the thesis uses clocks for checkpoint coordination and vector counts for identifying messages to be logged. The algorithm is a periodic, distributed algorithm. We prove correctness of the algorithm and compare it with similar clock based algorithms. Distributed Shared Memory (DSM) systems provide the benefit of ease of programming in a scalable system. The recently proposed IEEE Scalable Coherent Interface (SCI) standard, facilitates the construction of scalable coherent systems. In Chapter 4 we discuss a checkpointing algorithm for an SCI based DSM system. SCI maintains cache coherence in hardware using a distributed cache directory which scales with the number of processors in the system. SCI recommends a two phase transaction protocol for communication. Our algorithm is a two phase centralised coordinated algorithm. Phase one initiates checkpoints and the checkpointing activity is completed in phase two. The correctness of the algorithm is established theoretically. The chapter concludes with the discussion of the features of SCI exploited by the checkpointing algorithm proposed in the thesis. In Chapter 5, a static checkpointing algorithm is developed assuming that the program to be executed on a parallel computer is given as a directed acyclic task graph. We assume that the estimates of the time to execute each task in the task graph is given. Given the timing at which checkpoints are to be taken, the algorithm identifies a set of edges where checkpointing tasks can be placed ensuring that they form a consistent global checkpoint. The proposed algorithm eliminates coordination overhead at run time. It significantly reduces the context saving overhead by taking checkpoints along edges of the task graph. The algorithm is used as a preprocessing step before scheduling the tasks to processors. The algorithm complexity is O(km) where m is the number of edges in the graph and k the maximum number of global checkpoints to be taken. The static algorithm is implemented on a parallel computer with a PVM environment as it is widely available and portable. The task graph of a program can be constructed manually or through program development tools. Our implementation is a collection of preprocessing and run time routines. The preprocessing routines operate on the task graph information to generate a set of edges to be checkpointed for each global checkpoint and write the information on disk. The run time routines save the context along the marked edges. In case of recovery, the recovery algorithms read the information from stable storage and reconstruct the context. The limitation of our static checkpointing algorithm is that it can operate only on deterministic task graphs. To demonstrate the practical feasibility of the proposed approach, case studies of checkpointing some parallel programs are included in the thesis. We conclude the thesis with a summary of proposed algorithms and possible directions to continue research in the area of checkpointing.
APA, Harvard, Vancouver, ISO, and other styles
6

Harrison, Ian. "Locality and parallel optimizations for parallel supercomputing." Diss., Connect to the thesis, 2003. http://hdl.handle.net/10066/1274.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

練偉森 and Wai-sum Lin. "Adaptive parallel rendering." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1999. http://hub.hku.hk/bib/B31221415.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Lin, Wai-sum. "Adaptive parallel rendering /." Hong Kong : University of Hong Kong, 1999. http://sunzi.lib.hku.hk/hkuto/record.jsp?B20868236.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Sundar, N. S. "Data access optimizations for parallel computers /." The Ohio State University, 1998. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487950658548697.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Katiker, Rushikesh. "Automatic generation of dynamic parallel architectures." Click here for download, 2007. http://proquest.umi.com/pqdweb?did=1475182071&sid=1&Fmt=2&clientId=3260&RQT=309&VName=PQD.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Elabed, Jamal. "Implementing parallel sorting algorithms." Virtual Press, 1989. http://liblink.bsu.edu/uhtbin/catkey/543997.

Full text
Abstract:
The Republic of Guinea is located on the west coast of Africa at about 11° North latitude. A large portion of Guinea's supply of protein is dried fish. The actual drying method operates under open air, the foodstuff being unprotected from unexpected rains, windborne dirt and dust, and from infestation by insects, rodents, and other animals. More, the deforestation rate is increasing year after year, depleting the source of fuel for drying. Practical ways of drying fish cheaply and sanitarily would be welcome.Recently, much work has been devoted to developing algorithms for parallel processors. Parallel algorithms have received a great deal of attention because of the advances in computer hardware technology. These parallel processors and algorithms have been used to improve computational speed, especially in the areas of sorting, evaluation of polynomials, arithmetic expressions, matrix and graphic problems.Sorting is an important operation in business and computer engineering applications. The literature contains many sorting algorithms, both sequential and parallel, which have been developed and used in practical applications. bubble sort, quick sort, insertion sort, enumeration sort, bucket and odd-even transposition sort. Ada, a new excellent programming language that offers high-level concurrent processing facilities called tasks, is used in this thesis to introduce, implement, compare and evaluate some of the parallel sorting algorithms. This thesis will also show that parallel sorting algorithms reduce the time requirement to perform the tasks.
Department of Computer Science
APA, Harvard, Vancouver, ISO, and other styles
12

Hybinette, Maria. "Interactive parallel simulation environments." Diss., Georgia Institute of Technology, 2000. http://hdl.handle.net/1853/9174.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Izadi, Mohammad. "Hierarchical Matrix Techniques on Massively Parallel Computers." Doctoral thesis, Universitätsbibliothek Leipzig, 2012. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-101164.

Full text
Abstract:
Hierarchical matrix (H-matrix) techniques can be used to efficiently treat dense matrices. With an H-matrix, the storage requirements and performing all fundamental operations, namely matrix-vector multiplication, matrix-matrix multiplication and matrix inversion can be done in almost linear complexity. In this work, we tried to gain even further speedup for the H-matrix arithmetic by utilizing multiple processors. Our approach towards an H-matrix distribution relies on the splitting of the index set. The main results achieved in this work based on the index-wise H-distribution are: A highly scalable algorithm for the H-matrix truncation and matrix-vector multiplication, a scalable algorithm for the H-matrix matrix multiplication, a limited scalable algorithm for the H-matrix inversion for a large number of processors.
APA, Harvard, Vancouver, ISO, and other styles
14

Pan, Yinfei. "Parallel XML parsing." Diss., Online access via UMI:, 2009.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
15

Velusamy, Vijay. "Adapting Remote Direct Memory Access based file system to parallel Input-/Output." Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-11112003-092209.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

邱祖淇 and Cho-ki Joe Yau. "Efficient solutions for the load distribution problem." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1999. http://hub.hku.hk/bib/B31222031.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Yau, Cho-ki Joe. "Efficient solutions for the load distribution problem /." Hong Kong : University of Hong Kong, 1999. http://sunzi.lib.hku.hk/hkuto/record.jsp?B20971953.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Farreras, Esclusa Montse. "Optimizing programming models for massively parallel computers." Doctoral thesis, Universitat Politècnica de Catalunya, 2008. http://hdl.handle.net/10803/31776.

Full text
Abstract:
Since the invention of the transistor, clock frequency increase was the primary method of improving computing performance. As the reach of Moore's law came to an end, however, technology driven performance gains became increasingly harder to achieve, and the research community was forced to come up with innovative system architectures. Today increasing parallelism is the primary method of improving performance: single processors are being replaced by multiprocessor systems and multicore architectures. The challenge faced by computer architects is to increase performance while limited by cost and power consumption. The appearance of cheap and fast interconnection networks has promoted designs based on distributed memory computing. Most modern massively parallel computers, as reflected by the Top 500 list, are clusters of workstations using commodity processors connected by high speed interconnects. Today's massively parallel systems consist of hundreds of thousands of processors. Software technology to program these large systems is still in its infancy. Optimizing communication has become a key to overall system performance. To cope with the increasing burden of communication, the following methods have been explored: (i) Scalability in the messaging system: The messaging system itself needs to scale up to the 100K processor range. (ii) Scalable algorithms reducing communication: As the machine grows in size the amount of communication also increases, and the resulting overhead negatively impacts performance. New programming models and algorithms allow programmers to better exploit locality and reduce communication. (iii) Speed up communication: reducing and hiding communication latency, and improving bandwidth. Following the three items described above, this thesis contributes to the improvement of the communication system (i) by proposing a scalable memory management of the communication system, that guarantees the correct reception of data and control-data, (ii) by proposing a language extension that allows programmers to better exploit data locality to reduce inter-node communication, and (iii) by presenting and evaluating a cache of remote addresses that aims to reduce control-data and exploit the RDMA native network capabilities, resulting in latency reduction and better overlap of communication and computation. Our contributions are analyzed in two different parallel programming models: Message Passing Interface (MPI) and Unified Parallel C (UPC). Many different programing models exist today, and the programmer usually needs to choose one or another depending on the problem and the machine architecture. MPI has been chosen because it is the de facto standard for parallel programming in distributed memory machines. UPC was considered because it constitutes a promising easy-to-use approach to parallelism. Since parallelism is everywhere, programmability is becoming important and languages such as UPC are gaining attention as a potential future of high performance computing. Concerning the communication system, the languages chosen are relevant because, while MPI offers two-sided communication, UPC relays on a one-sided communication model. This difference potentially influences the communication system requirements of the language. These requirements as well as our contributions are analyzed and discussed for both programming models and we state whether they apply to both programming models.
APA, Harvard, Vancouver, ISO, and other styles
19

Wang, Diangin. "Solving the algebraic eigenproblem on parallel computers." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp03/NQ31909.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Davis, Martin H. Jr. "Optical waveguides in general purpose parallel computers." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/8153.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Wilson, Gregory V. "Structuring and supporting programs on parallel computers." Thesis, University of Edinburgh, 1992. http://hdl.handle.net/1842/12151.

Full text
Abstract:
Distributed-memory multicomputers will not find acceptance outwith the specialised field of high-performance numerical applications until programs written for them using systems similar to those found on conventional uniprocessors can be run with an efficiency comparable to that achieved on those uniprocessors. This work argues that the key to constructing upwardly-compatible programming systems for multicomputers based on message passing which are both efficient and usable, and which allow effective monitoring, is to require those systems to be structured, in the same way that modern programming languages require programs to be structured. It is further argued that even a well-structured message-passing system is too low-level for most applications programming, and that some more abstract system is required. The merits of one such abstraction, called generative communciation, are considered, and suggestions made for enriching standard implementations in order to improve their usability and efficiency. Finally, it is argued that the performance of any programming system for distributed-memory multicomputers, regardless of its degree of abstraction, is largely determined by the degree to which it eliminates or avoids contention. A technique for doing this, based on opportunistic combining networks, is introduced, and its effect on performance investigated using simulations.
APA, Harvard, Vancouver, ISO, and other styles
22

Booth, Stephen Peter. "Application of parallel computers to particle physics." Thesis, University of Edinburgh, 1992. http://hdl.handle.net/1842/15213.

Full text
Abstract:
This thesis describes lattice gauge theories and discusses methods used to simulate them stochastically. The use of parallel computers for these simulations is discussed in depth. Various pseudo-random number generator algorithms are reviewed and the implementation of these algorithms on parallel systems is investigated. The strong-coupling phase transition of non-compact lattice QED is investigated. The phase diagram of strong-coupling non-compact lattice QED with an additional four-fermion interaction is deduced using a series of dynamical fermion simulations. The mass dependence of the system is investigated for non-compact QED and along the β = 2.0 axis, which is close to a system with only four-fermi interactions. These results are compared with solutions to the gap equation in order to determine if the data is consistent with a mean-field interpretation. An interpolation technique intended to improve the utilisation of the available data is investigated. The simulation program is also described in detail as a case study of a parallel implementation of a lattice gauge theory. The implementation of QCD on an i860 based parallel computer is described in depth. This includes a description of how code is optimised for the i860, an analysis of the time-critical portions of the code and a discussion of how these routines were implemented. Timings for these routines are given. Some results from these simulations are also presented.
APA, Harvard, Vancouver, ISO, and other styles
23

Nordström, Tomas. "Highly parallel computers for artificial neural networks." Doctoral thesis, Luleå tekniska universitet, 1995. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-25655.

Full text
Abstract:
During a number of years the two fields of artificial neural networks (ANNs) and highly parallel computing have both evolved rapidly. In this thesis the possibility of combining these fields is explored, investigating the design and usage of highly parallel computers for ANN calculations. A new system-architecture REMAP (Real-time, Embedded, Modular, Adaptive, Parallel processor) is presented as a candidate platform for future action-oriented systems. With this new system-architecture, multi-modular networks of cooperating and competing ANNs can be realized. For action-oriented systems, concepts like real-time interaction with the environment, embeddedness, and learning with selforganization are important. In this thesis the requirements for efficient mapping of ANN algorithms onto the suggested architecture are identified. This has been accomplished by studies of ANN implementations on general purpose parallel computers as well as designs of new parallel systems particularly suited to ANN computing. The suggested architecture incorporates highly parallel, communicating processing modules, each constructed as a linear SIMD (Single Instruction stream, Multiple Data stream) array, internally connected using a ring topology, but also supporting broadcast and reduction operations. Many of the analyzed ANN models are similar in structure and can be studied in a unified context. A new superclass of ANN models called localized learning systems (LLSs) is therefore suggested and defined. A parallel computer implementation of LLSs is analyzed and the importance of the reduction operations is recognized. The study of various LLS models and other commonly used ANN models not contained in the LLS class, like the multilayer perceptron with error back-propagation, establishes REMAP modules as an excellent architecture for many different ANN models, useful in the design of action-oriented systems.
Godkänd; 1995; 20070426 (ysko)
APA, Harvard, Vancouver, ISO, and other styles
24

Levin, Matthew D. "Parallel algorithms for SIMD and MIMD computers." Thesis, Loughborough University, 1990. https://dspace.lboro.ac.uk/2134/32962.

Full text
Abstract:
The thesis is concerned with the inversion of matrices and the solution of linear systems and eigensystems in a parallel environment. Following an introductory chapter of concepts and definitions in the field of linear algebra, a general survey of parallel machines and algorithms is presented in Chapters 2 and 3, including a detailed description of the Distributed Array Processor (DAP) and the Neptune multiprocessing system. In Chapter 4, a new technique, the double-bordering algorithm, for the solution of linear systems is derived, and its application to the parallel solution of difference systems described. A modified form of the method for the inversion of matrices is derived, implemented on the Neptune multiprocessing system, and its performance compared with that of the Gauss-Jordan and (single) bordering algorithms. The results of the implementation of several other parallel algorithms are also presented. Chapter 5 deals with the class of matrices known as Toeplitz matrices, which arise in the field of signal processing. Trench's algorithm for the inversion of such matrices is implemented on the Neptune multiprocessing system, and, for the solution of banded symmetric Toeplitz systems, the relative efficiencies of three sequential strategies are compared: Levinson's algorithm, the double-bordering algorithm, and a method based on a novel factorisation scheme. Chapter 6 is concerned with the implementation of various iterative methods on the DAP. The solution of several difference systems by the Jacobi, Gauss-Seidel and successive over-relaxation (SOR) algorithms is compared with their solution by a variation (c. 1943) of the algorithms proposed by Hotelling, in which matrix-vector products are replaced by successive matrix squarings. The technique is also applied to the power method for the solution of the eigenvalue problem. The thesis concludes with a summary and recommendations for future work.
APA, Harvard, Vancouver, ISO, and other styles
25

Bhalerao, Rohit Dinesh. "Parallel XML parsing." Diss., Online access via UMI:, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
26

Palmer, Joseph McRae. "The Hybrid Architecture Parallel Fast Fourier Transform (HAPFFT) /." Diss., CLICK HERE for online access, 2005. http://contentdm.lib.byu.edu/ETD/image/etd855.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Byrd, Jonathan Michael Robert. "Parallel Markov Chain Monte Carlo." Thesis, University of Warwick, 2010. http://wrap.warwick.ac.uk/3634/.

Full text
Abstract:
The increasing availability of multi-core and multi-processor architectures provides new opportunities for improving the performance of many computer simulations. Markov Chain Monte Carlo (MCMC) simulations are widely used for approximate counting problems, Bayesian inference and as a means for estimating very highdimensional integrals. As such MCMC has found a wide variety of applications in fields including computational biology and physics,financial econometrics, machine learning and image processing. This thesis presents a number of new method for reducing the runtime of Markov Chain Monte Carlo simulations by using SMP machines and/or clusters. Two of the methods speculatively perform iterations in parallel, reducing the runtime of MCMC programs whilst producing statistically identical results to conventional sequential implementations. The other methods apply only to problem domains that can be presented as an image, and involve using various means of dividing the image into subimages that can be proceed with some degree of independence. Where possible the thesis includes a theoretical analysis of the reduction in runtime that may be achieved using our technique under perfect conditions, and in all cases the methods are tested and compared on selection of multi-core and multi-processor architectures. A framework is provided to allow easy construction of MCMC application that implement these parallelisation methods.
APA, Harvard, Vancouver, ISO, and other styles
28

Perks, Oliver F. J. "Addressing parallel application memory consumption." Thesis, University of Warwick, 2013. http://wrap.warwick.ac.uk/58493/.

Full text
Abstract:
Recent trends in computer architecture are furthering the gap between CPU capabilities and those of the memory system. The rise of multi-core processors is having a dramatic effect on memory interactions, not just with respect to performance but crucially to capacity. The slow growth of DRAM capacity, coupled with configuration limitations, is driving up the cost of memory systems as a proportion of total HPC platform cost. As a result, scientific institutions are increasingly interested in application memory consumption, and in justifying the cost associated with maintaining high memory-per-core ratios. By studying the scaling behaviour of applications, both in terms of runtime and memory consumption, we are able to demonstrate a decrease in workload efficiency in low memory environments, resulting from poor memory scalability. Current tools are lacking in performance and analytical capabilities motivating the development of a new suite of tools for capturing and analysing memory consumption in large scale parallel applications. By observing and analysing memory allocations we are able to record not only how much but more crucially where and when an application uses its memory. We use use this analysis to look at some of the key principles in application scaling such as processor decomposition, parallelisation models and runtime libraries, and their associated effects on memory consumption. We demonstrate how the data storage model of OpenMPI implementations inherently prevents scaling due to memory requirements, and investigate the benefits of different solutions. Finally, we show how by analysing information gathered during application execution we can automatically generate models to predict application memory consumption, at different scale and runtime configurations. In addition we predict, using these models, how implementation changes could affect the memory consumption of an industry strength benchmark.
APA, Harvard, Vancouver, ISO, and other styles
29

Francis, Nicholas David. "Parallel architectures for image analysis." Thesis, University of Warwick, 1991. http://wrap.warwick.ac.uk/108844/.

Full text
Abstract:
This thesis is concerned with the problem of designing an architecture specifically for the application of image analysis and object recognition. Image analysis is a complex subject area that remains only partially defined and only partially solved. This makes the task of designing an architecture aimed at efficiently implementing image analysis and recognition algorithms a difficult one. Within this work a massively parallel heterogeneous architecture, the Warwick Pyramid Machine is described. This architecture consists of SIMD, MIMD and MSIMD modes of parallelism each directed at a different part of the problem. The performance of this architecture is analysed with respect to many tasks drawn from very different areas of the image analysis problem. These tasks include an efficient straight line extraction algorithm and a robust and novel geometric model based recognition system. The straight line extraction method is based on the local extraction of line segments using a Hough style algorithm followed by careful global matching and merging. The recognition system avoids quantising the pose space, hence overcoming many of the problems inherent with this class of methods and includes an analytical verification stage. Results and detailed implementations of both of these tasks are given.
APA, Harvard, Vancouver, ISO, and other styles
30

Lewis, E. Christopher. "Achieving robust performance in parallel programming languages /." Thesis, Connect to this title online; UW restricted, 2001. http://hdl.handle.net/1773/6996.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Kondo, Boubacar. "An investigation of parallel algorithms developed for graph problems and their implementation on parallel computers." Virtual Press, 1991. http://liblink.bsu.edu/uhtbin/catkey/770951.

Full text
Abstract:
With the recent development of VLSI (Very Large Scale Integration) technology, research has increased considerably on the development of efficient parallel algorithms for solutions of practical graph problems. Varieties of algorithms have already been implemented on different models of parallel computers. But not too much is known yet about the question of which model of parallel computer will efficiently and definitely fit every graph problem. In this investigation the study will focus on a comparative analysis of speedup and efficiency of parallel algorithms with parallel model of computation, and with respect to some sequential algorithms.
Department of Computer Science
APA, Harvard, Vancouver, ISO, and other styles
32

Aljabri, Malak Saleh. "GUMSMP : a scalable parallel Haskell implementation." Thesis, University of Glasgow, 2015. http://theses.gla.ac.uk/6822/.

Full text
Abstract:
The most widely available high performance platforms today are hierarchical, with shared memory leaves, e.g. clusters of multi-cores, or NUMA with multiple regions. The Glasgow Haskell Compiler (GHC) provides a number of parallel Haskell implementations targeting different parallel architectures. In particular, GHC-SMP supports shared memory architectures, and GHC-GUM supports distributed memory machines. Both implementations use different, but related, runtime system (RTS) mechanisms and achieve good performance. A specialised RTS for the ubiquitous hierarchical architectures is lacking. This thesis presents the design, implementation, and evaluation of a new parallel Haskell RTS, GUMSMP, that combines shared and distributed memory mechanisms to exploit hierarchical architectures more effectively. The design evaluates a variety of design choices and aims to efficiently combine scalable distributed memory parallelism, using a virtual shared heap over a hierarchical architecture, with low-overhead shared memory parallelism on shared memory nodes. Key design objectives in realising this system are to prefer local work, and to exploit mostly passive load distribution with pre-fetching. Systematic performance evaluation shows that the automatic hierarchical load distribution policies must be carefully tuned to obtain good performance. We investigate the impact of several policies including work pre-fetching, favouring inter-node work distribution, and spark segregation with different export and select policies. We present the performance results for GUMSMP, demonstrating good scalability for a set of benchmarks on up to 300 cores. Moreover, our policies provide performance improvements of up to a factor of 1.5 compared to GHC- GUM. The thesis provides a performance evaluation of distributed and shared heap implementations of parallel Haskell on a state-of-the-art physical shared memory NUMA machine. The evaluation exposes bottlenecks in memory management, which limit scalability beyond 25 cores. We demonstrate that GUMSMP, that combines both distributed and shared heap abstractions, consistently outper- forms the shared memory GHC-SMP on seven benchmarks by a factor of 3.3 on average. Specifically, we show that the best results are obtained when shar- ing memory only within a single NUMA region, and using distributed memory system abstractions across the regions.
APA, Harvard, Vancouver, ISO, and other styles
33

Grove, Duncan A. "Performance modelling of message-passing parallel programs." Title page, contents and abstract only, 2003. http://web4.library.adelaide.edu.au/theses/09PH/09phg8832.pdf.

Full text
Abstract:
This dissertation describes a new performance modelling system, called the Performance Evaluating Virtual Parallel Machine (PEVPM). It uses a novel bottom-up approach, where submodels of individual computation and communication events are dynamically constructed from data-dependencies, current contention levels and the performance distributions of low-level operations, which define performance variability in the face of contention.
APA, Harvard, Vancouver, ISO, and other styles
34

Alahmadi, Marwan Ibrahim. "Optimizing data parallelism in applicative languages." Diss., Georgia Institute of Technology, 1990. http://hdl.handle.net/1853/8457.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Salam, Mohammed Abdul. "A study of hypercube graph and its application to parallel computing." Virtual Press, 1991. http://liblink.bsu.edu/uhtbin/catkey/774739.

Full text
Abstract:
Recent studies have shown an increased interest and research in the area of parallel computing. Graphs offer ' an excellent means for the modelling of parallel computers. The hypercube graph is emerging as the preferred topology for parallel processing. It is a subject of intense research and study by both graph theorists and computer scientists.This thesis is intended to investigate several graph theoretic properties of hypercubes and one of its subgraphs (middle graph of the cube). These include edgedensity, diameter, connectivity, Hamiltonian property, Eulerian property, cycle structure, and crossing number.. Theproblem of routing using parallel algorithms for implementing partial permutation is also described. We also discuss the problem of multiplying matrices on hypercube, which is helpful in solving graph theoretic problems like shortest paths and transitive closure. The problem of graph embeddings is also discussed pertaining to hypercube graph. Lastly, several important applications of hypercubes are discussed.
Department of Computer Science
APA, Harvard, Vancouver, ISO, and other styles
36

Lehmann, Uwe. "Schedules for Dynamic Bidirectional Simulations on Parallel Computers." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2003. http://nbn-resolving.de/urn:nbn:de:swb:14-1054281056187-31742.

Full text
Abstract:
For adjoint calculations, parameter estimation, and similar purposes one may need to reverse the execution of a computer program. The simplest option is to record a complete execution log and then to read it backwards. This requires massive amounts of storage. Instead one may generate the execution log piecewise by restarting the ``forward'' calculation repeatedly from suitably placed checkpoints. This thesis extends the theoretical results of the parallel reversal schedules. First a algorithm was constructed which carries out the ``forward'' calculation and distributes checkpoints in a way, such that the reversal calculation can be started at any time. This approach provides adaptive parallel reversal schedules for simulations where the number of time steps is not known a-priori. The number of checkpoints and processors used is optimal at any time. Further, an algorithm was described which makes is possible to restart the initial computer program during the program reversal. Again, this can be done without any additional computation at any time. Hence, optimal parallel reversal schedules for the bidirectional simulation are provided by this thesis
Bei der Berechnung von Adjungierten, zum Debuggen und für ähnliche Anwendungen kann man die Umkehr der entsprechenden Programmauswertung verwenden. Der einfachste Ansatz, nämlich das Erstellen einer kompletten Mitschrift der Vorwärtsrechnung, welche anschließend rückwärts gelesen wird, verursacht einen enormen Speicherplatzbedarf. Als Alternative dazu kann man die Mitschrift auch stückweise erzeugen, indem die Programmauswertung von passend gewählten Checkpoints wiederholt gestartet wird. In dieser Arbeit wird die Theorie der optimalen parallelen Umkehrschemata erweitert. Zum einen erfolgt die Konstruktion von adaptiven parallelen Umkehrschemata. Dafür wird ein Algorithmus beschrieben, der es durch die Nutzung von mehreren Prozessen ermöglicht, Checkpoints so zu verteilen, daß die Umkehrung des Programmes jederzeit ohne Zeitverlust erfolgen kann. Hierbei bleibt die Zahl der verwendeten Checkpoints und Prozesse innerhalb der bekannten Optimalitätsgrenzen. Zum anderen konnte für die adaptiven parallelen Umkehrschemata ein Algorithmus entwickelt werden, welcher ein Restart der eigentlichen Programmauswertung basierend auf der laufenden Programmumkehr erlaubt. Dieser Restart kann wieder jederzeit ohne Zeitverlust erfolgen und die entstehenden Checkpointverteilung erfüllen wieder sowohl Optimalitäts- als auch die Adaptivitätskriterien. Zusammenfassend wurden damit in dieser Arbeit Schemata konstruiert, die bidirektionale Simulationen ermöglichen
APA, Harvard, Vancouver, ISO, and other styles
37

Goehlich, Ralph Dietmar. "Design of finite element systems for parallel computers." Diss., Georgia Institute of Technology, 1989. http://hdl.handle.net/1853/16817.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Andersen, Johannes Harder. "Linear programming methods for fine grain parallel computers." Thesis, Brunel University, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.241255.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Zhang, Zhao. "Enabling efficient parallel scripting on large-scale computers." Thesis, The University of Chicago, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3627911.

Full text
Abstract:

Many-task computing (MTC) applications assemble existing sequential (or parallel) programs, using POSIX files for intermediate data. The parallelism of such applications often comes from data parallelism. MTC applications can be grouped into stages, and dependencies between tasks in different stages can be in the form of file production and consumption. The computation stage of MTC applications can have a large number of tasks, thus it can have a large amount of I/O traffic (metadata traffic and I/O traffic) which is also highly concurrent. Some MTC applications are iterative, where the computation iterates over a dataset and exit when some condition(s) are reached. Some MTC applications are interactive, where the application requires human action between computation stages.

In this dissertation we develop a complete parallel scripting framework called AMFORA, which has a shared in-RAM file system and task execution engine. It implements the multi-read single-write consistency model, preserves the POSIX interface for original applications, and provides an interface for collective data movement and functional data transformation. It is interoperable with many existing serial scripting languages (e.g., Bash, Python). AMFORA runs on thousands of compute nodes on an IBM BG/P supercomputer. It also runs on cloud environments such as Amazon EC2 and Google Compute Engine. To understand the baseline MTC application performance on large-scale computers, we define MTC Envelope, which is a file system benchmark to measure the capacity of a given software/hardware stack in the context of MTC applications.

The main contributions of this dissertation are: A system independent approach to profile and understand the concurrency of MTC applications' I/O behavior; A benchmark definition that measures the file system's capacity for MTC applications; A theoretical model to estimate the I/O overhead of MTC applications on large-scale computers; A scalable distributed file system design, with no centralized component, that achieves good scalability; A collective file system management toolkit to enable fast data movement; A functional file system management toolkit to enable fast file content transformation; A new parallel scripting programming model that extends a scripting language (e.g., Bash); A novel file system access interface design that combines both POSIX and non-POSIX interfaces to ease programming without loss of efficiency; An automated method for identifying data flow patterns that are amenable to collective optimizations at runtime; The open source implementation of the entire framework to enable MTC applications on large-scale computers. (Abstract shortened by UMI.)

APA, Harvard, Vancouver, ISO, and other styles
40

Goldberg, Andrew Vladislav. "Efficient graph algorithms for sequential and parallel computers." Thesis, Massachusetts Institute of Technology, 1987. http://hdl.handle.net/1721.1/14912.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1987.
MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING.
Bibliography: p. 117-123.
by Andrew Vladislav Goldberg.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
41

Baker, James McCall Jr. "Run-time systems for fine-grain message-passing parallel computers." Diss., Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/15366.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Das, Samir Ranjan. "Performance issues in time warp parallel simulations." Diss., Georgia Institute of Technology, 1994. http://hdl.handle.net/1853/8152.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Wai, Siu-kit, and 衛兆傑. "Virtual links for multicomputers." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1996. http://hub.hku.hk/bib/B18038050.

Full text
Abstract:
(Uncorrected OCR) Abstract of Thesis entitled 'Virtual Links for Multicomputers' Submitted by Siu Kit Wai for the degree of Master of Philosophy at Univsersity of Hong Kong in October 1996 In order to increase computation power, multiple autonomous computers or processors are connected to form a multicomputer. The performance boost is the result of exploiting in parallel the processing power available in individual processors. Parallel processing, however, requires the cooperation among the processors, which implies interprocessor communication. The efficiency of such communications is limited by the bandwidth and number of communication channels between directly connected processors. Multiple processes on a processor share a few hardware communication links/channels to communication with processes executing on a different processor. Effective and efficient sharing of channels is important for the overall system performance; hence it is important that the sharing be properly managed. When the sharing is not provided by the hardware, it can be provided in software at system level. Without a managing component, processes need to be programmed to flight for and gain exclusive access to the communication links. This is usually not effective, error-prone, and could reduce the overall performance of processes executing in the processor. Flexibility is a main advantage of providing a channel-sharing mechanism at system level. Parameters such as packet size, and configuration of the system can be customized and tuned to meet the communication characteristics of different applications. In this project, we investigate how link sharing can be provided at system level. Our approach is based on idea of virtual links. The system is designed to be as transparent and easy to be used as possible. We will discuss how different parameters and configurations affect the system functionality and performance. We also compare this software solution to other existing solutions including a hardware solution. ii
abstract
toc
Computer Science
Master
Master of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
44

Wai, Siu-kit. "Virtual links for multicomputers /." Hong Kong : University of Hong Kong, 1996. http://sunzi.lib.hku.hk/hkuto/record.jsp?B18038050.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Rajaram, Kumaran. "Principal design criteria influencing the performance of a portable, high performance parallel I/O implementation." Master's thesis, Mississippi State : Mississippi State University, 2002. http://library.msstate.edu/etd/show.asp?etd=etd-04052002-105711.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Huang, Chun-Hsi. "Communication-efficient bulk synchronous parallel algorithms." Buffalo, N.Y. : Dept. of Computer Science, State University of New York at Buffalo, 2001. http://www.cse.buffalo.edu/tech%2Dreports/2001%2D06.ps.Z.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Choi, Jong-Deok. "Parallel program debugging with flowback analysis." Madison, Wis. : University of Wisconsin-Madison, Computer Sciences Dept, 1989. http://catalog.hathitrust.org/api/volumes/oclc/20839575.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Hopper, Michael A. "A compiler framework for multithreaded parallel systems." Diss., Georgia Institute of Technology, 1997. http://hdl.handle.net/1853/15638.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Bissland, Lesley. "Hardware and software aspects of parallel computing." Thesis, University of Glasgow, 1996. http://theses.gla.ac.uk/3953/.

Full text
Abstract:
Part 1 (Chapters 2,3 and 4) is concerned with the development of hardware for multiprocessor systems. Some of the concepts used in digital hardware design are introduced in Chapter 2. These include the fundamentals of digital electronics such as logic gates and flip-flops as well as the more complicated topics of rom and programmable logic. It is often desirable to change the network topology of a multiprocessor machine to suit a particular application. The third chapter describes a circuit switching scheme that allows the user to alter the network topology prior to computation. To achieve this, crossbar switches are connected to the nodes, and the host processor (a PC) programs the crossbar switches to make the desired connections between the nodes. The hardware and software required for this system is described in detail. Whilst this design allows the topology of a multiprocessor system to be altered prior to computation, the topology is still fixed during program run-time. Chapter 4 presents a system that allows the topology to be altered during run-time. The nodes send connection requests to a control processor which programs a crossbar switch connected to the nodes. This system allows every node in a parallel computer to communicate directly with every other node. The hardware interface between the nodes and the control processor is discussed in detail, and the software on the control processor is also described. Part 2 (Chapters 5 and 6) of this thesis is concerned with the parallelisation of a large molecular mechanics program. Chapter 5 describes the fundamentals of molecular mechanics such as the steric energy equation and its components, force field parameterisation and energy minimisation. The implementation of a novel programming (COMFORT) and hardware (the BB08) environment into a parallel molecular mechanics (MM) program is presented in Chapter 6. The structure of the sequential version of the MM program is detailed, before discussing the implementation of the parallel version using COMFORT and the BB08.
APA, Harvard, Vancouver, ISO, and other styles
50

Child, Christopher H. T. "Approximate dynamic programming with parallel stochastic planning operators." Thesis, City University London, 2011. http://openaccess.city.ac.uk/1109/.

Full text
Abstract:
This thesis presents an approximate dynamic programming (ADP) technique for environment modelling agents. The agent learns a set of parallel stochastic planning operators (P-SPOs) by evaluating changes in its environment in response to actions, using an association rule mining approach. An approximate policy is then derived by iteratively improving state value aggregation estimates attached to the operators using the P-SPOs as a model in a Dyna-Q-like architecture. Reinforcement learning and dynamic programming are powerful techniques for automated agent decision making in stochastic environments. Dynamic programming is effective when there is a known environment model, while reinforcement learning is effective when a model is not available. The techniques derive a policy: a mapping from each environment state to an action which optimizes the long term reward the agent receives. The standard methods become less effective as the state space for the environment increases because they require values to be associated with each state, the storage and processing of which is exponential to the number of state variables. Resolving this “curse of dimensionality” is an important topic of research amongst all communities working on this problem. Two key methods are to: (i) derive an estimate of the value (approximate dynamic programming) using function approximation or state aggregation; or (ii) build a model of the environment from experience. This thesis presents a method of combining these approaches by exploiting structure in the state transition and value functions captured in a set of planning operators which are learnt through experience in the environment. Standard planning operators define the deterministic changes that occur in an environment in response to an action. This work presents Parallel Stochastic Planning Operators (P-SPOs), a novel form of planning operator providing a structured model of the state transition function in environments which are both non-deterministic and for which changes can occur outside the influence of actions. Next, an automated method for extracting P-SPOs from observations in an environment is explored using an adaptation of association rule mining. Finally, methods of relating the state transition structure encapsulated in the P-SPOs to state values, using the operators to store state value aggregation estimates, are evaluated. The framework described provides a method by which approximate dynamic programming can be applied by designers of AI agents and AI planning systems for which they have minimal prior knowledge. The framework and P-SPO based implementations are tested against standard techniques in two bench-mark stochastic environments: a “slippery gripper” block painting robot; and a “predator-prey” agent environment. Experimental results show that an agent using a P-SPO-based approach is able to learn an accurate model of its environment if successor state variables exhibit conditional independence, and an approximate model in the non-independent case. Results also demonstrate that the agent’s ability to generalise to previously unseen states using the model allow it to form an improved policy over an agent employing a standard Dyna-Q based technique. Finally, an approximate policy stored in state aggregation estimates attached to operators is shown to be optimal in experiments for which the P-SPO set contains sufficient information for effective aggregations to be formed.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography