Log in

Relevant bibliographies by topics / Parallel and distributed algorithms / Dissertations / Theses

Dissertations / Theses on the topic 'Parallel and distributed algorithms'

To see the other types of publications on this topic, follow the link: Parallel and distributed algorithms.

Author: Grafiati

Published: 10 December 2022

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Parallel and distributed algorithms.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Bolotski, Michael. "Distributed bit-parallel architecture and algorithms for early vision." Thesis, University of British Columbia, 1990. http://hdl.handle.net/2429/29462.

Full text

Abstract:

A new form of parallelism, distributed bit-parallelism, is introduced. A distributed bit-parallel organization distributes each bit of a data item to a different processor. Bit-parallelism allows computation that is sub-linear with word size for such operations as integer addition, arithmetic shifts, and data moves. The implications of bit-parallelism for system architecture are analyzed. An implementation of a bit-parallel architecture based on a mesh with bypass network is presented. The performance of bit-parallel algorithms on this architecture is analyzed and found to be several times faster than bit-serial algorithms. The application of the architecture to low level vision algorithms is discussed.
Applied Science, Faculty of
Electrical and Computer Engineering, Department of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

2

Collet, Julien. "Exploration of parallel graph-processing algorithms on distributed architectures." Thesis, Compiègne, 2017. http://www.theses.fr/2017COMP2391/document.

Full text

Abstract:

Avec l'explosion du volume de données produites chaque année, les applications du domaine du traitement de graphes ont de plus en plus besoin d'être parallélisées et déployées sur des architectures distribuées afin d'adresser le besoin en mémoire et en ressource de calcul. Si de telles architectures larges échelles existent, issue notamment du domaine du calcul haute performance (HPC), la complexité de programmation et de déploiement d’algorithmes de traitement de graphes sur de telles cibles est souvent un frein à leur utilisation. De plus, la difficile compréhension, a priori, du comportement en performances de ce type d'applications complexifie également l'évaluation du niveau d'adéquation des architectures matérielles avec de tels algorithmes. Dans ce contexte, ces travaux de thèses portent sur l’exploration d’algorithmes de traitement de graphes sur architectures distribuées en utilisant GraphLab, un Framework de l’état de l’art dédié à la programmation parallèle de tels algorithmes. En particulier, deux cas d'applications réelles ont été étudiées en détails et déployées sur différentes architectures à mémoire distribuée, l’un venant de l’analyse de trace d’exécution et l’autre du domaine du traitement de données génomiques. Ces études ont permis de mettre en évidence l’existence de régimes de fonctionnement permettant d'identifier des points de fonctionnements pertinents dans lesquels on souhaitera placer un système pour maximiser son efficacité. Dans un deuxième temps, une étude a permis de comparer l'efficacité d'architectures généralistes (type commodity cluster) et d'architectures plus spécialisées (type serveur de calcul hautes performances) pour le traitement de graphes distribué. Cette étude a démontré que les architectures composées de grappes de machines de type workstation, moins onéreuses et plus simples, permettaient d'obtenir des performances plus élevées. Cet écart est d'avantage accentué quand les performances sont pondérées par les coûts d'achats et opérationnels. L'étude du comportement en performance de ces architectures a également permis de proposer in fine des règles de dimensionnement et de conception des architectures distribuées, dans ce contexte. En particulier, nous montrons comment l’étude des performances fait apparaitre les axes d’amélioration du matériel et comment il est possible de dimensionner un cluster pour traiter efficacement une instance donnée. Finalement, des propositions matérielles pour la conception de serveurs de calculs plus performants pour le traitement de graphes sont formulées. Premièrement, un mécanisme est proposé afin de tempérer la baisse significative de performance observée quand le cluster opère dans un point de fonctionnement où la mémoire vive est saturée. Enfin, les deux applications développées ont été évaluées sur une architecture à base de processeurs basse-consommation afin d'étudier la pertinence de telles architectures pour le traitement de graphes. Les performances mesurés en utilisant de telles plateformes sont encourageantes et montrent en particulier que la diminution des performances brutes par rapport aux architectures existantes est compensée par une efficacité énergétique bien supérieure
With the advent of ever-increasing graph datasets in a large number of domains, parallel graph-processing applications deployed on distributed architectures are more and more needed to cope with the growing demand for memory and compute resources. Though large-scale distributed architectures are available, notably in the High-Performance Computing (HPC) domain, the programming and deployment complexity of such graphprocessing algorithms, whose parallelization and complexity are highly data-dependent, hamper usability. Moreover, the difficult evaluation of performance behaviors of these applications complexifies the assessment of the relevance of the used architecture. With this in mind, this thesis work deals with the exploration of graph-processing algorithms on distributed architectures, notably using GraphLab, a state of the art graphprocessing framework. Two use-cases are considered. For each, a parallel implementation is proposed and deployed on several distributed architectures of varying scales. This study highlights operating ranges, which can eventually be leveraged to appropriately select a relevant operating point with respect to the datasets processed and used cluster nodes. Further study enables a performance comparison of commodity cluster architectures and higher-end compute servers using the two use-cases previously developed. This study highlights the particular relevance of using clustered commodity workstations, which are considerably cheaper and simpler with respect to node architecture, over higher-end systems in this applicative context. Then, this thesis work explores how performance studies are helpful in cluster design for graph-processing. In particular, studying throughput performances of a graph-processing system gives fruitful insights for further node architecture improvements. Moreover, this work shows that a more in-depth performance analysis can lead to guidelines for the appropriate sizing of a cluster for a given workload, paving the way toward resource allocation for graph-processing. Finally, hardware improvements for next generations of graph-processing servers areproposed and evaluated. A flash-based victim-swap mechanism is proposed for the mitigation of unwanted overloaded operations. Then, the relevance of ARM-based microservers for graph-processing is investigated with a port of GraphLab on a NVIDIA TX2-based architecture

APA, Harvard, Vancouver, ISO, and other styles

3

Cordova, Gabriel. "A distributed reconstruction of EKG signals." To access this resource online via ProQuest Dissertations and Theses @ UTEP, 2008. http://0-proquest.umi.com.lib.utep.edu/login?COPT=REJTPTU0YmImSU5UPTAmVkVSPTI=&clientId=2515.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Xu, Lei. "Cellular distributed and parallel computing." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:88ffe124-c2fd-4144-86fe-47b35f4908bd.

Full text

Abstract:

This thesis focuses on novel approaches to distributed and parallel computing that are inspired by the mechanism and functioning of biological cells. We refer to this concept as cellular distributed and parallel computing which focuses on three important principles: simplicity, parallelism, and locality. We first give a parallel polynomial-time solution to the constraint satisfaction problem (CSP) based on a theoretical model of cellular distributed and parallel computing, which is known as neural-like P systems (or neural-like membrane systems). We then design a class of simple neural-like P systems to solve the fundamental maximal independent set (MIS) selection problem efficiently in a distributed way, by drawing inspiration from the way that developing cells in the fruit fly become specialised. Building on the novel bio-inspired approach to distributed MIS selection, we propose a new simple randomised algorithm for another fundamental distributed computing problem: the distributed greedy colouring (GC) problem. We then propose an improved distributed MIS selection algorithm that incorporates for the first time another important feature of the biological system: adapting the probabilities used at each node based on local feedback from neighbouring nodes. The improved distributed MIS selection algorithm is again extended to solve the distributed greedy colouring problem. Both improved algorithms are simple and robust and work under very restrictive conditions, moreover, they both achieve state-of-the-art performance in terms of their worst-case time complexity and message complexity. Given any n-node graph with maximum degree Delta, the expected time complexity of our improved distributed MIS selection algorithm is O(log n) and the message complexity per node is O(1). The expected time complexity of our improved distributed greedy colouring algorithm is O(Delta + log n) and the message complexity per node is again O(1). Finally, we provide some experimental results to illustrate the time and message complexity of our proposed algorithms in practice. In particular, we show experimentally that the number of colours used by our distributed greedy colouring algorithms turns out to be optimal or near-optimal for many standard graph colouring benchmarks, so they provide effective simple heuristic approaches to computing a colouring with a small number of colours.

APA, Harvard, Vancouver, ISO, and other styles

5

Kim, Jinwoo. "Hierarchical asynchronous genetic algorithms for parallel/distributed simulation-based optimization." Diss., The University of Arizona, 1994. http://hdl.handle.net/10150/186831.

Full text

Abstract:

The objective of this dissertation is to develop a multi-resolution optimization strategy based on the evolution algorithms in the parallel/distributed simulation environment. The system architecture is constructed hierarchically with multiple clusters which consist of an expert system (controller) and set of genetic algorithm optimizers (agents). We propose an asynchronous genetic algorithm (AGA) which continuously updates the population in parallel genetic algorithms. Asynchronous evaluation of population in a parallel computer improves the utilization of the processors and reduces search time when the evaluation time of individuals is highly variable. Further, we have devised a noise assignment scheme which resolves the pre-convergence drawback of the genetic algorithms. In this scheme, binary representation (discrete sampling) of an individual is combined with a random number (analog sampling), so that the genetic algorithm can investigate the entire search space regardless of the bit-size of an individual. Real application problems require the evaluation of a large number of parameters and their search complexity grows beyond the capability of a single level GA-optimizer. In response, we have developed a novel scheme called Hierarchical Genetic Algorithms. This multilevel GA optimization strategy is based on an Intelligent Machine Architecture which supporting non-deterministic computation, intensive and irregular memory access patterns, and large potential for parallel computing. The clusters in the Hierarchical GAs are coordinated hierarchically and creation and deletion of nodes are performed dynamically based upon performance. During the optimization process, the clusters cooperate together to solve different levels of the abstracted problem. A candidate solution at a higher level creates a lower level cluster which utilizes previously optimized parameter information. It can also contribute to the search process of a higher level by sending the feedback information. Hierarchical GAs demonstrate performance with various experiments. We have compared the performance of the Hierarchical GAs and simple GA (single-level). The Hierarchical GAs adaptively changes its structure to allocate more computing resources to the promising nodes. With the same amount of memory size for population, the simulation results shows that the Hierarchical GAs find a solution faster than the simple GA.

APA, Harvard, Vancouver, ISO, and other styles

6

Kalaiselvi, S. "Checkpointing Algorithms for Parallel Computers." Thesis, Indian Institute of Science, 1997. http://hdl.handle.net/2005/67.

Full text

Abstract:

Checkpointing is a technique widely used in parallel/distributed computers for rollback error recovery. Checkpointing is defined as the coordinated saving of process state information at specified time instances. Checkpoints help in restoring the computation from the latest saved state, in case of failure. In addition to fault recovery, checkpointing has applications in fault detection, distributed debugging and process migration. Checkpointing in uniprocessor systems is easy due to the fact that there is a single clock and events occur with respect to this clock. There is a clear demarcation of events that happens before a checkpoint and events that happens after a checkpoint. In parallel computers a large number of computers coordinate to solve a single problem. Since there might be multiple streams of execution, checkpoints have to be introduced along all these streams simultaneously. Absence of a global clock necessitates explicit coordination to obtain a consistent global state. Events occurring in a distributed system, can be ordered partially using Lamport's happens before relation. Lamport's happens before relation ->is a partial ordering relation to identify dependent and concurrent events occurring in a distributed system. It is defined as follows: ·If two events a and b happen in the same process, and if a happens before b, then a->b ·If a is the sending event of a message and b is the receiving event of the same message then a -> b ·If neither a à b nor b -> a, then a and b are said to be concurrent. A consistent global state may have concurrent checkpoints. In the first chapter of the thesis we discuss issues regarding ordering of events in a parallel computer, need for coordination among checkpoints and other aspects related to checkpointing. Checkpointing locations can either be identified statically or dynamically. The static approach assumes that a representation of a program to be checkpointed is available with information that enables a programmer to specify the places where checkpoints are to be taken. The dynamic approach identifies the checkpointing locations at run time. In this thesis, we have proposed algorithms for both static and dynamic checkpointing. The main contributions of this thesis are as follows: 1. Parallel computers that are being built now have faster communication and hence more efficient clock synchronisation compared to those built a few years ago. Based on efficient clock synchronisation protocols, the clock drift in current machines can be maintained within a few microseconds. We have proposed a dynamic checkpointing algorithm for parallel computers assuming bounded clock drifts. 2. The shared memory paradigm is convenient for programming while message passing paradigm is easy to scale. Distributed Shared Memory (DSM) systems combine the advantage of both paradigms and can be visualized easily on top of a network of workstations. IEEE has recently proposed an interconnect standard called Scalable Coherent Interface (SCI), to con6gure computers as a Distributed Shared Memory system. A periodic dynamic checkpointing algorithm has been proposed in the thesis for a DSM system which uses the SCI standard. 3. When information about a parallel program is available one can make use of this knowledge to perform efficient checkpointing. A static checkpointing approach based on task graphs is proposed for parallel programs. The proposed task graph based static checkpointing approach has been implemented on a Parallel Virtual Machine (PVM) platform. We now give a gist of various chapters of the thesis. Chapter 2 of the thesis gives a classification of existing checkpointing algorithms. The chapter surveys algorithm that have been reported in literature for checkpointing parallel/distributed systems. A point to be noted is that most of the algorithms published for checkpointing message passing systems are based on the seminal article by Chandy & Lamport. A large number of checkpointing algorithms have been published by relaxing the assumptions made in the above mentioned article and by extending the features to minimise the overheads of coordination and context saving. Checkpointing for shared memory systems primarily extend cache coherence protocols to maintain a consistent memory. All of them assume that the main memory is safe for storing the context. Recently algorithms have been published for distributed shared memory systems, which extend the cache coherence protocols used in shared memory systems. They however also include methods for storing the status of distributed memory in stable storage. Chapter 2 concludes with brief comments on the desirable features of a checkpointing algorithm. In Chapter 3, we develop a dynamic checkpointing algorithm for message passing systems assuming that the clock drift of processors in the system is bounded. Efficient clock synchronisation protocols have been implemented on recent parallel computers owing to the fact that communication between processors is very fast. Based on efficient clock synchronisation protocols, clock skew can be limited to a few microseconds. The algorithm proposed in the thesis uses clocks for checkpoint coordination and vector counts for identifying messages to be logged. The algorithm is a periodic, distributed algorithm. We prove correctness of the algorithm and compare it with similar clock based algorithms. Distributed Shared Memory (DSM) systems provide the benefit of ease of programming in a scalable system. The recently proposed IEEE Scalable Coherent Interface (SCI) standard, facilitates the construction of scalable coherent systems. In Chapter 4 we discuss a checkpointing algorithm for an SCI based DSM system. SCI maintains cache coherence in hardware using a distributed cache directory which scales with the number of processors in the system. SCI recommends a two phase transaction protocol for communication. Our algorithm is a two phase centralised coordinated algorithm. Phase one initiates checkpoints and the checkpointing activity is completed in phase two. The correctness of the algorithm is established theoretically. The chapter concludes with the discussion of the features of SCI exploited by the checkpointing algorithm proposed in the thesis. In Chapter 5, a static checkpointing algorithm is developed assuming that the program to be executed on a parallel computer is given as a directed acyclic task graph. We assume that the estimates of the time to execute each task in the task graph is given. Given the timing at which checkpoints are to be taken, the algorithm identifies a set of edges where checkpointing tasks can be placed ensuring that they form a consistent global checkpoint. The proposed algorithm eliminates coordination overhead at run time. It significantly reduces the context saving overhead by taking checkpoints along edges of the task graph. The algorithm is used as a preprocessing step before scheduling the tasks to processors. The algorithm complexity is O(km) where m is the number of edges in the graph and k the maximum number of global checkpoints to be taken. The static algorithm is implemented on a parallel computer with a PVM environment as it is widely available and portable. The task graph of a program can be constructed manually or through program development tools. Our implementation is a collection of preprocessing and run time routines. The preprocessing routines operate on the task graph information to generate a set of edges to be checkpointed for each global checkpoint and write the information on disk. The run time routines save the context along the marked edges. In case of recovery, the recovery algorithms read the information from stable storage and reconstruct the context. The limitation of our static checkpointing algorithm is that it can operate only on deterministic task graphs. To demonstrate the practical feasibility of the proposed approach, case studies of checkpointing some parallel programs are included in the thesis. We conclude the thesis with a summary of proposed algorithms and possible directions to continue research in the area of checkpointing.

APA, Harvard, Vancouver, ISO, and other styles

7

Loo, Alfred. "A statistical approach to parallel sorting and selection algorithms design." Thesis, University of Sunderland, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.311295.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Gupta, Sandeep K. S. "Synthesizing communication-efficient distributed memory parallel programs for block recursive algorithms /." The Ohio State University, 1995. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487861796820607.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

White, Tennis S. "Distributed control reconfiguration algorithms for 2-dimensional mesh architectures." Diss., Virginia Tech, 1991. http://hdl.handle.net/10919/39919.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Narravula, Harsha V. Katsinis Constantine. "Performance of parallel algorithms on a broadcast-based architecture /." Philadelphia, Pa. : Drexel University, 2003. http://dspace.library.drexel.edu/handle/1860/254.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Watkins, Andrew. "Exploiting immunological metaphors in the development of serial, parallel and distributed learning algorithms." Thesis, University of Kent, 2005. https://kar.kent.ac.uk/14351/.

Full text

Abstract:

This thesis examines the use of immunological metaphors in building serial, parallel, and distributed learning algorithms. It offers a basic study in the development of biologically-inspired algorithms which merge inspiration from biology with known, standard computing technology to examine robust methods of computing. This thesis begins by detailing key interactions found within the immune system that provide inspiration for the development of a learning system. It then exploits the use of more processing power for the development of faster algorithms. This leads to the exploration of distributed computing resources for the examination of more biologically plausible systems. This thesis offers the following main contributions. The components of the immune system that exhibit the capacity for learning are detailed. A framework for discussing learning algorithms is proposed. Three properties of every learning algorithm-memory, adaptation, and decision-making-are identified for this framework, and traditional learning algorithms are placed in the context of this framework. An investigation into the use of immunological components for learning is provided. This leads to an understanding of these components in terms of the learning framework. A simplification of the Artificial Immune Recognition System (AIRS) immune-inspired learning algorithm is provided by employing affinity-dependent somatic hypermutation. A parallel version of the Clonal Selection Algorithm (CLONALG) immune learning algorithm is developed. It is shown that basic parallel computing techniques can provide computational benefits for this algorithm. Exploring this technology further, a parallel version of AIRS is offered. It is shown that applying these same parallel computing techniques to AIRS, while less scalable than when applied to CLONALG, still provides computational gains. A distributed approach to AIRS is offered, and it is argued that this approach provides a more biologically appealing model. The simple distributed approach is proposed in terms of an initial step toward a more complex, distributed system. Biological immune systems exhibit complex cellular interactions. The mechanisms of these interactions, while often poorly understood, hint at an extremely powerful information processing/problem solving system. This thesis demonstrates how the use of immunological principles coupled with standard computing technology can lead to the development of robust, biologically inspired learning algorithms.

APA, Harvard, Vancouver, ISO, and other styles

12

McMurtrey, Shannon Dale. "Training and Optimizing Distributed Neural Networks Using a Genetic Algorithm." NSUWorks, 2010. http://nsuworks.nova.edu/gscis_etd/243.

Full text

Abstract:

Parallelizing neural networks is an active area of research. Current approaches surround the parallelization of the widely used back-propagation (BP) algorithm, which has a large amount of communication overhead, making it less than ideal for parallelization. An algorithm that does not depend on the calculation of derivatives, and the backward propagation of errors, better lends itself to a parallel implementation. One well known training algorithm for neural networks explicitly incorporates network structure in the objective function to be minimized which yields simpler neural networks. Prior work has implemented this using a modified genetic algorithm in a serial fashion that is not scalable, thus limiting its usefulness. This dissertation created a parallel version of the algorithm. The performance of the proposed algorithm is compared against the existing algorithm using a variety of syn-thetic and real world problems. Computational experiments with benchmark datasets in-dicate that the parallel algorithm proposed in this research outperforms the serial version from prior research in finding better minima in the same time as well as identifying a simpler architecture.

APA, Harvard, Vancouver, ISO, and other styles

13

Das, Sarma Atish. "Algorithms for large graphs." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/34709.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Izadi, Mohammad. "Hierarchical Matrix Techniques on Massively Parallel Computers." Doctoral thesis, Universitätsbibliothek Leipzig, 2012. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-101164.

Full text

Abstract:

Hierarchical matrix (H-matrix) techniques can be used to efficiently treat dense matrices. With an H-matrix, the storage requirements and performing all fundamental operations, namely matrix-vector multiplication, matrix-matrix multiplication and matrix inversion can be done in almost linear complexity. In this work, we tried to gain even further speedup for the H-matrix arithmetic by utilizing multiple processors. Our approach towards an H-matrix distribution relies on the splitting of the index set. The main results achieved in this work based on the index-wise H-distribution are: A highly scalable algorithm for the H-matrix truncation and matrix-vector multiplication, a scalable algorithm for the H-matrix matrix multiplication, a limited scalable algorithm for the H-matrix inversion for a large number of processors.

APA, Harvard, Vancouver, ISO, and other styles

15

Çağrıcı, Gökhan Koltuksuz Ahmet. "An analysis of key generation efficiency of rsa cryptos ystem in distributed environments/." [s.l.]: [s.n.], 2005. http://library.iyte.edu.tr/tezler/master/bilgisayaryazilimi/T000406.pdf.

Full text

Abstract:

Thesis (Master)--İzmir Institute of Technology, İzmir, 2005.
Keywords: Cryptosystem, rivest-Shamir-Adleman, parallel computing, parallel algorithms, Random number. Includes bibliographical references (leaves. 68).

APA, Harvard, Vancouver, ISO, and other styles

16

George, Thomas. "A distributed memory implementation of Loci." Master's thesis, Mississippi State : Mississippi State University, 2001. http://library.msstate.edu/etd/show.asp?etd=etd-11082001-140719.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Sattar, Naw Safrin. "Scalable Community Detection using Distributed Louvain Algorithm." ScholarWorks@UNO, 2019. https://scholarworks.uno.edu/td/2640.

Full text

Abstract:

Community detection (or clustering) in large-scale graph is an important problem in graph mining. Communities reveal interesting characteristics of a network. Louvain is an efficient sequential algorithm but fails to scale emerging large-scale data. Developing distributed-memory parallel algorithms is challenging because of inter-process communication and load-balancing issues. In this work, we design a shared memory-based algorithm using OpenMP, which shows a 4-fold speedup but is limited to available physical cores. Our second algorithm is an MPI-based parallel algorithm that scales to a moderate number of processors. We also implement a hybrid algorithm combining both. Finally, we incorporate dynamic load-balancing in our final algorithm DPLAL (Distributed Parallel Louvain Algorithm with Load-balancing). DPLAL overcomes the performance bottleneck of the previous algorithms, shows around 12-fold speedup scaling to a larger number of processors. Overall, we present the challenges, our solutions, and the empirical performance of our algorithms for several large real-world networks.

APA, Harvard, Vancouver, ISO, and other styles

18

Cinel, Sertac. "Sequential And Parallel Heuristic Algorithms For The Rectilinear Steiner Tree Problem." Master's thesis, METU, 2006. http://etd.lib.metu.edu.tr/upload/12607896/index.pdf.

Full text

Abstract:

The Steiner Tree problem is one of the most popular graph problems and has many application areas. The rectilinear version of this problem, introduced by Hanan, has taken special attention since it addresses a fundamental matter in &ldquo
Physical Design&rdquo
phase of the Very Large Scale Integrated (VLSI) Computer Aided Design (CAD) process. The Rectilinear Steiner Tree Problem asks for a minimum length tree that interconnects a given set of points by only horizontal and vertical line segments, enabling the use of extra points. There are various exact algorithms. However the problem is NP-complete hence approximation algorithms have to be used especially for large instances. In this thesis work, first a survey on heuristics for the Rectilinear Steiner Tree Problem is conducted and then two recently developed successful algorithms, BGA by Kahng et. al. and RST by Zhou have been studied and investigated deeply. Both algorithms have subproblems, most of which have individual backgrounds in literature. After an analysis of BGA and RST, the thesis proposes a modification on RST, which leads to a faster and non-recursive version. The efficiency of the modified algorithm has been validated by computational tests using both random and VLSI benchmark instances. A partially parallelized version of the modified algorithm is also proposed for distributed computing environments. It is implemented using MPI (message passing interface) middleware and the results of comparative tests conducted on a cluster of workstations are presented. The proposed distributed algorithm has also proved to be promising especially for large problem instances.

APA, Harvard, Vancouver, ISO, and other styles

19

Wu, Jiande. "Parallel Computing of Particle Filtering Algorithms for Target Tracking Applications." ScholarWorks@UNO, 2014. http://scholarworks.uno.edu/td/1953.

Full text

Abstract:

Particle filtering has been a very popular method to solve nonlinear/non-Gaussian state estimation problems for more than twenty years. Particle filters (PFs) have found lots of applications in areas that include nonlinear filtering of noisy signals and data, especially in target tracking. However, implementation of high dimensional PFs in real-time for large-scale problems is a very challenging computational task. Parallel & distributed (P&D) computing is a promising way to deal with the computational challenges of PF methods. The main goal of this dissertation is to develop, implement and evaluate computationally efficient PF algorithms for target tracking, and thereby bring them closer to practical applications. To reach this goal, a number of parallel PF algorithms is designed and implemented using different parallel hardware architectures such as Computer Cluster, Graphics Processing Unit (GPU), and Field-Programmable Gate Array (FPGA). Proposed is an improved PF implementation for computer cluster - the Particle Transfer Algorithm (PTA), which takes advantage of the cluster architecture and outperforms significantly existing algorithms. Also, a novel GPU PF algorithm implementation is designed which is highly efficient for GPU architectures. The proposed algorithm implementations on different parallel computing environments are applied and tested for target tracking problems, such as space object tracking, ground multitarget tracking using image sensor, UAV-multisensor tracking. Comprehensive performance evaluation and comparison of the algorithms for both tracking and computational capabilities is performed. It is demonstrated by the obtained simulation results that the proposed implementations help greatly overcome the computational issues of particle filtering for realistic practical problems.

APA, Harvard, Vancouver, ISO, and other styles

20

Adlerborn, Björn. "Parallel Algorithms and Library Software for the Generalized Eigenvalue Problem on Distributed Memory Computer Systems." Licentiate thesis, Umeå universitet, Institutionen för datavetenskap, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-119439.

Full text

Abstract:

We present and discuss algorithms and library software for solving the generalized non-symmetric eigenvalue problem (GNEP) on high performance computing (HPC) platforms with distributed memory. Such problems occur frequently in computational science and engineering, and our contributions make it possible to solve GNEPs fast and accurate in parallel using state-of-the-art HPC systems. A generalized eigenvalue problem corresponds to finding scalars y and vectors x such that Ax = yBx, where A and B are real square matrices. A nonzero x that satisfies the GNEP equation is called an eigenvector of the ordered pair (A,B), and the scalar y is the associated eigenvalue. Our contributions include parallel algorithms for transforming a matrix pair (A,B) to a generalized Schur form (S,T), where S is quasi upper triangular and T is upper triangular. The eigenvalues are revealed from the diagonals of S and T. Moreover, for a specified set of eigenvalues an associated pair of deflating subspaces can be computed, which typically is requested in various applications. In the first stage the matrix pair (A,B) is reduced to a Hessenberg-triangular form (H,T), where H is upper triangular with one nonzero subdiagonal and T is upper triangular, in a finite number of steps. The second stage reduces the matrix pair further to generalized Schur form (S,T) using an iterative QZ-based method. Outgoing from a one-stage method for the reduction from (A,B) to (H,T), a novel parallel algorithm is developed. In brief, a delayed update technique is applied to several partial steps, involving low level operations, before associated accumulated transformations are applied in a blocked fashion which together with a wave-front task scheduler makes the algorithm scale when running in a parallel setting. The potential presence of infinite eigenvalues makes a generalized eigenvalue problem ill-conditioned. Therefore the parallel algorithm for the second stage, reduction to (S,T) form, continuously scan for and robustly deflate infinite eigenvalues. This will reduce the impact so that they do not interfere with other real eigenvalues or are misinterpreted as real eigenvalues. In addition, our parallel iterative QZ-based algorithm makes use of multiple implicit shifts and an aggressive early deflation (AED) technique, which radically speeds up the convergence. The multi-shift strategy is based on independent chains of so called coupled bulges and computational windows which is an important source of making the algorithm scalable. The parallel algorithms have been implemented in state-of-the-art library software. The performance is demonstrated and evaluated using up to 1600 CPU cores for problems with matrices as large as 100000 x 100000. Our library software is described in a User Guide. The software is, optionally, tunable via a set of parameters for various thresholds and buffer sizes etc. These parameters are discussed, and recommended values are specified which should result in reasonable performance on HPC systems similar to the ones we have been running on.

APA, Harvard, Vancouver, ISO, and other styles

21

McMahon, Mathew T. "A Distributed Genetic Algorithm With Migration for the Design of Composite Laminate Structures." Thesis, Virginia Tech, 1998. http://hdl.handle.net/10919/36962.

Full text

Abstract:

This thesis describes the development of a general Fortran 90 framework for the solution of composite laminate design problems using a genetic algorithm (GA). The initial Fortran 90 module and package of operators result in a standard genetic algorithm (sGA). The sGA is extended to operate on a parallel processor, and a migration algorithm is introduced. These extensions result in the distributed genetic algorithm with migration (dGA). The performance of the dGA in terms of cost and reliability is studied and compared to an sGA baseline, using two types of composite laminate design problems. The nondeterminism of GAs and the migration and dynamic load balancing algorithm used in this work result in a changed (diminished) workload, so conventional measures of parallelizability are not meaningful. Thus, a set of experiments is devised to characterize the run time performance of the dGA. The migration algorithm is found to diminish the normalized cost and improve the reliability of a GA optimization run. An effective linear speedup for constant work is achieved, and the dynamic load balancing algorithm with distributed control and token ring termination detection yield improved run time performance.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

22

Pajak, Dominik. "Algorithms for Deterministic Parallel Graph Exploration." Phd thesis, Université Sciences et Technologies - Bordeaux I, 2014. http://tel.archives-ouvertes.fr/tel-01064992.

Full text

Abstract:

Nous étudions dans cette thèse le problème de l'exploration parallèle d'un graphe à l'aide des multiples, synchronisés et mobiles agents. Chaque agent est une entité individuelle qui peut, indépendamment des autres agents, visitez les sommets du graphe ou parcourir ses arêtes. Le but de ensemble des agents est de visiter tous les sommets de graphe. Nous étudions d'abord l'exploration du graphe dans un modèle où chaque agent est équipé de mémoire interne, mais les nœuds n'ont pas de mémoire. Dans ce modèle les agents sont autorisés à communiquer entre eux en échangeant des messages. Nous présentons des algorithmes qui s'exécutent dans un minimum de temps possible pour polynomiale nombre d'agents (polynomiale en nombre de sommets du graphe). Nous étudions aussi quelle est l'impacte de différent méthodes des communications. Nous étudions des algorithmes où les agents peuvent se communiquer à distance arbitraire, mais aussi où communication est possible seulement entre les agents situés dans le même sommet. Dans les deux cas nous présentons des algorithmes efficaces. Nous avons aussi obtenu des limites inférieures qui correspondent bien à la performance des algorithmes. Nous considérons également l'exploration de graphe en supposant que les mouvements des agents sont déterminés par le soi-disant rotor-router mécanisme. Du point de vue d'un sommet fixé, le rotor- router envoie des agents qui visitent les sommet voisins dans un mode round-robin. Nous étudions l'accélération défini comme la proportion entre le pire des cas de l'exploration d'un agent unique et des plusieurs agents. Pour générales graphes, nous montrerons que le gain de vitesse en cas de multi-agent rotor-router est toujours entre fonction logarithmique et linéaire du nombre d'agents. Nous présentons également des résultats optimaux sur l'accélération de multi-agent rotor-router pour cycles, expanseurs, graphes aléatoires, cliques, tores de dimension fixé et une analyse presque optimale pour hypercubes. Finalement nous considérons l'exploration sans collision, où chaque agent doit explorer le graphe de manière indépendante avec la contrainte supplémentaire que deux agents ne peuvent pas occuper le même sommet. Dans le cas où les agents sont donnés le plan de graphe, on présente un algorithme optimal pour les arbres et un algorithme asymptotiquement optimal pour générales graphes. Nous présentons aussi des algorithmes dans le cas de l'exploration sans collision des arbres et des générales graphes dans la situation où les agents ne connaissent pas le graphe. Nous fermons la thèse par des observations finales et une discussion de problèmes ouverts liés dans le domaine de l'exploration des graphes.

APA, Harvard, Vancouver, ISO, and other styles

23

Garcia, Jesus Luis. "A parallel algorithm for distributed minimum variance distortionless response beamforming." [Florida] : State University System of Florida, 1999. http://etd.fcla.edu/etd/uf/1999/amp7343/garcia.pdf.

Full text

Abstract:

Thesis (M.S.)--University of Florida, 1999.
Title from first page of PDF file. Document formatted into pages; contains 101 p.; also contains graphics. Vita. Includes bibliographical references (p. 99-100).

APA, Harvard, Vancouver, ISO, and other styles

24

Thomas, Wessel Morant. "Parallel Mining of Association Rules Using a Lattice Based Approach." NSUWorks, 2009. http://nsuworks.nova.edu/gscis_etd/361.

Full text

Abstract:

The discovery of interesting patterns from database transactions is one of the major problems in knowledge discovery in database. One such interesting pattern is the association rules extracted from these transactions. Parallel algorithms are required for the mining of association rules due to the very large databases used to store the transactions. In this paper we present a parallel algorithm for the mining of association rules. We implemented a parallel algorithm that used a lattice approach for mining association rules. The Dynamic Distributed Rule Mining (DDRM) is a lattice-based algorithm that partitions the lattice into sublattices to be assigned to processors for processing and identification of frequent itemsets. Experimental results show that DDRM utilizes the processors efficiently and performed better than the prefix-based and partition algorithms that use a static approach to assign classes to the processors. The DDRM algorithm scales well and shows good speedup.

APA, Harvard, Vancouver, ISO, and other styles

25

Obrovac, Marko. "Chemical Computing for Distributed Systems: Algorithms and Implementation." Phd thesis, Université Rennes 1, 2013. http://tel.archives-ouvertes.fr/tel-00925257.

Full text

Abstract:

Avec l'émergence de plates-formes distribuées très hétérogènes, dynamiques et à large échelle, la nécessité d'un moyen de les programmer efficacement et de les gérer est apparu. Le concept de l'informatique autonomique propose de créer des systèmes auto-gérés c'est-à-dire des systèmes qui sont conscients de leurs composants et de leur environnement, et peuvent se configurer, s'optimiser, se réparer et se protéger. Dans le cadre de la réalisation de tels systèmes, la programmation déclarative, dont l'objectif est de faciliter la tâche du programmeur en séparant le contrôle de la logique du calcul, a retrouvé beaucoup d'intérêt ces derniers temps. En particulier, la programmation à base de des règles est considérée comme un modèle prometteur dans cette quête d'abstractions de programmation adéquates pour ces plates-formes. Cependant, bien que ces modèles gagnent beaucoup d'attention, ils créent une demande pour des outils génériques capables de les exécuter à large échelle. Le modèle de programmation chimique, qui a été conçu suivant la métaphore chimique, est un modèle de programmation à bas de règles et d'ordre supérieur, avec une exécution non-déterministe, où les règles sont appliquées de façon concurrente sur un multi ensemble de données. Dans cette thèse, nous proposons la conception, le développement et l'expérimentation d'un intergiciel distribué pour l'exécution de programmes chimique sur des plates-formes à large échelle et génériques. L'architecture proposée combine une couche de communication pair-à-pair avec un protocole de capture atomique d'objets sur lesquels les règles doivent être appliquées, et un système efficace de détection de terminaison. Nous décrivons le prototype d'intergiciel mettant en oeuvre cette architecture. En s'appuyant sur son déploiement sur une plate-forme expérimentale à large échelle, nous présentons les résultats de performance, qui confirment les complexités analytiques obtenues et montrons expérimentalement la viabilité d'un tel modèle de programmation.

APA, Harvard, Vancouver, ISO, and other styles

26

Abdullah, Rosni. "Design and analysis of numerical algorithms for the solution of linear systems on parallel and distributed architectures." Thesis, Loughborough University, 1997. https://dspace.lboro.ac.uk/2134/33138.

Full text

Abstract:

The increasing availability of parallel computers is having a very significant impact on all aspects of scientific computation, including algorithm research and software development in numerical linear algebra. In particular, the solution of linear systems, which lies at the heart of most calculations in scientific computing is an important computation found in many engineering and scientific applications. In this thesis, well-known parallel algorithms for the solution of linear systems are compared with implicit parallel algorithms or the Quadrant Interlocking (QI) class of algorithms to solve linear systems. These implicit algorithms are (2x2) block algorithms expressed in explicit point form notation.

APA, Harvard, Vancouver, ISO, and other styles

27

Krajča, Petr. "Advanced algorithms for formal concept analysis." Diss., Online access via UMI:, 2009.

Find full text

Abstract:

Thesis (Ph. D.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Department of Systems Science and Industrial Engineering, 2009.
Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

28

Lee, Dong Ryeol. "A distributed kernel summation framework for machine learning and scientific applications." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/44727.

Full text

Abstract:

The class of computational problems I consider in this thesis share the common trait of requiring consideration of pairs (or higher-order tuples) of data points. I focus on the problem of kernel summation operations ubiquitous in many data mining and scientific algorithms. In machine learning, kernel summations appear in popular kernel methods which can model nonlinear structures in data. Kernel methods include many non-parametric methods such as kernel density estimation, kernel regression, Gaussian process regression, kernel PCA, and kernel support vector machines (SVM). In computational physics, kernel summations occur inside the classical N-body problem for simulating positions of a set of celestial bodies or atoms. This thesis attempts to marry, for the first time, the best relevant techniques in parallel computing, where kernel summations are in low dimensions, with the best general-dimension algorithms from the machine learning literature. We provide a unified, efficient parallel kernel summation framework that can utilize: (1) various types of deterministic and probabilistic approximations that may be suitable for both low and high-dimensional problems with a large number of data points; (2) indexing the data using any multi-dimensional binary tree with both distributed memory (MPI) and shared memory (OpenMP/Intel TBB) parallelism; (3) a dynamic load balancing scheme to adjust work imbalances during the computation. I will first summarize my previous research in serial kernel summation algorithms. This work started from Greengard/Rokhlin's earlier work on fast multipole methods for the purpose of approximating potential sums of many particles. The contributions of this part of this thesis include the followings: (1) reinterpretation of Greengard/Rokhlin's work for the computer science community; (2) the extension of the algorithms to use a larger class of approximation strategies, i.e. probabilistic error bounds via Monte Carlo techniques; (3) the multibody series expansion: the generalization of the theory of fast multipole methods to handle interactions of more than two entities; (4) the first O(N) proof of the batch approximate kernel summation using a notion of intrinsic dimensionality. Then I move onto the problem of parallelization of the kernel summations and tackling the scaling of two other kernel methods, Gaussian process regression (kernel matrix inversion) and kernel PCA (kernel matrix eigendecomposition). The artifact of this thesis has contributed to an open-source machine learning package called MLPACK which has been first demonstrated at the NIPS 2008 and subsequently at the NIPS 2011 Big Learning Workshop. Completing a portion of this thesis involved utilization of high performance computing resource at XSEDE (eXtreme Science and Engineering Discovery Environment) and NERSC (National Energy Research Scientific Computing Center).

APA, Harvard, Vancouver, ISO, and other styles

29

Stamatakis, Alexandros. "Distributed and parallel algorithms and systems for inference of huge phylogenetic trees based on the maximum likelihood method." [S.l. : s.n.], 2004. http://deposit.ddb.de/cgi-bin/dokserv?idn=973053380.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Talpaert, Arthur. "Direct Numerical Simulation of bubbles with Adaptive Mesh Refinement with Distributed Algorithms." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLX016/document.

Full text

Abstract:

Ce travail de thèse présente l'implémentation de la simulation d'écoulements diphasiques dans des conditions de réacteurs nucléaires à caloporteur eau, à l'échelle de bulles individuelles. Pour ce faire, nous étudions plusieurs modèles d'écoulements thermohydrauliques et nous focalisons sur une technique de capture d'interface mince entre phases liquide et vapeur. Nous passons ainsi en revue quelques techniques possibles de maillage adaptatif (AMR) et nous fournissons des outils algorithmiques et informatiques adaptés à l'AMR par patchs dont l'objectif localement la précision dans des régions d'intérêt. Plus précisément, nous introduisons un algorithme de génération de patchs conçu dans l'optique du calcul parallèle équilibré. Cette approche nous permet de capturer finement des changements situés à l'interface, comme nous le montrons pour des cas tests d'advection ainsi que pour des modèles avec couplage hyperbolique-elliptique. Les calculs que nous présentons incluent également la simulation du système de Navier-Stokes incompressible qui modélise la déformation de l'interface entre deux fluides non-miscibles
This PhD work presents the implementation of the simulation of two-phase flows in conditions of water-cooled nuclear reactors, at the scale of individual bubbles. To achieve that, we study several models for Thermal-Hydraulic flows and we focus on a technique for the capture of the thin interface between liquid and vapour phases. We thus review some possible techniques for Adaptive Mesh Refinement (AMR) and provide algorithmic and computational tools adapted to patch-based AMR, which aim is to locally improve the precision in regions of interest. More precisely, we introduce a patch-covering algorithm designed with balanced parallel computing in mind. This approach lets us finely capture changes located at the interface, as we show for advection test cases as well as for models with hyperbolic-elliptic coupling. The computations we present also include the simulation of the incompressible Navier-Stokes system, which models the shape changes of the interface between two non-miscible fluids

APA, Harvard, Vancouver, ISO, and other styles

31

Janjic, Vladimir. "Load balancing of irregular parallel applications on heterogeneous computing environments." Thesis, University of St Andrews, 2012. http://hdl.handle.net/10023/2540.

Full text

Abstract:

Large-scale heterogeneous distributed computing environments (such as Computational Grids and Clouds) offer the promise of access to a vast amount of computing resources at a relatively low cost. In order to ease the application development and deployment on such complex environments, high-level parallel programming languages exist that need to be supported by sophisticated runtime systems. One of the main problems that these runtime systems need to address is dynamic load balancing that ensures that no resources in the environment are underutilised or overloaded with work. This thesis deals with the problem of obtaining good speedups for irregular applications on heterogeneous distributed computing environments. It focuses on workstealing techniques that can be used for load balancing during the execution of irregular applications. It specifically addresses two problems that arise during work-stealing: where thieves should look for work during the application execution and how victims should respond to steal attempts. In particular, we describe and implement a new Feudal Stealing algorithm and also we describe and implement new granularity-driven task selection policies in the SCALES simulator, which is a work-stealing simulator developed for this thesis. In addition, we present the comprehensive evaluation of the Feudal Stealing algorithm and the granularity-driven task selection policies using the simulations of a large class of regular and irregular parallel applications on a wide range of computing environments. We show how the Feudal Stealing algorithm and the granularity-driven task selection policies bring significant improvements in speedups of irregular applications, compared to the state-of-the-art work-stealing algorithms. Furthermore, we also present the implementation of the task selection policies in the Grid-GUM runtime system [AZ06] for Glasgow Parallel Haskell (GpH) [THLPJ98], in addition to the implementation in SCALES, and we also present the evaluation of this implementation on a large set of synthetic applications.

APA, Harvard, Vancouver, ISO, and other styles

32

Helal, Manal Computer Science &amp Engineering Faculty of Engineering UNSW. "Indexing and partitioning schemes for distributed tensor computing with application to multiple sequence alignment." Awarded by:University of New South Wales. Computer Science & Engineering, 2009. http://handle.unsw.edu.au/1959.4/44781.

Full text

Abstract:

This thesis investigates indexing and partitioning schemes for high dimensional scientific computational problems. Building on the foundation offered by Mathematics of Arrays (MoA) for tensor-based computation, the ultimate contribution of the thesis is a unified partitioning scheme that works invariant of the dataset dimension and shape. Consequently, portability is ensured between different high performance machines, cluster architectures, and potentially computational grids. The Multiple Sequence Alignment (MSA) problem in computational biology has an optimal dynamic programming based solution, but it becomes computationally infeasible as its dimensionality (the number of sequences) increases. Even sub-optimal approximations may be unmanageable for more than eight sequences. Furthermore, no existing MSA algorithms have been formulated in a manner invariant over the number of sequences. This thesis presents an optimal distributed MSA method based on MoA. The latter offers a set of constructs that help represent multidimensional arrays in memory in a linear, concise and efficient way. Using MoA allows the partitioning of the dynamic programming algorithm to be expressed independently of dimension. MSA is the highest dimensional scientific problem considered for MoA-based partitioning to date. Two partitioning schemes are presented: the first is a master/slave approach which is based on both master/slave scheduling and slave/slave coupling. The second approach is a peer-to-peer design, in which the scheduling and dependency communication are calculated independently by each process, with no need for a master scheduler. A search space reduction technique is introduced to cater for the exponential expansion as the problem dimensionality increases. This technique relies on defining a hyper-diagonal through the tensor space, and choosing a band of neighbouring partitions around the diagonal to score. In contrast, other sub-optimal methods in the literature only consider projections on the surface of the hyper-cube. The resulting massively parallel design produces a scalable solution that has been implemented on high performance machines and cluster architectures. Experimental results for these implementations are presented for both simulated and real datasets. Comparisons between the reduced search space technique of this thesis with other sub-optimal methods for the MSA problem are presented.

APA, Harvard, Vancouver, ISO, and other styles

33

Liu, Xing. "High-performance algorithms and software for large-scale molecular simulation." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/53487.

Full text

Abstract:

Molecular simulation is an indispensable tool in many different disciplines such as physics, biology, chemical engineering, materials science, drug design, and others. Performing large-scale molecular simulation is of great interest to biologists and chemists, because many important biological and pharmaceutical phenomena can only be observed in very large molecule systems and after sufficiently long time dynamics. On the other hand, molecular simulation methods usually have very steep computational costs, which limits current molecular simulation studies to relatively small systems. The gap between the scale of molecular simulation that existing techniques can handle and the scale of interest has become a major barrier for applying molecular simulation to study real-world problems. In order to study large-scale molecular systems using molecular simulation, it requires developing highly parallel simulation algorithms and constantly adapting the algorithms to rapidly changing high performance computing architectures. However, many existing algorithms and codes for molecular simulation are from more than a decade ago, which were designed for sequential computers or early parallel architectures. They may not scale efficiently and do not fully exploit features of today's hardware. Given the rapid evolution in computer architectures, the time has come to revisit these molecular simulation algorithms and codes. In this thesis, we demonstrate our approach to addressing the computational challenges of large-scale molecular simulation by presenting both the high-performance algorithms and software for two important molecular simulation applications: Hartree-Fock (HF) calculations and hydrodynamics simulations, on highly parallel computer architectures. The algorithms and software presented in this thesis have been used by biologists and chemists to study some problems that were unable to solve using existing codes. The parallel techniques and methods developed in this work can be also applied to other molecular simulation applications.

APA, Harvard, Vancouver, ISO, and other styles

34

Cortés, Fité Ana. "A new distributed diffusion algorithm for dynamic load-balancing in parallel systems." Doctoral thesis, Universitat Autònoma de Barcelona, 2000. http://hdl.handle.net/10803/3054.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Sharma, Rahil. "Shared and distributed memory parallel algorithms to solve big data problems in biological, social network and spatial domain applications." Diss., University of Iowa, 2016. https://ir.uiowa.edu/etd/2277.

Full text

Abstract:

Big data refers to information which cannot be processed and analyzed using traditional approaches and tools, due to 4 V's - sheer Volume, Velocity at which data is received and processed, and data Variety and Veracity. Today massive volumes of data originate in domains such as geospatial analysis, biological and social networks, etc. Hence, scalable algorithms for effcient processing of this massive data is a signicant challenge in the field of computer science. One way to achieve such effcient and scalable algorithms is by using shared & distributed memory parallel programming models. In this thesis, we present a variety of such algorithms to solve problems in various above mentioned domains. We solve five problems that fall into two categories. The first group of problems deals with the issue of community detection. Detecting communities in real world networks is of great importance because they consist of patterns that can be viewed as independent components, each of which has distinct features and can be detected based upon network structure. For example, communities in social networks can help target users for marketing purposes, provide user recommendations to connect with and join communities or forums, etc. We develop a novel sequential algorithm to accurately detect community structures in biological protein-protein interaction networks, where a community corresponds with a functional module of proteins. Generally, such sequential algorithms are computationally expensive, which makes them impractical to use for large real world networks. To address this limitation, we develop a new highly scalable Symmetric Multiprocessing (SMP) based parallel algorithm to detect high quality communities in large subsections of social networks like Facebook and Amazon. Due to the SMP architecture, however, our algorithm cannot process networks whose size is greater than the size of the RAM of a single machine. With the increasing size of social networks, community detection has become even more difficult, since network size can reach up to hundreds of millions of vertices and edges. Processing such massive networks requires several hundred gigabytes of RAM, which is only possible by adopting distributed infrastructure. To address this, we develop a novel hybrid (shared + distributed memory) parallel algorithm to efficiently detect high quality communities in massive Twitter and .uk domain networks. The second group of problems deals with the issue of effciently processing spatial Light Detection and Ranging (LiDAR) data. LiDAR data is widely used in forest and agricultural crop studies, landscape classification, 3D urban modeling, etc. Technological advancements in building LiDAR sensors have enabled highly accurate and dense LiDAR point clouds resulting in massive data volumes, which pose computing issues with processing and storage. We develop the first published landscape driven data reduction algorithm, which uses the slope-map of the terrain as a filter to reduce the data without sacrificing its accuracy. Our algorithm is highly scalable and adopts shared memory based parallel architecture. We also develop a parallel interpolation technique that is used to generate highly accurate continuous terrains, i.e. Digital Elevation Models (DEMs), from discrete LiDAR point clouds.

APA, Harvard, Vancouver, ISO, and other styles

36

Devlin, Agatha. "A study of parallel algorithms for the solution of some combinatorial optimization problems on a distributed array of processors." Thesis, Queen's University Belfast, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.282129.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Bozdag, Doruk. "Graph Coloring and Clustering Algorithms for Science and Engineering Applications." The Ohio State University, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=osu1229459765.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Ahmed, Nova. "Parallel Algorithm for Memory Efficient Pairwise and Multiple Genome Alignment in Distributed Environment." Digital Archive @ GSU, 2004. http://digitalarchive.gsu.edu/cs_theses/2.

Full text

Abstract:

The genome sequence alignment problems are very important ones from the computational biology perspective. These problems deal with large amount of data which is memory intensive as well as computation intensive. In the literature, two separate algorithms have been studied and improved – one is a Pairwise sequence alignment algorithm which aligns pairs of genome sequences with memory reduction and parallelism for the computation and the other one is the multiple sequence alignment algorithm that aligns multiple genome sequences and this algorithm is also parallelized efficiently so that the workload of the alignment program is well distributed. The parallel applications can be launched on different environments where shared memory is very well suited for these kinds of applications. But shared memory environment has the limitation of memory usage as well as scalability also these machines are very costly. A better approach is to use the cluster of computers and the cluster environment can be further enhanced to a grid environment so that the scalability can be improved introducing multiple clusters. Here the grid environment is studied as well as the shared memory and cluster environment for the two applications. It can be stated that for carefully designed algorithms the grid environment is comparable for its performance to other distributed environments and it sometimes outperforms the others in terms of the limitations of resources the other distributed environments have.

APA, Harvard, Vancouver, ISO, and other styles

39

Rong, Rong. "Toward Distributed At-scale Hybrid Network Test with Emulation and Simulation Symbiosis." FIU Digital Commons, 2016. http://digitalcommons.fiu.edu/etd/3058.

Full text

Abstract:

In the past decade or so, significant advances were made in the field of Future Internet Architecture (FIA) design. Undoubtedly, the size of Future Internet will increase tremendously, and so will the complexity of its users’ behaviors. This advancement means most of future Internet applications and services can only achieve and demonstrate full potential on a large-scale basis. The development of network testbeds that can validate key design decisions and expose operational issues at scale is essential to FIA research. In conjunction with the development and advancement of FIA, cyber-infrastructure testbeds have also achieved remarkable progress. For meaningful network studies, it is indispensable to utilize cyber-infrastructure testbeds appropriately in order to obtain accurate experiment results. That said, existing current network experimentation is intrinsically deficient. The existing testbeds do not offer scalability, flexibility, and realism at the same time. This dissertation aims to construct a hybrid system of conducting at-scale network studies and experiments by exploiting the distributed computing ability of current testbeds. First, this work presents a synchronization of parallel discrete event simulation that offers the simulation with transparent scalability and performance on various high-end computing platforms. The parallel simulator that we implement is configured so that it can self-adapt for the performance while running on supercomputers with disparate architectures. The simulator could be used to handle models of different sizes, varying modeling details, and different complexity levels. Second, this works addresses the issue of researching network design and implementation realistically at scale, through the use of distributed cyber-infrastructure testbeds. An existing symbiotic approach is applied to integrate emulation with simulation so that they can overcome the limitations of physical setup. The symbiotic method is used to improve the capabilities of a specific emulator, Mininet. In this case, Mininet can be used to run applications directly on the virtual machines and software switches, with network connectivity represented by detailed simulation at scale. We also propose a method for using the symbiotic approach to coordinate separate Mininet instances, each representing a different set of the overlapping network flows. This approach provides a significant improvement to the scalability of the network experiments.

APA, Harvard, Vancouver, ISO, and other styles

40

Gubba, Ravikumar Krishnanjan. "Distributed simulation of power systems using real time digital simulator." Master's thesis, Mississippi State : Mississippi State University, 2009. http://library.msstate.edu/etd/show.asp?etd=etd-06152009-222641.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Nemanja, Nedić. "Upravljanje tokovima aktivnosti u distributivnom menadžment sistemu." Phd thesis, Univerzitet u Novom Sadu, Fakultet tehničkih nauka u Novom Sadu, 2016. http://www.cris.uns.ac.rs/record.jsf?recordId=96420&source=NDLTD&language=en.

Full text

Abstract:

U radu je predstavljeno istraživanje vezano za poboljšanje performansi rada velikih nadzorno-upravljačkih sistema poput DMS-a. Ovaj cilj je postignut koordinacijom izvršavanja tokova aktivnosti, što podrazumeva efikasnu raspodelu zadataka na računarske resurse. U te svrhe razvijeni su i testirani različiti algoritmi. Ovakav pristup je obezbedio veći stepen iskorišćenja računarskih resursa, što je rezultiralo boljim performansama.
Thе paper presents an approach how to improve performance of larger scale distributed utility management system such as DMS. This goal is accomplished by using an intelligent workflow management. Workflows are divided into the atomic tasks which are scheduled to computing resources for execution. For these purposes various scheduling algorithms are developed and thoroughly tested. This approach has provided greater utilization of computing resources which further have resulted in better performance.

APA, Harvard, Vancouver, ISO, and other styles

42

Chaiwan, Pramote. "NEW ACCURATE FAULT LOCATION ALGORITHM FOR PARALLEL TRANSMISSION LINES." UKnowledge, 2011. http://uknowledge.uky.edu/gradschool_diss/813.

Full text

Abstract:

Electric power systems have been in existence for over a century. Electric power transmission line systems play an important role in carrying electrical power to customers everywhere. The number of transmission lines in power systems is increasing as global demand for power has increased. Parallel transmission lines are widely used in the modern transmission system for higher reliability. The parallel lines method has economic and environmental advantages over single circuit. A fault that occurs on a power transmission line will cause long outage time if the fault location is not located as quickly as possible. The faster the fault location is found, the sooner the system can be restored and outage time can be reduced. The main focus of this research is to develop a new accurate fault location algorithm for parallel transmission lines to identify the fault location for long double-circuit transmission lines, taking into consideration mutual coupling impedance, mutual coupling admittance, and shunt capacitance of the line. In this research, the equivalent PI circuit based on a distributed parameter line model for positive, negative, and zero sequence networks have been constructed for system analysis during the fault. The new method uses only the voltage and current from one end of parallel lines to calculate the fault distance. This research approaches the problem by derivation all equations from positive sequence, negative sequence, and zero sequence network by using KVL and KCL. Then, the fault location is obtained by solving these equations. EMTP has been utilized to generate fault cases under various fault conditions with different fault locations, fault types and fault resistances. Then the algorithm is evaluated using the simulated data. The results have shown that the developed algorithm can achieve highly accurate estimates and is promising for practical applications.

APA, Harvard, Vancouver, ISO, and other styles

43

Kallala, Haithem. "Massively parallel algorithms for realistic PIC simulations of ultra high intensity laser-plasma interaction, application to attosecond pulses separation of Doppler harmonics." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASS052.

Full text

Abstract:

La complexité des mécanismes physiques mis en jeu lors de l'interaction laser-plasma à ultra-haute intensité nécessite de recourir à des simulations PIC particulièrement lourdes. Au cœur de ces codes de calcul, les solveurs de Maxwell pseudo-spectraux d'ordre élevé présentent de nombreux avantages en termes de précision numérique. Néanmoins, ces solveurs ont un coût élevé en termes de ressources nécessaires. En effet, les techniques de parallélisation existantes pour ces solveurs sont peu performantes au-delà de quelques milliers de coeurs, ou induisent un important usage mémoire, ce qui limite leur scalabilité à large échelle. Dans cette thèse, nous avons développé une toute nouvelle approche de parallélisation qui combine les avantages des méthodes existantes. Cette méthode a été testée à très large échelle et montre un scaling significativement meilleur que les précédentes techniques, tout en garantissant un usage mémoire réduit.En capitalisant sur ce travail numérique, nous avons réalisé une étude numérique/théorique approfondie dans le cadre de la génération d'harmoniques d'ordres élevés sur cible solide. Lorsqu'une impulsion laser ultra-intense (I>10¹⁶W.cm⁻² ) et ultra-courte (de quelques dizaines de femtosecondes) est focalisée sur une cible solide, elle génère un plasma sur-dense, appelé miroir plasma, qui réfléchit non-linéairement le laser incident. La réflexion de l'impulsion laser est accompagnée par l'émission cohérente d'harmoniques d'ordres élevées, sous forme d'impulsions X-UV attosecondes (1 attosecond = 10⁻¹⁸s). Pour des intensités laser relativistes (I>10¹⁹ W.cm⁻²), la surface du plasma est incurvée sous l'effet de la pression de radiation du laser. De ce fait, les harmoniques rayonnées par la surface du plasma sont focalisées. Dans cette thèse, j'ai étudié la possibilité de produire des impulsions attosecondes isolées en régime relativiste sur miroir plasma, grâce au mécanisme de phare attoseconde. Celui-ci consiste à introduire une rotation des fronts d'onde du laser incident de façon à séparer angulairement les différentes impulsions attosecondes produites à chaque cycle optique. En régime relativiste, la courbure du miroir plasma augmente considérablement la divergence du faisceau harmonique, ce qui rend le mécanisme phare attoseconde inefficace. Pour y remédier, j'ai développé deux techniques de réduction de divergence harmonique afin de mitiger l'effet de focalisation induit par la courbure du miroir plasma et permettre de générer des impulsions attosecondes isolées à partir d’harmoniques Doppler. Ces deux techniques sont basées sur la mise en forme en amplitude et en phase du faisceau laser. Par ailleurs, j'ai développé un modèle théorique pour déterminer les régimes optimaux d'interaction afin de maximiser la séparation angulaire des impulsions attosecondes. Ce modèle a été validé par des simulations numériques PIC en géométries 2D et 3D et sur une large gamme de paramètres laser et plasma. Finalement, on montre qu'en ajustant des paramètres laser et plasma réalistes, il est possible de séparer efficacement les impulsions attosecondes en régime relativiste
The complexity of the physical mechanisms involved in ultra-high intensity laser-plasma interaction requires the use of particularly heavy PIC simulations. At the heart of these computational codes, high-order pseudo-spectral Maxwell solvers have many advantages in terms of numerical accuracy. This numerical approach comes however with an expensive computational cost. Indeed, existing parallelization methods for pseudo-spectral solvers are only scalable to few tens of thousands of cores, or induce an important memory footprint, which also hinders the scaling of the method at large scales. In this thesis, we developed a novel, arbitrarily scalable, parallelization strategy for pseudo-spectral Maxwell's equations solvers which combines the advantages of existing parallelization techniques. This method proved to be more scalable than previously proposed approaches, while ensuring a significant drop in the total memory use.By capitalizing on this computational work, we conducted an extensive numerical and theoretical study in the field of high order harmonics generation on solid targets. In this context, when an ultra-intense (I>10¹⁶W.cm⁻²) ultra-short (few tens of femtoseconds) laser pulse irradiates a solid target, a reflective overdense plasma mirror is formed at the target-vacuum interface. The subsequent laser pulse non linear reflection is accompanied with the emission of coherent high order laser harmonics, in the form of attosecond X-UV light pulses (1 attosecond = 10⁻¹⁸s). For relativistic laser intensities (I>10¹⁹ W.cm⁻²), the plasma surface is curved under the laser radiation pressure. And the plasma mirror acts as a focusing optics for the radiated harmonic beam. In this thesis, we investigated feasible ways for producing isolated attosecond light pulses from relativistic plasma-mirror harmonics, with the so called attosecond lighthouse effect. This effect relies introducing a wavefront rotation on the driving laser pulse in order to send attosecond pulses emitted during different laser optical cycles along different directions. In the case of high order harmonics generated in the relativistic regime, the plasma mirror curvature significantly increases the attosecond pulses divergence and prevents their separation with the attosecond lighthouse scheme. For this matter, we developed two harmonic divergence reduction techniques, based on tailoring the laser pulse phase or amplitude profiles in order to significantly inhibit the plasma mirror focusing effect and allow for a clear separation of attosecond light pulses by reducing the harmonic beam divergence. Furthermore, we developed an analytical model to predict optimal interaction conditions favoring attosecond pulses separation. This model was fully validated with 2D and 3D PIC simulations over a broad range of laser and plasma parameters. In the end, we show that under realistic laser and plasma conditions, it is possible to produce isolated attosecond pulses from Doppler harmonics

APA, Harvard, Vancouver, ISO, and other styles

44

He, Jian. "Design and Evaluation of a Data-distributed Massively Parallel Implementation of a Global Optimization Algorithm---DIRECT." Diss., Virginia Tech, 2007. http://hdl.handle.net/10919/29786.

Full text

Abstract:

The present work aims at an efficient, portable, and robust design of a data-distributed massively parallel DIRECT, the deterministic global optimization algorithm widely used in multidisciplinary engineering design, biological science, and physical science applications. The original algorithm is modified to adapt to different problem scales and optimization (exploration vs.\ exploitation) goals. Enhanced with a memory reduction technique, dynamic data structures are used to organize local data, handle unpredictable memory requirements, reduce the memory usage, and share the data across multiple processors. The parallel scheme employs a multilevel functional and data parallelism to boost concurrency and mitigate the data dependency, thus improving the load balancing and scalability. In addition, checkpointing features are integrated to provide fault tolerance and hot restarts. Important algorithm modifications and design considerations are discussed regarding data structures, parallel schemes, error handling, and portability. Using several benchmark functions and real-world applications, the present work is evaluated in terms of optimization effectiveness, data structure efficiency, memory usage, parallel performance, and checkpointing overhead. Modeling and analysis techniques are used to investigate the design effectiveness and performance sensitivity under various problem structures, parallel schemes, and system settings. Theoretical and experimental results are compared for two parallel clusters with different system scale and network connectivity. An analytical bounding model is constructed to measure the load balancing performance under different schemes. Additionally, linear regression models are used to characterize two major overhead sources---interprocessor communication and processor idleness, and also applied to the isoefficiency functions in scalability analysis. For a variety of high-dimensional problems and large scale systems, the data-distributed massively parallel design has achieved reasonable performance. The results of the performance study provide guidance for efficient problem and scheme configuration. More importantly, the generalized design considerations and analysis techniques are beneficial for transforming many global search algorithms to become effective large scale parallel optimization tools.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

45

Viennot, Laurent. "Quelques algorithmes parallèles et séquentiels de traitement des graphes et applications." Phd thesis, Université Paris-Diderot - Paris VII, 1996. http://tel.archives-ouvertes.fr/tel-00471691.

Full text

Abstract:

Cette présente un point de vue algorithmique parllèle et séquentiel sur le traitement des graphes. Le chapitre~1 est consacré au modèle \lscPRAM qui est le modèle de parallèlisme le plus simple qui soit : plusieurs processeurs ont accès à une mémoire partagée. Même avec la simplification apportée par le modèle, certains problèmes restent difficiles à résoudre. La section~1.1 introduit une représentation adaptée aux traitement algorithmique des ordres de dimension fixée $d$ et permet de calculer une représentation classique de l'ordre, ce calcul est lié aux traitement de requêtes géométriques dans un espace de dimension $d$. La section~1.2 est consacrée à la reconnaissance en parallèle des ordres \lscN-free et la section~1.3 traite de la reconnaissance des graphes de comparabilité. D'une manière générale, l'étude de classes particulières de graphes permet de résoudre des problèmes qui sont difficiles dans le cas général en utilisant une structure algorithmique sous-jacente à la classe considérée. Le problème de la reconnaissance consiste à trouver cette structure. Le chapitre~2 est au consacré au modèle \lscCGM qui est un modèle de machine parallèle dite << à gros grain >> qui priviligie l'étude du placement distribué des données d'un problème, \cad{} sur les différentes mémoires des ordinateurs qui vont travailler ensemble sur le problème. Ce chapitre reprend les problèmes abordés dans le modèle \lscPRAM et en fournit des solutions dans le modèle \lscCGM. Un algorithme de \anglais{list-ranking} est de plus présenté dans la section d'un graphe dans ce modèle. Le chapitre~3 est consacré à un << modèle de calcul >> très particulier issu d'un problème de téléphonie \lscGSM. Ce chapitre regroupe d'une part les différentes idées algorithmiques qui s'appliquent à un tel problème soumis à de multiples contraintes et d'autre part des simulations permettant d'évaluer la pertinence des différentes idées. Ce problème est de nature continue mais on peut néanmoins y apporter des solutions issues de l'algorithmique discrète telles que les techniques liées aux des composantes connexes d'un graphe. Par soucis de continuité, un algorithme de composante connexes est donné dans chacun des trois modèles abordés. Enfin, le chapitre~4 est consacré à une nouvelle technique algorithmique : l'affinage de partition. La section~4.1 tente de cerner cette technique et montre les ressemblances entre différents algorithmes existants. Cette technique nous permettra de généraliser certains de ces algorithmes à la résolution d'autres problèmes proches. L'affinage de partition nous permettra ensuite dans la section~4.2 de donner des algorithmes simples pour résoudre la reconnaissance des graphes d'intervalles et l'orientation transitive, deux problèmes dont les solution algorithmiques efficaces étaient jusque là très difficiles à implanter et reposaient sur des structures de données complexes.

APA, Harvard, Vancouver, ISO, and other styles

46

Mehdi, Malika. "PARALLEL HYBRID OPTIMIZATION METHODS FOR PERMUTATION BASED PROBLEMS." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2011. http://tel.archives-ouvertes.fr/tel-00841962.

Full text

Abstract:

La résolution efficace de problèmes d'optimisation a permutation de grande taille nécessite le développement de méthodes hybrides complexes combinant différentes classes d'algorithmes d'optimisation. L'hybridation des metaheuristiques avec les méthodes exactes arborescentes, tel que l'algorithme du branch-and-bound (B&B), engendre une nouvelle classe d'algorithmes plus efficace que ces deux classes de méthodes utilisées séparément. Le défi principal dans le développement de telles méthodes consiste a trouver des liens ou connections entre les stratégies de recherches divergentes utilisées dans les deux classes de méthodes. Les Algorithmes Genetiques (AGs) sont des metaheuristiques, a base de population, tr'es populaires bas'es sur des op'erateurs stochastiques inspirés de la théorie de l'évolution. Contrairement aux AGs et aux m'etaheuristiques généralement, les algorithmes de B&B sont basées sur l'énumération implicite de l'espace de recherche représente par le moyen d'un arbre, dit arbre de recherche. Notre approche d'hybridation consiste a définir un codage commun des solutions et de l'espace de recherche ainsi que des opérateurs de recherche ad'equats afin de permettre un couplage efficace de bas niveau entre les deux classes de méthodes AGs et B&B. La représentation de l'espace de recherche par le moyen d'arbres est traditionnellement utilis'ee dans les algorithmes de B&B. Dans cette thèse, cette représentation a été adaptée aux metaheuristiques. L'encodage des permutations au moyen de nombres naturels faisant référence a l'ordre d'énumération lexicographique des permutations dans l'arbre du B&B, est proposé comme une nouvelle manière de représenter l'espace de recherche des problèmes 'a permutations dans les metaheuristiques. Cette méthode de codage est basée sur les propriétés mathématiques des permutations, 'a savoir les codes de Lehmer et les tables d'inversions ainsi que les système d'énumération factoriels. Des fonctions de transformation permettant le passage entre les deux représentations (permutations et nombres) ainsi que des opérateurs de recherche adaptes au codage, sont définis pour les problèmes 'a permutations généralisés. Cette représentation, désormais commune aux metaheuristiques et aux algorithmes de B&B, nous a permis de concevoir des stratégies d'hybridation et de collaboration efficaces entre les AGs et le B&B. En effet, deux approches d'hybridation entre les AGs et les algorithmes de B&B (HGABB et COBBIGA) bas'es sur cette représentation commune ont été proposées dans cette thèse. Pour validation, une implémentation a été réalisée pour le problème d'affectation quadratique 'a trois dimension (Q3AP). Afin de résoudre de larges instances de ce problème, nous avons aussi propose une parallélisation pour les deux algorithme hybrides, basée sur des techniques de décomposition d'espace (décomposition par intervalle) utilisées auparavant pour la parallélisation des algorithmes de B&B. Du point de vue implémentation, afin de faciliter de futurs conceptions et implémentations de méthodes hybrides combinant metaheuristiques et méthodes exacte arborescentes, nous avons développe une plateforme d'hybridation intégrée au logiciel pour metaheuristiques, ParadisEO. La nouvelle plateforme a été utilisée pour réaliser des expérimentations intensives sur la grille de calcul Grid'5000.

APA, Harvard, Vancouver, ISO, and other styles

47

Lancin, Aurélien. "Étude de réseaux complexes et de leurs propriétés pour l’optimisation de modèles de routage." Thesis, Nice, 2014. http://www.theses.fr/2014NICE4117/document.

Full text

Abstract:

Cette thèse s’intéresse aux problématiques de routage dans les réseaux, notamment dans le graphe des systèmes autonomes (AS) d’Internet. Nous cherchons d’une part à mieux comprendre les propriétés du graphe de l’Internet qui sont utiles dans la conception de nouveaux paradigmes de routage. D’autre part, nous cherchons à évaluer par simulation les performances de ces paradigmes. La première partie de mes travaux porte sur l’étude d’une propriété́ métrique, l’hyperbolicité́ selon Gromov, utilisée dans la conception de nouveaux paradigmes de routage. Je présente dans un premier temps une nouvelle approche pour le calcul de l’hyperbolicité́ d’un graphe utilisant une décomposition du graphe par les cliques-séparatrices et la notion de paires éloignées. Je propose ensuite un nouvel algorithme pour le calcul de l’hyperbolicité́ qui, combiné avec la méthode de décomposition par les cliques-séparatrices, permet son calcul sur des graphes composés de 58 000 sommets en quelques heures. La deuxième partie de mes travaux porte sur le développement de DRMSim, une nouvelle plate-forme de simulation de modèles de routage dynamiques. Celle-ci permet l’évaluation des performances des schémas de routage et leur comparaison au protocole de référence, le protocole de routeur frontière, BGP. DRMSim a permis l’étude par simulation de différents schémas de routage compact sur des topologies à O(10k) nœuds. Je détaille l’architecture de DRMSim et quelques exemples d’utilisation. Puis, je présente une étude réalisée en vue de développer une version parallèle et distribuée de DRMSim dans le cadre de la simulation de BGP
This thesis considers routing issues in networks, and particularly the graph of the autonomous systems (AS) of the Internet. Firstly, we aim at better understanding the properties of the Internet that are useful in the design of new routing paradigms. Secondly, we want to evaluate by simulation the performance of these paradigms. The first part of my work concerns the study of the Gromov hyperbolicity, a useful metric property for the design of new routing paradigms. I show how to use a decomposition of the graph by clique-separators as a pre-processing method for the computation of the hyperbolicity. Then, I propose a new algorithm to compute this property. Altogether, these methods allows us for computing the hyperbolicity of graphs up to 58 000 nodes. The second part of my work concerns the development of DRMSim, a new Dynamic Routing Model Simulator. It facilitates the evaluation of the performances of various routing schemes and their comparison to the standard routing scheme of the Internet, the border router protocol BGP. Using DRMSim, we performed simulations of several compact routing schemes on topologies up to O(10k) nodes. I describe its architecture and detail some examples. Then, I present a feasibility study for the design of a parallel/distributed version of DRMSim in order to simulate BGP on larger topologies

APA, Harvard, Vancouver, ISO, and other styles

48

Tesfa, Teklu K. "Distributed Hydrological Modeling Using Soil Depth Estimated from Landscape Variable Derived with Enhanced Terrain Analysis." DigitalCommons@USU, 2010. https://digitalcommons.usu.edu/etd/616.

Full text

Abstract:

The spatial patterns of land surface and subsurface characteristics determine the spatial heterogeneity of hydrological processes. Soil depth is one of these characteristics and an important input parameter required by distributed hydrological models that explicitly represent spatial heterogeneity. Soil is related to topography and land cover due to the role played by topography and vegetation in affecting soil-forming processes. The research described in this dissertation addressed the development of statistical models that predict the soil depth pattern over the landscape; derivation of new topographic variables evaluated using both serial and parallel algorithms; and evaluation of the impacts of detailed soil depth representation on simulations of stream flow and soil moisture. The dissertation is comprised of three papers. In paper 1, statistical models were developed to predict soil depth pattern over the watershed based on topographic and land cover variables. Soil depth was surveyed at locations selected to represent the topographic and land cover variation at the Dry Creek Experimental Watershed, near Boise, Idaho. Explanatory variables were derived from a digital elevation model and remote sensing imagery for regression to the field data. Generalized Additive and Random Forests models were developed to predict soil depth over the watershed. The models were able to explain about 50% of the soil depth spatial variation, which is an important improvement over the soil depth extracted from the SSURGO national soil database. In paper 2, definitions of the new topographic variables derived in the effort to model soil depth, and serial and Message Passing Interface parallel implementations of the algorithms for their evaluation are presented. The parallel algorithms enhanced the processing speed of large digital elevation models as compared to the serial recursive algorithms initially developed. In paper 3, the impact of spatially explicit soil depth information on simulations of stream flow and soil moisture as compared to soil depth derived from the SSURGO soil database has been evaluated. The Distributed Hydrology Vegetation Soil Model was applied using automated parameter optimization technique with all input parameters the same except soil depth. Stream flow was less impacted by the detailed soil depth information, while simulation of soil moisture was slightly improved due to the detailed representation of soil depth.

APA, Harvard, Vancouver, ISO, and other styles

49

Afshar, Yaser. "Parallel distributed-memory particle methods for acquisition-rate segmentation and uncertainty quantifications of large fluorescence microscopy images." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-213157.

Full text

Abstract:

Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. Another issue is the information loss during image acquisition due to limitations of the optical imaging systems. Analysis of the acquired images may, therefore, find multiple solutions (or no solution) due to imaging noise, blurring, and other uncertainties introduced during image acquisition. In this thesis, we address the computational processing time and memory issues by developing a distributed parallel algorithm for segmentation of large fluorescence-microscopy images. The method is based on the versatile Discrete Region Competition (Cardinale et al., 2012) algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collective solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 10^10 pixels) but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data inspection and interactive experiments. Second, we estimate the segmentation uncertainty on large images that do not fit the main memory of a single computer. We there- fore develop a distributed parallel algorithm for efficient Markov- chain Monte Carlo Discrete Region Sampling (Cardinale, 2013). The parallel algorithm provides a measure of segmentation uncertainty in a statistically unbiased way. It approximates the posterior probability densities over the high-dimensional space of segmentations around the previously found segmentation
Moderne Fluoreszenzmikroskopie, wie zum Beispiel Lichtblattmikroskopie, erlauben die Aufnahme hochaufgelöster, 3-dimensionaler Bilder. Dies führt zu einen Engpass bei der Bearbeitung und Analyse der aufgenommenen Bilder, da die Aufnahmerate die Datenverarbeitungsrate übersteigt. Zusätzlich können diese Bilder so groß sein, dass sie die Speicherkapazität eines einzelnen Computers überschreiten. Hinzu kommt der aus Limitierungen des optischen Abbildungssystems resultierende Informationsverlust während der Bildaufnahme. Bildrauschen, Unschärfe und andere Messunsicherheiten können dazu führen, dass Analysealgorithmen möglicherweise mehrere oder keine Lösung für Bildverarbeitungsaufgaben finden. Im Rahmen der vorliegenden Arbeit entwickeln wir einen verteilten, parallelen Algorithmus für die Segmentierung von speicherintensiven Fluoreszenzmikroskopie-Bildern. Diese Methode basiert auf dem vielseitigen "Discrete Region Competition" Algorithmus (Cardinale et al., 2012), der sich bereits in anderen Anwendungen als nützlich für die Segmentierung von Mikroskopie-Bildern erwiesen hat. Das hier präsentierte Verfahren unterteilt das Eingangsbild in kleinere Unterbilder, welche auf die Speicher mehrerer Computer verteilt werden. Die Koordinierung des globalen Segmentierungsproblems wird durch die Benutzung von Netzwerkkommunikation erreicht. Dies erlaubt die Segmentierung von sehr großen Bildern, wobei wir die Anwendung des Algorithmus auf Bildern mit bis zu 10^10 Pixeln demonstrieren. Zusätzlich wird die Segmentierungsgeschwindigkeit erhöht und damit vergleichbar mit der Aufnahmerate des Mikroskops. Dies ist eine Grundvoraussetzung für die intelligenten Mikroskope der Zukunft, und es erlaubt die Online-Betrachtung der aufgenommenen Daten, sowie interaktive Experimente. Wir bestimmen die Unsicherheit des Segmentierungsalgorithmus bei der Anwendung auf Bilder, deren Größe den Speicher eines einzelnen Computers übersteigen. Dazu entwickeln wir einen verteilten, parallelen Algorithmus für effizientes Markov-chain Monte Carlo "Discrete Region Sampling" (Cardinale, 2013). Dieser Algorithmus quantifiziert die Segmentierungsunsicherheit statistisch erwartungstreu. Dazu wird die A-posteriori-Wahrscheinlichkeitsdichte über den hochdimensionalen Raum der Segmentierungen in der Umgebung der zuvor gefundenen Segmentierung approximiert

APA, Harvard, Vancouver, ISO, and other styles

50

Kosowski, Adrian. "Time and Space-Efficient Algorithms for Mobile Agents in an Anonymous Network." Habilitation à diriger des recherches, Université Sciences et Technologies - Bordeaux I, 2013. http://tel.archives-ouvertes.fr/tel-00867765.

Full text

Abstract:

Computing with mobile agents is rapidly becoming a topic of mainstream research in the theory of distributed computing. The main research questions undertaken in this study concern the feasibility of solving fundamental tasks in an anonymous network, subject to limitations on the resources available to the agent. The considered challenges include: exploring a graph by means of an agent with limited memory, discovery of the network topology, and attempting to meet with another agent in another network (rendezvous). The constraints imposed on the agent include the number of moves which the agent is allowed to perform in the network, the amount of state memory available to the agent, the ability of the agent to communicate with other agents, as well as its a priori knowledge of the network topology or of global parameters.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!