Dissertations / Theses on the topic 'Efficient parallel algorithms'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Efficient parallel algorithms.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Stadtherr, Hans. "Work efficient parallel scheduling algorithms." [S.l. : s.n.], 1998. http://deposit.ddb.de/cgi-bin/dokserv?idn=962681369.
Full textHuang, Chun-Hsi. "Communication-efficient bulk synchronous parallel algorithms." Buffalo, N.Y. : Dept. of Computer Science, State University of New York at Buffalo, 2001. http://www.cse.buffalo.edu/tech%2Dreports/2001%2D06.ps.Z.
Full textLiang, Weifa, and wliang@cs anu edu au. "Designing Efficient Parallel Algorithms for Graph Problems." The Australian National University. Department of Computer Science, 1997. http://thesis.anu.edu.au./public/adt-ANU20010829.114536.
Full textYang, Yong. "Efficient parallel genetic algorithms applied to numerical optimisation." Thesis, Southampton Solent University, 2008. http://ssudl.solent.ac.uk/631/.
Full textGoldberg, Andrew Vladislav. "Efficient graph algorithms for sequential and parallel computers." Thesis, Massachusetts Institute of Technology, 1987. http://hdl.handle.net/1721.1/14912.
Full textMICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING.
Bibliography: p. 117-123.
by Andrew Vladislav Goldberg.
Ph.D.
He, Xin. "On efficient parallel algorithms for solving graph problems /." The Ohio State University, 1987. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487331541710947.
Full textChen, Calvin Ching-Yuen. "Efficient Parallel Algorithms and Data Structures Related to Trees." Thesis, University of North Texas, 1991. https://digital.library.unt.edu/ark:/67531/metadc332626/.
Full textHalverson, Ranette Hudson. "Efficient Linked List Ranking Algorithms and Parentheses Matching as a New Strategy for Parallel Algorithm Design." Thesis, University of North Texas, 1993. https://digital.library.unt.edu/ark:/67531/metadc278153/.
Full text邱祖淇 and Cho-ki Joe Yau. "Efficient solutions for the load distribution problem." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1999. http://hub.hku.hk/bib/B31222031.
Full textYau, Cho-ki Joe. "Efficient solutions for the load distribution problem /." Hong Kong : University of Hong Kong, 1999. http://sunzi.lib.hku.hk/hkuto/record.jsp?B20971953.
Full textBangalore, Lakshminarayana Nagesh. "Efficient graph algorithm execution on data-parallel architectures." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/53058.
Full textMuhammad, Ali. "Parallel genetic algorithms : an efficient model and applications in control systems." Thesis, Southampton Solent University, 1997. http://ssudl.solent.ac.uk/1266/.
Full textGupta, Sandeep K. S. "Synthesizing communication-efficient distributed memory parallel programs for block recursive algorithms /." The Ohio State University, 1995. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487861796820607.
Full textBischof, Stefan. "Efficient algorithms for on-line scheduling and load distribution in parallel systems." [S.l. : s.n.], 1999. http://deposit.ddb.de/cgi-bin/dokserv?idn=959771409.
Full textBanks, Nicola E. "Insights from the parallel implementation of efficient algorithms for the fractional calculus." Thesis, University of Chester, 2015. http://hdl.handle.net/10034/613841.
Full textRamasamy, Kandasamy Manimozhian. "Efficient state space exploration for parallel test generation." Thesis, [Austin, Tex. : University of Texas, 2009. http://hdl.handle.net/2152/ETD-UT-2009-05-131.
Full textTitle from PDF title page (University of Texas Digital Repository, viewed on August 10, 2009). Includes bibliographical references.
Nelissen, Geoffrey. "Efficient optimal multiprocessor scheduling algorithms for real-time systems." Doctoral thesis, Universite Libre de Bruxelles, 2013. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/209528.
Full textThese last years, we have witnessed a paradigm shift in the computing platform architectures. Uniprocessor platforms have given place to multiprocessor architectures. While the real-time scheduling theory can be considered as being mature for uniprocessor systems, it is still an evolving research field for multiprocessor architectures. One of the main difficulties with multiprocessor platforms, is to provide an optimal scheduling algorithm (i.e. scheduling algorithm that constructs a schedule respecting all the task deadlines for any task set for which a solution exists). Although optimal multiprocessor real-time scheduling algorithms exist, they usually cause an excessive number of task preemptions and migrations during the schedule. These preemptions and migrations cause overheads that must be added to the task execution times. Therefore, task sets that would have been schedulable if preemptions and migrations had no cost, become unschedulable in practice. An efficient scheduling algorithm is therefore an algorithm that either minimize the number of preemptions and migrations, or reduce their cost.
In this dissertation, we expose the following results:
- We show that reducing the "fairness" in the schedule, advantageously impacts the number of preemptions and migrations. Hence, all the scheduling algorithms that will be proposed in this thesis, tend to reduce or even suppress the fairness in the computed schedule.
- We propose three new online scheduling algorithms. One of them --- namely, BF2 --- is optimal for the scheduling of sporadic tasks in discrete-time environments, and reduces the number of task preemptions and migrations in comparison with the state-of-the-art in discrete-time systems. The second one is optimal for the scheduling of periodic tasks in a continuous-time environment. Because this second algorithm is based on a semi-partitioned scheme, it should favorably impact the preemption overheads. The third algorithm --- named U-EDF --- is optimal for the scheduling of sporadic and dynamic task sets in a continuous-time environment. It is the first real-time scheduling algorithm which is not based on the notion of "fairness" and nevertheless remains optimal for the scheduling of sporadic (and dynamic) systems. This important result was achieved by extending the uniprocessor algorithm EDF to the multiprocessor scheduling problem.
- Because the coding techniques are also evolving as the degree of parallelism increases in computing platforms, we provide solutions enabling the scheduling of parallel tasks with the currently existing scheduling algorithms, which were initially designed for the scheduling of sequential independent tasks.
Doctorat en Sciences de l'ingénieur
info:eu-repo/semantics/nonPublished
Witkowski, Thomas. "Software concepts and algorithms for an efficient and scalable parallel finite element method." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-141651.
Full textSoftwarepakete zur numerischen Lösung partieller Differentialgleichungen mit Hilfe der Finiten-Element-Methode sind in vielen Forschungsbereichen ein wichtiges Werkzeug. Die dahinter stehenden Datenstrukturen und Algorithmen unterliegen einer ständigen Neuentwicklung um den immer weiter steigenden Anforderungen der Nutzergemeinde gerecht zu werden und um neue, hochgradig parallel Rechnerarchitekturen effizient nutzen zu können. Dies ist auch der Kernpunkt dieser Arbeit. Um parallel Rechnerarchitekturen mit einer immer höher werdenden Anzahl an von einander unabhängigen Recheneinheiten, z.B.~Prozessoren, effizient Nutzen zu können, müssen Datenstrukturen und Algorithmen aus verschiedenen Teilgebieten der Mathematik und Informatik entwickelt und miteinander kombiniert werden. Im Kern sind dies zwei Bereiche: verteilte Gitter und parallele Löser für lineare Gleichungssysteme. Für jedes der beiden Teilgebiete existieren unabhängig voneinander zahlreiche Ansätze. In dieser Arbeit wird argumentiert, dass für hochskalierbare Anwendungen der Finiten-Elemente-Methode nur eine Kombination beider Teilgebiete und die Verknüpfung der darunter liegenden Datenstrukturen eine effiziente und skalierbare Implementierung ermöglicht. Zuerst stellen wir Konzepte vor, die parallele verteile Gitter mit entsprechenden Adaptionstrategien ermöglichen. Zentraler Punkt ist hier die Informationsaufbereitung für beliebige Löser linearer Gleichungssysteme. Beim Lösen partieller Differentialgleichung mit der Finiten Elemente Methode wird ein großer Teil der Rechenzeit für das Lösen der dabei anfallenden linearen Gleichungssysteme aufgebracht. Daher ist deren Parallelisierung von zentraler Bedeutung. Basierend auf dem vorgestelltem Konzept für verteilten Gitter, welches beliebige geometrische Informationen für die linearen Löser aufbereiten kann, präsentieren wir mehrere unterschiedliche Lösermethoden. Besonders Gewicht wird dabei auf allgemeine Löser gelegt, die möglichst wenig Annahmen über das zu lösende System machen. Hierfür wird die FETI-DP (Finite Element Tearing and Interconnect - Dual Primal) Methode weiterentwickelt. Obwohl die FETI-DP Methode vom mathematischen Standpunkt her als quasi-optimal bezüglich der parallelen Skalierbarkeit gilt, kann sie für große Anzahl an Prozessoren (> 10.000) nicht mehr effizient implementiert werden. Dies liegt hauptsächlich an einem verhältnismäßig kleinem aber global verteilten Grobgitterproblem. Wir stellen eine Multilevel FETI-DP Methode vor, die dieses Problem durch eine hierarchische Komposition des Grobgitterproblems löst. Dadurch wird die Kommunikation entlang des Grobgitterproblems lokalisiert und die Skalierbarkeit der FETI-DP Methode auch für große Anzahl an Prozessoren sichergestellt. Neben der Parallelisierung der Finiten-Elemente-Methode beschäftigen wir uns in dieser Arbeit mit der Ausnutzung von bestimmten Voraussetzung um auch die sequentielle Effizienz bestehender Implementierung der Finiten-Elemente-Methode zu steigern. In vielen Fällen müssen partielle Differentialgleichungen mit mehreren Variablen gelöst werden. Sehr häufig ist dabei zu beobachten, insbesondere bei der Modellierung mehrere miteinander gekoppelter physikalischer Phänomene, dass die Lösungsstruktur der unterschiedlichen Variablen entweder schwach oder vollständig voneinander entkoppelt ist. In den meisten Implementierungen wird dabei nur ein Gitter zur Diskretisierung aller Variablen des Systems genutzt. Wir stellen eine Finite-Elemente-Methode vor, bei der zwei unabhängig voneinander verfeinerte Gitter genutzt werden können um ein System partieller Differentialgleichungen zu lösen
Melot, Nicolas. "Algorithms and Framework for Energy Efficient Parallel Stream Computing on Many-Core Architectures." Doctoral thesis, Linköpings universitet, Programvara och system, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-132308.
Full textThis thesis has also been funded by CUGS, Graduate School in Computer Science and FP7 EXCESS.
The electronic version has been corrected. See the published errata list.
Ojiaku, Jude-Thaddeus. "A study of time and energy efficient algorithms for parallel and heterogeneous computing." Thesis, University of Liverpool, 2016. http://livrepository.liverpool.ac.uk/3004892/.
Full textLeung, Kwong-Keung, and 梁光強. "Fast and efficient video coding based on communication and computationscheduling on multiprocessors." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2001. http://hub.hku.hk/bib/B29750945.
Full textMor, Stefano Drimon Kurz. "Analysis of synchronizations in greedy-scheduled executions and applications to efficient generation of pseudorandom numbers in parallel." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2015. http://hdl.handle.net/10183/130529.
Full textNós apresentamos duas contribuições para a área de programação paralela. A primeira contribuição é teórica: nós introduzimos a análise SIPS, uma nova abordagem para a estimar o número de sincronizações realizadas durante a execução de um algoritmo paralelo. SIPS generaliza o conceito de relógios lógicos para contar o número de sincronizações realizadas por um algoritmo paralelo e é capaz de calcular limites do pior caso mesmo na presença de execuções paralelas não-determinísticas, as quais não são geralmente cobertas por análises no estado-da-arte. Nossa análise nos permite estimar novos limites de pior caso para computações escalonadas pelo popular algoritmo de roubo de tarefas e também projetar programas paralelos e adaptáveis que são mais eficientes. A segunda contribuição é pragmática: nós apresentamos uma estratégia de paralelização eficiente para a geração de números pseudoaleatórios. Como uma alternativa para implementações fixas de componentes de geração aleatória nós introduzimos uma API chamada Par-R, projetada e analisada utilizando-se SIPS. Sua principal idea é o uso da capacidade de um gerador sequencial R de realizar um “pulo” eficiente dentro do fluxo de números gerados; nós os associamos a operações realizadas pelo escalonador por roubo de tarefas, o qual nossa análise baseada em SIPS demonstra ocorrer raramente em média. Par-R é comparado com o gerador paralelo de números pseudoaleatórios DotMix, escrito para a plataforma de multithreading dinâmico Cilk Plus. A latência de Par-R tem comparação favorável à latência do DotMix, o que é confirmado experimentalmente, mas não requer o uso subjacente fixado de um dado gerador aleatório.
We present two contributions to the field of parallel programming. The first contribution is theoretical: we introduce SIPS analysis, a novel approach to estimate the number of synchronizations performed during the execution of a parallel algorithm. Based on the concept of logical clocks, it allows us: on one hand, to deliver new bounds for the number of synchronizations, in expectation; on the other hand, to design more efficient parallel programs by dynamic adaptation of the granularity. The second contribution is pragmatic: we present an efficient parallelization strategy for pseudorandom number generation, independent of the number of concurrent processes participating in a computation. As an alternative to the use of one sequential generator per process, we introduce a generic API called Par-R, which is designed and analyzed using SIPS. Its main characteristic is the use of a sequential generator that can perform a “jump-ahead” directly from one number to another on an arbitrary distance within the pseudorandom sequence. Thanks to SIPS, we show that, in expectation, within an execution scheduled by work stealing of a “very parallel” program (whose depth or critical path is subtle when compared to the work or number of operations), these operations are rare. Par-R is compared with the parallel pseudorandom number generator DotMix, written for the Cilk Plus dynamic multithreading platform. The theoretical overhead of Par-R compares favorably to DotMix’s overhead, what is confirmed experimentally, while not requiring a fixed generator underneath.
Schwambach, Vítor. "Methods and tools for rapid and efficient parallel implementation of computer vision algorithms on embedded multiprocessors." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM022/document.
Full textEmbedded computer vision applications demand high system computational power and constitute one of the key drivers for application-specific multi- and many-core systems. A number of early system design choices can impact the system’s parallel performance – among which the parallel granularity, the number of processors and the balance between computation and communication. Their impact in the final system performance is difficult to assess in early design stages and there is a lack for tools that support designers in this task. The contributions of this thesis consist in two methods and associated tools that facilitate the selection of embedded multiprocessor’s architectural parameters and computer vision application parallelization strategies. The first consists of a Design Space Exploration (DSE) methodology that relies on Parana, a fast and accurate parallel performance estimation tool. Parana enables the evaluation of what-if parallelization scenarios and can determine their maximum achievable performance limits. The second contribution consists of a method for optimal 2D image tile sizing using constraint programming within the Tilana tool. The proposed method integrates non-linear DMA data transfer times and parallel scheduling overheads for increased accuracy
Leung, Kwong-Keung. "Fast and efficient video coding based on communication and computation scheduling on multiprocessors." Hong Kong : University of Hong Kong, 2001. http://sunzi.lib.hku.hk/hkuto/record.jsp?B23272867.
Full textKosowski, Adrian. "Time and Space-Efficient Algorithms for Mobile Agents in an Anonymous Network." Habilitation à diriger des recherches, Université Sciences et Technologies - Bordeaux I, 2013. http://tel.archives-ouvertes.fr/tel-00867765.
Full textZönnchen, Benedikt Sebastian [Verfasser], Hans-Joachim [Akademischer Betreuer] Bungartz, Hans-Joachim [Gutachter] Bungartz, and Gerta [Gutachter] Köster. "Efficient parallel algorithms for large-scale pedestrian simulation / Benedikt Sebastian Zönnchen ; Gutachter: Hans-Joachim Bungartz, Gerta Köster ; Betreuer: Hans-Joachim Bungartz." München : Universitätsbibliothek der TU München, 2021. http://d-nb.info/1237048850/34.
Full textChenini, Hanen. "A rapid design methodology for generating of parallel image processing applications and parallel architectures for smart camera." Thesis, Clermont-Ferrand 2, 2014. http://www.theses.fr/2014CLF22459.
Full textDue to the complexity of image processing algorithms and the restrictions imposed by MPSoC designs to reach their full potentials, automatic design methodologies are needed to provide guidance for the programmer to generate efficient parallel programs. In this dissertation, we present a MPSoC-based design methodology solution supporting automatic design space exploration, automatic performance evaluation, as well as automatic hardware/software system generation. To facilitate the parallel programming, the presented MPSoC approach is based on a CubeGen framework that permits the expression of different scenarios for architecture and algorithmic design exploring to reach the desired level of performance, resulting in short time development. The generated design could be implemented in a FPGA technology with an expected improvement in application performance and power consumption. Starting from the application, we have evolved our effective methodology to provide several parameterizable algorithmic skeletons in the face of varying application characteristics to exploit all types of parallelism of the real algorithms. Implementing such applications on our parallel embedded system shows that our advanced methods achieve increased efficiency with respect to the computational and communication requirements. The experimental results demonstrate that the designed multiprocessing architecture can be programmed efficiently and also can have an equivalent performance to a more powerful designs based hard-core processors and better than traditional ASIC solutions which are too slow and too expensive
Witkowski, Thomas [Verfasser], Axel [Akademischer Betreuer] Voigt, and Wolfgang [Akademischer Betreuer] Nagel. "Software concepts and algorithms for an efficient and scalable parallel finite element method / Thomas Witkowski. Gutachter: Axel Voigt ; Wolfgang Nagel. Betreuer: Axel Voigt." Dresden : Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2014. http://d-nb.info/1068446455/34.
Full textMorrison, Adrian Franklin. "An Efficient Method for Computing Excited State Properties of Extended Molecular Aggregates Based on an Ab-Initio Exciton Model." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1509730158943602.
Full textKlein, Philip N. (Philip Nathan). "An efficient parallel algorithm for planarity." Thesis, Massachusetts Institute of Technology, 1986. http://hdl.handle.net/1721.1/34303.
Full textMICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING
Bibliography: leaves 56-57.
by Philip Nathan Klein.
M.S.
Jain, Ankita. "Efficient Parallel Algorithm for Overlaying Surface Meshes." Thesis, Georgia Institute of Technology, 2007. http://hdl.handle.net/1853/16315.
Full textLu, Xin. "An efficient parallel optimization algorithm for the token bucket control mechanism." Thesis, University of Ottawa (Canada), 2004. http://hdl.handle.net/10393/26704.
Full textRawald, Tobias. "Scalable and Efficient Analysis of Large High-Dimensional Data Sets in the Context of Recurrence Analysis." Doctoral thesis, Humboldt-Universität zu Berlin, 2018. http://dx.doi.org/10.18452/18797.
Full textRecurrence quantification analysis (RQA) is a method from nonlinear time series analysis. It relies on the identification of line structures within so-called recurrence matrices and comprises a set of scalar measures. Existing computing approaches to RQA are either not capable of processing recurrence matrices exceeding a certain size or suffer from long runtimes considering time series that contain hundreds of thousands of data points. This thesis introduces scalable recurrence analysis (SRA), which is an alternative computing approach that subdivides a recurrence matrix into multiple sub matrices. Each sub matrix is processed individually in a massively parallel manner by a single compute device. This is implemented exemplarily using the OpenCL framework. It is shown that this approach delivers considerable performance improvements in comparison to state-of-the-art RQA software by exploiting the computing capabilities of many-core hardware architectures, in particular graphics cards. The usage of OpenCL allows to execute identical SRA implementations on a variety of hardware platforms having different architectural properties. An extensive evaluation analyses the impact of applying concepts from database technology, such memory storage layouts, to the RQA processing pipeline. It is investigated how different realisations of these concepts affect the performance of the computations on different types of compute devices. Finally, an approach based on automatic performance tuning is introduced that automatically selects well-performing RQA implementations for a given analytical scenario on specific computing hardware. Among others, it is demonstrated that the customised auto-tuning approach allows to considerably increase the efficiency of the processing by adapting the implementation selection.
McLaughlin, Adam. "Mapping parallel graph algorithms to throughput-oriented architectures." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54374.
Full textDubreuil, Christophe. "A quad-tree algorithm for efficient polygon comparison, and its parallel implementation." Thesis, Heriot-Watt University, 1997. http://hdl.handle.net/10399/677.
Full textAhmed, Nova. "Parallel Algorithm for Memory Efficient Pairwise and Multiple Genome Alignment in Distributed Environment." Digital Archive @ GSU, 2004. http://digitalarchive.gsu.edu/cs_theses/2.
Full textZlateski, Aleksandar. "A design and implementation of an efficient, parallel watershed algorithm for affinity graphs." Thesis, Massachusetts Institute of Technology, 2011. http://hdl.handle.net/1721.1/66820.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (p. 43).
In this thesis, I designed and implemented an efficient, parallel, generalized watershed algorithm for hierarchical segmentation of affinity graphs. By introducing four variable parameters the algorithm enables us to use previous knowledge about the input graph in order to achieve better results. The algorithm is very suitable for hierarchical segmentintation of large scale 3D images of the brain tissue obtained by electron microscopy making it an essential tool for reconstructing the brain's neural-networks called connectomes. The algorithm was fully implemented in C++ and tested on a currently largest available affinity graph of size 90GB on which no existent watershed implementation could be applied.
by Aleksandar Zlateski.
M.Eng.
Çağrıcı, Gökhan Koltuksuz Ahmet. "An analysis of key generation efficiency of rsa cryptos ystem in distributed environments/." [s.l.]: [s.n.], 2005. http://library.iyte.edu.tr/tezler/master/bilgisayaryazilimi/T000406.pdf.
Full textKeywords: Cryptosystem, rivest-Shamir-Adleman, parallel computing, parallel algorithms, Random number. Includes bibliographical references (leaves. 68).
Yamamoto, Yusaku. "Efficient Parallel Implementation of a Weather Derivatives Pricing Algorithm based on the Fast Gauss Transform." IEEE, 2006. http://hdl.handle.net/2237/9478.
Full textJanzén, Johan. "Evaluation of Energy-Optimizing Scheduling Algorithms for Streaming Computations on Massively Parallel Multicore Architectures." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-111385.
Full textLambert, Thomas. "On the Effect of Replication of Input Files on the Efficiency and the Robustness of a Set of Computations." Thesis, Bordeaux, 2017. http://www.theses.fr/2017BORD0656/document.
Full textThe increasing importance of High Performance Computing (HPC) and Big Data applications creates new issues in parallel computing. One of them is communication, the data transferred from a processor to another. Such data movements have an impact on computational time, inducing delays and increase of energy consumption. If replication, of either tasks or files, generates communication, it is also an important tool to improve resiliency and parallelism. In this thesis, we focus on the impact of the replication of input files on the overall amount of communication. For this purpose, we concentrate on two practical problems. The first one is parallel matrix multiplication. In this problem, the goal is to induce as few replications as possible in order to decrease the amount of communication. The second problem is the scheduling of the “Map” phase in the MapReduce framework. In this case, replication is an input of the problem and this time the goal is to use it in the best possible way. In addition to the replication issue, this thesis also considers the comparison between static and dynamic approaches for scheduling. For consistency, static approaches compute schedules before starting the computation while dynamic approaches compute the schedules during the computation itself. In this thesis we design hybrid strategies in order to take advantage of the pros of both. First, we relate communication-avoiding matrix multiplication with a square partitioning problem, where load-balancing is given as an input. In this problem, the goal is to split a square into zones (whose areas depend on the relative speed of resources) while minimizing the sum of their half-perimeters. We improve the existing results in the literature for this problem with two additional approximation algorithms. In addition we also propose an alternative model using a cube partitioning problem. We prove the NP-completeness of the associated decision problem and we design two approximations algorithms. Finally, we implement the algorithms for both problems in order to provide a comparison of the schedules for matrix multiplication. For this purpose, we rely on the StarPU library. Second, in the Map phase of MapReduce scheduling case, the input files are replicated and distributed among the processors. For this problem we propose two metrics. In the first one, we forbid non-local tasks (a task that is processed on a processor that does not own its input files) and under this constraint, we aim at minimizing the makespan. In the second problem, we allow non-local tasks and we aim at minimizing them while minimizing makespan. For the theoretical study, we focus on tasks with homogeneous computation times. First, we relate a greedy algorithm on the makespan metric with a “ball-into-bins” process, proving that this algorithm produces solutions with expected overhead (the difference between the number of tasks on the most loaded processor and the number of tasks in a perfect distribution) equal to O(mlogm) where m denotes the number of processors. Second, we relate this scheduling problem (with forbidden non-local tasks) to a problem of graph orientation and therefore prove, with the results from the literature, that there exists, with high probability, a near-perfect assignment (whose overhead is at most 1). In addition, there are polynomial-time optimal algorithms. For the communication metric case, we provide new algorithms based on a graph model close to matching problems in bipartite graphs. We prove that these algorithms are optimal for both communication and makespan metrics. Finally, we provide simulations based on traces from a MapReduce cluster to test our strategies with realistic settings and prove that the algorithms we propose perform very well in the case of low or medium variance of the computation times of the different tasks of a job
Coutinho, Demetrios Ara?jo Magalh?es. "Implementa??o paralela escal?vel e eficiente do algoritmo simplex padr?o em arquitetura multicore." Universidade Federal do Rio Grande do Norte, 2014. http://repositorio.ufrn.br:8080/jspui/handle/123456789/15502.
Full textCoordena??o de Aperfei?oamento de Pessoal de N?vel Superior
This work presents a scalable and efficient parallel implementation of the Standard Simplex algorithm in the multicore architecture to solve large scale linear programming problems. We present a general scheme explaining how each step of the standard Simplex algorithm was parallelized, indicating some important points of the parallel implementation. Performance analysis were conducted by comparing the sequential time using the Simplex tableau and the Simplex of the CPLEXR IBM. The experiments were executed on a shared memory machine with 24 cores. The scalability analysis was performed with problems of different dimensions, finding evidence that our parallel standard Simplex algorithm has a better parallel efficiency for problems with more variables than constraints. In comparison with CPLEXR , the proposed parallel algorithm achieved a efficiency of up to 16 times better
Este trabalho apresenta uma implementa??o paralela escal?vel e eficiente do algoritmo Simplex padr?o em arquitetura de processadores multicore para resolver problemas de programa??o linear de grande escala. Apresenta-se um esquema geral explicando como foi paralelizado cada passo do algoritmo simplex padr?o, apontando pontos importantes da implementa??o paralela. Foram realizadas an?lises de desempenho atrav?s da compara??o dos tempos sequenciais utilizando o Simplex tableau e Simplex do CPLEXR da IBM. Os experimentos foram realizados em uma m?quina de mem?ria compartilhada com 24 n?cleos. A an?lise de escalabilidade foi feita com problemas de diferentes dimens?es, encontrando evid?ncias de que a implementa??o paralela proposta do algoritmo simplex padr?o tem melhor efici?ncia paralela para problemas com mais vari?veis do que restri??es. Na compara??o com CPLEXR , o algoritmo proposto paralelo obteve uma efici?ncia de at? 16 vezes maior
Colombet, Laurent. "Parallélisation d'applications pour des réseaux de processeurs homogènes ou hétérogènes." Grenoble INPG, 1994. http://tel.archives-ouvertes.fr/tel-00005084.
Full textThe aim of this thesis is to study and develop efficient methods for parallelization of scientific applications on parallel computers with distributed memory. In the first part we present two libraries of PVM((\it Parallel Virtual Machine)) and MPI ((\it Message Passing Interface)) communication tools. They allow implementation of programs on most parallel machines, but also on heterogeneous computer networks. This chapter illustrates the problems faced when trying to evaluate performances of networks with heterogeneous processors. To evaluate such performances we modified and adapted the concepts of speed-up and efficiency to account for heterogeneity. The second part deals with a study of parallel application libraries such as ScaLAPACK and with the development of communication masking techniques. The general concept is based on communication anticipation, in particular by pipelining message sending operations. Experimental results on Cray T3D and IBM SP1 machines validates the theoretical studies performed on basic algorithms of the libraries discussed above
Li, Lei. "Fast Algorithms for Mining Co-evolving Time Series." Research Showcase @ CMU, 2011. http://repository.cmu.edu/dissertations/112.
Full textLee, Young-Jun. "Routing and Efficient Evaluation Techniques for Multi-hop Mobile Wireless Networks." Diss., Georgia Institute of Technology, 2005. http://hdl.handle.net/1853/7455.
Full textVu, Chinh Trung. "Distributed Energy-Efficient Solutions for Area Coverage Problems in Wireless Sensor Networks." Digital Archive @ GSU, 2009. http://digitalarchive.gsu.edu/cs_diss/37.
Full textFassi, Imen. "XFOR (Multifor) : A new programming structure to ease the formulation of efficient loop optimizations." Thesis, Strasbourg, 2015. http://www.theses.fr/2015STRAD043/document.
Full textWe propose a new programming structure named XFOR (Multifor), dedicated to data-reuse aware programming. It allows to handle several for-loops simultaneously and map their respective iteration domains onto each other. Additionally, XFOR eases loop transformations application and composition. Experiments show that XFOR codes provides significant speed-ups when compared to the original code versions, but also to the Pluto optimized versions. We implemented the XFOR structure through the development of three software tools: (1) a source-to-source compiler named IBB for Iterate-But-Better!, which automatically translates any C/C++ code containing XFOR-loops into an equivalent code where XFOR-loops have been translated into for-loops. IBB takes also benefit of optimizations implemented in the polyhedral code generator CLooG which is invoked by IBB to generate for-loops from an OpenScop specification; (2) an XFOR programming environment named XFOR-WIZARD that assists the programmer in re-writing a program with classical for-loops into an equivalent but more efficient program using XFOR-loops; (3) a tool named XFORGEN, which automatically generates XFOR-loops from any OpenScop representation of transformed loop nests automatically generated by an automatic optimizer
Kindap, Nihal. "On An Architecture For A Parallel Finite Field Multiplier With Low Complexity Based On Composite Fields." Master's thesis, METU, 2004. http://etd.lib.metu.edu.tr/upload/12605347/index.pdf.
Full textm (k 32) is investigated. The architecture has lower complexity when the Karatsuba-Ofman algorithm is applied for certain k. Using particular primitive polynomials for composite fields improves the complexities. We demonstrated for the values m = 2, 4, 8 in details. This thesis is based on the paper &ldquo
A New Architecture for a Parallel Finite Field Multiplier with Low Complexity Based on Composite Fields &rdquo
by Christof Paar. The whole purpose of this thesis is to understand and present a detailed description of the results of the paper of Paar.
Luo, Jia. "Algorithmes génétiques parallèles pour résoudre des problèmes d'ordonnancement de tâches dynamiques de manière efficace en prenant en compte l'énergie." Thesis, Toulouse 3, 2019. http://www.theses.fr/2019TOU30001.
Full textDue to new government legislation, customers' environmental concerns and continuously rising cost of energy, energy efficiency is becoming an essential parameter of industrial manufacturing processes in recent years. Most efforts considering energy issues in scheduling problems have focused on static scheduling. But in fact, scheduling problems are dynamic in the real world with uncertain new arrival jobs after the execution time. In this thesis, two energy efficient dynamic scheduling problems are studied. Model I analyzes the total tardiness and the makespan with a peak power limitation while considering the flexible flow shop with new arrival jobs. A periodic complete rescheduling approach is adopted to represent the optimization problem. Model II concerns an investigation into minimizing total tardiness and total energy consumption in the job shop with new urgent arrival jobs. An event driven schedule repair approach is utilized to deal with the updated schedule. As an adequate renewed scheduling plan needs to be obtained in a short response time in dynamic environment, two parallel Genetic Algorithms (GAs) are proposed to solve these two models respectively. The parallel GA I is a CUDA-based hybrid model consisting of an island GA at the upper level and a fine-grained GA at the lower level. It combines metrics of two hierarchical layers and takes full advantage of CUDA's compute capability. The parallel GA II is a dual heterogeneous design composed of a cellular GA and a pseudo GA. The islands with these two different structures increase the population diversity and can be well parallelized on GPUs simultaneously with multi-core CPU. Finally, numerical experiments are conducted and show that our approaches can not only solve the problems flexibly, but also gain competitive results and reduce time requirements
Poggi, Giovanni. "Implementazione CUDA su GPU di un algoritmo sort-based per la distribuzione efficiente di dati in simulazioni distribuite." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24988/.
Full text