Dissertations / Theses: 'Algoritmi paralleli'

1

Retico, Alessandro. "Algoritmi paralleli per l'allineamento di immagini." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/19101/.

Full text

Abstract:

In questa tesi si propone un algoritmo che, date una serie di immagini astronomiche raffiguranti uno stesso soggetto, cerca di ricavarne una rappresentazione migliore. Scattare foto astronomiche non è affatto facile: bisogna seguire vari accorgimenti e regolare opportunamente le impostazioni della propria fotocamera. In genere gli astrofili, che non sempre dispongono di sofisticatissime attrezzature per fotografare lo spazio, scattano una serie di foto o registrano un video (da cui estrarranno i fotogrammi) del soggetto che vogliono immortalare, attraverso la lente di un telescopio: in questa maniera riescono a catturare dettagli invisibili all'occhio umano, ma le immagini ottenute presentano parecchio rumore; per questo motivo le astrofotografie sono quasi sempre sottoposte ad una fase di elaborazione. Una semplice procedura per ridurre il rumore consiste nel produrre una nuova immagine i cui pixel abbiano valore pari alla media dei corrispettivi pixel di tutte le immagini scattate. Dato che la tecnica di cattura precedentemente descritta è enormemente sensibile allo spostamento, prima di calcolare la media dei pixel, è necessario allineare le immagini, rimuovendo lo scostamento presente tra esse. Per effettuare questo procedimento (noto anche come image registration) esistono varie tecniche più o meno complicate: in questa tesi ne verranno presentate brevemente alcune, ma ci si limiterà ad implementarne una versione banale, basata su una tecnica di brute force. La tematica affrontata si presta molto bene alla programmazione parallela, perciò l'implementazione dell'algoritmo che verrà proposto sfrutterà le clausole OpenMP e le istruzioni SIMD di un Raspberry Pi 2 B; è stata scelta tale architettura date le notevoli capacità di parallelizzazione che è in grado di offrire nonostante i ridotti costi, le contenute dimensioni e la minima energia richiesta.

APA, Harvard, Vancouver, ISO, and other styles

2

Tentarelli, Edoardo. "Architetture serverless per algoritmi massicciamente paralleli in ambito Industria 4.0." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20286/.

Full text

Abstract:

L’edge computing permette di distribuire l’elaborazione di servizi direttamente sulle macchine di produzione, evitando di inviare la richiesta verso data center esterni all’organizzazione, con vantaggi evidenti in termini di latenza e sicurezza. Questo modello di esecuzione, molto diffuso in industria manufatturiera, sta portando ad una migrazione dei servizi verso ambienti edge, ma la quantità limitata di risorse rende difficile il deployment di servizi computazionalmente onerosi verso questo modello. Ultimamente, sono state rilasciate sul mercato piattaforme che garantiscono la completa gestione dell’ambiente di esecuzione, sollevando lo sviluppatore da qualsiasi pratica operazionale. Ogni allocazione di risorse è ottimizzata, trasparentemente, dalla piattaforma, garantendo elevati gradi di disponibilità e tolleranza dei servizi. Questo modello di esecuzione viene definito serverless, e molte organizzazioni stanno migrando i propri servizi verso queste soluzioni. L’obiettivo di questo studio è stato quello di valutare le prestazioni del serverless nell’elaborazione di funzioni di image processing in ambienti edge. In particolare, lo studio è stato effettuato su algoritmi massicciamente paralleli, per cui è stato possibile parallelizzare il carico in task indipendenti. Le sperimentazioni hanno confrontato una soluzione serverless, in cui parti di immagini sono state ruotate in parallelo, ed una soluzione sequenziale, in cui la rotazione è stata effettuata sull’intera immagine. I risultati ottenuti mostrano evidenti benefici verso la soluzione serverless, in quanto offre parametri di scalabilità maggiori. Inoltre, i consumi di risorse sono decisamente più limitati, garantendo una soluzione più idonea ad ambienti edge e adatta al caso d’uso applicativo preso in esame. Per queste considerazioni, è consigliata la migrazione di servizi CPU intensive verso architetture serverless, per poter beneficiare dei risparmi e dei vantaggi offerti da questo tipo di soluzioni.

APA, Harvard, Vancouver, ISO, and other styles

3

Montanari, Sofia. "Algoritmi paralleli per la trasformata di hough nell’individuazione di linee rette nelle immagini digitali." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2020.

Find full text

Abstract:

Nella mia tesi è presentato lo studio dell'algoritmo di Hough, utilizzato per l'identificazione di linee rette nelle immagini digitali. Nella tesi viene proposta l'implementazione della versione seriale dell'algoritmo di Hough; successivamente vengono analizzate diverse soluzioni che permettono la parallelizzazione del problema e ne vengono studiate le prestazioni.

APA, Harvard, Vancouver, ISO, and other styles

4

Barocci, Federico. "Valutazione delle prestazioni di processori a basso consumo energetico in applicazioni parallele." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/10506/.

Full text

Abstract:

In questo lavoro di tesi sono state impiegate le librerie grafiche OpenGL ES 2 per eseguire calcoli paralleli sulla GPU del Raspberry Pi. Sono stati affrontati e discussi concetti riguanrdati il calcolo parallelo, stream processing, GPGPU e le metriche di valutazione di algoritmi paralleli. Sono inoltre descritte le potenzialita e le limitazioni derivanti dall'impiego di OpenGL per implementare algoritmi paralleli. In particolare si e fatto riferimento all'algoritmo Seam Carving per il restringimento di immagini, realizzando e valutando una implementazione parallela di questo sul Raspberry Pi.

APA, Harvard, Vancouver, ISO, and other styles

5

Dogan, Can. "Implementazione e valutazione delle prestazioni di algoritmi per il calcolo di cammini minimi su grafi." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2018.

Find full text

Abstract:

Lo scopo di questa tesi è di implementare e analizzare algoritmi paralleli su grafi. Gli algoritmi in oggetto sono quelli di Bellman-Ford per i cammini minimi da singola sorgente (Single-Source Shortest Paths), e di Floyd Warshall per i cammini minimi tra tutte le coppie di nodi (All-Pairs Shortest Paths). Analizzeremo a fondo entrambi gli algoritmi e vedremo come variano i tempi di esecuzione rispetto alle versioni seriali, e ne valuteremo la scalabilità. Per avere un’idea concreta di come e dove vengono utilizzati, vedremo alcune delle principali applicazioni che fanno uso di questi algoritmi. I capitoli saranno suddivisi nel seguente modo: la prima parte farà da introduzione e tratterà l’argomento sui grafi e i cammini, nella seconda parte si illustreranno gli algoritmi in dettaglio, nella terza parte si parlerà della piattaforma di parallelizzazione e delle implementazioni parallele degli algoritmi, infine nell’ultima parte verranno analizzate le prestazioni delle implementazioni proposte.

APA, Harvard, Vancouver, ISO, and other styles

6

GALVAN, Stefano. "Perception-motivated parallel algorithms for haptics." Doctoral thesis, Università degli Studi di Verona, 2010. http://hdl.handle.net/11562/343948.

Full text

Abstract:

Negli ultimi anni l’utilizzo di dispositivi aptici, atti cioè a riprodurre l’interazione fisica con l’ambiente remoto o virtuale, si sta diffondendo in vari ambiti della robotica e dell’informatica, dai videogiochi alla chirurgia robotizzata eseguita in teleoperazione, dai cellulari alla riabilitazione. In questo lavoro di tesi abbiamo voluto considerare nuovi punti di vista sull’argomento, allo scopo di comprendere meglio come riportare l’essere umano, che è l’unico fruitore del ritorno di forza, tattile e di telepresenza, al centro della ricerca sui dispositivi aptici. Allo scopo ci siamo focalizzati su due aspetti: una manipolazione del segnale di forza mutuata dalla percezione umana e l’utilizzo di architetture multicore per l’implementazione di algoritmi aptici e robotici. Con l’aiuto di un setup sperimentale creato ad hoc e attraverso l’utilizzo di un joystick con ritorno di forza a 6 gradi di libertà, abbiamo progettato degli esperimenti psicofisici atti all’identificazione di soglie differenziali di forze/coppie nel sistema mano-braccio. Sulla base dei risultati ottenuti abbiamo determinato una serie di funzioni di scalatura del segnale di forza, una per ogni grado di libertà, che permettono di aumentare l’abilità umana nel discriminare stimoli differenti. L’utilizzo di tali funzioni, ad esempio in teleoperazione, richiede la possibilità di variare il segnale di feedback e il controllo del dispositivo sia in relazione al lavoro da svolgere, sia alle peculiari capacità dell’utilizzatore. La gestione del dispositivo deve quindi essere in grado di soddisfare due obbiettivi tendenzialmente in contrasto, e cioè il raggiungimento di alte prestazioni in termini di velocità, stabilità e precisione, abbinato alla flessibilità tipica del software. Una soluzione consiste nell’affidare il controllo del dispositivo ai nuovi sistemi multicore che si stanno sempre più prepotentemente affacciando sul panorama informatico. Per far ciò una serie di algoritmi consolidati deve essere portata su sistemi paralleli. In questo lavoro abbiamo dimostrato che è possibile convertire facilmente vecchi algoritmi già implementati in hardware, e quindi intrinsecamente paralleli. Un punto da definire rimane però quanto costa portare degli algoritmi solitamente descritti in VLSI e schemi in un linguaggio di programmazione ad alto livello. Focalizzando la nostra attenzione su un problema specifico, la pseudoinversione di matrici che è presente in molti algoritmi di dinamica e cinematica, abbiamo mostrato che un’attenta progettazione e decomposizione del problema permette una mappatura diretta sulle unità di calcolo disponibili. In aggiunta, l’uso di parallelismo a livello di dati su macchine SIMD permette di ottenere buone prestazioni utilizzando semplici operazioni vettoriali come addizioni e shift. Dato che di solito tali istruzioni fanno parte delle implementazioni hardware la migrazione del codice risulta agevole. Abbiamo testato il nostro approccio su una Sony PlayStation 3 equipaggiata con un processore IBM Cell Broadband Engine.
In the last years the use of haptic feedback has been used in several applications, from mobile phones to rehabilitation, from video games to robotic aided surgery. The haptic devices, that are the interfaces that create the stimulation and reproduce the physical interaction with virtual or remote environments, have been studied, analyzed and developed in many ways. Every innovation in the mechanics, electronics and technical design of the device it is valuable, however it is important to maintain the focus of the haptic interaction on the human being, who is the only user of force feedback. In this thesis we worked on two main topics that are relevant to this aim: a perception based force signal manipulation and the use of modern multicore architectures for the implementation of the haptic controller. With the help of a specific experimental setup and using a 6 dof haptic device we designed a psychophysical experiment aimed at identifying of the force/torque differential thresholds applied to the hand-arm system. On the basis of the results obtained we determined a set of task dependent scaling functions, one for each degree of freedom of the three-dimensional space, that can be used to enhance the human abilities in discriminating different stimuli. The perception based manipulation of the force feedback requires a fast, stable and configurable controller of the haptic interface. Thus a solution is to use new available multicore architectures for the implementation of the controller, but many consolidated algorithms have to be ported to these parallel systems. Focusing on specific problem, i.e. the matrix pseudoinversion, that is part of the robotics dynamic and kinematic computation, we showed that it is possible to migrate code that was already implemented in hardware, and in particular old algorithms that were inherently parallel and thus not competitive on sequential processors. The main question that still lies open is how much effort is required in order to write these algorithms, usually described in VLSI or schematics, in a modern programming language. We show that a careful task decomposition and design permit a mapping of the code on the available cores. In addition, the use of data parallelism on SIMD machines can give good performance when simple vector instructions such as add and shift operations are used. Since these instructions are present also in hardware implementations the migration can be easily performed. We tested our approach on a Sony PlayStation 3 game console equipped with IBM Cell Broadband Engine processor.

APA, Harvard, Vancouver, ISO, and other styles

7

Kang, Seunghwa. "On the design of architecture-aware algorithms for emerging applications." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/39503.

Full text

Abstract:

This dissertation maps various kernels and applications to a spectrum of programming models and architectures and also presents architecture-aware algorithms for different systems. The kernels and applications discussed in this dissertation have widely varying computational characteristics. For example, we consider both dense numerical computations and sparse graph algorithms. This dissertation also covers emerging applications from image processing, complex network analysis, and computational biology. We map these problems to diverse multicore processors and manycore accelerators. We also use new programming models (such as Transactional Memory, MapReduce, and Intel TBB) to address the performance and productivity challenges in the problems. Our experiences highlight the importance of mapping applications to appropriate programming models and architectures. We also find several limitations of current system software and architectures and directions to improve those. The discussion focuses on system software and architectural support for nested irregular parallelism, Transactional Memory, and hybrid data transfer mechanisms. We believe that the complexity of parallel programming can be significantly reduced via collaborative efforts among researchers and practitioners from different domains. This dissertation participates in the efforts by providing benchmarks and suggestions to improve system software and architectures.

APA, Harvard, Vancouver, ISO, and other styles

8

Thomin, Philippe. "Algorithmes parallèles pour la synthèse d'image par radiosité sur calculateur à mémoire distribuée." Valenciennes, 1993. https://ged.uphf.fr/nuxeo/site/esupversions/a9efdd76-820d-4008-ab0e-23f35d428cdf.

Full text

Abstract:

Pour répondre à la demande d'utilisateurs de plus en plus nombreux, l'image de synthèse doit concilier deux aspects souvent contradictoires : le réalisme et l'interactivité. En termes de réalisme, les algorithmes de radiosité permettent, à l'heure actuelle, d'obtenir des résultats spectaculaires. Toutefois, en dépit d'optimisations algorithmiques drastiques, la production d'une image est encore au mieux une affaire de minutes et les ressources mémoire mobilisées sont considérables. Les limites actuelles doivent donc être franchies en augmentant la puissance des moyens matériels utilisés. Le travail présenté dans ce mémoire porte sur l'étude des algorithmes parallèles de radiosité sur calculateurs à mémoire distribuée. Les solutions proposées privilégient l'utilisation optimale des ressources mémoire réparties, ce qui leur permet de traiter des scènes très complexes. La réduction des coûts de communication et le travail en synchronisme des différents processeurs permettent alors de concilier les comportements temporels et dimensionnels des algorithmes définis. Une maquette, réalisée sur un réseau de transputers, a permis de valider cette approche et de préciser ses limites d'utilisation. Deux directions peuvent alors être explorées, l'une concernant l'amélioration du comportement temporel, l'autre visant à étendre les algorithmes proposés au traitement des surfaces spéculaires.

APA, Harvard, Vancouver, ISO, and other styles

9

Brown, Naïma. "Vérification et mise en œuvre distribuée des programmes unity." Nancy 1, 1994. http://www.theses.fr/1994NAN10384.

Full text

Abstract:

L'affinement successif de programmes parallèles à partir de spécifications formelles permet de s'assurer de la correction du programme obtenu par rapport a la specification initiale, puisque la transformation est fondée sur la préservation de propriétés telles que celles d'invariance et celles de fatalité. Cependant, les programmes obtenus sont écrits dans une notation abstraite éloignée des langages de programmation parallèles classiques et il convient de développer des techniques de transformation de programmes (ensemble d'actions) en des programmes écrits dans des langages de programmation parallèles (linda, occam). Bien entendu, cela conduit à analyser les techniques possibles de parallélisation de systèmes d'actions ou de distribution de systèmes d'actions. Notre travail de thèse s'appuie sur le formalisme unity de Chandy et Misra pour lequel nous avons construit un outil d'aide à la preuve dans l'outil b et nous avons examiné plusieurs types de transformations de programmes unity en d'autres langages ainsi que des stratégies de répartition. Nous proposons un langage intermédiaire que nous dénommons UCL (unity communication language) qui facilite l'implantation d'unity sur une architecture parallèle, et qui assure la correction formelle de cette implantation vis-à-vis de la spécification initiale. Le langage UCL est ensuite utilisé comme nouveau point de départ de la transformation et permet de produire soit un programme Clinda, soit un programme Occam. Cette approche favorise la réutilisation de cette première étape de transformation avant de cibler une architecture particulière

APA, Harvard, Vancouver, ISO, and other styles

10

Halverson, Ranette Hudson. "Efficient Linked List Ranking Algorithms and Parentheses Matching as a New Strategy for Parallel Algorithm Design." Thesis, University of North Texas, 1993. https://digital.library.unt.edu/ark:/67531/metadc278153/.

Full text

Abstract:

The goal of a parallel algorithm is to solve a single problem using multiple processors working together and to do so in an efficient manner. In this regard, there is a need to categorize strategies in order to solve broad classes of problems with similar structures and requirements. In this dissertation, two parallel algorithm design strategies are considered: linked list ranking and parentheses matching.

APA, Harvard, Vancouver, ISO, and other styles

11

Yousif, Hilal M. "Parallel algorithms for MIMD parallel computers." Thesis, Loughborough University, 1986. https://dspace.lboro.ac.uk/2134/15113.

Full text

Abstract:

This thesis mainly covers the design and analysis of asynchronous parallel algorithms that can be run on MIMD (Multiple Instruction Multiple Data) parallel computers, in particular the NEPTUNE system at Loughborough University. Initially the fundamentals of parallel computer architectures are introduced with different parallel architectures being described and compared. The principles of parallel programming and the design of parallel algorithms are also outlined. Also the main characteristics of the 4 processor MIMD NEPTUNE system are presented, and performance indicators, i.e. the speed-up and the efficiency factors are defined for the measurement of parallelism in a given system. Both numerical and non-numerical algorithms are covered in the thesis. In the numerical solution of partial differential equations, a new parallel 9-point block iterative method is developed. Here, the organization of the blocks is done in such a way that each process contains its own group of 9 points on the network, therefore, they can be run in parallel. The parallel implementation of both 9-point and 4- point block iterative methods were programmed using natural and redblack ordering with synchronous and asynchronous approaches. The results obtained for these different implementations were compared and analysed. Next the parallel version of the A.G.E. (Alternating Group Explicit) method is developed in which the explicit nature of the difference equation is revealed and exploited when applied to derive the solution of both linear and non-linear 2-point boundary value problems. Two strategies have been used in the implementation of the parallel A.G.E. method using the synchronous and asynchronous approaches. The results from these implementations were compared. Also for comparison reasons the results obtained from the parallel A.G.E. were compared with the ~ corresponding results obtained from the parallel versions of the Jacobi, Gauss-Seidel and S.O.R. methods. Finally, a computational complexity analysis of the parallel A.G.E. algorithms is included. In the area of non-numeric algorithms, the problems of sorting and searching were studied. The sorting methods which were investigated was the shell and the digit sort methods. with each method different parallel strategies and approaches were used and compared to find the best results which can be obtained on the parallel machine. In the searching methods, the sequential search algorithm in an unordered table and the binary search algorithms were investigated and implemented in parallel with a presentation of the results. Finally, a complexity analysis of these methods is presented. The thesis concludes with a chapter summarizing the main results.

APA, Harvard, Vancouver, ISO, and other styles

12

Rozoy, Brigitte. "Un modele de parallelisme : le monoide distribue." Caen, 1987. http://www.theses.fr/1987CAEN2039.

Full text

Abstract:

La mise en evidence d'equivalences des modeles mathematiques du parallelisme asynchrone nous amene a creer un nouveau modele dit monoide distribue qui constitue les differents modeles existants. Les problemes de cette etude sont la reconnaissabilite dans ce monoide et la terminaison distribuee dans les reseaux repartis asynchrones

APA, Harvard, Vancouver, ISO, and other styles

13

Vladimir, Lončar. "Hybrid parallel algorithms for solving nonlinear Schrödinger equation." Phd thesis, Univerzitet u Novom Sadu, Prirodno-matematički fakultet u Novom Sadu, 2017. https://www.cris.uns.ac.rs/record.jsf?recordId=104931&source=NDLTD&language=en.

Full text

Abstract:

Numerical methods and algorithms for solving of partial differential equations, especially parallel algorithms, are an important research topic, given the very broad applicability range in all areas of science. Rapid advances of computer technology open up new possibilities for development of faster algorithms and numerical simulations of higher resolution. This is achieved through paralleliza-tion at different levels that practically all current computers support.In this thesis we develop parallel algorithms for solving one kind of partial differential equations known as nonlinear Schrödinger equation (NLSE) with a convolution integral kernel. Equations of this type arise in many fields of physics such as nonlinear optics, plasma physics and physics of ultracold atoms, as well as economics and quantitative finance. We focus on a special type of NLSE, the dipolar Gross-Pitaevskii equation (GPE), which characterizes the behavior of ultracold atoms in the state of Bose-Einstein condensation.We present novel parallel algorithms for numerically solving GPE for a wide range of modern parallel computing platforms, from shared memory systems and dedicated hardware accelerators in the form of graphics processing units (GPUs), to heterogeneous computer clusters. For shared memory systems, we provide an algorithm and implementation targeting multi-core processors us-ing OpenMP. We also extend the algorithm to GPUs using CUDA toolkit and combine the OpenMP and CUDA approaches into a hybrid, heterogeneous al-gorithm that is capable of utilizing all available resources on a single computer. Given the inherent memory limitation a single computer has, we develop a distributed memory algorithm based on Message Passing Interface (MPI) and previous shared memory approaches. To maximize the performance of hybrid implementations, we optimize the parameters governing the distribution of data and workload using a genetic algorithm. Visualization of the increased volume of output data, enabled by the efficiency of newly developed algorithms, represents a challenge in itself. To address this, we integrate the implementations with the state-of-the-art visualization tool (VisIt), and use it to study two use-cases which demonstrate how the developed programs can be applied to simulate real-world systems.
Numerički metodi i algoritmi za rešavanje parcijalnih diferencijalnih jednačina, naročito paralelni algoritmi, predstavljaju izuzetno značajnu oblast istraživanja, uzimajući u obzir veoma široku primenljivost u svim oblastima nauke. Veliki napredak informacione tehnologije otvara nove mogućnosti za razvoj bržih al-goritama i numeričkih simulacija visoke rezolucije. Ovo se ostvaruje kroz para-lelizaciju na različitim nivoima koju poseduju praktično svi moderni računari. U ovoj tezi razvijeni su paralelni algoritmi za rešavanje jedne vrste parcijalnih diferencijalnih jednačina poznate kao nelinearna Šredingerova jednačina sa inte-gralnim konvolucionim kernelom. Jednačine ovog tipa se javljaju u raznim oblas-tima fizike poput nelinearne optike, fizike plazme i fizike ultrahladnih atoma, kao i u ekonomiji i kvantitativnim finansijama. Teza se bavi posebnim oblikom nelinearne Šredingerove jednačine, Gros-Pitaevski jednačinom sa dipol-dipol in-terakcionim članom, koja karakteriše ponašanje ultrahladnih atoma u stanju Boze-Ajnštajn kondenzacije.U tezi su predstavljeni novi paralelni algoritmi za numeričko rešavanje Gros-Pitaevski jednačine za širok spektar modernih računarskih platformi, od sis-tema sa deljenom memorijom i specijalizovanih hardverskih akceleratora u ob-liku grafičkih procesora, do heterogenih računarskih klastera. Za sisteme sa deljenom memorijom, razvijen je algoritam i implementacija namenjena više-jezgarnim centralnim procesorima korišćenjem OpenMP tehnologije. Ovaj al-goritam je proširen tako da radi i u okruženju grafičkih procesora korišćenjem CUDA alata, a takođe je razvijen i predstavljen hibridni, heterogeni algoritam koji kombinuje OpenMP i CUDA pristupe i koji je u stanju da iskoristi sve raspoložive resurse jednog računara.Imajući u vidu inherentna ograničenja raspoložive memorije koju pojedinačan računar poseduje, razvijen je i algoritam za sisteme sa distribuiranom memorijom zasnovan na Message Passing Interface tehnologiji i prethodnim algoritmima za sisteme sa deljenom memorijom. Da bi se maksimalizovale performanse razvijenih hibridnih implementacija, parametri koji određuju raspodelu podataka i računskog opterećenja su optimizovani korišćenjem genetskog algoritma. Poseban izazov je vizualizacija povećane količine izlaznih podataka, koji nastaju kao rezultat efikasnosti novorazvijenih algoritama. Ovo je u tezi rešeno kroz inte-graciju implementacija sa najsavremenijim alatom za vizualizaciju (VisIt), što je omogućilo proučavanje dva primera koji pokazuju kako razvijeni programi mogu da se iskoriste za simulacije realnih sistema.

APA, Harvard, Vancouver, ISO, and other styles

14

Suleman, Hussein. "Genetic Programming in Mathematica." Thesis, University of Durban-Westville, 1997. http://hdl.handle.net/10919/71533.

Full text

Abstract:

GP has traditionally been implemented in LISP but there is a slow migration towards faster languages like C++. Any implementation language is dictated not only by the speed of the platform but also by the desirability of such an implementation. With a large number of scientists migrating to scientifically-biased programming languages like Mathematica, such provides an ideal testbed for GP.In this study it was attempted to implement GP on a Mathematica platform, exploiting the advantages of Mathematica's unique capabilities. Wherever possible, optimizations have been applied to drive the GP algorithm towards realistic goals. At an early stage it was noted that the standard GP algorithm could be significantly speeded up by parallelisation and the distribution of processing. This was incorporated into the algorithm, using known techniques and Mathematica-specific knowledge.

APA, Harvard, Vancouver, ISO, and other styles

15

Elabed, Jamal. "Implementing parallel sorting algorithms." Virtual Press, 1989. http://liblink.bsu.edu/uhtbin/catkey/543997.

Full text

Abstract:

The Republic of Guinea is located on the west coast of Africa at about 11° North latitude. A large portion of Guinea's supply of protein is dried fish. The actual drying method operates under open air, the foodstuff being unprotected from unexpected rains, windborne dirt and dust, and from infestation by insects, rodents, and other animals. More, the deforestation rate is increasing year after year, depleting the source of fuel for drying. Practical ways of drying fish cheaply and sanitarily would be welcome.Recently, much work has been devoted to developing algorithms for parallel processors. Parallel algorithms have received a great deal of attention because of the advances in computer hardware technology. These parallel processors and algorithms have been used to improve computational speed, especially in the areas of sorting, evaluation of polynomials, arithmetic expressions, matrix and graphic problems.Sorting is an important operation in business and computer engineering applications. The literature contains many sorting algorithms, both sequential and parallel, which have been developed and used in practical applications. bubble sort, quick sort, insertion sort, enumeration sort, bucket and odd-even transposition sort. Ada, a new excellent programming language that offers high-level concurrent processing facilities called tasks, is used in this thesis to introduce, implement, compare and evaluate some of the parallel sorting algorithms. This thesis will also show that parallel sorting algorithms reduce the time requirement to perform the tasks.
Department of Computer Science

APA, Harvard, Vancouver, ISO, and other styles

16

Sultan, Ziad. "Algèbre linéaire exacte, parallèle, adaptative et générique." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM030/document.

Full text

Abstract:

Les décompositions en matrices triangulaires sont une brique de base fondamentale en calcul algébrique. Ils sont utilisés pour résoudre des systèmes linéaires et calculer le rang, le déterminant, l'espace nul ou les profiles de rang en ligne et en colonne d'une matrix. Le projet de cette thèse est de développer des implantations hautes performances parallèles de l'élimination de Gauss exact sur des machines à mémoire partagée.Dans le but d'abstraire le code de l'environnement de calcul parallèle utilisé, un langage dédié PALADIn (Parallel Algebraic Linear Algebra Dedicated Interface) a été implanté et est basé essentiellement sur des macros C/C++. Ce langage permet à l'utilisateur d'écrire un code C++ et tirer partie d’exécutions séquentielles et parallèles sur des architectures à mémoires partagées en utilisant le standard OpenMP et les environnements parallel KAAPI et TBB, ce qui lui permet de bénéficier d'un parallélisme de données et de taches.Plusieurs aspects de l'algèbre linéaire exacte parallèle ont été étudiés. Nous avons construit de façon incrémentale des noyaux parallèles efficaces pour les multiplication de matrice, la résolution de systèmes triangulaires au dessus duquel plusieurs variantes de l'algorithme de décomposition PLUQ sont construites. Nous étudions la parallélisation de ces noyaux en utilisant plusieurs variantes algorithmiques itératives ou récursives et en utilisant des stratégies de découpes variées.Nous proposons un nouvel algorithme récursive de l'élimination de Gauss qui peut calculer simultanément les profiles de rang en ligne et en colonne d'une matrice et de toutes ses sous-matrices principales, tout en étant un algorithme état de l'art de l'élimination de Gauss. Nous étudions aussi les conditions pour qu'un algorithme de l'élimination de Gauss révèle cette information en définissant un nouvel invariant matriciel, la matrice de profil de rang
Triangular matrix decompositions are fundamental building blocks in computational linear algebra. They are used to solve linear systems, compute the rank, the determinant, the null-space or the row and column rank profiles of a matrix. The project of my PhD thesis is to develop high performance shared memory parallel implementations of exact Gaussian elimination.In order to abstract the computational code from the parallel programming environment, we developed a domain specific language, PALADIn: Parallel Algebraic Linear Algebra Dedicated Interface, that is based on C/C + + macros. This domain specific language allows the user to write C + + code and benefit from sequential and parallel executions on shared memory architectures using the standard OpenMP, TBB and Kaapi parallel runtime systems and thus providing data and task parallelism.Several aspects of parallel exact linear algebra were studied. We incrementally build efficient parallel kernels, for matrix multiplication, triangular system solving, on top of which several variants of PLUQ decomposition algorithm are built. We study the parallelization of these kernels using several algorithmic variants: either iterative or recursive and using different splitting strategies.We propose a recursive Gaussian elimination that can compute simultaneously therow and column rank profiles of a matrix as well as those of all of its leading submatrices, in the same time as state of the art Gaussian elimination algorithms. We also study the conditions making a Gaussian elimination algorithm reveal this information by defining a new matrix invariant, the rank profile matrix

APA, Harvard, Vancouver, ISO, and other styles

17

Hedenström, Felix. "Trial Division : Improvements and Implementations." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-211090.

Full text

Abstract:

Trial division is possibly the simplest algorithm for factoring numbers.The problem with Trial division is that it is slow and wastes computationaltime on unnecessary tests of division. How can this simple algorithms besped up while still being serial? How does this algorithm behave whenparallelized? Can a superior serial and a parallel version be combined intoan even more powerful algorithm?To answer these questions the basics of trial divisions where researchedand improvements where suggested. These improvements where later im-plemented and tested by measuring the time it took to factorize a givennumber.A version using a list of primes and multiple threads turned out to bethe fastest for numbers larger than 10 10 , but was beaten when factoringlower numbers by its serial counterpart. A problem was detected thatcaused the parallel versions to have long allocation times which slowedthem down, but this did not hinder them much.
Trial division är en av de enklaste algoritmerna när det kommer till attfaktorisera tal. Problemet med trial division är att det är relativt långsamtoch att det gör onödiga beräkningar. Hur kan man göra denna algoritmsnabbare utan att inte göra den seriell? Hur beter sig algoritmen när denär parallelliserad? Kan en förbättrad seriell sedan bli parallelliserad?För att besvara dessa frågor studerades trial division och dess möjligaförbättringar. Dessa olika förbättringar implementerades i form av flerafunktioner som sedan och testades mot varandra.Den snabbaste versionen byggde på att använda en lista utav primtaloch trådar för att minimera antal ’trials’ samt att dela upp arbetet. Denvar dock inte alltid snabbast, då den seriella versionen som också användeen lista av primtal var snabbare för siffror under 10 10 . Sent upptäck-tes ett re-allokeringsproblem med de parallella implementationerna, meneftersom de ändå var snabbare fixades inte detta problemet.

APA, Harvard, Vancouver, ISO, and other styles

18

Ghanemi, Salim. "Non-numerical parallel algorithms for asynchronous parallel computer systems." Thesis, Loughborough University, 1987. https://dspace.lboro.ac.uk/2134/28016.

Full text

Abstract:

The work in this thesis covers mainly the design and analysis of many important Non-Numerical Parallel Algorithms that run on MIMD type Parallel Computer Systems (PCSs), in particular the NEPTUNE and the SEQUENT BALANCE 8000 PCSs available at Loughborough University of Technology.

APA, Harvard, Vancouver, ISO, and other styles

19

Su, (Philip) Shin-Chen. "Parallel subdomain method for massively parallel computers." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/17376.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Stadtherr, Hans. "Work efficient parallel scheduling algorithms." [S.l. : s.n.], 1998. http://deposit.ddb.de/cgi-bin/dokserv?idn=962681369.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Kalaiselvi, S. "Checkpointing Algorithms for Parallel Computers." Thesis, Indian Institute of Science, 1997. http://hdl.handle.net/2005/67.

Full text

Abstract:

Checkpointing is a technique widely used in parallel/distributed computers for rollback error recovery. Checkpointing is defined as the coordinated saving of process state information at specified time instances. Checkpoints help in restoring the computation from the latest saved state, in case of failure. In addition to fault recovery, checkpointing has applications in fault detection, distributed debugging and process migration. Checkpointing in uniprocessor systems is easy due to the fact that there is a single clock and events occur with respect to this clock. There is a clear demarcation of events that happens before a checkpoint and events that happens after a checkpoint. In parallel computers a large number of computers coordinate to solve a single problem. Since there might be multiple streams of execution, checkpoints have to be introduced along all these streams simultaneously. Absence of a global clock necessitates explicit coordination to obtain a consistent global state. Events occurring in a distributed system, can be ordered partially using Lamport's happens before relation. Lamport's happens before relation ->is a partial ordering relation to identify dependent and concurrent events occurring in a distributed system. It is defined as follows: ·If two events a and b happen in the same process, and if a happens before b, then a->b ·If a is the sending event of a message and b is the receiving event of the same message then a -> b ·If neither a à b nor b -> a, then a and b are said to be concurrent. A consistent global state may have concurrent checkpoints. In the first chapter of the thesis we discuss issues regarding ordering of events in a parallel computer, need for coordination among checkpoints and other aspects related to checkpointing. Checkpointing locations can either be identified statically or dynamically. The static approach assumes that a representation of a program to be checkpointed is available with information that enables a programmer to specify the places where checkpoints are to be taken. The dynamic approach identifies the checkpointing locations at run time. In this thesis, we have proposed algorithms for both static and dynamic checkpointing. The main contributions of this thesis are as follows: 1. Parallel computers that are being built now have faster communication and hence more efficient clock synchronisation compared to those built a few years ago. Based on efficient clock synchronisation protocols, the clock drift in current machines can be maintained within a few microseconds. We have proposed a dynamic checkpointing algorithm for parallel computers assuming bounded clock drifts. 2. The shared memory paradigm is convenient for programming while message passing paradigm is easy to scale. Distributed Shared Memory (DSM) systems combine the advantage of both paradigms and can be visualized easily on top of a network of workstations. IEEE has recently proposed an interconnect standard called Scalable Coherent Interface (SCI), to con6gure computers as a Distributed Shared Memory system. A periodic dynamic checkpointing algorithm has been proposed in the thesis for a DSM system which uses the SCI standard. 3. When information about a parallel program is available one can make use of this knowledge to perform efficient checkpointing. A static checkpointing approach based on task graphs is proposed for parallel programs. The proposed task graph based static checkpointing approach has been implemented on a Parallel Virtual Machine (PVM) platform. We now give a gist of various chapters of the thesis. Chapter 2 of the thesis gives a classification of existing checkpointing algorithms. The chapter surveys algorithm that have been reported in literature for checkpointing parallel/distributed systems. A point to be noted is that most of the algorithms published for checkpointing message passing systems are based on the seminal article by Chandy & Lamport. A large number of checkpointing algorithms have been published by relaxing the assumptions made in the above mentioned article and by extending the features to minimise the overheads of coordination and context saving. Checkpointing for shared memory systems primarily extend cache coherence protocols to maintain a consistent memory. All of them assume that the main memory is safe for storing the context. Recently algorithms have been published for distributed shared memory systems, which extend the cache coherence protocols used in shared memory systems. They however also include methods for storing the status of distributed memory in stable storage. Chapter 2 concludes with brief comments on the desirable features of a checkpointing algorithm. In Chapter 3, we develop a dynamic checkpointing algorithm for message passing systems assuming that the clock drift of processors in the system is bounded. Efficient clock synchronisation protocols have been implemented on recent parallel computers owing to the fact that communication between processors is very fast. Based on efficient clock synchronisation protocols, clock skew can be limited to a few microseconds. The algorithm proposed in the thesis uses clocks for checkpoint coordination and vector counts for identifying messages to be logged. The algorithm is a periodic, distributed algorithm. We prove correctness of the algorithm and compare it with similar clock based algorithms. Distributed Shared Memory (DSM) systems provide the benefit of ease of programming in a scalable system. The recently proposed IEEE Scalable Coherent Interface (SCI) standard, facilitates the construction of scalable coherent systems. In Chapter 4 we discuss a checkpointing algorithm for an SCI based DSM system. SCI maintains cache coherence in hardware using a distributed cache directory which scales with the number of processors in the system. SCI recommends a two phase transaction protocol for communication. Our algorithm is a two phase centralised coordinated algorithm. Phase one initiates checkpoints and the checkpointing activity is completed in phase two. The correctness of the algorithm is established theoretically. The chapter concludes with the discussion of the features of SCI exploited by the checkpointing algorithm proposed in the thesis. In Chapter 5, a static checkpointing algorithm is developed assuming that the program to be executed on a parallel computer is given as a directed acyclic task graph. We assume that the estimates of the time to execute each task in the task graph is given. Given the timing at which checkpoints are to be taken, the algorithm identifies a set of edges where checkpointing tasks can be placed ensuring that they form a consistent global checkpoint. The proposed algorithm eliminates coordination overhead at run time. It significantly reduces the context saving overhead by taking checkpoints along edges of the task graph. The algorithm is used as a preprocessing step before scheduling the tasks to processors. The algorithm complexity is O(km) where m is the number of edges in the graph and k the maximum number of global checkpoints to be taken. The static algorithm is implemented on a parallel computer with a PVM environment as it is widely available and portable. The task graph of a program can be constructed manually or through program development tools. Our implementation is a collection of preprocessing and run time routines. The preprocessing routines operate on the task graph information to generate a set of edges to be checkpointed for each global checkpoint and write the information on disk. The run time routines save the context along the marked edges. In case of recovery, the recovery algorithms read the information from stable storage and reconstruct the context. The limitation of our static checkpointing algorithm is that it can operate only on deterministic task graphs. To demonstrate the practical feasibility of the proposed approach, case studies of checkpointing some parallel programs are included in the thesis. We conclude the thesis with a summary of proposed algorithms and possible directions to continue research in the area of checkpointing.

APA, Harvard, Vancouver, ISO, and other styles

22

Mahawar, Hemant. "Parallel algorithms for inductance extraction." Texas A&M University, 2003. http://hdl.handle.net/1969.1/5930.

Full text

Abstract:

In VLSI circuits, signal delays play an important role in design, timing verification and signal integrity checks. These delays are attributed to the presence of parasitic resistance, capacitance and inductance. With increasing clock speed and reducing feature sizes, these delays will be dominated by parasitic inductance. In the next generation VLSI circuits, with more than millions of components and interconnect segments, fast and accurate inductance estimation becomes a crucial step. A generalized approach for inductance extraction requires the solution of a large, dense, complex linear system that models mutual inductive effects among circuit elements. Iterative methods are used to solve the system without explicit computation of the system matrix itself. Fast hierarchical techniques are used to compute approximate matrix-vector products with the dense system matrix in a matrix-free way. Due to unavailability of system matrix, constructing a preconditioner to accelerate the convergence of the iterative method becomes a challenging task. This work presents a class of parallel algorithms for fast and accurate inductance extraction of VLSI circuits. We use the solenoidal basis approach that converts the linear system into a reduced system. The reduced system of equations is solved by a preconditioned iterative solver that uses fast hierarchical methods to compute products with the dense coefficient matrix. A GreenÃÂ¢ÃÂÃÂs function based preconditioner is proposed that achieves near-optimal convergence rates in several cases. By formulating the preconditioner as a dense matrix similar to the coefficient matrix, we are able to use fast hierarchical methods for the preconditioning step as well. Experiments on a number of benchmark problems highlight the efficient preconditioning scheme and its advantages over FastHenry. To further reduce the solution time of the software, we have developed a parallel implementation. The parallel software package is capable of analyzing interconnects con- figurations involving several conductors within reasonable time. A two-tier parallelization scheme enables mixed mode parallelization, which uses both OpenMP and MPI directives. The parallel performance of the software is demonstrated through experiments on the IBM p690 and AMD Linux clusters. These experiments highlight the portability and efficiency of the software on multiprocessors with shared, distributed, and distributed-shared memory architectures.

APA, Harvard, Vancouver, ISO, and other styles

23

Austad, Haakon Michael. "Parallel Multiple Proposal MCMC Algorithms." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for matematiske fag, 2007. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-12857.

Full text

Abstract:

We explore the variance reduction achievable through parallel implementation of multi-proposal MCMC algorithms and use of control variates. Implemented sequentially multi-proposal MCMC algorithms are of limited value, but they are very well suited for parallelization. Further, discarding the rejected states in an MCMC sampler can intuitively be interpreted as a waste of information. This becomes even more true for a multi-proposal algorithm where we discard several states in each iteration. By creating an alternative estimator consisting of a linear combination of the traditional sample mean and zero mean random variables called control variates we can improve on the traditional estimator. We present a setting for the multi-proposal MCMC algorithm and study it in two examples. The first example considers sampling from a simple Gaussian distribution, while for the second we design the framework for a multi-proposal mode jumping algorithm for sampling from a distribution with several separated modes. We find that the variance reduction achieved from our control variate estimator in general increases as the number of proposals in our sampler increase. For our Gaussian example we find that the benefit from parallelization is small, and that little is gained from increasing the number of proposals. The mode jumping example however is very well suited for parallelization and we get a relative variance reduction pr time of roughly 80% with 16 proposals in each iteration.

APA, Harvard, Vancouver, ISO, and other styles

24

Antkowiak, Łukasz. "Parallel algorithms of timetable generation." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-6083.

Full text

Abstract:

Context: Most of the problem of generating timetable for a school belongs to the class of NP-hard problems. Complexity and practical value makes this kind of problem interesting for parallel computing. Objectives: This paper focuses on Class-Teacher problem with weighted time slots and proofs that it is NP-complete problem. Branch and bound scheme and two approaches to distribute the simulated annealing are proposed. Empirical evaluation of described methods is conducted in an elementary school computer laboratory. Methods: Simulated annealing implementation described in literature are adapted for the problem, and prepared for execution in distributed systems. Empirical evaluation is conducted with the real data from Polish elementary school. Results: Proposed branch and bound scheme scales nearly logarithmically with the number of nodes in computing cluster. Proposed parallel simulated annealing models tends to increase solution quality. Conclusions: Despite a significant increase in computing power, computer laboratories are still unprepared for heavy computation. Proposed branch and bound method is infeasible with the real instances. Parallel Moves approach tends to provide better solution at the beginning of execution, but the Multiple Independent Runs approach outruns it after some time.
Sammanhang: De flesta problem med att generera scheman för en skola tillhör klassen av NP-svårt problemen. Komplexitet och praktiskt värde gör att den här typen av problemen forskas med särskild uppmärksamhet på en parallell bearbetning. Syfte: Detta dokument fokusarar på Klass-Lärare problem med vikter för enskilda tidsluckor och på att visa var ett NP-svårt problem är fullständigt. Branch and bound scheman och två metoder för att distribuera en simulerad glödgning algoritm presenterades. En empirisk analys av beskrivna metoder gjordes i datorlaboratorium i en grundskola. Metod: Implementering av en simulerad glödgning algoritm som beskrivs i litteraturen blev anpassad till ett utvalt problem och distribuerade system. Empirisk utvärdering genomförs med verkliga data från polska grundskolan Resultat: Föreslagit Branch and bound system graderar nästan logaritmiskt antal noder i ett datorkluster. Den simulerade glödgning algoritmen som föreslagits förbättrar lösningarnas kvalitet. Slutsatser: Trots att en betydande ökning med beräkningskraft är inte datasalar i skolor anpassad till avancerade beräkningar. Användning av den Branch and Bound föreslagna metoden till praktiska problem är omöjlig i praktiken. En annan föreslagen metod Parallel Moves ger bättre resultat i början av utförandet men Multiple Independent Runs hittar bättre lösningar efter en viss tid.

APA, Harvard, Vancouver, ISO, and other styles

25

Hutchinson, David Alexander. "Parallel algorithms in external memory." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape7/PQDD_0002/NQ42793.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Trifunovic, Aleksandar. "Parallel algorithms for hypergraph partitioning." Thesis, Imperial College London, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.430537.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Abdallah, Ali Ezzat. "Transformational derivation of parallel algorithms." Thesis, University of Oxford, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.320624.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Tett, Simon F. B. "Parallel algorithms for atmospheric modelling." Thesis, University of Edinburgh, 1992. http://hdl.handle.net/1842/11452.

Full text

Abstract:

In this thesis, the usefulness of massively parallel computers of the MIMD class to satisfy atmospheric modellers' demands for increased computing power is examined. Algorithms to use these computers, for the dynamics, are developed for both grid-point and spectral methods. Scaling formulae for these algorithms are developed and the algorithms are implemented on the Edinburgh Concurrent Supercomputer (ECS). Another component of atmospheric models is parameterization; in which the effects of unresolved phenomenan on the mean-flow are modelled. Two parameterization schemes are implemented on the ECS and a study of the effects of load-balancing is made. Furthermore it is concluded that implementation of parameterization schemes on data-parallel computers is likely to be difficult, unlike MIMD machines where the implementation is straightforward.

APA, Harvard, Vancouver, ISO, and other styles

29

Roweth, Duncan. "Parallel algorithms for lattice QCD." Thesis, University of Edinburgh, 1987. http://hdl.handle.net/1842/12886.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

White, William Warren. "Mapping parallel algorithms into hypercubes /." The Ohio State University, 1989. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487676261010809.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Moon, Gordon Euhyun. "Parallel Algorithms for Machine Learning." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1561980674706558.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Hutchinson, David Alexander Carleton University Dissertation Computer Science. "Parallel algorithms in external memory." Ottawa, 1999.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

33

Severini, Lorenzo. "A parallel community detection algorithm." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2013. http://amslaurea.unibo.it/5948/.

Full text

Abstract:

Complex networks analysis is a very popular topic in computer science. Unfortunately this networks, extracted from different contexts, are usually very large and the analysis may be very complicated: computation of metrics on these structures could be very complex. Among all metrics we analyse the extraction of subnetworks called communities: they are groups of nodes that probably play the same role within the whole structure. Communities extraction is an interesting operation in many different fields (biology, economics,...). In this work we present a parallel community detection algorithm that can operate on networks with huge number of nodes and edges. After an introduction to graph theory and high performance computing, we will explain our design strategies and our implementation. Then, we will show some performance evaluation made on a distributed memory architectures i.e. the supercomputer IBM-BlueGene/Q "Fermi" at the CINECA supercomputing center, Italy, and we will comment our results.

APA, Harvard, Vancouver, ISO, and other styles

34

Bajcar, Martin. "Použití OpenCl v AVG na platformě Windows." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236431.

Full text

Abstract:

The main topic of this thesis is the practical use of OpenCL at AVG company. AVG is looking for ways to decrease hardware requirement of their security product and also to decrease computation time of some algorithms. Using OpenCL is one way to achieve this requirement. Significant part of this thesis deals with optimization strategies for AMD and NVIDIA graphics cards as they are most common cards among users. Practical part of the thesis describes parallelization of two algorithms, their analysis and implementation. After that, the obtained results are presented and cases in which the use of OpenCL is beneficial are identified. As a part of implementation, library containing various utility functions which can aid programmers to implement OpenCL based code was developed.

APA, Harvard, Vancouver, ISO, and other styles

35

Cledat, Romain. "Programming models for speculative and optimistic parallelism based on algorithmic properties." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/42749.

Full text

Abstract:

Today's hardware is becoming more and more parallel. While embarrassingly parallel codes, such as high-performance computing ones, can readily take advantage of this increased number of cores, most other types of code cannot easily scale using traditional data and/or task parallelism and cores are therefore left idling resulting in lost opportunities to improve performance. The opportunistic computing paradigm, on which this thesis rests, is the idea that computations should dynamically adapt to and exploit the opportunities that arise due to idling resources to enhance their performance or quality. In this thesis, I propose to utilize algorithmic properties to develop programming models that leverage this idea thereby providing models that increase and improve the parallelism that can be exploited. I exploit three distinct algorithmic properties: i) algorithmic diversity, ii) the semantic content of data-structures, and iii) the variable nature of results in certain applications. This thesis presents three main contributions: i) the N-way model which leverages algorithmic diversity to speed up hitherto sequential code, ii) an extension to the N-way model which opportunistically improves the quality of computations and iii) a framework allowing the programmer to specify the semantics of data-structures to improve the performance of optimistic parallelism.

APA, Harvard, Vancouver, ISO, and other styles

36

Bae, Sung Eun. "Sequential and Parallel Algorithms for the Generalized Maximum Subarray Problem." Thesis, University of Canterbury. Computer Science and Software Engineering, 2007. http://hdl.handle.net/10092/1202.

Full text

Abstract:

The maximum subarray problem (MSP) involves selection of a segment of consecutive array elements that has the largest possible sum over all other segments in a given array. The efficient algorithms for the MSP and related problems are expected to contribute to various applications in genomic sequence analysis, data mining or in computer vision etc. The MSP is a conceptually simple problem, and several linear time optimal algorithms for 1D version of the problem are already known. For 2D version, the currently known upper bounds are cubic or near-cubic time. For the wider applications, it would be interesting if multiple maximum subarrays are computed instead of just one, which motivates the work in the first half of the thesis. The generalized problem of K-maximum subarray involves finding K segments of the largest sum in sorted order. Two subcategories of the problem can be defined, which are K-overlapping maximum subarray problem (K-OMSP), and K-disjoint maximum subarray problem (K-DMSP). Studies on the K-OMSP have not been undertaken previously, hence the thesis explores various techniques to speed up the computation, and several new algorithms. The first algorithm for the 1D problem is of O(Kn) time, and increasingly efficient algorithms of O(K² + n logK) time, O((n+K) logK) time and O(n+K logmin(K, n)) time are presented. Considerations on extending these results to higher dimensions are made, which contributes to establishing O(n³) time for 2D version of the problem where K is bounded by a certain range. Ruzzo and Tompa studied the problem of all maximal scoring subsequences, whose definition is almost identical to that of the K-DMSP with a few subtle differences. Despite slight differences, their linear time algorithm is readily capable of computing the 1D K-DMSP, but it is not easily extended to higher dimensions. This observation motivates a new algorithm based on the tournament data structure, which is of O(n+K logmin(K, n)) worst-case time. The extended version of the new algorithm is capable of processing a 2D problem in O(n³ + min(K, n) · n² logmin(K, n)) time, that is O(n³) for K ≤ n/log n For the 2D MSP, the cubic time sequential computation is still expensive for practical purposes considering potential applications in computer vision and data mining. The second half of the thesis investigates a speed-up option through parallel computation. Previous parallel algorithms for the 2D MSP have huge demand for hardware resources, or their target parallel computation models are in the realm of pure theoretics. A nice compromise between speed and cost can be realized through utilizing a mesh topology. Two mesh algorithms for the 2D MSP with O(n) running time that require a network of size O(n²) are designed and analyzed, and various techniques are considered to maximize the practicality to their full potential.

APA, Harvard, Vancouver, ISO, and other styles

37

Pajak, Dominik. "Algorithms for Deterministic Parallel Graph Exploration." Phd thesis, Université Sciences et Technologies - Bordeaux I, 2014. http://tel.archives-ouvertes.fr/tel-01064992.

Full text

Abstract:

Nous étudions dans cette thèse le problème de l'exploration parallèle d'un graphe à l'aide des multiples, synchronisés et mobiles agents. Chaque agent est une entité individuelle qui peut, indépendamment des autres agents, visitez les sommets du graphe ou parcourir ses arêtes. Le but de ensemble des agents est de visiter tous les sommets de graphe. Nous étudions d'abord l'exploration du graphe dans un modèle où chaque agent est équipé de mémoire interne, mais les nœuds n'ont pas de mémoire. Dans ce modèle les agents sont autorisés à communiquer entre eux en échangeant des messages. Nous présentons des algorithmes qui s'exécutent dans un minimum de temps possible pour polynomiale nombre d'agents (polynomiale en nombre de sommets du graphe). Nous étudions aussi quelle est l'impacte de différent méthodes des communications. Nous étudions des algorithmes où les agents peuvent se communiquer à distance arbitraire, mais aussi où communication est possible seulement entre les agents situés dans le même sommet. Dans les deux cas nous présentons des algorithmes efficaces. Nous avons aussi obtenu des limites inférieures qui correspondent bien à la performance des algorithmes. Nous considérons également l'exploration de graphe en supposant que les mouvements des agents sont déterminés par le soi-disant rotor-router mécanisme. Du point de vue d'un sommet fixé, le rotor- router envoie des agents qui visitent les sommet voisins dans un mode round-robin. Nous étudions l'accélération défini comme la proportion entre le pire des cas de l'exploration d'un agent unique et des plusieurs agents. Pour générales graphes, nous montrerons que le gain de vitesse en cas de multi-agent rotor-router est toujours entre fonction logarithmique et linéaire du nombre d'agents. Nous présentons également des résultats optimaux sur l'accélération de multi-agent rotor-router pour cycles, expanseurs, graphes aléatoires, cliques, tores de dimension fixé et une analyse presque optimale pour hypercubes. Finalement nous considérons l'exploration sans collision, où chaque agent doit explorer le graphe de manière indépendante avec la contrainte supplémentaire que deux agents ne peuvent pas occuper le même sommet. Dans le cas où les agents sont donnés le plan de graphe, on présente un algorithme optimal pour les arbres et un algorithme asymptotiquement optimal pour générales graphes. Nous présentons aussi des algorithmes dans le cas de l'exploration sans collision des arbres et des générales graphes dans la situation où les agents ne connaissent pas le graphe. Nous fermons la thèse par des observations finales et une discussion de problèmes ouverts liés dans le domaine de l'exploration des graphes.

APA, Harvard, Vancouver, ISO, and other styles

38

Schmollinger, Martin. "Designing parallel algorithms for SMP clusters." [S.l. : s.n.], 2003. http://deposit.ddb.de/cgi-bin/dokserv?idn=969343841.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Bergheim, Thomas Stian, and Arve Aleksander Nymo Skogvold. "Parallel Algorithms for Neuronal Spike Sorting." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, 2011. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-14199.

Full text

Abstract:

Neurons communicate through electrophysiological signals, which may be recorded using electrodes inserted into living tissue.When a neuron emits a signal, it is referred to as a spike, and an electrode can detect these from multiple neurons.Neuronal spike sorting is the process of classifying the spike activity based on which neuron each spike signal is emitted from.Advances in technology have introduced better recording equipment, which allows the recording of many neurons at the same time.However, clustering software is lagging behind.Currently, spike sorting is often performed semi-manually by experts, with computer assistance, in a drastically reduced feature space.This makes the clustering prone to subjectivity.Automating the process will make classification much more efficient, and may produce better results.Implementing accurate and efficient spike sorting algorithms is therefore increasingly important.We have developed parallel implementations of superparamagnetic clustering, a novel clustering algorithm, as well as k-means clustering, serving as a useful comparison.Several feature extraction methods have been implemented to test various input distributions with the clustering algorithms. To assess the quality of the results from the algorithms, we have also implemented different cluster quality algorithms.Our implementations have been benchmarked, and found to scale well both with increased problem sizes and when run on multi-core processors.The results from our cluster quality measurements are inconclusive, and we identify this as a problem related to the subjectivity in the manually classified datasets.To better assess the utility of the algorithms, comparisons with intracellular recordings should be performed.

APA, Harvard, Vancouver, ISO, and other styles

40

Zheng, Lu. "Parallel Junction Tree Algorithm on GPU." Thesis, Carnegie Mellon University, 2013. http://pqdtopen.proquest.com/#viewpdf?dispub=3575068.

Full text

Abstract:

Bayesian networks (BNs) are an effective tool in a diverse range of applications that require representation and reasoning with uncertain knowledge and data. Mathematically, a Bayesian network (BN) is a type of statistical model that can compactly represent complex probability distributions of random variables. Inference over BNs can be either exact or approximate. The junction tree algorithm is perhaps the most popular exact inference algorithm. The junction tree algorithm propagates beliefs (or posteriors) over a junction tree when a new piece of evidence comes in so that the factors over the junction tree stay consistent with each other.

However, belief propagation over junction trees is known to be computationally hard. The cornputational complexity hinders the application of BNs in cases where real-time inference is required. This thesis accelerates the junction tree belief propagation algorithm through parallelizing and using a state of the art manycore computing platform. Recent manycore computing platforms, like the recent Graphical Processing Units (GPUs) from NVIDIA and Intel's Knights Ferry, employ a Single Instruction Multiple Data (SIMD) architecture. They can provide massive computational power to address various computational problems. However, it is not trivial to map a problem instance to the manycore platform. The problem has to be carefully parallelized and the implementation has to be carefully optimized in order to make full use of the computation resources on a GPU.

In this thesis, we will thoroughly investigate the junction tree algorithm and identify the possible parallel opportunities. Our work mainly focuses on node-level parallelizati on, i.e., we parallelize each message passing of junction tree belief propagation. We first identify two kinds of parallelism in the message passing, namely, element-wise parallelism and arithmetic parallelism. Then, based on these two kinds parallelism, we propose a two-dimensional parallel message passing algorithm. In case of message passings that do not contain enough parallel opportunity, we also propose a junction tree pruning techniques called clique merging. Clique merging eliminates extremely small nodes in a junction tree so that the junction tree is better adjusted to the GPU platform.

Performance tuning for the parallel junction tree algorithm is necessary. Yet the diversity in size typically seen in the junction tree cliques makes the GPU input workload vary; the two dimensions of parallelism further add to the complexity of this tuning issue. Therefore it is hard to set the GPU configuration parameters for optimal performance. In this thesis, we parameterize the GPU tuning process and propose a machine learning framework to automatically optimize the performance. This framework essentially trains a machine learning model to capture causal relationships between a GPUs workload, its configuration, and resulting performance characteristics. The machine learning model is then used to optimize the GPU configuration. Experiments show that this auto-tuning framework can improve the performance significantly more than extensive manual tuning.

We implemented the parallel junction tree algorithm on a NVIDIA GTX 460 GPU card, and employed the clique merging and auto-tuning techniques. Compared with a benchmark sequential code that runs on an Intel Core 2 Quad CPU, the fully tuned code show up to 19.90x speedup. On average over all data-sets, we get a speedup of 10.70x (arithmetic average) or 8.68x (geometric average).

APA, Harvard, Vancouver, ISO, and other styles

41

Burgess, David A. "Parallel computing for unstructured mesh algorithms." Thesis, University of Oxford, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.318758.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Wesson, Paul John. "Parallel algorithms for systems of equations." Thesis, University of Oxford, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.315776.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Gwilliam, Catherine Sarah. "Parallel algorithms for Navier-Stokes modelling." Thesis, University of Oxford, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.357478.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

MACHADO, LUCAS EUZEBIO. "PARALLEL ALGORITHMS FOR MULTICORE GAME ENGINES." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2010. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=16309@1.

Full text

Abstract:

COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
Esse tese apresenta diversas técnicas sobre tecnologia paralela em jogos eletrônicos. A tese inicia apresentando diversas arquiteturas possíveis para um motor de jogos. Uma nova arquitetura é proposta, mais flexível e adequada para processadores do futuro que terão um grau maior de paralelismo. Em seguida, uma nova técnica para processar uma octree, uma estrutura de dados clássica da computação gráfica, é apresentada. As últimas técnicas apresentadas são relacionadas a detecção de colisão. Novas ténicas para processamento de grids hieráquicos e balanceamento de detecção colisãom um conjunto de objetos são apresentadas.
This thesis presents several techniques about parallel technology on electronic games. The thesis begins presenting several possible architectures for a game engine. A new architecture is presented, more flexible and adequate for the processors of the future that will have a higher level of parallelism. Following, a new technique for processing an octree, a classic data structure for computer graphics, is presented. The last techniques presented are related to collision detection. New techniques for processing hierarquical grids and balancing collision detection on a set of objets are presented.

APA, Harvard, Vancouver, ISO, and other styles

45

Frank, Matthew I. "LoPC-- modeling contention in parallel algorithms." Thesis, Massachusetts Institute of Technology, 1997. http://hdl.handle.net/1721.1/47439.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Klein, Philip N. (Philip Nathan). "An efficient parallel algorithm for planarity." Thesis, Massachusetts Institute of Technology, 1986. http://hdl.handle.net/1721.1/34303.

Full text

Abstract:

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1986.
MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING
Bibliography: leaves 56-57.
by Philip Nathan Klein.
M.S.

APA, Harvard, Vancouver, ISO, and other styles

47

Banihashemi, Seyed Parsa. "Parallel explicit FEM algorithms using GPU's." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54391.

Full text

Abstract:

The Explicit Finite Element Method is a powerful tool in nonlinear dynamic finite element analysis. Recent major developments in computational devices, in particular, General Purpose Graphical Processing Units (GPGPU's) now make it possible to increase the performance of the explicit FEM. This dissertation investigates existing explicit finite element method algorithms which are then redesigned for GPU's and implemented. The performance of these algorithms is assessed and a new asynchronous variational integrator spatial decomposition (AVISD) algorithm is developed which is flexible and encompasses all other methods and can be tuned based for a user-defined problem and the performance of the user's computer. The mesh-aware performance of the proposed explicit finite element algorithm is studied and verified by implementation. The current research also introduces the use of a Particle Swarm Optimization method to tune the performance of the proposed algorithm automatically given a finite element mesh and the performance characteristics of a user's computer. For this purpose, a time performance model is developed which depends on the finite element mesh and the machine performance. This time performance model is then used as an objective function to minimize the run-time cost. Also, based on the performance model provided in this research and predictions about the changes in GPU's in the near future, the performance of the AVISD method is predicted for future machines. Finally, suggestions and insights based on these results are proposed to help facilitate future explicit FEM development.

APA, Harvard, Vancouver, ISO, and other styles

48

Abali, Bulent. "Parallel sorting algorithms for hypercube multiprocessors /." The Ohio State University, 1989. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487672631602736.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Huang, Chun-Hsi. "Communication-efficient bulk synchronous parallel algorithms." Buffalo, N.Y. : Dept. of Computer Science, State University of New York at Buffalo, 2001. http://www.cse.buffalo.edu/tech%2Dreports/2001%2D06.ps.Z.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Mebrotra, Mala. "Tree-Searching Algorithms on Parallel Architectures." W&M ScholarWorks, 1985. https://scholarworks.wm.edu/etd/1539625306.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Algoritmi paralleli'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles