Dissertations / Theses: 'Grammatical automatic parallel programming'

1

Tregidgo, R. W. S. "Parallel processing and automatic postal address recognition." Thesis, University of Essex, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.304946.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Li, Li. "Model-based automatic performance diagnosis of parallel computations /." view abstract or download file of text, 2007. http://proquest.umi.com/pqdweb?did=1335366371&sid=1&Fmt=2&clientId=11238&RQT=309&VName=PQD.

Full text

Abstract:

Thesis (Ph. D.)--University of Oregon, 2007. Typescript. Includes vita and abstract. Includes bibliographical references (leaves 119-123). Also available for download via the World Wide Web; free to University of Oregon users.

APA, Harvard, Vancouver, ISO, and other styles

3

Goddard, Alan John. "An automatic approach to implementing DSP algorithms on parallel processors." Thesis, City University London, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.254871.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Galparsoro, Miguel Angel Maiza. "Automatic scheduling and parallel code generation for high performance real-time systems." Thesis, University of York, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.288061.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Doddapaneni, Srinivas P. "Automatic dynamic decomposition of programs on distributed memory machines." Diss., Georgia Institute of Technology, 1997. http://hdl.handle.net/1853/8158.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Saà-Garriga, Albert. "Automatic source code adaptation for heterogeneous platforms." Doctoral thesis, Universitat Autònoma de Barcelona, 2016. http://hdl.handle.net/10803/399986.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Zarei, Behrouz. "Performance analysis of automatic lookahead generation in parallel discrete event simulation using control flow graphs." Thesis, Lancaster University, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.274230.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Fjeld, Hans Erik. "Application of Parallel Programming in a Automatic Detector for a Pulsed MTD Radar system : Automatic Detection and Fast Ordered Selection Algorithms." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for elektronikk og telekommunikasjon, 2012. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-18490.

Full text

Abstract:

Automatic CFAR Detection is to be implemented in a real time pulsed MTD radar system, used in a mar-itime application.The CFAR should be able to have good detection properties in bad weather conditions, where rough seastates, heavy downpour and high winds are expected.Sucient detection properties may be achieved using an Ordered Statistics based CFAR to generate detec-tion threshold for the MTD radar video signal.The MTD video is the coherent raw video of the signal ltered in a bandpass lter bank, separating theDoppler frequency space of the video into a number of individual Doppler channels.The Doppler frequency shift relates to a velocity, implying that every Doppler channel represents a velocityspace, so that targets and clutter may further resolved by their relative Doppler velocityCFAR algorithms are applied to all the test cells in the MTD video signal. These algorithms have to estimatea threshold that is used at discriminating real targets from clutter in all the velocity channels of the MTDvideo.A good threshold estimate is to have a low probability of false detections, and a high probability of declaringactual targets.This is to be valid in all clutter conditions, even when one or multiple targets are surrounded by non-stationary clutter and closely spaced.The Ordered Statistics algorithms involves using the k'th largest value of the test window as a mean clutterpower estimate for its corresponding test cell.The ordered statistics model makes a threshold selection based on the rank of the samples. A task withcomplexity increasing as a function of window length and k parameter.This task is to be performed on a large number of test cells in a system running real time. In a real timeradar system, all processing have to be done before the next scan becomes available.Radian AS works on developing a PC based MTD Radar system for a pulsed Doppler radar.The radar interfaces the PC through a PCI Express radar receiver card developed by Radian AS.This thesis investigates the application of parallel programming in C/C++ in order to achieve real time au-tomatic detection in a PC based MTD radar. Two means of parallel programming are considered, involvingexploitation of multi core CPU architecture as well as using a dedicated GPU as a co processor.OpenMP is an Open Source library with compiler instructions for running tasks in parallel over multiplecores in a CPU. It is easily incorporated into C/C++ code, and may be used with most multi core CPUs.nVidia has made GP-GPU computing available to the public through CUDA, selling CUDA enabled graphicscards and providing the tools as well as documentation needed for a programmer to be able to use the GPUas co processor.CUDA C integrates the SIMT abstractions of CUDA, and a programmer may write C code that is compiledand executed on the GPU.Dierent implementations of the OS-CFAR algorithm for threshold estimation are implemented using CUDAand OpenMP.The dierent implementations are evaluated and compared to each other in terms of the results gatheredfrom executing them on MTD video.The experiences drawn from this work is concluded with respect to the application of parallel programming,and further recommendations for the future of the project of making a PC based pulsed MTD Radar signalprocessor.This thesis introduces a CUDA algorithm for high throughput ordered selection using short window lengthson a large number of cells under test.An algorithm developed in C for the project assignment leading up to this thesis is enabled openMP, alongwith a C++ STL algorithm, for performing ordered statistics ranked selection on the CPU. In addition, theCUDA OS-CFAR algorithm is ported to C with openMP.The three implementations in C/C++ are compared to the CUDA C implementation.

APA, Harvard, Vancouver, ISO, and other styles

9

Gebremedhin, Mahder. "Automatic and Explicit Parallelization Approaches for Mathematical Simulation Models." Licentiate thesis, Linköpings universitet, Programvara och system, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-117346.

Full text

Abstract:

The move from single core and processor systems to multi-core and many-processors systemscomes with the requirement of implementing computations in a way that can utilizethese multiple units eciently. This task of writing ecient multi-threaded algorithmswill not be possible with out improving programming languages and compilers to providethe mechanisms to do so. Computer aided mathematical modeling and simulationis one of the most computationally intensive areas of computer science. Even simpli-ed models of physical systems can impose a considerable amount of computational loadon the processors at hand. Being able to take advantage of the potential computationpower provided by multi-core systems is vital in this area of application. This thesis triesto address how we can take advantage of the potential computation power provided bythese modern processors to improve the performance of simulations. The work presentsimprovements for the Modelica modeling language and the OpenModelica compiler. Two approaches of utilizing the computational power provided by modern multi-corearchitectures are presented in this thesis: Automatic and Explicit parallelization. Therst approach presents the process of extracting and utilizing potential parallelism fromequation systems in an automatic way with out any need for extra eort from the modelers/programmers side. The thesis explains improvements made to the OpenModelicacompiler and presents the accompanying task systems library for ecient representation,clustering, scheduling proling and executing complex equation/task systems with heavydependencies. The Explicit parallelization approach explains the process of utilizing parallelismwith the help of the modeler or programmer. New programming constructs havebeen introduced to the Modelica language in order to enable modelers write parallelizedcode. the OpenModelica compiler has been improved accordingly to recognize and utilizethe information from this new algorithmic constructs and generate parallel code toimprove the performance of computations. The series name Linköping Studies in Science and Technology Licentiate Thesis is incorrect. The correct series name is Linköping Studies in Science and Technology Thesis.

APA, Harvard, Vancouver, ISO, and other styles

10

Leao, Ruth Pastora Saraiva. "A study of automatic contingency selection algorithms for steady-state security assessment of power systems and the application of parallel processing." Thesis, Loughborough University, 1995. https://dspace.lboro.ac.uk/2134/32911.

Full text

Abstract:

The performance of various Contingency Selection methods has been investigated within the framework of accuracy for application to steady-state power system security assessment and suitability for execution in a real-time environment. In the study the following requirements have been considered: (a) Effectiveness: in identifying contingencies which may cause limit violations and discarding all others; (b) Adaptability: to model both permanent and temporary changes in the system; (c) Flexibility: to model any number and type of contingencies; (d) Computational efficiency: in terms of speed in selecting the sub-set of contingencies as well as in terms of storage requirements; (e) Ability: to update and augment on-line the list of contingencies given the actual system operating data.

APA, Harvard, Vancouver, ISO, and other styles

11

Diarra, Rokiatou. "Automatic Parallelization for Heterogeneous Embedded Systems." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS485.

Full text

Abstract:

L'utilisation d'architectures hétérogènes, combinant des processeurs multicoeurs avec des accélérateurs tels que les GPU, FPGA et Intel Xeon Phi, a augmenté ces dernières années. Les GPUs peuvent atteindre des performances significatives pour certaines catégories d'applications. Néanmoins, pour atteindre ces performances avec des API de bas niveau comme CUDA et OpenCL, il est nécessaire de réécrire le code séquentiel, de bien connaître l’architecture des GPUs et d’appliquer des optimisations complexes, parfois non portables. D'autre part, les modèles de programmation basés sur des directives (par exemple, OpenACC, OpenMP) offrent une abstraction de haut niveau du matériel sous-jacent, simplifiant ainsi la maintenance du code et améliorant la productivité. Ils permettent aux utilisateurs d’accélérer leurs codes séquentiels sur les GPUs en insérant simplement des directives. Les compilateurs d'OpenACC/OpenMP ont la lourde tâche d'appliquer les optimisations nécessaires à partir des directives fournies par l'utilisateur et de générer des codes exploitant efficacement l'architecture sous-jacente. Bien que les compilateurs d'OpenACC/OpenMP soient matures et puissent appliquer certaines optimisations automatiquement, le code généré peut ne pas atteindre l'accélération prévue, car les compilateurs ne disposent pas d'une vue complète de l'ensemble de l'application. Ainsi, il existe généralement un écart de performance important entre les codes accélérés avec OpenACC/OpenMP et ceux optimisés manuellement avec CUDA/OpenCL. Afin d'aider les programmeurs à accélérer efficacement leurs codes séquentiels sur GPU avec les modèles basés sur des directives et à élargir l'impact d'OpenMP/OpenACC dans le monde universitaire et industrielle, cette thèse aborde plusieurs problématiques de recherche. Nous avons étudié les modèles de programmation OpenACC et OpenMP et proposé une méthodologie efficace de parallélisation d'applications avec les approches de programmation basées sur des directives. Notre expérience de portage d'applications a révélé qu'il était insuffisant d'insérer simplement des directives de déchargement OpenMP/OpenACC pour informer le compilateur qu'une région de code particulière devait être compilée pour être exécutée sur la GPU. Il est essentiel de combiner les directives de déchargement avec celles de parallélisation de boucle. Bien que les compilateurs actuels soient matures et effectuent plusieurs optimisations, l'utilisateur peut leur fournir davantage d'informations par le biais des clauses des directives de parallélisation de boucle afin d'obtenir un code mieux optimisé. Nous avons également révélé le défi consistant à choisir le bon nombre de threads devant exécuter une boucle. Le nombre de threads choisi par défaut par le compilateur peut ne pas produire les meilleures performances. L'utilisateur doit donc essayer manuellement différents nombres de threads pour améliorer les performances. Nous démontrons que les modèles de programmation OpenMP et OpenACC peuvent atteindre de meilleures performances avec un effort de programmation moindre, mais les compilateurs OpenMP/OpenACC atteignent rapidement leur limite lorsque le code de région déchargée a une forte intensité arithmétique, nécessite un nombre très élevé d'accès à la mémoire globale et contient plusieurs boucles imbriquées. Dans de tels cas, des langages de bas niveau doivent être utilisés. Nous discutons également du problème d'alias des pointeurs dans les codes GPU et proposons deux outils d'analyse statiques qui permettent d'insérer automatiquement les qualificateurs de type et le remplacement par scalaire dans le code source Recent years have seen an increase of heterogeneous architectures combining multi-core CPUs with accelerators such as GPU, FPGA, and Intel Xeon Phi. GPU can achieve significant performance for certain categories of application. Nevertheless, achieving this performance with low-level APIs (e.g. CUDA, OpenCL) requires to rewrite the sequential code, to have a good knowledge of GPU architecture, and to apply complex optimizations that are sometimes not portable. On the other hand, directive-based programming models (e.g. OpenACC, OpenMP) offer a high-level abstraction of the underlying hardware, thus simplifying the code maintenance and improving productivity. They allow users to accelerate their sequential codes on GPU by simply inserting directives. OpenACC/OpenMP compilers have the daunting task of applying the necessary optimizations from the user-provided directives and generating efficient codes that take advantage of the GPU architecture. Although the OpenACC / OpenMP compilers are mature and able to apply some optimizations automatically, the generated code may not achieve the expected speedup as the compilers do not have a full view of the whole application. Thus, there is generally a significant performance gap between the codes accelerated with OpenACC/OpenMP and those hand-optimized with CUDA/OpenCL. To help programmers for speeding up efficiently their legacy sequential codes on GPU with directive-based models and broaden OpenMP/OpenACC impact in both academia and industry, several research issues are discussed in this dissertation. We investigated OpenACC and OpenMP programming models and proposed an effective application parallelization methodology with directive-based programming approaches. Our application porting experience revealed that it is insufficient to simply insert OpenMP/OpenACC offloading directives to inform the compiler that a particular code region must be compiled for GPU execution. It is highly essential to combine offloading directives with loop parallelization constructs. Although current compilers are mature and perform several optimizations, the user may provide them more information through loop parallelization constructs clauses in order to get an optimized code. We have also revealed the challenge of choosing good loop schedules. The default loop schedule chosen by the compiler may not produce the best performance, so the user has to manually try different loop schedules to improve the performance. We demonstrate that OpenMP and OpenACC programming models can achieve best performance with lesser programming effort, but OpenMP/OpenACC compilers quickly reach their limit when the offloaded region code is computed/memory bound and contain several nested loops. In such cases, low-level languages may be used. We also discuss pointers aliasing problem in GPU codes and propose two static analysis tools that perform automatically at source level type qualifier insertion and scalar promotion to solve aliasing issues

APA, Harvard, Vancouver, ISO, and other styles

12

Engman, Jimmy. "Model Predictive Control for Series-Parallel Plug-In Hybrid Electrical Vehicle." Thesis, Linköpings universitet, Fordonssystem, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-69608.

Full text

Abstract:

The automotive industry is required to deal with increasingly stringent legislationfor greenhouse gases. Hybrid Electric Vehicles, HEV, are gaining acceptance as thefuture path of lower emissions and fuel consumption. The increased complexityof multiple prime movers demand more advanced control systems, where futuredriving conditions also becomes interesting. For a plug-in Hybrid Electric Vehicle,PIHEV, it is important to utilize the comparatively inexpensive electric energybefore the driving cycle is complete, this for minimize the cost of the driving cycle,since the battery in a PIHEV can be charged from the grid. A strategy with lengthinformation of the driving cycle from a global positioning system, GPS, couldreduce the cost of driving. This by starting to blend the electric energy with fuelearlier, a strategy called blended driving accomplish this by distribute the electricenergy, that is charged externally, with fuel over the driving cycle, and also ensurethat the battery’s minimum level reaches before the driving cycle is finished. Astrategy called Charge Depleting Charge Sustaining, CDCS, does not need lengthinformation. This strategy first depletes the battery to a minimum State of Charge,SOC, and after this engages the engine to maintain the SOC at this level. In thisthesis, a variable SOC reference is developed, which is dependent on knowledgeabout the cycle’s length and the current length the vehicle has driven in the cycle.With assistance of a variable SOC reference, is a blended strategy realized. Thisis used to minimize the cost of a driving cycle. A comparison between the blendedstrategy and the CDCS strategy was done, where the CDCS strategy uses a fixedSOC reference. During simulation is the usage of fuel minimized; and the blendedstrategy decreases the cost of the driving missions compared to the CDCS strategy.To solve the energy management problem is a model predictive control used. Thedesigned control system follows the driving cycles, is charge sustaining and solvesthe energy management problem during simulation. The system also handlesmoderate model errors. Fordonsindustrin måste hantera allt strängare lagkrav mot utsläpp av emissioneroch växthusgaser. Hybridfordon har börjat betraktas som den framtida vägenför att ytterligare minska utsläpp och användning av fossila bränslen. Den ökadekomplexiteten från flera olika motorer kräver mera avancerade styrsystem. Begränsningarfrån motorernas energikällor gör att framtida förhållanden är viktigaatt estimera. För plug-in hybridfordon, PIHEV, är det viktigt att använda denvvijämförelsevis billiga elektriska energin innan fordonet har nått fram till slutdestinationen.Batteriets nuvarande energimängd mäts i dess State of Charge, SOC.Genom att utnyttja information om hur långt det är till slutdestinationen från ettGlobal Positioning System, GPS, blandar styrsystemet den elektriska energin medbränsle från början, detta kallas för blandad körning. En strategi som inte hartillgång till hur långt fordonet ska köras kallas Charge Depleting Charge Sustaining,CDCS. Denna strategi använder först energin från batteriet, för att sedanbörja använda förbränningsmotorn när SOC:s miniminivå har nåtts. Strategin attanvända GPS informationen är jämförd med en strategi som inte har tillgång tillinformation om körcykelns längd. Blandad körning använder en variabel SOC referens,till skillnad från CDCS strategin som använder sig av en konstant referenspå SOC:s miniminivå. Den variabla SOC referensen beror på hur långt fordonethar kört av den totala körsträckan, med hjälp av denna realiseras en blandad körning.Från simuleringarna visade det sig att blandad körning gav minskad kostnadför de simulerade körcyklerna jämfört med en CDCS strategi. En modellbaseradprediktionsreglering används för att lösa energifördelningsproblemet. Styrsystemetföljer körcykler och löser energifördelningsproblemet för de olika drivkällorna undersimuleringarna. Styrsystemet hanterar även måttliga modellfel.

APA, Harvard, Vancouver, ISO, and other styles

13

Zhao, Jie. "Une approche combinée langage-polyédrique pour la programmation parallèle hétérogène." Thesis, Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE062.

Full text

Abstract:

De nos jours, l'optimisation des compilateurs est de plus en plus mise à l'épreuve par la diversité des langages de programmation et l'hétérogénéité des architectures. Le modèle polyédrique est un puissant cadre mathématique permettant aux programmes d’exploiter la parallélisation automatique et l’optimisation de la localité, jouant un rôle important dans le domaine de l’optimisation des compilateurs. Une limite de longue date du modèle réside dans sa restriction aux programmes affines à contrôle statique, ce qui a entraîné une demande émergente de prise en charge d'extensions non affines. Cela est particulièrement aigu dans le contexte d'architectures hétérogènes où une variété de noyaux de calcul doivent être analysés et transformés pour répondre aux contraintes des accélérateurs matériels et pour gérer les transferts de données à travers des espaces mémoire. Nous explorons plusieurs extensions non affines du modèle polyhédral, dans le contexte d'un langage intermédiaire bien défini combinant des éléments affines et syntaxiques. D'un côté, nous expliquons comment les transformations et la génération de code pour des boucles avec des limites de boucle dynamiques non dépendantes des données et dynamiques sont intégrées dans un cadre polyédrique, élargissant ainsi le domaine applicable de la compilation polyédrique dans le domaine des applications non affines. D'autre part, nous décrivons l'intégration du pavage en recouvrement pour les calculs de pochoir dans un cadre polyhédral général, en automatisant les transformations non affines dans la compilation polyhédrique. Nous évaluons nos techniques sur des architectures de CPU et de GPU, en validant l'efficacité des optimisations en effectuant une comparaison approfondie des performances avec des frameworks et des librairies écrites à la pointe de la technologie Nowadays, optimizing compilers are increasingly challenged by the diversity of programming languages and heterogeneity of architectures. The polyhedral model is a powerful mathematical framework for programs to exploit automatic parallelization and locality optimization, playing an important role in the field of optimizing compilers. A long standing limitation of the model has been its restriction to static control affine programs, resulting in an emergent demand for the support of non-affine extensions. This is particularly acute in the context of heterogeneous architectures where a variety of computation kernels need to be analyzed and transformed to match the constraints of hardware accelerators and to manage data transfers across memory spaces. We explore multiple non-affine extensions of the polyhedral model, in the context of a welldefined intermediate language combining affine and syntactic elements. On the one hand, we explain how transformations and code generation for loops with non-affine, data-dependent and dynamic loop bounds are integrated into a polyhedral framework, extending the applicable domain of polyhedral compilation in the realm of non-affine applications. On the other hand, we describe the integration of overlapped tiling for stencil computations into a general polyhedral framework, automating non-affine transformations in polyhedral compilation. We evaluate our techniques on both CPU and GPU architectures, validating the effectiveness of the optimizations by conducting an in-depth performance comparison with state-of-the-art frameworks and manually-written libraries

APA, Harvard, Vancouver, ISO, and other styles

14

Passerat-Palmbach, Jonathan. "Contributions to parallel stochastic simulation : application of good software engineering practices to the distribution of pseudorandom streams in hybrid Monte Carlo simulations." Phd thesis, Université Blaise Pascal - Clermont-Ferrand II, 2013. http://tel.archives-ouvertes.fr/tel-00858735.

Full text

Abstract:

The race to computing power increases every day in the simulation community. A few years ago, scientists have started to harness the computing power of Graphics Processing Units (GPUs) to parallelize their simulations. As with any parallel architecture, not only the simulation model implementation has to be ported to the new parallel platform, but all the tools must be reimplemented as well. In the particular case of stochastic simulations, one of the major element of the implementation is the pseudorandom numbers source. Employing pseudorandom numbers in parallel applications is not a straightforward task, and it has to be done with caution in order not to introduce biases in the results of the simulation. This problematic has been studied since parallel architectures are available and is called pseudorandom stream distribution. While the literature is full of solutions to handle pseudorandom stream distribution on CPU-based parallel platforms, the young GPU programming community cannot display the same experience yet. In this thesis, we study how to correctly distribute pseudorandom streams on GPU. From the existing solutions, we identified a need for good software engineering solutions, coupled to sound theoretical choices in the implementation. We propose a set of guidelines to follow when a PRNG has to be ported to GPU, and put these advice into practice in a software library called ShoveRand. This library is used in a stochastic Polymer Folding model that we have implemented in C++/CUDA. Pseudorandom streams distribution on manycore architectures is also one of our concerns. It resulted in a contribution named TaskLocalRandom, which targets parallel Java applications using pseudorandom numbers and task frameworks. Eventually, we share a reflection on the methods to choose the right parallel platform for a given application. In this way, we propose to automatically build prototypes of the parallel application running on a wide set of architectures. This approach relies on existing software engineering tools from the Java and Scala community, most of them generating OpenCL source code from a high-level abstraction layer.

APA, Harvard, Vancouver, ISO, and other styles

15

Cohen, Albert. "Contributions à la conception de systèmes à hautes performances, programmables et sûrs: principes, interfaces, algorithmes et outils." Habilitation à diriger des recherches, Université Paris Sud - Paris XI, 2007. http://tel.archives-ouvertes.fr/tel-00550830.

Full text

Abstract:

La loi de Moore sur semi-conducteurs approche de sa fin. L'evolution de l'architecture de von Neumann à travers les 40 ans d'histoire du microprocesseur a conduit à des circuits d'une insoutenable complexité, à un très faible rendement de calcul par transistor, et une forte consommation énergetique. D'autre-part, le monde du calcul parallèle ne supporte pas la comparaison avec les niveaux de portabilité, d'accessibilité, de productivité et de fiabilité de l'ingénérie du logiciel séquentiel. Ce dangereux fossé se traduit par des défis passionnants pour la recherche en compilation et en langages de programmation pour le calcul à hautes performances, généraliste ou embarqué. Cette thèse motive notre piste pour relever ces défis, introduit nos principales directions de travail, et établit des perspectives de recherche.

APA, Harvard, Vancouver, ISO, and other styles

16

Hamidouche, Khaled. "Programmation des architectures hiérarchiques et hétérogènes." Phd thesis, Université Paris Sud - Paris XI, 2011. http://tel.archives-ouvertes.fr/tel-00653203.

Full text

Abstract:

Les architectures de calcul haute performance de nos jours sont des architectures hiérarchiques et hétérogènes: hiérarchiques car elles sont composées d'une hiérarchie de mémoire, une mémoire distribuée entre les noeuds et une mémoire partagée entre les coeurs d'un même noeud. Hétérogènes due à l'utilisation des processeurs spécifiques appelés Accélérateurs tel que le processeur CellBE d'IBM et les CPUs de NVIDIA. La complexité de maîtrise de ces architectures est double. D'une part, le problème de programmabilité: la programmation doit rester simple, la plus proche possible de la programmation séquentielle classique et indépendante de l'architecture cible. D'autre part, le problème d'efficacité: les performances doivent êtres proches de celles qu'obtiendrait un expert en écrivant le code à la main en utilisant des outils de bas niveau. Dans cette thèse, nous avons proposé une plateforme de développement pour répondre à ces problèmes. Pour cela, nous proposons deux outils : BSP++ est une bibliothèque générique utilisant des templates C++ et BSPGen est un framework permettant la génération automatique de code hybride à plusieurs niveaux de la hiérarchie (MPI+OpenMP ou MPI + Cell BE). Basée sur un modèle hiérarchique, la bibliothèque BSP++ prend les architectures hybrides comme cibles natives. Utilisant un ensemble réduit de primitives et de concepts intuitifs, BSP++ offre une simplicité d'utilisation et un haut niveau d' abstraction de la machine cible. Utilisant le modèle de coût de BSP++, BSPGen estime et génère le code hybride hiérarchique adéquat pour une application donnée sur une architecture cible. BSPGen génère un code hybride à partir d'une liste de fonctions séquentielles et d'une description de l'algorithme parallèle. Nos outils ont été validés sur différentes applications de différents domaines allant de la vérification et du calcul scientifique au traitement d'images en passant par la bioinformatique. En utilisant une large sélection d'architecture cible allant de simple machines à mémoire partagée au machines Petascale en passant par les architectures hétérogènes équipées d'accélérateurs de type Cell BE.

APA, Harvard, Vancouver, ISO, and other styles

17

Jimborean, Alexandra. "Adapting the polytope model for dynamic and speculative parallelization." Phd thesis, Université de Strasbourg, 2012. http://tel.archives-ouvertes.fr/tel-00733850.

Full text

Abstract:

In this thesis, we present a Thread-Level Speculation (TLS) framework whose main feature is to speculatively parallelize a sequential loop nest in various ways, to maximize performance. We perform code transformations by applying the polyhedral model that we adapted for speculative and runtime code parallelization. For this purpose, we designed a parallel code pattern which is patched by our runtime system according to the profiling information collected on some execution samples. We show on several benchmarks that our framework yields good performance on codes which could not be handled efficiently by previously proposed TLS systems.

APA, Harvard, Vancouver, ISO, and other styles

18

Lourenço, Nuno António Marques. "Enhancing Grammar-Based Approaches for the Automatic Design of Algorithms." Doctoral thesis, 2016. http://hdl.handle.net/10316/29450.

Full text

Abstract:

Tese de doutoramento em Programa de Doutoramento em Ciência da Informação e Tecnologia, apresentada ao Departamento de Engenharia Informática da Faculdade de Ciências e Tecnologia da Universidade de Coimbra Os Algoritmos Evolucionários (AE) são métodos computacionais de procura estocástica inspirados pelos conceitos da selecção natural e da genética. Este tipo de algoritmos tem sido usado com sucesso para resolver problemas em dominios da aprendizagem, do design e da optimização. Para utilizar um AE é necessário definir as suas componentes principais, como por exemplo os operadores de variação, os operadores de selecção de pais, e os mecanismos de selecção de sobreviventes. O desempenho de um AE pode ser altamente melhorado se cada uma destas componentes for ajustada para o problema especifico que se pretende resolver. Normalmente estas modificações são feitas manualmente e requerem um grau de conhecimento elevado. Para tentar melhorar este processo, os investigadores têm vindo a propor algoritmos para automaticamente criar AE. Estes novos métodos usam um (meta-) algoritmo que combina as diversas componentes e parâmetros, de maneira a criar a estratégia que melhor se aplica ao problema em questão. Neste contexto surge a área das Híper-Heurísticas (HH), cujo principal objectivo é o desenvolvimento de meta-algoritmos que sejam eficientes. A Programação Genética (PG), e em particular as variantes baseadas em representações gramaticais são habitualmente utilizadas como motor de pesquisa nas HH. Este trabalho prentende estudar e analisar em que condições a eficácia dos métodos de pesquisa pode ser melhorada, no contexto da evolução automática de AE. As principais contribuições podem ser divididas em três aspectos. A primeira consiste na construção de uma framework de HH baseada em Evolução Gramatical (EG). A framework está dividida em duas fases complementares: Aprendizagem e Validação. Na aprendizagem, um motor de EG é usado para combinar as componentes de baixo nível que estão especificadas numa Gramática Livre de Contexto. Na validação, os melhores algoritmos encontrados são aplicados a cenários diferentes dos da aprendizagem, para analisar a sua capacidade de generalização. A segunda contribuição está relacionada com a análise do impacto que as condições de aprendizagem têm na estrutura final dos algoritmos que estão a ser aprendidos e consequentemente na sua capacidade de optimização. Além disso é feita uma análise da relação que existe entre a qualidade dos algoritmos na fase de aprendizagem, e a qualidade dos algoritmos na fase de validação. Em concreto, analisa-se se os melhores algoritmos da fase de aprendizagem mantêm o seu bom desempenho na fase de validação. Por fim, a última contribuição é uma proposta de uma nova representação para EG que permite resolver alguns problemas relacionados com a exploração do espaço de procura. Evolutionary Algorithms (EA) are stochastic computational methods loosely inspired by the principles of natural selection and genetics. They have been successfully used to solve complex problems in the domains of learning, design and optimization. When using an EA practitioners have to define its main components such as the variation operators, the selection and replacement mechanisms. The performance of an EA can be greatly enhanced if the components are tailored to the specific situation being addressed. These modifications are usually done manually and require a reasonable degree of expertise. In order to ease the use of EAs some researchers have developed methods to automatically design this type of algorithms. Usually, these methods rely on an (meta-) algorithm that combine components and parameters, in order to learn the one that is most suited for the problem being addressed. The area of Hyper-Heuristics (HH) emerges in this context focusing on the development of efficient meta-algorithms. Genetic Programming (GP), specifically the grammar based variants, are commonly used as HH. In this work, we study and analyze the conditions in which Grammatical Evolution (GE) can be enhanced to automatically design EAs. The main contributions can be divided in three aspects. Firstly, we propose an HH framework that relies on GE as the search algorithm. The proposed framework is divided in two complementary phases: Learning and Validation. In Learning the GE engine is used to combine low level components that are specified in a Context Free Grammar. In the second phase, Validation, the best algorithms learned are selected to be applied to scenarios different from the learning, in order to evaluate their generalization capacity. Secondly we study the impact that the learning conditions have in the final structure of the algorithms that are being learned. Moreover, we analyze the relationship between the quality exhibited by the algorithms during learning and their effective optimization ability when used in unseen scenarios. In concrete we analyze if the best strategies discover in learning still have the same good behavior in validation. Our final contribution addresses some of the limitations exhibited by Grammatical Evolution. The result is a novel representation with an enhanced performance. FCT - SFRH/BD/79649/2011

APA, Harvard, Vancouver, ISO, and other styles

19

Faraj, Ahmad A. Yuan Xin. "Automatic empirical techniques for developing efficient MPI collective communication routines." 2006. http://etd.lib.fsu.edu/theses/available/07072006-162046.

Full text

Abstract:

Thesis (Ph. D.)--Florida State University, 2006. Advisor: Xin Yuan, Florida State University, College of Arts and Sciences, Dept. of Computer Science. Title and description from dissertation home page (viewed Sept. 19, 2006). Document formatted into pages; contains xiii, 162 pages. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Grammatical automatic parallel programming'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles