Dissertations / Theses on the topic 'Calcolo HTC'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Calcolo HTC.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Vinot, Emmanuel. "Modélisation des supraconducteurs HTC : applications au calcul des pertes AC." Phd thesis, Grenoble INPG, 2000. http://tel.archives-ouvertes.fr/tel-00689985.
Full textGIROTTO, IVAN. "Studio della Fisica delle Emulsioni tramite l'utilizzo di Calcolo ad Alte Prestazioni." Doctoral thesis, Università degli studi di Modena e Reggio Emilia, 2021. http://hdl.handle.net/11380/1251098.
Full textIn this project we employed highly optimized codes, based on the multicomponent Lattice Boltzmann model (LBM), to explore the physics of complex fluids in 3-dimensions. We first implemented an LBM based application which delivers good scaling performances on distributed systems while optimising the memory access through a data organisation that enables high computing efficiency. In particular, we first introduced and then deeply analysed, two new clustered data layouts which, enhancing compiler vectorizazion, demonstrated to deliver high-performance on modern x86-64 CPUs, if compared with legacy data layouts typically adopted for LBM based codes such as arrays of structures (AoS) or structures of arrays (SoA). This work aided the award of two PRACE projects for approximately hundreds of millions of core-hours distributed among two major European Tier-0 systems for high-performance computing such as the Marconi at CINECA and the MareNostrum at the Barcelona Supercomputing Centre (BSC). We performed a detailed analysis of the computing performance and energy efficiency on both the CPU systems which equipped those supercomputers: the Intel KNL and the more recent Intel Skylake processor, respectively. In the ultimate stage of the project we also extended the implemented model to run on multi-GPU distributed systems such as the Marconi-100 at CINECA. We implemented and validated the well-established Shan-Chen multicomponent LBM with second neighbour coupling. This allows to model the dynamics of two immiscible fluids characterized by a surface tension as well as by a disjoing pressure between them. The emulsion is stirred via a large scale forcing mimicking a classical stirring often used in spectral simulation of turbulent flows. With the implemented numerical models, we started to explore the physics of complex fluid emulsions: from the phase of turbulent stirring where the emulsion is produced, to the resting phase where the resulting emulsion is in jammed state. In particular, we performed several simulations to achieve a first qualitative measurements on the morphology of the system (i.e., number of droplets, average volume of the droplets, average surface, PDFs of volume and surface) as well as some initial estimation of the energy. We made the analysis at different volume fractions and by pushing the dispersed phase up to about 80%, limit reported by experiments. We observed how the resulting highly-packed emulsions bring up rich phenomenology showing non-spherical droplets, and while presenting feature of a solid in resting phase but still flowing as a fluid if subjected to a forcing. We have analysed the behaviour of the system looking at both, the influence of the flow on the morphology, by stirring at different forcing amplitudes, and the influence of morphology on the flow, by performing Kolmogorov rheology tests on jammed emulsions at different volume fractions. Emulsions are remarkable systems presenting an extremely interesting phenomenology but at the same time being really fragile. Indeed, we have experimented the difficulties of finding the equilibrium between the rate of pushing higher volume fraction and the correct stirring amplitude to achieve turbulence without facing the problem of catastrophic phase inversion. In the second part of the project we engineer and added to the implemented LBM based code a method for tracking all droplets present in a 3-dimensional emulsion at high-resolution, obtaining a Lagrangian profile of all droplets in the dispersed phase of the emulsion both when exposed to large-scale stirring and when the forcing is turn off
Masini, Filippo. "Coca cola hbc italia: Modello per il calcolo di inventory stock target e production cycles ottimali." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amslaurea.unibo.it/8060/.
Full textCapra, Antoine. "Virtualisation en contexte HPC." Thesis, Bordeaux, 2015. http://www.theses.fr/2015BORD0436/document.
Full textTo meet the growing needs of the digital simulation and remain at the forefront of technology, supercomputers must be constantly improved. These improvements can be hardware or software order. This forces the application to adapt to a new programming environment throughout its development. It then becomes necessary to raise the question of the sustainability of applications and portability from one machine to another. The use of virtual machines may be a first response to this need for sustaining stabilizing programming environments. With virtualization, applications can be developed in a fixed environment, without being directly impacted by the current environment on a physical machine. However, the additional abstraction induced by virtual machines in practice leads to a loss of performance. We propose in this thesis a set of tools and techniques to enable the use of virtual machines in HPC context. First we show that it is possible to optimize the operation of a hypervisor to respond accurately to the constraints of HPC that are : the placement of implementing son and memory data locality. Then, based on this, we have proposed a resource partitioning service from a compute node through virtual machines. Finally, to expand our work to use for MPI applications, we studied the network solutions and performance of a virtual machine
Chatelain, Yohan. "Outils de débogage et d'optimisation des calculs flottants dans le contexte HPC." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLV096.
Full textHigh Performance Computing (HPC) is a dynamic ecosystem where scientific computing architectures and codes are in permanent co-evolution (parallelism, specialized accelerators, new memories).This dynamism requires developers to adapt their software regularly to exploit all the new technological innovations.For this purpose, co-design approaches consisting of simultaneously developing software and hardware are an interesting approach.Nevertheless, co-design efforts have mainly focused on application performance without necessarily taking into account the numerical quality.However, this is becoming increasingly difficult to maintain from one generation of supercomputer to the next due to the increased complexity of the hardware and the parallel programming models. In addition, there are new floating point computation formats (bfloat16, binary16) that should be harnessed during the modernization process.These findings raise two issues:1) How to check the digital quality of codes during the modernization process? This requires tools that allow both to quickly identify sources of numerical errors and to be user-friendly for non-expert users.2) How can we take advantage of the new possibilities offered by the equipment?The applications possibilities are manifold and therefore lead to a considerable space of possible solutions. The solutions found are the result of a compromise between the performance of the application and the numerical quality of the computations, but also the reproducibility of the results.In this thesis, we contributed to the Verificarlo software that helps to detect numerical errors by injecting various noise models into floating computations. More precisely, we have developed an approach to study the evolution of numerical errors over time. This tool is based on the generation of numerical traces that allow the numerical quality of the variables to be tracked over time. These traces are enriched by context information retrieved during compilation and can then be viewed in an elegant way.We also contributed to VPREC, a computation model simulating formats of varying sizes. This tool has been used to address the problem of format optimization in iterative schemes. The proposed optimization is temporal since it optimizes the computation precision for each time step.Finally, a major constraint in the development of tools for HPC is the scaling up. Indeed, the size of the codes and the number of computations involved drastically increase the complexity of the analyses and limit conventional approaches. We have demonstrated that the techniques developed in this thesis are applicable to industrial codes since they have made it possible, first, to detect and correct a numerical error in the ABINIT code (ab initio code for quantum chemistry developed by the CEA et al.). Secondly, these tools have reduced the computation accuracy of YALES2 (fluid mechanics code developed by CORIA) and improved performance by reducing communication volumes by 28% and accelerating execution up to 1.30 times
Magnani, Simone. "analisi delle prestazioni del sistema grafico videocore iv applicato al calcolo generico." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/19100/.
Full textPourroy, Jean. "Calcul Haute Performance : Caractérisation d’architectures et optimisation d’applications pour les futures générations de supercalculateurs." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM028.
Full textInformation systems and High-Performance Computing (HPC) infrastructures play an active role in the improvement of scientific knowledge and the evolution of our societies. The field of HPC is expanding rapidly and users need increasingly powerful architectures to analyze the tsunami of data (numerical simulations, IOT), to make more complex decisions (artificial intelligence), and to make them faster (connected cars, weather).In this thesis work, we discuss several challenges (power consumption, cost, complexity) for the development of new generations of Exascale supercomputers. While industrial applications do not manage to achieve more than 10% of the theoretical performance, we show the need to rethink the architecture of platforms, in particular by using energy-optimized architectures. We then present some of the emerging technologies that will allow their development: 3D memories (HBM), Storage Class Memory (SCM) or photonic interconnection technologies. These new technologies associated with a new communication protocol (Gen-Z) will help to optimally execute the different parts of an application. However, in the absence of a method for fine characterization of code performance, these emerging architectures are potentially condemned since few experts know how to exploit them.Our contribution consists in the development of benchmarks and performance analysis tools. The first aim is to finely characterize specific parts of the microarchitecture. Two microbenchmarks have thus been developed to characterize the memory system and the floating point unit (FPU). The second family of tools is used to study the performance of applications. A first tool makes it possible to monitor the memory bus traffic, a critical resource of modern architectures. A second tool can be used to profile applications by extracting and characterizing critical loops (hot spots).To take advantage of the heterogeneity of platforms, we propose a 5-step methodology to identify and characterize these new platforms, to model the performance of an application, and finally to port its code to the selected architecture. Finally, we show how the tools can help developers to extract the maximum performance from an architecture. By providing our tools in open source, we want to sensitize users to this approach and develop a community around the work of performance characterization and analysis
Bruned, Vianney. "Analyse statistique et interprétation automatique de données diagraphiques pétrolières différées à l’aide du calcul haute performance." Thesis, Montpellier, 2018. http://www.theses.fr/2018MONTS064.
Full textIn this thesis, we investigate the automation of the identification and the characterization of geological strata using well logs. For a single well, geological strata are determined thanks to the segmentation of the logs comparable to multivariate time series. The identification of strata on different wells from the same field requires correlation methods for time series. We propose a new global method of wells correlation using multiple sequence alignment algorithms from bioinformatics. The determination of the mineralogical composition and the percentage of fluids inside a geological stratum results in an ill-posed inverse problem. Current methods are based on experts’ choices: the selection of a subset of mineral for a given stratum. Because of a model with a non-computable likelihood, an approximate Bayesian method (ABC) assisted with a density-based clustering algorithm can characterize the mineral composition of the geological layer. The classification step is necessary to deal with the identifiability issue of the minerals. At last, the workflow is tested on a study case
Honore, Valentin. "Convergence HPC - Big Data : Gestion de différentes catégories d'applications sur des infrastructures HPC." Thesis, Bordeaux, 2020. http://www.theses.fr/2020BORD0145.
Full textNumerical simulations are complex programs that allow scientists to solve, simulate and model complex phenomena. High Performance Computing (HPC) is the domain in which these complex and heavy computations are performed on large-scale computers, also called supercomputers.Nowadays, most scientific fields need supercomputers to undertake their research. It is the case of cosmology, physics, biology or chemistry. Recently, we observe a convergence between Big Data/Machine Learning and HPC. Applications coming from these emerging fields (for example, using Deep Learning framework) are becoming highly compute-intensive. Hence, HPC facilities have emerged as an appropriate solution to run such applications. From the large variety of existing applications has risen a necessity for all supercomputers: they mustbe generic and compatible with all kinds of applications. Actually, computing nodes also have a wide range of variety, going from CPU to GPU with specific nodes designed to perform dedicated computations. Each category of node is designed to perform very fast operations of a given type (for example vector or matrix computation).Supercomputers are used in a competitive environment. Indeed, multiple users simultaneously connect and request a set of computing resources to run their applications. This competition for resources is managed by the machine itself via a specific program called scheduler. This program reviews, assigns andmaps the different user requests. Each user asks for (that is, pay for the use of) access to the resources ofthe supercomputer in order to run his application. The user is granted access to some resources for a limited amount of time. This means that the users need to estimate how many compute nodes they want to request and for how long, which is often difficult to decide.In this thesis, we provide solutions and strategies to tackle these issues. We propose mathematical models, scheduling algorithms, and resource partitioning strategies in order to optimize high-throughput applications running on supercomputers. In this work, we focus on two types of applications in the context of the convergence HPC/Big Data: data-intensive and irregular (orstochastic) applications.Data-intensive applications represent typical HPC frameworks. These applications are made up oftwo main components. The first one is called simulation, a very compute-intensive code that generates a tremendous amount of data by simulating a physical or biological phenomenon. The second component is called analytics, during which sub-routines post-process the simulation output to extract,generate and save the final result of the application. We propose to optimize these applications by designing automatic resource partitioning and scheduling strategies for both of its components.To do so, we use the well-known in situ paradigm that consists in scheduling both components together in order to reduce the huge cost of saving all simulation data on disks. We propose automatic resource partitioning models and scheduling heuristics to improve overall performance of in situ applications.Stochastic applications are applications for which the execution time depends on its input, while inusual data-intensive applications the makespan of simulation and analytics are not affected by such parameters. Stochastic jobs originate from Big Data or Machine Learning workloads, whose performanceis highly dependent on the characteristics of input data. These applications have recently appeared onHPC platforms. However, the uncertainty of their execution time remains a strong limitation when using supercomputers. Indeed, the user needs to estimate how long his job will have to be executed by the machine, and enters this estimation as his first reservation value. But if the job does not complete successfully within this first reservation, the user will have to resubmit the job, this time requiring a longer reservation
Colin, de Verdière Guillaume. "A la recherche de la haute performance pour les codes de calcul et la visualisation scientifique." Thesis, Reims, 2019. http://www.theses.fr/2019REIMS012/document.
Full textThis thesis aims to demonstrate that algorithms and coding, in a high performance computing (HPC) context, cannot be envisioned without taking into account the hardware at the core of supercomputers since those machines evolve dramatically over time. After setting a few definitions relating to scientific codes and parallelism, we show that the analysis of the different generations of supercomputer used at CEA over the past 30 years allows to exhibit a number of attention points and best practices toward code developers.Based on some experiments, we show how to aim at code performance suited to the usage of supercomputers, how to try to get portable performance and possibly extreme performance in the world of massive parallelism, potentially using GPUs.We explain that graphical post-processing software and hardware follow the same parallelism principles as large scientific codes, requiring to master a global view of the simulation chain.Last, we describe tendencies and constraints that will be forced on the new generations of exaflopic class supercomputers. These evolutions will, yet again, impact the development of the next generations of scientific codes
Albert, Jérémie. "Modèle de calcul, primitives, et applications de référence, pour le domaine des réseaux ad hoc fortement mobiles." Thesis, Bordeaux 1, 2010. http://www.theses.fr/2010BOR14169/document.
Full textMobile ad hoc networks that evolve in an unplanned and unpredictable mannerare often studied assuming that their composition and their topology evolve relatively slowly. In this context of weak mobility, it is then possible to propose mechanisms (such asrouting, Public Key Infrastructure, etc.) which make the application designed for a static context still operational. At the opposite, the work presented in this thesis focuses on highlymobile ad hoc networks (iMANets). The nodes of these networks are extremely mobile,bringing ceaseless and fast changes in the network topology. The main contributions of this thesis are (i) the definition of an algebra called CiMAN (Calculus for highly Mobile Adhoc Networks) which makes it possible to model communicating processes in these highly mobile ad hoc networks, (ii) the use of this algebra to prove the correctness of algorithms dedicated to these networks, and (iii) a middleware and reference applications specifically designed for this context
Casteigts, Arnaud. "Contribution à l'algorithmique distribuée dans les réseaux mobiles ad hoc - Calculs locaux et réétiquetages de graphes dynamiques." Phd thesis, Université Sciences et Technologies - Bordeaux I, 2007. http://tel.archives-ouvertes.fr/tel-00193181.
Full textSaillard, Emmanuelle. "Static/Dynamic Analyses for Validation and Improvements of Multi-Model HPC Applications." Thesis, Bordeaux, 2015. http://www.theses.fr/2015BORD0176/document.
Full textSupercomputing plays an important role in several innovative fields, speeding up prototyping or validating scientific theories. However, supercomputers are evolving rapidly with now millions of processing units, posing the questions of their programmability. Despite the emergence of more widespread and functional parallel programming models, developing correct and effective parallel applications still remains a complex task. Although debugging solutions have emerged to address this issue, they often come with restrictions. However programming model evolutions stress the requirement for a convenient validation tool able to handle hybrid applications. Indeed as current scientific applications mainly rely on the Message Passing Interface (MPI) parallel programming model, new hardwares designed for Exascale with higher node-level parallelism clearly advocate for an MPI+X solutions with X a thread-based model such as OpenMP. But integrating two different programming models inside the same application can be error-prone leading to complex bugs - mostly detected unfortunately at runtime. In an MPI+X program not only the correctness of MPI should be ensured but also its interactions with the multi-threaded model, for example identical MPI collective operations cannot be performed by multiple nonsynchronized threads. This thesis aims at developing a combination of static and dynamic analysis to enable an early verification of hybrid HPC applications. The first pass statically verifies the thread level required by an MPI+OpenMP application and outlines execution paths leading to potential deadlocks. Thanks to this analysis, the code is selectively instrumented, displaying an error and synchronously interrupting all processes if the actual scheduling leads to a deadlock situation
Mena, morales Valentin. "Approche de conception haut-niveau pour l'accélération matérielle de calcul haute performance en finance." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2017. http://www.theses.fr/2017IMTA0018/document.
Full textThe need for resources in High Performance Computing (HPC) is generally met by scaling up server farms, to the detriment of the energy consumption of such a solution. Accelerating HPC application on heterogeneous platforms, such as FPGAs or GPUs, offers a better architectural compromise as they can reduce the energy consumption of a deployed system. Therefore, a change of programming paradigm is needed to support this heterogeneous acceleration, which trickles down to an increased level of programming complexity tackled by software experts. This is most notably the case for developers in quantitative finance. Applications in this field are constantly evolving and increasing in complexity to stay competitive and comply with legislative changes. This puts even more pressure on the programmability of acceleration solutions. In this context, the use of high-level development and design flows, such as High-Level Synthesis (HLS) for programming FPGAs, is not enough. A domain-specific approach can help to reach performance requirements, without impairing the programmability of accelerated applications.We propose in this thesis a high-level design approach that relies on OpenCL, as a heterogeneous programming standard. More precisely, a recent implementation of OpenCL for Altera FPGA is used. In this context, four main contributions are proposed in this thesis: (1) an initial study of the integration of hardware computing cores to a software library for quantitative finance (QuantLib), (2) an exploration of different architectures and their respective performances, as well as the design of a dedicated architecture for the pricing of American options and their implied volatility, based on a high-level design flow, (3) a detailed characterization of an Altera OpenCL platform, from elemental operators, memory accesses, control overlays, and up to the communication links it is made of, (4) a proposed compilation flow that is specific to the quantitative finance domain, and relying on the aforementioned characterization and on the description of the considered financial applications (option pricing)
Christodoulis, Georgios. "Adaption d'un système HPC pour intégrer des FPGAs." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM061.
Full textAlong with the traditional CPU cores, processing units of different architectures have been employed by the HPC community in order to obtain improved efficiency and performance. A Field Programmable Gate Arrays - FPGA, is a hardware fabric composed by interconnected re-programmable logic and memory blocks. This type of processing unit, constitutes promising candidate to amplify the computational power of heterogeneous HPC platforms, since due to the reduced amount of abstraction layers between the level of programming and the actual hardware, they can satisfy the aforementioned objectives.However, exploiting them requires an in-depth knowledge of low-level hardware design and high expertise on vendor-provided tools, which is not aligned with the expertise of HPC application programmers. In the scope of this thesis, we have designed a framework that allows a straightforward development of scientific applications over heterogeneous platforms enhanced with FPGA. The orientation of the work is towards a programming environment that requires the minimum knowledge of the underlying architecture, and an FPGA can be used in the same way as any other accelerator. In the core of the environment, there is the StarPU heterogeneous runtime system, that was extended to support FPGA, hiding from the programmer complex operations deriving from the complexity of the underlying architecture while it allows fine control of the performance through different scheduling strategies.For the communication with the FPGA device, we created Conor, a communication library based on RIFFA, that ensures the consistency of the accelerator during scenarios where software threads are interacting with the last concurrently.Our approach is evaluated across two dimensions, one corresponding to the programmability of the framework, and the other to the performance overhead imposed by the additional components attached to the FPGA.The programmability of the framework was evaluated using a basic blocking version of matrix multiplication, which is also used to demonstrate that our development did not impose any additional overhead to the rest of the platform.On top of the first example of matrix multiplication, we created an efficient hardware design of gemm, that will allow the execution of more complex and interesting applications like the Cholesky decomposition
Putigny, Bertrand. "Benchmark-driven Approaches to Performance Modeling of Multi-Core Architectures." Phd thesis, Université Sciences et Technologies - Bordeaux I, 2014. http://tel.archives-ouvertes.fr/tel-00984791.
Full textWanza, Weloli Joël. "Modélisation, simulation de différents types d’architectures de noeuds de calcul basés sur l’architecture ARM et optimisés pour le calcul haute-performance." Thesis, Université Côte d'Azur (ComUE), 2019. http://www.theses.fr/2019AZUR4042.
Full textThis work is part of a family of European projects called Mont-Blanc whose objective is to develop the next generation of Exascale systems. It addresses specifically the issue of energy efficiency, at micro-architectural level first by considering the use of 64-bit Armv8-A based compute nodes and an associated relevant SoC topology, and examine also the runtime aspects with notably the study of power management strategies that can be better suited to the constraints of HPC highly parallel processing. A design space exploration methodology capable of supporting the simulation of large manycore computing clusters is developped and lead to propose, design and evaluate multi-SoC and their associated SoC Coherent Interconnect models (SCI). This approach is then used to define a pre-exascale architecture allowing to globally reduce the complexity and cost of chip developments without sacrifying performances. The resulting partitioning scheme introduces interesting perspectives at technology level such as the integration of more compute nodes directly on an interposer based System-in-Package (SiP), possibly based on 3D Through Silicon Vias (TSVs) using High Memory Bandwidth (HBM). Energy efficiency is addressed more directly in second instance by studying current power management policies and proposing two strategies to help reducing power while preserving performances. The first one exploits finer application execution knowledge to adjust the frequency of extensive parallel threads and better balance their execution time. The second strategy reduces core frequencies at synchronisation points of jobs to avoid running the cores at full speed while it is not necessary. Experiment results with these strategies, both in simulation and real hardware, show the possibilities offered par this approach to address the strong requirements of Exascale platforms
Chehaimi, Omar. "Parallelizzazione dell'algoritmo di ricostruzione di Feldkamp-Davis-Kress per architetture Low-Power di tipo System-On-Chip." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/13918/.
Full textLanore, Vincent. "On Scalable Reconfigurable Component Models for High-Performance Computing." Thesis, Lyon, École normale supérieure, 2015. http://www.theses.fr/2015ENSL1051/document.
Full textComponent-based programming is a programming paradigm which eases code reuse and separation of concerns. Some component models, which are said to be "reconfigurable", allow the modification at runtime of an application's structure. However, these models are not suited to High-Performance Computing (HPC) as they rely on non-scalable mechanisms.The goal of this thesis is to provide models, algorithms and tools to ease the development of component-based reconfigurable HPC applications.The main contribution of the thesis is the DirectMOD component model which eases development and reuse of distributed transformations. In order to improve on this core model in other directions, we have also proposed:• the SpecMOD formal component model which allows automatic specialization of hierarchical component assemblies and provides high-level software engineering features;• mechanisms for efficient fine-grain reconfiguration for AMR applications, an important application class in HPC.An implementation of DirectMOD, called DirectL2C, as been developed so as to implement a series of benchmarks to evaluate our approach. Experiments on HPC architectures show our approach scales. Moreover, a quantitative analysis of the benchmark's codes show that our approach is compact and eases reuse
Glesser, David. "Road to exascale : improving scheduling performances and reducing energy consumption with the help of end-users." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM044/document.
Full textThe field of High Performance Computing (HPC) is characterized by the contin-uous evolution of computing architectures, the proliferation of computing resourcesand the increasing complexity of applications users wish to solve. One of the mostimportant software of the HPC stack is the Resource and Job Management System(RJMS) which stands between the user workloads and the platform, the applica-tions and the resources. This specialized software provides functions for building,submitting, scheduling and monitoring jobs in a dynamic and complex computingenvironment.In order to reach exaflops HPC systems, new constraints and objectives havebeen introduced. This thesis develops and tests the idea that the users of suchsystems can help reaching the exaflopic scale. Specifically, we show and introducenew techniques that employ users behaviors to improve energy consumption andoverall cluster performances.To test the proposed techniques, we need to develop new tools and method-ologies that scale up to large HPC clusters. Thus, we designed adequate tools thatassess new RJMS scheduling algorithms of such large systems. These tools areable to run on small clusters by emulating or simulating bigger platforms. Afterevaluating different techniques to measure the energy consumption of HPC clusters,we propose a new heuristic, based on the popular Easy Backfilling algorithm, inorder to control the power consumption of such huge systems. We also demonstrate,using the same idea, how to control the energy consumption during a time period.The proposed mechanism is able to limit the energy consumption while keepingsatisfying performances. If energy is a limited resource, it has to be shared fairly.We also present a mechanism which shares energy consumption among users. Weargue that sharing fairly the energy among users should motivate them to reducethe energy consumption of their applications. Finally, we analyze past and presentbehaviors of users using learning algorithms in order to improve the performancesof the parallel platforms. This approach does not only outperform state of the artmethods, it also shows promising insight on how such method can improve otheraspects of RJMS
Bleuse, Raphaël. "Appréhender l'hétérogénéité à (très) grande échelle." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAM053/document.
Full textThe demand for computation power is steadily increasing, driven by the need tosimulate more and more complex phenomena with an increasing amount ofconsumed/produced data.To meet this demand, the High Performance Computing platforms grow in both sizeand heterogeneity.Indeed, heterogeneity allows splitting problems for a more efficient resolutionof sub-problems with ad hoc hardware or algorithms.This heterogeneity arises in the platforms' architecture and in the variety ofprocessed applications.Consequently, the performances become more sensitive to the execution context.We study in this thesis how to qualitatively bring—at a reasonablecost—context-awareness/obliviousness into allocation and scheduling policies.This study is conducted from two standpoints: within single applications, andat the whole platform scale from an inter-applications perspective.We first study the minimization of the makespan of sequential tasks onplatforms with a mixed architecture composed of multiple CPUs and GPUs.We integrate context-awareness into schedulers with an affinity mechanism thatimproves local behavior.This mechanism has been implemented in a parallel run-time, and experimentsshow that it is able to reduce the memory transfers while maintaining a lowmakespan.We then extend the model to implicitly consider parallelism on the CPUs withthe moldable-task model.We propose an efficient algorithm formulated as an integer linear program witha constant performance guarantee of 3/2+ε.Second, we devise a new modeling framework where constraints are a first-classtool.Rather than extending existing models to consider all possible interactions, wereduce the set of feasible schedules by further constraining existing models.We propose a set of reasonable constraints to model application spreading andI/O traffic.We then instantiate this framework for unidimensional topologies, and propose acomprehensive case study of the makespan minimization under convex and localconstraints
Diakhaté, François. "Contribution à l'élaboration de supports exécutifs exploitant la virtualisation pour le calcul hautes performances." Phd thesis, Université Sciences et Technologies - Bordeaux I, 2010. http://tel.archives-ouvertes.fr/tel-00798832.
Full textAugonnet, Cédric. "Scheduling Tasks over Multicore machines enhanced with Accelerators : a Runtime System’s Perspective." Thesis, Bordeaux 1, 2011. http://www.theses.fr/2011BOR14460/document.
Full textMulticore machines equipped with accelerators are becoming increasingly popular in the HighPerformance Computing ecosystem. Hybrid architectures provide significantly improved energyefficiency, so that they are likely to generalize in the Manycore era. However, the complexity introducedby these architectures has a direct impact on programmability, so that it is crucial toprovide portable abstractions in order to fully tap into the potential of these machines. Pure offloadingapproaches, that consist in running an application on regular processors while offloadingpredetermined parts of the code on accelerators, are not sufficient. The real challenge is to buildsystems where the application would be spread across the entire machine, that is, where computationwould be dynamically scheduled over the full set of available processing units.In this thesis, we thus propose a new task-based model of runtime system specifically designedto address the numerous challenges introduced by hybrid architectures, especially in terms of taskscheduling and of data management. In order to demonstrate the relevance of this model, we designedthe StarPU platform. It provides an expressive interface along with flexible task schedulingcapabilities tightly coupled to an efficient data management. Using these facilities, together witha database of auto-tuned per-task performance models, it for instance becomes straightforward todevelop efficient scheduling policies that take into account both computation and communicationcosts. We show that our task-based model is not only powerful enough to provide support forclusters, but also to scale on hybrid manycore architectures.We analyze the performance of our approach on both synthetic and real-life workloads, andshow that we obtain significant speedups and a very high efficiency on various types of multicoreplatforms enhanced with accelerators
Didelot, Sylvain. "Improving memory consumption and performance scalability of HPC applications with multi-threaded network communications." Thesis, Versailles-St Quentin en Yvelines, 2014. http://www.theses.fr/2014VERS0029/document.
Full textA recent trend in high performance computing shows a rising number of cores per compute node, while the total amount of memory per compute node remains constant. To scale parallel applications on such large machines, one of the major challenges is to keep a low memory consumption. This thesis develops a multi-threaded communication layer over Infiniband which provides both good performance of communications and a low memory consumption. We target scientific applications parallelized using the MPI standard in pure mode or combined with a shared memory programming model. Starting with the observation that network endpoints and communication buffers are critical for the scalability of MPI runtimes, the first contribution proposes three approaches to control their usage. We introduce a scalable and fully-connected virtual topology for connection-oriented high-speed networks. In the context of multirail configurations, we then detail a runtime technique which reduces the number of network connections. We finally present a protocol for dynamically resizing network buffers over the RDMA technology. The second contribution proposes a runtime optimization to enforce the overlap potential of MPI communications, showing a 2x improvement factor on communications. The third contribution evaluates the performance of several MPI runtimes running a seismic modeling application in a hybrid context. On large compute nodes up to 128 cores, the introduction of OpenMP in the MPI application saves up to 17 % of memory. Moreover, we show a performance improvement with our multi-threaded communication layer where the OpenMP threads concurrently participate to the MPI communications
Vasseur, Romain. "Développements HPC pour une nouvelle méthode de docking inverse : applications aux protéines matricielles." Thesis, Reims, 2015. http://www.theses.fr/2015REIMS036.
Full textThis work is a methodological and software development of so-called inverse molecular docking method. This method offers through an in house program AMIDE — Automatic Reverse Docking Engine — to distribute large numbers of molecular docking simulations on HPC architectures (com- puting clusters) with AutoDock 4.2 and AutoDock Vina applications. The principle of this method is to test small molecules on a set of potential target proteins. The program optimum parameters were defined from a pilot study and the protocol was validated on ligands and peptides binding MMPs and EBP extracellular matrix proteins. This method improves the conformational search in docking computation on experimental structures compared to existing protocols (blind docking). It is shown that the AMIDE program is more efficient to discriminate preferred binding sites in inverse proteins screening experiments than blind docking. These results are obtained by the implemen- tation of methods for partitioning the search space that also allow through a hybrid distribution system to deploy a set of independent embarassingly parallel tasks perfectly scalable
Martsinkevich, Tatiana V. "Improving message logging protocols towards extreme-scale HPC systems." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112215.
Full textExisting petascale machines have a Mean Time Between Failures (MTBF) in the order of several hours. It is predicted that in the future systems the MTBF will decrease. Therefore, applications that will run on these systems need to be able to tolerate frequent failures. Currently, the most common way to do this is to use global application checkpoint/restart scheme: if some process fails the whole application rolls back the its last checkpointed state and re-executes from that point. This solution will become infeasible at large scale, due to its energy costs and inefficient resource usage. Therefore fine-grained failure containment is a strongly required feature for the fault tolerance techniques that target large-scale executions. In the context of message passing MPI applications, message logging fault tolerance protocols provide good failure containment as they require restart of only one process or, in some cases, a bounded number of processes. However, existing logging protocols experience a number of issues which prevent their usage at large scale. In particular, they tend to have high failure-free overhead because they usually need to store reliably any nondeterministic events happening during the execution of a process in order to correctly restore its state in recovery. Next, as message logs are usually stored in the volatile memory, logging may incur large memory footprint, especially in communication-intensive applications. This is particularly important because the future exascale systems expect to have less memory available per core. Another important trend in HPC is switching from MPI-only applications to hybrid programming models like MPI+threads and MPI+tasks in response to the increasing number of cores per node. This gives opportunities for employing fault tolerance solutions that handle faults on the level of threads/tasks. Such approach has even better failure containment compared to message logging protocols which handle failures on the level of processes. Thus, the work in these dissertation consists of three parts. First, we present a hierarchical log-based fault tolerance solution, called Scalable Pattern-Based Checkpointing (SPBC) for mitigating process fail-stop failures. The protocol leverages a new deterministic model called channel-determinism and a new always-happens-before relation for partial ordering of events in the application. The protocol is scalable, has low overhead in failure-free execution and does not require logging any events, provides perfect failure containment and has a fully distributed recovery. Second, to address the memory limitation problem on compute nodes, we propose to use additional dedicated resources, or logger nodes. All the logs that do not fit in the memory of compute nodes are sent to the logger nodes and kept in their memory. In a series of experiments we show that not only this approach is feasible, but, combined with a hierarchical logging scheme like the SPBC, logger nodes can be an ultimate solution to the problem of memory limitation for logging protocols. Third, we present a log-based fault tolerance protocol for hybrid applications adopting MPI+tasks programming model. The protocol is used to tolerate detected uncorrected errors (DUEs) that happen during execution of a task. Normally, a DUE caused the system to raise an exception which lead to an application crash. Then, the application has to restart from a checkpoint. In the proposed solution, we combine task checkpointing with message logging in order to support task re-execution. Such task-level failure containment can be beneficial in large-scale executions because it avoids the more expensive process-level restart. We demonstrate the advantages of this protocol on the example of hybrid MPI+OmpSs applications
Le, Fevre Valentin. "Resilient scheduling algorithms for large-scale platforms." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEN019.
Full textThis thesis focuses on a major problem for the HPC community: resilience. Computing platforms are bigger and bigger in order to reach what we call exascale, i.e. a computing capacity of 10^18 FLOP/s but they suffer numerous failures. Reducing the execution time and handling the errors are two linked problems: for instance, replication (computing redudancy) decreases the number of critical failures but also decreases the number of available resources. In particular, this thesis focuses on several “checkpoint/restart” mechanisms.(saving the state of an application to restart from that save when a failure occurs): the first part investigates checkpointing on several levels, the use of additional resources to cope with system latency and checkpointing in generic task-graphs. The second part deals with optimal checkpointing strategies when coupled with replication (in linear task graphs, on heterogeneous platforms and with process duplication). The last part explores several scheduling problems linked to increasing disruptions in large-scale platforms
Emeras, Joseph. "Analyse et rejeu de traces de charge dans les grands systèmes de calcul distribués." Phd thesis, Université de Grenoble, 2013. http://tel.archives-ouvertes.fr/tel-00940055.
Full textHaferssas, Ryadh Mohamed. "Espaces grossiers pour les méthodes de décomposition de domaine avec conditions d'interface optimisées." Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066450.
Full textThe objective of this thesis is to design an efficient domain decomposition method to solve solid and fluid mechanical problems, for this, Optimized Schwarz methods (OSM) are considered and revisited. The optimized Schwarz methods were introduced by P.L. Lions. They consist in improving the classical Schwarz method by replacing the Dirichlet interface conditions by a Robin interface conditions and can be applied to both overlapping and non overlapping subdomains. Robin conditions provide us an another way to optimize these methods for better convergence and more robustness when dealing with mechanical problem with almost incompressibility nature. In this thesis, a new theoretical framework is introduced which consists in providing an Additive Schwarz method type theory for optimized Schwarz methods, e.g. Lions' algorithm. We define an adaptive coarse space for which the convergence rate is guaranteed regardless of the regularity of the coefficients of the problem. Then we give a formulation of a two-level preconditioner for the proposed method. A broad spectrum of applications will be covered, such as incompressible linear elasticity, incompressible Stokes problems and unstationary Navier-Stokes problem. Numerical results on a large-scale parallel experiments with thousands of processes are provided. They clearly show the effectiveness and the robustness of the proposed approach
Emeras, Joseph. "Workload Traces Analysis and Replay in Large Scale Distributed Systems." Thesis, Grenoble, 2013. http://www.theses.fr/2013GRENM081/document.
Full textHigh Performance Computing is preparing the era of the transition from Petascale to Exascale. Distributed computing systems are already facing new scalability problems due to the increasing number of computing resources to manage. It is now necessary to study in depth these systems and comprehend their behaviors, strengths and weaknesses to better build the next generation.The complexity of managing users applications on the resources conducted to the analysis of the workload the platform has to support, this to provide them an efficient service.The need for workload comprehension has lead to the collection of traces from production systems and to the proposal of a standard workload format. These contributions enabled the study of numerous of these traces. This also lead to the construction of several models, based on the statistical analysis of the different workloads from the collection.Until recently, existing workload traces did not enabled researchers to study the consumption of resources by the jobs in a temporal way. This is now changing with the need for characterization of jobs consumption patterns.In the first part of this thesis we propose a study of existing workload traces. Then we contribute with an observation of cluster workloads with the consideration of the jobs resource consumptions over time. This highlights specific and unattended patterns in the usage of resources from users.Finally, we propose an extension of the former standard workload format that enables to add such temporal consumptions without loosing the benefit of the existing works.Experimental approaches based on workload models have also served the goal of distributed systems evaluation. Existing models describe the average behavior of observed systems.However, although the study of average behaviors is essential for the understanding of distributed systems, the study of critical cases and particular scenarios is also necessary. This study would give a more complete view and understanding of the performance of the resources and jobs management. In the second part of this thesis we propose an experimental method for performance evaluation of distributed systems based on the replay of production workload trace extracts. These extracts, replaced in their original context, enable to experiment the change of configuration of the system in an online workload and observe the different configurations results. Our technical contribution in this experimental approach is twofold. We propose a first tool to construct the environment in which the experimentation will take place, then we propose a second set of tools that automatize the experiment setup and that replay the trace extract within its original context.Finally, these contributions conducted together, enable to gain a better knowledge of HPC platforms. As future works, the approach proposed in this thesis will serve as a basis to further study larger infrastructures
Said, Issam. "Apports des architectures hybrides à l'imagerie profondeur : étude comparative entre CPU, APU et GPU." Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066531/document.
Full textIn an exploration context, Oil and Gas (O&G) companies rely on HPC to accelerate depth imaging algorithms. Solutions based on CPU clusters and hardware accelerators are widely embraced by the industry. The Graphics Processing Units (GPUs), with a huge compute power and a high memory bandwidth, had attracted significant interest.However, deploying heavy imaging workflows, the Reverse Time Migration (RTM) being the most famous, on such hardware had suffered from few limitations. Namely, the lack of memory capacity, frequent CPU-GPU communications that may be bottlenecked by the PCI transfer rate, and high power consumptions. Recently, AMD has launched theAccelerated Processing Unit (APU): a processor that merges a CPU and a GPU on the same die, with promising features notably a unified CPU-GPU memory. Throughout this thesis, we explore how efficiently may the APU technology be applicable in an O&G context, and study if it can overcome the limitations that characterize the CPU and GPU based solutions. The APU is evaluated with the help of memory, applicative and power efficiency OpenCL benchmarks. The feasibility of the hybrid utilization of the APUs is surveyed. The efficiency of a directive based approach is also investigated. By means of a thorough review of a selection of seismic applications (modeling and RTM) on the node level and on the large scale level, a comparative study between the CPU, the APU and the GPU is conducted. We show the relevance of overlapping I/O and MPI communications with computations for the APU and GPUclusters, that APUs deliver performances that range between those of CPUs and those of GPUs, and that the APU can be as power efficient as the GPU
Bouguerra, Mohamed Slim. "Tolérance aux pannes dans des environnements de calcul parallèle et distribué : optimisation des stratégies de sauvegarde/reprise et ordonnancement." Thesis, Grenoble, 2012. http://www.theses.fr/2012GRENM023/document.
Full textThe parallel computing platforms available today are increasingly larger. Typically the emerging parallel platforms will be composed of several millions of CPU cores running up to a billion of threads. This intensive growth of the number of parallel threads will make the application subject to more and more failures. Consequently it is necessary to develop efficient strategies providing safe and reliable completion for HPC parallel applications. Checkpointing is one of the most popular and efficient technique for developing fault-tolerant applications on such a context. However, checkpoint operations are costly in terms of time, computation and network communications. This will certainly affect the global performance of the application. In the first part of this thesis, we propose a performance model that expresses formally the checkpoint scheduling problem. Two variants of the problem have been considered. In the first variant, the objective is the minimization of the expected completion time. Under this model we prove that when the failure rate and the checkpoint cost are constant the optimal checkpoint strategy is necessarily periodic. For the general problem when the failure rate and the checkpoint cost are arbitrary we provide a numerical solution for the problem. In the second variant if the problem, we exhibit the tradeoff between the impact of the checkpoints operations and the lost computation due to failures. In particular, we prove that the checkpoint scheduling problem is NP-hard even in the simple case of uniform failure distribution. We also present a dynamic programming scheme for determining the optimal checkpointing times in all the variants of the problem. In the second part of this thesis, we design several fault tolerant scheduling algorithms that minimize the application makespan and in the same time maximize the application reliability. Mainly, in this part we point out that the growth rate of the failure distribution determines the relationship between both objectives. More precisely we show that when the failure rate is decreasing the two objectives are antagonist. In the second hand when the failure rate is increasing both objective are congruent. Finally, we provide approximation algorithms for both failure rate cases
Heinrich, Franz. "Modélisation, prédiction et optimisation de la consommation énergétique d'applications MPI à l'aide de SimGrid." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM018/document.
Full textThe High-Performance Computing (HPC) community is currently undergoingdisruptive technology changes in almost all fields, including a switch towardsmassive parallelism with several thousand compute cores on a single GPU oraccelerator and new, complex networks. Powering a massively parallel machinebecomesThe energy consumption of these machines will continue to grow in the future,making energy one of the principal cost factors of machine ownership. This explainswhy even the classic metric "flop/s", generally used to evaluate HPC applicationsand machines, is widely regarded as to be replaced by an energy-centric metric"flop/watt".One approach to predict energy consumption is through simulation, however, a pre-cise performance prediction is crucial to estimate the energy faithfully. In this thesis,we contribute to the performance and energy prediction of HPC architectures. Wepropose an energy model which we have implemented in the open source SimGridsimulator. We validate this model by carefully and systematically comparing itwith real experiments. We leverage this contribution to both evaluate existingand propose new DVFS governors that are part*icularly designed to suit the HPCcontext
Genet, Damien. "Conception et réalisation d'un solveur pour les problèmes de dynamique des fluides pour les architectures many-core." Thesis, Bordeaux, 2014. http://www.theses.fr/2014BORD0379/document.
Full textNumerical simulation is nowadays an essential part of engineering analysis, be it to design anew plane, or to detect underground oil reservoirs. Numerical simulations have indeed become an important complement to theoretical and experimental investigation, allowing one to reduce the cost of engineering design processes. In order to achieve a high level of precision, one need to increase the resolution of his computational domain. So to keep getting results in reasonable time, one shall nd a way to speed-up computations. To do this, we use high performance computing, HPC, to exploit the complex architecture of modern supercomputers. Under these two constraints, and some other like the genericity of finite elements, or the mesh dimension, we developed a new platform AeroSol. In this thesis, we present the mathematical background, and the two types of schemes that are implemented in the platform, the continuous finite elements method, and the discontinuous one. Then, we present the design choices made in the platform,then, we study a sub-problem, the assembly operation, which can be found in linear algebra multi-frontal methods
Ho, Minh Quan. "Optimisation de transfert de données pour les processeurs pluri-coeurs, appliqué à l'algèbre linéaire et aux calculs sur stencils." Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAM042/document.
Full textUpcoming Exascale target in High Performance Computing (HPC) and disruptive achievements in artificial intelligence give emergence of alternative non-conventional many-core architectures, with energy efficiency typical of embedded systems, and providing the same software ecosystem as classic HPC platforms. A key enabler of energy-efficient computing on many-core architectures is the exploitation of data locality, specifically the use of scratchpad memories in combination with DMA engines in order to overlap computation and communication. Such software paradigm raises considerable programming challenges to both the vendor and the application developer. In this thesis, we tackle the memory transfer and performance issues, as well as the programming challenges of memory- and compute-intensive HPC applications on he Kalray MPPA many-core architecture. With the first memory-bound use-case of the lattice Boltzmann method (LBM), we provide generic and fundamental techniques for decomposing three-dimensional iterative stencil problems onto clustered many-core processors fitted withs cratchpad memories and DMA engines. The developed DMA-based streaming and overlapping algorithm delivers 33%performance gain over the default cache-based implementation.High-dimensional stencil computation suffers serious I/O bottleneck and limited on-chip memory space. We developed a new in-place LBM propagation algorithm, which reduces by half the memory footprint and yields 1.5 times higher performance-per-byte efficiency than the state-of-the-art out-of-place algorithm. On the compute-intensive side with dense linear algebra computations, we build an optimized matrix multiplication benchmark based on exploitation of scratchpad memory and efficient asynchronous DMA communication. These techniques are then extended to a DMA module of the BLIS framework, which allows us to instantiate an optimized and portable level-3 BLAS numerical library on any DMA-based architecture, in less than 100 lines of code. We achieve 75% peak performance on the MPPA processor with the matrix multiplication operation (GEMM) from the standard BLAS library, without having to write thousands of lines of laboriously optimized code for the same result
TANGHERLONI, ANDREA. "High-Performance Computing to tackle complex problems in life sciences." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2019. http://hdl.handle.net/10281/241217.
Full textRecent advances in several research fields of Life Sciences, such as Bioinformatics, Computational Biology and Medical Imaging, are generating huge amounts of data that require effective computational tools to be analyzed, while other disciplines, like Systems Biology, typically deal with mathematical models of biochemical networks, where issues related to the lack of quantitative parameters and the efficient description of the emergent dynamics must be faced. In these contexts, High-Performance Computing (HPC) infrastructures represent a fundamental means to tackle these problems, allowing for both real-time processing of data and fast simulations. In the latest years, the use of general-purpose many-core devices, such as Many Integrated Core coprocessors and Graphics Processing Units (GPUs), gained ground. The second ones, which are pervasive, relatively cheap and extremely efficient parallel many-core coprocessors capable of achieving tera-scale performance on common workstations, have been extensively exploited in the work presented in this thesis. Moreover, some of the problems described here require the application of Computational Intelligence (CI) methods. As a matter fact, the Parameter Estimation problem in Systems Biology, the Haplotype Assembly problem in Genome Analysis as well as the enhancement and segmentation of medical images characterized by a bimodal gray level intensity histogram can be viewed as optimization problems, which can be effectively addressed by relying on CI approaches. In the case of the Parameter Estimation problem, Evolutionary and Swarm Intelligence techniques were exploited and coupled with novel GPU-powered simulators-designed and developed in this thesis to execute both coarse-grained and fine-grained simulations-which were used to perform in a parallel fashion the biochemical simulations underlying the fitness functions required by these population-based approaches. The Haplotype Assembly and the enhancement of medical images problems were both addressed by means of Genetic Algorithms (GAs), which were shown to be very effective in solving combinatorial problems. Since the proposed approaches based on GAs are computationally demanding, a Master-Slave paradigm was exploited to distribute the workload, reducing the required running time. The overall results show that coupling HPC and CI techniques is advantageous to address these problems and speed up the computational analyses in these research fields.
Galia, Antonio. "A Dynamic Homogenization Method for Nuclear Reactor Core Calculations." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASP042.
Full textThree-dimensional deterministic core calculations are typically based on the classical two-step approach, where the homogenized cross sections of an assembly type are pre-calculated and then interpolated to the actual state in the reactor. The weighting flux used for cross-section homogenization is determined assuming the fundamental mode condition and using a critical-leakage modelthat does not account for the actual environment of an assembly. On the other hand, 3D direct transport calculations and the 2D/1D Fusion method, mostly based on the method of characteristics, have recently been applied showing excellent agreement with reference Monte-Carlo code, but still remaining computationally expensive for multiphysics applications and core depletioncalculations.In the present work, we propose a method of Dynamic Homogenization as an alternative technique for 3D core calculations, in the framework of domain decomposition method that can be massively parallelized. It consists of an iterative process between core and assembly calculationsthat preserves assembly exchanges. The main features of this approach are:i) cross-sections homogenization takes into account the environment of each assembly in the core;ii) the reflector can be homogenized with its realistic 2D geometry and its environment;iii) the method avoids expensive 3D transport calculations;iv) no “off-line” calculation and therefore v) no cross-section interpolation is required.The verification tests on 2D and 3D full core problems are presented applying several homogenization and equivalence techniques, comparing against direct 3D transport calculation. For this analysis, we solved the NEA “PWR MOX/UO2 Core Benchmark” problem, which is characterized by strong radial heterogeneities due to the presence of different types of UOx and MOx assemblies at different burnups. The obtained results show the advantages of the proposed method in terms of precision with respect to two-step and performances with respect to the direct approach
Yildiz, Orcun. "Efficient Big Data Processing on Large-Scale Shared Platforms ˸ managing I/Os and Failure." Thesis, Rennes, École normale supérieure, 2017. http://www.theses.fr/2017ENSR0009/document.
Full textAs of 2017, we live in a data-driven world where data-intensive applications are bringing fundamental improvements to our lives in many different areas such as business, science, health care and security. This has boosted the growth of the data volumes (i.e., deluge of Big Data). To extract useful information from this huge amount of data, different data processing frameworks have been emerging such as MapReduce, Hadoop, and Spark. Traditionally, these frameworks run on largescale platforms (i.e., HPC systems and clouds) to leverage their computation and storage power. Usually, these largescale platforms are used concurrently by multiple users and multiple applications with the goal of better utilization of resources. Though benefits of sharing these platforms exist, several challenges are raised when sharing these large-scale platforms, among which I/O and failure management are the major ones that can impact efficient data processing.To this end, we first focus on I/O related performance bottlenecks for Big Data applications on HPC systems. We start by characterizing the performance of Big Data applications on these systems. We identify I/O interference and latency as the major performance bottlenecks. Next, we zoom in on I/O interference problem to further understand the root causes of this phenomenon. Then, we propose an I/O management scheme to mitigate the high latencies that Big Data applications may encounter on HPC systems. Moreover, we introduce interference models for Big Data and HPC applications based on the findings we obtain in our experimental study regarding the root causes of I/O interference. Finally, we leverage these models to minimize the impact of interference on the performance of Big Data and HPC applications. Second, we focus on the impact of failures on the performance of Big Data applications by studying failure handling in shared MapReduce clusters. We introduce a failure-aware scheduler which enables fast failure recovery while optimizing data locality thus improving the application performance
Ponsard, Raphael. "Traitement en temps réel, haut débit et faible latence, d'images par coprocesseurs GPU & FPGA utilisant les techniques d'accès direct à la mémoire distante." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALT071.
Full textThe constant evolution of X-ray photon sources associated to the increasing performance of high-end X-ray detectors allows cutting-edge experiments that can produce very high throughput data streams and generate large volumes of data that are challenging to manage and store.In this context, it becomes fundamental to optimize processing architectures that allow real-time image processing such as raw data pre-treatment, data reduction, data compression, fast-feedback.These data management challenges have still not been addressed in a fully satisfactory way as of today, and in any case, not in a generic manner.This thesis is part of the ESRF RASHPA project that aims at developing a RDMA-based Acquisition System for High Performance Applications.One of the main characteristics of this framework is the direct data placement, straight from the detector head (data producer) to the processing computing infrastructure (data receiver), at the highest acceptable throughput, using Remote Direct Memory Access (RDMA) and zero-copy techniques with minimal Central Processing Unit (CPU) interventions.The work carried out in this thesis is a contribution to the RASHPA framework, enabling data transfer directly to the internal memory of accelerator boards.A low-latency synchronisation mechanism between the RDMA network interface cards (RNIC) and the processing unit is proposed to trigger data processing while keeping pace with detector.Thus, a comprehensive solution fulfilling the online data analysis challenges is proposed on standard computer and massively parallel coprocessors as well.Scalability and versatility of the proposed approach is exemplified by detector emulators, leveraging RoCEv2 (RDMA over Converged Ethernet) or PCI-Express links and RASHPA Processing Units (RPUs) such as Graphic Processor Units (GPUs) and Field Gate Programmable Arrays (FPGAs).Real-time data processing on FPGA, seldom adopted in X ray science, is evaluated and the benefits of high level synthesis are exhibited.The framework is supplemented with an allocator of large contiguous memory chunk in main memory and an address translation system for accelerators, both geared towards DMA transfer.The assessment of the proposed pipeline was performed with online data analysis as found in serial diffraction experiments.This includes raw data pre-treatment as foreseen with adaptive gain detectors, image rejection using Bragg's peaks counting and data compression to sparse matrix format
Palomares, Vincent. "Combiner approches statique et dynamique pour modéliser la performance de boucles HPC." Thesis, Versailles-St Quentin en Yvelines, 2015. http://www.theses.fr/2015VERS040V/document.
Full textThe complexity of CPUs has increased considerably since their beginnings, introducing mechanisms such as register renaming, out-of-order execution, vectorization,prefetchers and multi-core environments to keep performance rising with each product generation. However, so has the difficulty in making proper use of all these mechanisms, or even evaluating whether one’s program makes good use of a machine,whether users’ needs match a CPU’s design, or, for CPU architects, knowing how each feature really affects customers.This thesis focuses on increasing the observability of potential bottlenecks inHPC computational loops and how they relate to each other in modern microarchitectures.We will first introduce a framework combining CQA and DECAN (respectively static and dynamic analysis tools) to get detailed performance metrics on smallcodelets in various execution scenarios.We will then present PAMDA, a performance analysis methodology leveraging elements obtained from codelet analysis to detect potential performance problems in HPC applications and help resolve them. A work extending the Cape linear model to better cover Sandy Bridge and give it more flexibility for HW/SW codesign purposes will also be described. It will bedirectly used in VP3, a tool evaluating the performance gains vectorizing loops could provide.Finally, we will describe UFS, an approach combining static analysis and cycle accurate simulation to very quickly estimate a loop’s execution time while accounting for out-of-order limitations in modern CPUs
Gouin, Florian. "Méthodologie de placement d'algorithmes de traitement d'images sur architecture massivement parallèle." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLEM075.
Full textIn industries, the curse of image sensors for higher definitions increases the amount of data to be processed in the image processing domain. The concerned algorithms, applied to embedded solutions, also have to frequently accept real-time constraints. So, the main issues are to moderate power consumption, to attain high performance computings and high memory bandwidth for data delivery.The massively parallel conception of GPUs is especially well adapted for this kind of tasks. However, this achitecture is complex to handle. Some reasons are its multiple memory and computation hierachical levels or the usage of this accelerator inside a global heterogeneous architecture. Therefore, mapping algorithms on GPUs, while exploiting high performance capacities of this architecture, aren’t trivial operations.In this thesis, we have developped a mapping methodology for sequential algorithms and designed it for GPUs. This methodology is made up of code analysis phases, mapping criteria verifications, code transformations and a final code generation phase. Part of the defined mapping criteria has been designed to assure the mapping legality, by considering GPU hardware specifities, whereas the other part are used to improve runtimes. In addition, we have studied GPU memories performances and the capacity of GPU to efficiently support coarse grain parallellism. This complementary work is a foundation for further improvments of GPU resources exploitation inside this mapping methodology.Last, the experimental results have revealed the functional reliability of the codes mapped on GPU and a speedup on the runtime of many C and C++ image processing applications used in industry
Lasry, Jérémie. "Calculs de plaques fissurées en flexion avec la méthode des éléments finis étendue (XFEM)." Phd thesis, INSA de Toulouse, 2009. http://tel.archives-ouvertes.fr/tel-00465635.
Full textMoustafa, Salli. "Massively Parallel Cartesian Discrete Ordinates Method for Neutron Transport Simulation." Thesis, Bordeaux, 2015. http://www.theses.fr/2015BORD0408/document.
Full textHigh-fidelity nuclear reactor core simulations require a precise knowledge of the neutron flux inside the reactor core. This flux is modeled by the linear Boltzmann equation also called neutron transport equation. In this thesis, we focus on solving this equation using the discrete ordinates method (SN) on Cartesian mesh. This method involves a source iteration scheme including a sweep over the spatial mesh and gathering the vast majority of computations in the SN method. Due to the large amount of computations performed in the resolution of the Boltzmann equation, numerous research works were focused on the optimization of the time to solution by developing parallel algorithms for solving the transport equation. However, these algorithms were designed by considering a super-computer as a collection of independent cores, and therefore do not explicitly take into account the memory hierarchy and multi-level parallelism available inside modern super-computers. Therefore, we first proposed a strategy for designing an efficient parallel implementation of the sweep operation on modern architectures by combining the use of the SIMD paradigm thanks to C++ generic programming techniques and an emerging task-based runtime system: PaRSEC. We demonstrated the need for such an approach using theoretical performance models predicting optimal partitionings. Then we studied the challenge of converging the source iterations scheme in highly diffusive media such as the PWR cores. We have implemented and studied the convergence of a new acceleration scheme (PDSA) that naturally suits our Hybrid parallel implementation. The combination of all these techniques have enabled us to develop a massively parallel version of the SN Domino solver. It is capable of tackling the challenges posed by the neutron transport simulations and compares favorably with state-of-the-art solvers such as Denovo. The performance of the PaRSEC implementation of the sweep operation reaches 6.1 Tflop/s on 768 cores corresponding to 33.9% of the theoretical peak performance of this set of computational resources. For a typical 26-group PWR calculations involving 1.02×1012 DoFs, the time to solution required by the Domino solver is 46 min using 1536 cores
Garlet, Milani Luís Felipe. "Autotuning assisté par apprentissage automatique de tâches OpenMP." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM022.
Full textModern computer architectures are highly complex, requiring great programming effort to obtain all the performance the hardware is capable of delivering. Indeed, while developers know potential optimizations, the only feasible way to tell which of them is faster for some platform is to test it. Furthermore, the many differences between two computer platforms, in the number of cores, cache sizes, interconnect, processor and memory frequencies, etc, makes it very challenging to have the same code perform well over several systems. To extract the most performance, it is often necessary to fine-tune the code for each system. Consequently, developers adopt autotuning to achieve some degree of portable performance. This way, the potential optimizations can be specified once, and, after testing each possibility on a platform, obtain a high-performance version of the code for that particular platform. However, this technique requires tuning each application for each platform it targets. This is not only time consuming but the autotuning and the real execution of the application differ. Differences in the data may trigger different behaviour, or there may be different interactions between the threads in the autotuning and the actual execution. This can lead to suboptimal decisions if the autotuner chooses a version that is optimal for the training but not for the real execution of the application. We propose the use of autotuning for selecting versions of the code relevant for a range of platforms and, during the execution of the application, the runtime system identifies the best version to use using one of three policies we propose: Mean, Upper Confidence Bound, and Gradient Bandit. This way, training effort is decreased and it enables the use of the same set of versions with different platforms without sacrificing performance. We conclude that the proposed policies can identify the version to use without incurring substantial performance losses. Furthermore, when the user does not know enough details of the application to configure optimally the explore-then-commit policy usedy by other runtime systems, the more adaptable UCB policy can be used in its place
Ben, Hassan Saïdi Ismaïl. "Numerical simulations of the shock wave-boundary layer interactions." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS390/document.
Full textSituations where an incident shock wave impinges upon a boundary layer are common in the aeronautical and spatial industries. Under certain circumstances (High Mach number, large shock angle...), the interaction between an incident shock wave and a boundary layer may create an unsteady separation bubble. This bubble, as well as the subsequent reflected shock wave, are known to oscillate in a low-frequency streamwise motion. This phenomenon, called the unsteadiness of the shock wave boundary layer interaction (SWBLI), subjects structures to oscillating loads that can lead to damages for the solid structure integrity.The aim of the present work is the unsteady numerical simulation of (SWBLI) in order to contribute to a better understanding of the SWBLI unsteadiness and the physical mechanism causing these low frequency oscillations of the interaction zone.To perform this study, an original numerical approach is used. The one step Finite Volume approach relies on the discretization of the convective fluxes of the Navier Stokes equations using the OSMP scheme developed up to the 7-th order both in space and time, the viscous fluxes being discretized using a standard centered Finite-Difference scheme. A Monotonicity-Preserving (MP) constraint is employed as a shock capturing procedure. The validation of this approach demonstrates the correct accuracy of the OSMP scheme to predict turbulent features and the great efficiency of the MP procedure to capture discontinuities without spoiling the solution and with an almost negligible additional cost. It is also shown that the use of the highest order tested of the OSMP scheme is relevant in term of simulation time and accuracy compromise. Moreover, an order of accuracy higher than 2-nd order for approximating the diffusive fluxes seems to have a negligible influence on the solution for such relatively high Reynolds numbers.By simulating the 3D unsteady interaction between a laminar boundary layer and an incident shock wave, we suppress the suspected influence of the large turbulent structures of the boundary layer on the SWBLI unsteadiness, the only remaining suspected cause of unsteadiness being the dynamics of the separation bubble. Results show that only the reattachment point oscillates at low frequencies characteristic of the breathing of the separation bubble. The separation point of the recirculation bubble and the foot of the reflected shock wave have a fixed location along the flat plate with respect to time. It shows that, in this configuration, the SWBLI unsteadiness is not observed.In order to reproduce and analyse the SWBLI unsteadiness, the simulation of a shock wave turbulent boundary layer interaction (SWTBLI) is performed. A Synthetic Eddy Method (SEM), adapted to compressible flows, has been developed and used at the inlet of the simulation domain for initiating the turbulent boundary layer without prohibitive additional computational costs. Analyses of the results are performed using, among others, the snapshot Proper Orthogonal Decomposition (POD) technique. For this simulation, the SWBLI unsteadiness has been observed. Results suggest that the dominant flapping mode of the recirculation bubble occurs at medium frequency. These cycles of successive enlargement and shrinkage of the separated zone are shown to be irregular in time, the maximum size of the recirculation bubble being submitted to discrepancies between successive cycles. This behaviour of the separation bubble is responsible for a low frequency temporal modulation of the amplitude of the separation and reattachment point motions and thus for the low frequency breathing of the separation bubble. These results tend to suggest that the SWBLI unsteadiness is related to this low frequency dynamics of the recirculation bubble; the oscillations of the reflected shocks foot being in phase with the motion of the separation point
Al, Hanbali Ahmad. "Évaluation des performances des réseaux sans-fil mobiles." Nice, 2006. http://www.theses.fr/2006NICE4058.
Full textThis thesis deals with the mobility impact on the performance of mobile ad hoc network (MANET). It contains two parts. The first part surveys the TCP protocol over MANET. The main conclusion is that mobility degrades the TCP performance. Since it induces frequent route failures and extended network partitions. These implications were the motivation in the second part to introduce and evaluate new transmission schemes that rely on the mobility to improve the capacity of MANET. More precisely, in the absence of a direct route between two nodes the rest of the nodes in the network can serve as the relay nodes. In the beginning, the focus was on the performance of the relay nodes (throughput and relay buffer size) using a detailed queueing analysis. One of the main results was that random mobility models that have uniform stationary distribution of nodes location achieve the lowest throughput of relaying. Next, in order to optimize the performance of the two-hop relay protocol, especially the delivery delay of packets, we evaluated the multicopy extension under the assumption that the lifetime of the packets is limited. The performance results (delivery delay, round trip time, consumed energy) were derived using the theory of absorbing Markov chains and the fluid approximations. These results were exploited to optimize the total energy consumed subject to a constraint on the delivery delay
Al, Hanbali Ahmad Altman Eitan Nain Philippe. "Évaluation des performances des réseaux sans-fil mobiles." [S.l.] : [s.n.], 2006. http://www-sop.inria.fr/dias/Theses/phd-218.pdf.
Full textGupta, Adarsh Baboo. "Numerical Simulations of the shock wave-boundary layer interaction in complex geometries." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPAST013.
Full textThe objective of the present thesis work is to provide a better insight of the SWBLI unsteadiness due to the low-frequency streamwise oscillations of the separation bubble. To investigate this low frequency motion, DNS of the interaction between the shock wave and laminar boundary layer in complex geometries has been carried out. To perform those simulations, a modified numerical approach for curvilinear coordinate, implemented in an in-house parallel (MPI) Finite-Volume based DNS/LES solver (CHORUS) developed at LIMSI-CNRS is used.The first part of the thesis is the validation of the modified numerical approach. The influence of the mesh distortion has been analyzed from several test cases. The errors introduced by different types of deformation for the three test cases dealing with advection, turbulence, and shock wave were identifiable. The errors created by deformation of the mesh are found comparatively low if the control volumes stay close to a parallelepiped. In some cases, a significant rise has been seen due to the introduction of the non-orthogonality of the mesh.The second part is the validation of code in the framework of supersonic flows around a compression corner which is the core of the present dissertation. The validation studies have been carried out for the case of both inviscid and viscous flows over a compression ramp and the comparison with theoretical as well as numerical data has been presented. This comparison has shown that the results obtained with CHORUS code are in good agreement with the reference data. However, those studies are rather old and a lot of progress has been made in numerical methods for high-speed flow simulations. Unfortunately, there are only a few recent studies concerning simulations or experiments of fully laminar flow around ramps or other complex geometries that could have helped to assess Chorus’ ability to compute such flows. It has then been decided to create our own test case using an extensively tested supersonic flow solver, rhoCentralFoam of the OpenFOAM open-source numerical package. The results obtained provided the difference in the two numerical approaches and allowed us to consider Chorus as validated for DNS of compressible flows with shocks in complex geometries.Consequently, the last chapter deals with the physical analysis of the flow created by a laminar boundary layer developing around two geometries: a classical compression ramp and a compression-expansion ramp. As said earlier, the goal of those simulations was to determine whether the low-frequency oscillations of the recirculation zone can be related to the coherent structures in the incoming boundary layer. The results have demonstrated that, for both configurations, the separation shock IS NOT subjected to longitudinal oscillations. However, when analysing the spectra from probes in the vicinity of the separation point, it has appeared that all the frequency information is contained in those temporal signals. The conclusion of this study is that the absence of oscillations in the laminar case is not, as originally thought, due to the absence of coherent structures in the incoming boundary layer but rather to the fact that, in the laminar case, the separation bubble extent is too large. As a consequence, even if the perturbations that make the bubble oscillate in the turbulent case are present for laminar boundary layer, they are damped in such a way that they are not able to move the shock system and/or the recirculation zone. The next step to this study would be to reduce either the freestream Mach number or the ramp angle in order to have a smaller recirculation bubble and check if the motion appear in that case
Prat, Raphaël. "Équilibrage dynamique de charge sur supercalculateur exaflopique appliqué à la dynamique moléculaire." Thesis, Bordeaux, 2019. http://www.theses.fr/2019BORD0174/document.
Full textIn the context of classical molecular dynamics applied to condensed matter physics, CEA researchers are studying complex phenomena at the atomic scale. To do this, it is essential to continuously optimize the molecular dynamics codes of recent massively parallel supercomputers to enable physicists to exploit their capacity to numerically reproduce more and more complex physical phenomena. Nevertheless, simulation codes must be adapted to balance the load between the cores of supercomputers.To do this, in this thesis we propose to incorporate the Adaptive Mesh Refinement method into the ExaSTAMP molecular dynamics code. The main objective is to optimize the computation loop performing the calculation of particle interactions using multi-threaded and vectorizable data structures. The structure also reduces the memory footprint of the simulation. The design of the AMR is guided by the need for load balancing and adaptability raised by sets of particles moving dynamically over time.The results of this thesis show that using an AMR structure in ExaSTAMP improves its performance. In particular, the AMR makes it possible to execute 1.31 times faster than before the simulation of a violent shock causing a tin microjet of 1 billion 249 million atoms on 256 KNLs. In addition, simulations that were not conceivable so far can be carried out thanks to AMR, such as the impact of a tin nanodroplet on a solid surface with more than 500 million atoms
Sarton, Jonathan. "Visualisations interactives haute-performance de données volumiques massives : une approche out-of-core multi-résolution basée GPUs." Thesis, Reims, 2018. http://www.theses.fr/2018REIMS022/document.
Full textThese thesis studies are part of the PIA2 project 3DNeuroSecure. This one aims to provide a collaborative system of interactive multi-scale navigation within visual big data (VDB) with ultra-high definition (tera-voxels), potentially multimodal, 3D biomedical imaging as application framework. In addition, this system will be able to integrate a variety of processing and/or annotations (tags) through remote HPC resources. All of these treatments must be possible in an out-of-core context. Because of the visual big data, we have to decoupled the location of acquisition from ones of storage and high performance computation and from ones for the manipulation of the data (various connected devices, mobile or not, smartphone, PC, large display wall, virtual reality room ...). The streaming visualization will be adapted to the user device in terms of both resolution (Full HD to GigaPixel) and 3D rendering (classic rendering on 2D screens, stereoscopic with glasses or autostereoscopic without glasses). All these developments supported by the CReSTIC with the support of MaSCA (Maison de la Simulation de Champagne-Ardenne) can therefore be summarized as: - the definition and implementation of the data structures adapted to the out-of-core visualization of the targeted visual big data. - the adaptation of the specific treatments partners, like interactive 3D rendering, to these new data structures. - the technical architecture choices for the HPC and the virtualization of the navigation software application, to take advantage of "ROMEO", the local datacenter. The auto-/stereoscopic rendering with or without glasses will be operated within the MINT software of the "université de Reims Champagne-Ardenne"