Дисертації з теми "Calculs haute performance"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 дисертацій для дослідження на тему "Calculs haute performance".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Galtier, Jérôme. "Structures de données irrégulières et architectures haute performance : une étude du calcul numérique intensif par le partitionnement de graphes." Versailles-St Quentin en Yvelines, 1997. http://www.theses.fr/1997VERS0001.
Guilloteau, Quentin. "Une approche autonomique à la régulation en ligne de systèmes HPC, avec un support pour la reproductibilité des expériences." Electronic Thesis or Diss., Université Grenoble Alpes, 2023. http://www.theses.fr/2023GRALM075.
High-Performance Computing (HPC) systems have become increasingly more complex, and their performance and power consumption make them less predictable.This unpredictability requires cautious runtime management to guarantee an acceptable Quality-of-Service to the end users.Such a regulation problem arises in the context of the computing grid middleware CiGri that aims at harvesting the idle computing resources of a set of cluster by injection low priority jobs.A too aggressive harvesting strategy can lead to the degradation of the performance for all the users of the clusters, while a too shy harvesting will leave resources idle and thus lose computing power.There is thus a tradeoff between the amount of resources that can be harvested and the resulting degradation of users jobs, which can evolve at runtime based on Service Level Agreements and the current load of the system.We claim that such regulation challenges can be addressed with tools from Autonomic Computing, and in particular when coupled with Control Theory.This thesis investigates several regulation problems in the context of CiGri with such tools.We will focus on regulating the harvesting based on the load of a shared distributed file-system, and improving the overall usage of the computing resources.We will also evaluate and compare the reusability of the proposed control-based solutions in the context of HPC systems.The experiments done in this thesis also led us to investigate new tools and techniques to improve the cost and reproducibility of the experiments.We will present a tool named NixOS-Compose able to generate and deploy reproducible distributed software environments.We will also investigate techniques to reduce the number of machines needed to deploy experiments on grid or cluster middlewares, such as CiGri, while ensuring an acceptable level of realism for the final deployed system
Ho, Minh Quan. "Optimisation de transfert de données pour les processeurs pluri-coeurs, appliqué à l'algèbre linéaire et aux calculs sur stencils." Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAM042/document.
Upcoming Exascale target in High Performance Computing (HPC) and disruptive achievements in artificial intelligence give emergence of alternative non-conventional many-core architectures, with energy efficiency typical of embedded systems, and providing the same software ecosystem as classic HPC platforms. A key enabler of energy-efficient computing on many-core architectures is the exploitation of data locality, specifically the use of scratchpad memories in combination with DMA engines in order to overlap computation and communication. Such software paradigm raises considerable programming challenges to both the vendor and the application developer. In this thesis, we tackle the memory transfer and performance issues, as well as the programming challenges of memory- and compute-intensive HPC applications on he Kalray MPPA many-core architecture. With the first memory-bound use-case of the lattice Boltzmann method (LBM), we provide generic and fundamental techniques for decomposing three-dimensional iterative stencil problems onto clustered many-core processors fitted withs cratchpad memories and DMA engines. The developed DMA-based streaming and overlapping algorithm delivers 33%performance gain over the default cache-based implementation.High-dimensional stencil computation suffers serious I/O bottleneck and limited on-chip memory space. We developed a new in-place LBM propagation algorithm, which reduces by half the memory footprint and yields 1.5 times higher performance-per-byte efficiency than the state-of-the-art out-of-place algorithm. On the compute-intensive side with dense linear algebra computations, we build an optimized matrix multiplication benchmark based on exploitation of scratchpad memory and efficient asynchronous DMA communication. These techniques are then extended to a DMA module of the BLIS framework, which allows us to instantiate an optimized and portable level-3 BLAS numerical library on any DMA-based architecture, in less than 100 lines of code. We achieve 75% peak performance on the MPPA processor with the matrix multiplication operation (GEMM) from the standard BLAS library, without having to write thousands of lines of laboriously optimized code for the same result
Pawlowski, Filip igor. "High-performance dense tensor and sparse matrix kernels for machine learning." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEN081.
In this thesis, we develop high performance algorithms for certain computations involving dense tensors and sparse matrices. We address kernel operations that are useful for machine learning tasks, such as inference with deep neural networks (DNNs). We develop data structures and techniques to reduce memory use, to improve data locality and hence to improve cache reuse of the kernel operations. We design both sequential and shared-memory parallel algorithms. In the first part of the thesis we focus on dense tensors kernels. Tensor kernels include the tensor--vector multiplication (TVM), tensor--matrix multiplication (TMM), and tensor--tensor multiplication (TTM). Among these, TVM is the most bandwidth-bound and constitutes a building block for many algorithms. We focus on this operation and develop a data structure and sequential and parallel algorithms for it. We propose a novel data structure which stores the tensor as blocks, which are ordered using the space-filling curve known as the Morton curve (or Z-curve). The key idea consists of dividing the tensor into blocks small enough to fit cache, and storing them according to the Morton order, while keeping a simple, multi-dimensional order on the individual elements within them. Thus, high performance BLAS routines can be used as microkernels for each block. We evaluate our techniques on a set of experiments. The results not only demonstrate superior performance of the proposed approach over the state-of-the-art variants by up to 18%, but also show that the proposed approach induces 71% less sample standard deviation for the TVM across the d possible modes. Finally, we show that our data structure naturally expands to other tensor kernels by demonstrating that it yields up to 38% higher performance for the higher-order power method. Finally, we investigate shared-memory parallel TVM algorithms which use the proposed data structure. Several alternative parallel algorithms were characterized theoretically and implemented using OpenMP to compare them experimentally. Our results on up to 8 socket systems show near peak performance for the proposed algorithm for 2, 3, 4, and 5-dimensional tensors. In the second part of the thesis, we explore the sparse computations in neural networks focusing on the high-performance sparse deep inference problem. The sparse DNN inference is the task of using sparse DNN networks to classify a batch of data elements forming, in our case, a sparse feature matrix. The performance of sparse inference hinges on efficient parallelization of the sparse matrix--sparse matrix multiplication (SpGEMM) repeated for each layer in the inference function. We first characterize efficient sequential SpGEMM algorithms for our use case. We then introduce the model-parallel inference, which uses a two-dimensional partitioning of the weight matrices obtained using the hypergraph partitioning software. The model-parallel variant uses barriers to synchronize at layers. Finally, we introduce tiling model-parallel and tiling hybrid algorithms, which increase cache reuse between the layers, and use a weak synchronization module to hide load imbalance and synchronization costs. We evaluate our techniques on the large network data from the IEEE HPEC 2019 Graph Challenge on shared-memory systems and report up to 2x times speed-up versus the baseline
Vienne, Jérôme. "Prédiction de performances d'applications de calcul haute performance sur réseau Infiniband." Phd thesis, Grenoble, 2010. http://www.theses.fr/2010GRENM043.
Manufacturers of computer clusters require tools to assist them in making better decisions in terms of architectural design. To address this need, in this thesis work, we focus on the specific issues of estimating computation times and InfiniBand network congestion. These two problems are often dealt with globally. However, an overall approach does not explain the reasons of performance loss related to architectural choices. So our approach was to conduct a more detailed study. In this thesis work, we focus on the following : 1) the estimation of computation time in a Grid, and 2) the estimation of communication times over Infiniband networks. To evaluate the computation time, the proposed approach is based on a static or semi-static analysis of the source code, by cutting it into blocks, before making a micro-benchmarking of these blocks on the targeted architecture. To estimate the communication time, a model of bandwidth sharing for Infiniband networks has been developed, allowing one to predict the impact related to concurrent communications. This model was then incorporated into a simulator to be validated on a set of synthetic communication graphs and on the application Socorro
Vienne, Jérôme. "Prédiction de performances d'applications de calcul haute performance sur réseau Infiniband." Phd thesis, Université de Grenoble, 2010. http://tel.archives-ouvertes.fr/tel-00728156.
Applencourt, Thomas. "Calcul haute performance & chimie quantique." Thesis, Toulouse 3, 2015. http://www.theses.fr/2015TOU30162/document.
This thesis work has two main objectives: 1. To develop and apply original electronic structure methods for quantum chemistry 2. To implement several computational strategies to achieve efficient large-scale computer simulations. In the first part, both the Configuration Interaction (CI) and the Quantum Monte Carlo (QMC) methods used in this work for calculating quantum properties are presented. We then describe more specifically the selected CI approach (so-called CIPSI approach, Configuration Interaction using a Perturbative Selection done Iteratively) that we used for building trial wavefunctions for QMC simulations. As a first application, we present the QMC calculation of the total non-relativistic energies of transition metal atoms of the 3d series. This work, which has required the implementation of Slater type basis functions in our codes, has led to the best values ever published for these atoms. We then present our original implementation of the pseudo-potentials for QMC and discuss the calculation of atomization energies for a benchmark set of 55 organic molecules. The second part is devoted to the Hight Performance Computing (HPC) aspects. The objective is to make possible and/or facilitate the deployment of very large-scale simulations. From the point of view of the developer it includes: The use of original programming paradigms, single-core optimization process, massively parallel calculations on grids (supercomputer and Cloud), development of collaborative tools , etc - and from the user's point of view: Improved code installation, management of the input/output parameters, GUI, interfacing with other codes, etc
Perotin, Matthieu Martineau Patrick. "Calcul haute performance sur matériel générique." S. l. : S. n, 2008. http://theses.abes.fr/2008TOUR4022.
Pérotin, Matthieu. "Calcul haute performance sur matériel générique." Thesis, Tours, 2008. http://www.theses.fr/2008TOUR4022/document.
Two facts are motivating this work: the demand for High Performance Computing of researchers and the low usage of the computing power of the pedagogic ressources. This thesis aims at giving an answer to the demand for HPC, while preserving the pedagogic ressources for the teaching. This work looked for a solution that would be simple and straightforward for the final users. Their needs and wishes lead to the definition of some specifications, in which most of the constraints could be satisfied with the use of a well designed software stack. Some others, however, cannot be satisfied with the use of existing solutions only, they define a new scheduling problem, in which the goal is to schedule the processes on the available ressources. This problem was studied and solved with various heurisitcs, which performances were compared with a simulator before being implemented in an experimental setup
Mena, morales Valentin. "Approche de conception haut-niveau pour l'accélération matérielle de calcul haute performance en finance." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2017. http://www.theses.fr/2017IMTA0018/document.
The need for resources in High Performance Computing (HPC) is generally met by scaling up server farms, to the detriment of the energy consumption of such a solution. Accelerating HPC application on heterogeneous platforms, such as FPGAs or GPUs, offers a better architectural compromise as they can reduce the energy consumption of a deployed system. Therefore, a change of programming paradigm is needed to support this heterogeneous acceleration, which trickles down to an increased level of programming complexity tackled by software experts. This is most notably the case for developers in quantitative finance. Applications in this field are constantly evolving and increasing in complexity to stay competitive and comply with legislative changes. This puts even more pressure on the programmability of acceleration solutions. In this context, the use of high-level development and design flows, such as High-Level Synthesis (HLS) for programming FPGAs, is not enough. A domain-specific approach can help to reach performance requirements, without impairing the programmability of accelerated applications.We propose in this thesis a high-level design approach that relies on OpenCL, as a heterogeneous programming standard. More precisely, a recent implementation of OpenCL for Altera FPGA is used. In this context, four main contributions are proposed in this thesis: (1) an initial study of the integration of hardware computing cores to a software library for quantitative finance (QuantLib), (2) an exploration of different architectures and their respective performances, as well as the design of a dedicated architecture for the pricing of American options and their implied volatility, based on a high-level design flow, (3) a detailed characterization of an Altera OpenCL platform, from elemental operators, memory accesses, control overlays, and up to the communication links it is made of, (4) a proposed compilation flow that is specific to the quantitative finance domain, and relying on the aforementioned characterization and on the description of the considered financial applications (option pricing)
Lagardère, Louis. "Calcul haute-performance et dynamique moléculaire polarisable." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066042.
This works is at the interface between theoretical chemistry, scientific computing and applied mathematics. We study different algorithms used to solve the specific equations that arise in polarizable molecular dynamics in a massively parallel context. This family of models requires indeed to solve more complex equations than in the classical case making the use of supercomputers mandatory in order to get significant results. We will more specifically study different types of boundary conditions that represent different ways to model solvation effects : first the Particle Mesh Ewald method to treat periodic boundary conditions and then a continuum solvation model discretized within a domain decomposition strategy : the ddCOSMO. The outline of this thesis is as follows : first, the different parallel strategies in the general context of molecular dynamics are reviewed. Then several methods to adapt these strategies to the specific case of polarizable force fields are presented. After that, strategies that allow to circumvent certain limits due to the use of iterative methods in the context of polarizable molecular dynamics are presented and studied. Then, the adapation of these methods to different cases of boundary conditions is presented : first in the case of the Particle Mesh Ewald method to treat periodic boundary conditions and then in the case of a particular continuum solvation model discretized with a domain decomposition strategy : the ddCOSMO. Finally, various numerical results and applications are presented
Pasca, Bogdan Mihai. "Calcul flottant haute performance sur circuits reconfigurables." Phd thesis, Ecole normale supérieure de lyon - ENS LYON, 2011. http://tel.archives-ouvertes.fr/tel-00654121.
Perarnau, Swann. "Environnements pour l'analyse expérimentale d'applications de calcul haute performance." Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00650047.
Aubert, Pierre. "Calcul haute performance pour la détection de rayon Gamma." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLV058/document.
The new generation research experiments will introduce huge data surge to a continuously increasing data production by current experiments. This increasing data rate causes upheavals at many levels, such as data storage, analysis, diffusion and conservation.The CTA project will become the utmost observatory of gamma astronomy on the ground from 2021. It will generate hundreds Peta-Bytes of data by 2030 and will have to be stored, compressed and analyzed each year.This work address the problems of data analysis optimization using high performance computing techniques via an efficient data format generator, very low level programming to optimize the CPU pipeline and vectorization of existing algorithms, introduces a fast compression algorithm for integers and finally exposes a new analysis algorithm based on efficient pictures comparison
Partimbene, Vincent. "Calcul haute performance pour la simulation d'interactions fluide-structure." Phd thesis, Toulouse, INPT, 2018. http://oatao.univ-toulouse.fr/20524/1/PARTIMBENE_Vincent.pdf.
Jolivet, Pierre. "Méthodes de décomposition de domaine. Application au calcul haute performance." Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENM040/document.
This thesis introduces a unified framework for various domain decomposition methods:those with overlap, so-called Schwarz methods, and those based on Schur complements,so-called substructuring methods. It is then possible to switch with a high-level of abstractionbetween methods and to build different preconditioners to accelerate the iterativesolution of large sparse linear systems. Such systems are frequently encountered in industrialor scientific problems after discretization of continuous models. Even though thesepreconditioners naturally exhibit good parallelism properties on distributed architectures,they can prove inadequate numerical performance for complex decompositions or multiscalephysics. This lack of robustness may be alleviated by concurrently solving sparse ordense local generalized eigenvalue problems, thus identifying modes that hinder the convergenceof the underlying iterative methods a priori. Using these modes, it is then possibleto define projection operators based on what is usually referred to as a coarse solver. Theseauxiliary tools tend to solve the aforementioned issues, but typically decrease the parallelefficiency of the preconditioners. In this dissertation, it is shown in three points thatthe newly developed construction is efficient: 1) by performing large-scale numerical experimentson Curie—a European supercomputer, and by comparing it with state of the art2) multigrid and 3) direct solvers
Huafeng, Yu. "Un Modèle Réactif Basé sur MARTE Dédié au Calcul Intensif à Parallélisme de Données : Transformation vers le Modèle Synchrone." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2008. http://tel.archives-ouvertes.fr/tel-00497248.
Les travaux de cette thèse s'inscrivent dans le cadre de la validation formelle et le contrôle réactif de calculs à haute performance sur systèmes-sur-puce (SoC).
Dans ce contexte, la première contribution est la modélisation synchrone accompagnée d'une transformation d'applications en équations synchrones. Les modéles synchrones permettent de résoudre plusieurs questions liées à la validation formelle via l'usage des outils et techniques formels offerts par la technologie synchrone. Les transformations sont développées selon l'approche d'Ingénierie Dirigé par les Modèles (IDM).
La deuxième contribution est une extension et amélioration des mécanismes de contrôle pour les calculs à haute performance, sous forme de constructeurs de langage de haut-niveau et de leur sémantique. Ils ont été défini afin de permettre la vérification, synthèse et génération de code. Il s'agit de déterminer un niveau d'abstraction de représentation des systèmes où soit extraite la partie contrôle, et de la modéliser sous forme d'automates à états finis. Ceci permet de spécifier et implémenter des changements de modes de calculs, qui se distinguent par exemple par les ressources utilisées, la qualité de service fournie, ou le choix d'algorithme remplissant une fonctionnalité.
Ces contributions permettent l'utilisation d'outils d'analyse et vérification, tels que la vérification de propriétés d'assignement unique et dépendance acyclique, model checking. L'utilisation de techniques de synthèse de contrôleurs discrets est également traitée. Elles peuvent assurer la correction de faˆ on constructive: à partir d'une spécification partielle du contrôle, la partie manquante pour que les propriétés soient satisfaites est calculée. Grâce à ces techniques, lors du développement de la partie contrôle, la spécification est simplifiée, et le résultat est assuré d'être correct par construction.
Les modélisations synchrone et de contrôle reposes sur MARTE et UML. Les travaux de cette thèse sont été partiellement implémentés dans le cadre de Gaspard, dédié aux applications de traitement de données intensives. Une étude de cas est présentée, dans laquelle nous nous intéressont à une application de système embarqué pour téléphone portable multimédia.
Jamal, Aygul. "A parallel iterative solver for large sparse linear systems enhanced with randomization and GPU accelerator, and its resilience to soft errors." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS269/document.
In this PhD thesis, we address three challenges faced by linear algebra solvers in the perspective of future exascale systems: accelerating convergence using innovative techniques at the algorithm level, taking advantage of GPU (Graphics Processing Units) accelerators to enhance the performance of computations on hybrid CPU/GPU systems, evaluating the impact of errors in the context of an increasing level of parallelism in supercomputers. We are interested in studying methods that enable us to accelerate convergence and execution time of iterative solvers for large sparse linear systems. The solver specifically considered in this work is the parallel Algebraic Recursive Multilevel Solver (pARMS), which is a distributed-memory parallel solver based on Krylov subspace methods.First we integrate a randomization technique referred to as Random Butterfly Transformations (RBT) that has been successfully applied to remove the cost of pivoting in the solution of dense linear systems. Our objective is to apply this method in the ARMS preconditioner to solve more efficiently the last Schur complement system in the application of the recursive multilevel process in pARMS. The experimental results show an improvement of the convergence and the accuracy. Due to memory concerns for some test problems, we also propose to use a sparse variant of RBT followed by a sparse direct solver (SuperLU), resulting in an improvement of the execution time.Then we explain how a non intrusive approach can be applied to implement GPU computing into the pARMS solver, more especially for the local preconditioning phase that represents a significant part of the time to compute the solution. We compare the CPU-only and hybrid CPU/GPU variant of the solver on several test problems coming from physical applications. The performance results of the hybrid CPU/GPU solver using the ARMS preconditioning combined with RBT, or the ILU(0) preconditioning, show a performance gain of up to 30% on the test problems considered in our experiments.Finally we study the effect of soft fault errors on the convergence of the commonly used flexible GMRES (FGMRES) algorithm which is also used to solve the preconditioned system in pARMS. The test problem in our experiments is an elliptical PDE problem on a regular grid. We consider two types of preconditioners: an incomplete LU factorization with dual threshold (ILUT), and the ARMS preconditioner combined with RBT randomization. We consider two soft fault error modeling approaches where we perturb the matrix-vector multiplication and the application of the preconditioner, and we compare their potential impact on the convergence of the solver
Ben, El Haj Ali Amin. "Calcul de haute performance en aéroélasticité et en écoulements turbulents tridimentionnels." Mémoire, École de technologie supérieure, 2008. http://espace.etsmtl.ca/159/1/BEN_HAJ_ALI_Amine.pdf.
Notargiacomo, Thibault. "Approche parcimonieuse et calcul haute performance pour la tomographie itérative régularisée." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAT013/document.
X-Ray computed tomography (CT) is a technique that aims at providing a measure of a given property of the interior of a physical object, given a set of exterior projection measurement. Although CT is a mature technology, most of the algorithm used for image reconstruction in commercial applications are based on analytical methods such as the filtered back-projection. The main idea of this thesis is to exploit the latest advances in the field of applied mathematics and computer sciences in order to study, design and implement algorithms dedicated to 3D cone beam reconstruction from X-Ray flat panel detectors targeting clinically relevant usecases, including low doses and few view acquisitions.In this work, we studied various strategies to model the tomographic operators, and how they can be implemented on a multi-GPU platform. Then we proposed to use the 3D complex wavelet transform in order to regularize the reconstruction problem
Esteghamatian, Amir. "Calcul haute performance pour la simulation multi-échelles des lits fluidisés." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSEC037/document.
Fluidized beds are a particular hydrodynamic configuration in which a pack (either dense or loose) of particles laid inside a container is re-suspended as a result of an upward oriented imposed flow at the bottom of the pack. This kind of system is widely used in the chemical engineering industry where catalytic cracking or polymerization processes involve chemical reactions between the catalyst particles and the surrounding fluid and fluidizing the bed is admittedly beneficial to the efficiency of the process. Due to the wide range of spatial scales and complex features of solid/solid and solid/fluid interactions in a dense fluidized bed, the system can be studied at different length scales, namely micro, meso and macro. In this work we focus on micro/meso simulations of fluidized beds. The workflow we use is based on home made high-fidelity numerical tools: GRAINS3D (Pow. Tech., 224:374-389, 2012) for granular dynamics of convex particles and PeliGRIFF (Parallel Efficient LIbrary for GRains In Fluid Flows, Comp. Fluids, 38(8):1608-1628,2009) for reactive fluid/solid flows. The objectives of our micro/meso simulations of such systems are two-fold: (i) to understand the multi-scale features of the system from a hydrodynamic standpoint and (ii) to analyze the performance of our meso-scale numerical model and to improve it accordingly. To this end, we first perform Particle Resolved Simulations (PRS) of liquid/solid and gas/solid fluidization of a 2000 particle system. The accuracy of the numerical results is examined by assessing the space convergence of the computed solution in order to guarantee that our PRS results can be reliably considered as a reference solution for this problem. The computational challenge for our PRS is a combination of a fine mesh to properly resolve all flow length scales to a long enough physical simulation time in order to extract time converged statistics. For that task, High Performance Computing and highly parallel codes as GRAINS3D/PeliGRIFF are extremely helpful. Second, we carry out a detailed cross-comparison of PRS results with those of locally averaged Euler- Lagrange simulations. Results show an acceptable agreement between the micro- and meso-scale predictions on the integral measures as pressure drop, bed height, etc. However, particles fluctuations are remarkably underpredicted by the meso-scale model, especially in the direction transverse to the main flow. We explore different directions in the improvement of the meso-scale model, such as (a) improving the inter-phase coupling scheme and (b) introducing a stochastic formulation for the drag law derived from the PRS results. We show that both improvements (a) and (b) are required to yield a satisfactory match of meso-scale results with PRS results. The new stochastic drag law, which incorporates information on the first and second-order moments of the PRS results, shows promises to recover the appropriate level of particles fluctuations. It now deserves to be validated on a wider range of flow regimes
Birgle, Nabil. "Écoulement dans le sous-sol, méthodes numériques et calcul haute performance." Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066050/document.
We develop a reliable numerical method to approximate a flow in a porous media, modeled by an elliptic equation. The simulation is made difficult because of the strong heterogeneities of the medium, the size together with complex geometry of the domain. A regular hexahedral mesh does not allow to describe accurately the geological layers of the domain. Consequently, this leads us to work with a mesh made of deformed cubes. There exists several methods of type finite volumes or finite elements which solve this issue. For our method, we wish to have only one degree of freedom per element for the pressure and one degree of freedom per face for the Darcy velocity, to stay as close to the habits of industrial software. Since standard mixed finite element methods does not converge, our method is based on composite mixed finite element. In two dimensions, a polygonal mesh is split into triangles by adding a node to the vertices's barycenter, and explicit formulation of the basis functions was obtained. In dimension 3, the method extend naturally to the case of pyramidal mesh. In the case of a hexahedron or a deformed cube, the element is divided into 24 tetrahedra by adding a node to the vertices's barycenter and splitting the faces into 4 triangles. The basis functions are then built by solving a discrete problem. The proposed methods have been theoretically analyzed and completed by a posteriori estimators. They have been tested on academical and realistic examples by using parallel computation
Laurencin, Tanguy. "Étude de la rhéologie des suspensions de fibres non-newtoniennes par imagerie et simulation numérique 3D à l'échelle des fibres." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAI013/document.
This study focuses on the processing of short fibre-reinforced polymer composites. The physical and mechanical properties of these materials are mainly affected by the position and orientation distribution of fibres induced during their forming. Thus, we analysed the flow-induced micro-mechanisms that arose at the fibre scale during the forming stage of these complex systems which behave as non-Newtonian fibre suspensions. For that purpose, an original approach was developed by combining 3D imaging technique and direct numerical simulation, both performed at the fibre scale. Hence, several model fibre suspensions with a non-Newtonian suspending fluid and with a concentration regime that ranged from dilute to concentrated were prepared . They were subjected to confined lubricated compression loadings using a rheometer mounted on a synchrotron X-ray microtomograph. Thanks to very short scanning times, 3D images of the evolving fibrous microstructures at high spatial resolution were recorded in real-time. These experiments were also simulated using a dedicated Finite Element library enabling an accurate description of fibre kinematics in complex suspending fluids thanks to high performance computation, level sets and adaptive anisotropic meshing. The efficiency of the numerical simulation from the dilute to semi-dilute concentration regimes was assessed through experimental and numerical comparisons.Then, we showed that the confinement effect and the non-Newtonian rheology of the suspending fluid had a weak effect on the fibre kinematics, if the fibres were sufficiently far from the compression platens, typically the fibre-platen distance should be larger than twice the fibre diameter. Otherwise, confinement effects occurred. Some extensions of the dumbbell model were proposed to correct the fibre kinematics in this flow conditions. In semi-dilute concentration, deviations of the fibre kinematics compared to the Jeffery’s predictions were also observed and related to hydrodynamic interactions between fibres. In this case, the predictions of Jeffery’s model and the related assumption of affine fibre motions are less relevant. In the concentrated regime, even if the overall orientation of fibre suspension could be astonishingly well described by using the Jeffery’s model, strong fluctuations on each fibre motion and rotation were observed. These deviations were induced by the numerous fibre-fibre contacts, which could be correctly predicted by the tube model
Rubeck, Christophe. "Calcul hautes performances pour les formulations intégrales en électromagnétisme basses fréquences." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00793505.
Bouvier, Clément. "Sélection de caractéristiques stables pour la segmentation d'images histologiques par calcul haute performance." Thesis, Sorbonne université, 2019. http://www.theses.fr/2019SORUS004.
In preclinical research and more specifically in neurobiology, histology uses images produced by increasingly powerful optical microscopes digitizing entire sections at cell scale. Quantification of stained tissue such as neurons relies on machine learning driven segmentation. However such methods need a lot of additional information, or features, which are extracted from raw data multiplying the quantity of data to process. As a result, the quantity of features is becoming a drawback to process large series of histological images in a fast and robust manner. Feature selection methods could reduce the amount of required information but selected subsets lack of stability. We propose a novel methodology operating on high performance computing (HPC) infrastructures and aiming at finding small and stable sets of features for fast and robust segmentation on high-resolution histological whole sections. This selection has two selection steps: first at feature families scale (an intermediate pool of features, between space and individual feature). Second, feature selection is performed on pre-selected feature families. In this work, the selected sets of features are stables for two different neurons staining. Furthermore the feature selection results in a significant reduction of computation time and memory cost. This methodology can potentially enable exhaustive histological studies at a high-resolution scale on HPC infrastructures for both preclinical and clinical research settings
Bouvier, Clément. "Sélection de caractéristiques stables pour la segmentation d'images histologiques par calcul haute performance." Electronic Thesis or Diss., Sorbonne université, 2019. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2019SORUS004.pdf.
In preclinical research and more specifically in neurobiology, histology uses images produced by increasingly powerful optical microscopes digitizing entire sections at cell scale. Quantification of stained tissue such as neurons relies on machine learning driven segmentation. However such methods need a lot of additional information, or features, which are extracted from raw data multiplying the quantity of data to process. As a result, the quantity of features is becoming a drawback to process large series of histological images in a fast and robust manner. Feature selection methods could reduce the amount of required information but selected subsets lack of stability. We propose a novel methodology operating on high performance computing (HPC) infrastructures and aiming at finding small and stable sets of features for fast and robust segmentation on high-resolution histological whole sections. This selection has two selection steps: first at feature families scale (an intermediate pool of features, between space and individual feature). Second, feature selection is performed on pre-selected feature families. In this work, the selected sets of features are stables for two different neurons staining. Furthermore the feature selection results in a significant reduction of computation time and memory cost. This methodology can potentially enable exhaustive histological studies at a high-resolution scale on HPC infrastructures for both preclinical and clinical research settings
Colin, Alexis. "De la collecte de trace à la prédiction du comportement d'applications parallèles." Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAS020.
Runtime systems are commonly used by parallel applications in order to efficiently exploit the underlying hardware resources. A runtime system hides the complexity of the management of the hardware and exposes a high-level interface to application developers. To this end, it makes decisions by relying on heuristics that estimate the future behavior of the application. We propose Pythia, a library that serves as an oracle capable of predicting the future behavior of an application, so that the runtime system can make more informed decisions. Pythia builds on the deterministic nature of many HPC applications: by recording an execution trace, Pythia captures the application main behavior. The trace can be provided for future executions of the application, and a runtime system can ask for predictions of future program behavior. We evaluate Pythia on 13 MPI applications and show that Pythia can accurately predict the future of most of these applications, even when varying the problem size. We demonstrate how Pythia predictions can guide a runtime system optimization by implementing an adaptive thread parallelism strategy in GNU OpenMP runtime system. The evaluation shows that, thanks to Pythia prediction, the adaptive strategy reduces the execution time of an application by up to 38%
González, Martha. "Application de techniques orientées-objet pour le calcul réparti de haute performance." Paris 6, 2002. http://www.theses.fr/2002PA066161.
Gueunet, Charles. "Calcul haute performance pour l'analyse topologique de données par ensembles de niveaux." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS120.
Topological Data Analysis requires efficient algorithms to deal with the continuously increasing size and level of details of data sets. In this manuscript, we focus on three fundamental topological abstractions based on level sets: merge trees, contour trees and Reeb graphs. We propose three new efficient parallel algorithms for the computation of these abstractions on multi-core shared memory workstations. The first algorithm developed in the context of this thesis is based on multi-thread parallelism for the contour tree computation. A second algorithm revisits the reference sequential algorithm to compute this abstraction and is based on local propagations expressible as parallel tasks. This new algorithm is in practice twice faster in sequential than the reference algorithm designed in 2000 and offers one order of magnitude speedups in parallel. A last algorithm also relying on task-based local propagations is presented, computing a more generic abstraction: the Reeb graph. Contrary to concurrent approaches, these methods provide the augmented version of these structures, hence enabling the full extend of level-set based analysis. Algorithms presented in this manuscript result today in the fastest implementations available to compute these abstractions. This work has been integrated into the open-source platform: the Topology Toolkit (TTK)
Pourroy, Jean. "Calcul Haute Performance : Caractérisation d’architectures et optimisation d’applications pour les futures générations de supercalculateurs." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM028.
Information systems and High-Performance Computing (HPC) infrastructures play an active role in the improvement of scientific knowledge and the evolution of our societies. The field of HPC is expanding rapidly and users need increasingly powerful architectures to analyze the tsunami of data (numerical simulations, IOT), to make more complex decisions (artificial intelligence), and to make them faster (connected cars, weather).In this thesis work, we discuss several challenges (power consumption, cost, complexity) for the development of new generations of Exascale supercomputers. While industrial applications do not manage to achieve more than 10% of the theoretical performance, we show the need to rethink the architecture of platforms, in particular by using energy-optimized architectures. We then present some of the emerging technologies that will allow their development: 3D memories (HBM), Storage Class Memory (SCM) or photonic interconnection technologies. These new technologies associated with a new communication protocol (Gen-Z) will help to optimally execute the different parts of an application. However, in the absence of a method for fine characterization of code performance, these emerging architectures are potentially condemned since few experts know how to exploit them.Our contribution consists in the development of benchmarks and performance analysis tools. The first aim is to finely characterize specific parts of the microarchitecture. Two microbenchmarks have thus been developed to characterize the memory system and the floating point unit (FPU). The second family of tools is used to study the performance of applications. A first tool makes it possible to monitor the memory bus traffic, a critical resource of modern architectures. A second tool can be used to profile applications by extracting and characterizing critical loops (hot spots).To take advantage of the heterogeneity of platforms, we propose a 5-step methodology to identify and characterize these new platforms, to model the performance of an application, and finally to port its code to the selected architecture. Finally, we show how the tools can help developers to extract the maximum performance from an architecture. By providing our tools in open source, we want to sensitize users to this approach and develop a community around the work of performance characterization and analysis
Yenke, Blaise. "Ordonnancement des sauvegardes/reprises d'applications de calcul haute performance dans les environnements dynamiques." Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00685856.
Yenke, Blaise Omer. "Ordonnancement des sauvegardes/reprises d'applications de calcul haute performance dans les environnements dynamiques." Thesis, Grenoble, 2011. http://www.theses.fr/2011GRENM003/document.
The technological advances has led major organizations such as enterprises, universities andresearch institutes to acquire intranets consisting of several servers and many workstations.However, in some of these organizations, the resources are rarely used at nights, weekends and onholidays, thus releasing a large computing power available and unused.This thesis discusses the exploitation of the idle period of workstaions in order to run HPCapplications. The workstations retained are restarted and integrated in dynamically formed clusters.However, the idle periods do not always permit the complete carrying out of the computationsallocated to them. The checkpointing mechanisms are then used to save in a certain period, theexecution context of applications for a possible restart. It is worth nothing that checkpointing all theprocesses in the required period is not always possible. We propose a scheduling model ofcheckpointing in parallel, which takes into account the time constraints imposed and the bandwidthconstraints (network and disk) to maximize the computation time already taken for the applicationswhich are to be checkpointed
Vömel, Christof. "Contributions à la recherche en calcul scientifique haute performance pour les matrices creuses." Toulouse, INPT, 2003. http://www.theses.fr/2003INPT003H.
Guermouche, Amina. "Nouveaux Protocoles de Tolérances aux Fautes pour les Applications MPI du Calcul Haute Performance." Phd thesis, Université Paris Sud - Paris XI, 2011. http://tel.archives-ouvertes.fr/tel-00666063.
Maillard, Nicolas. "Calcul Haute-Performance et Mécanique Quantique : analyse des ordonnancements en temps et en mémoire." Phd thesis, Université Joseph Fourier (Grenoble), 2001. http://tel.archives-ouvertes.fr/tel-00004684.
Visseq, Vincent. "Calcul haute performance en dynamique des contacts via deux familles de décomposition de domaine." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2013. http://tel.archives-ouvertes.fr/tel-00848363.
Baboulin, Marc. "Résolutions rapides et fiables pour les solveurs d'algèbre linéaire numérique en calcul haute performance." Habilitation à diriger des recherches, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00967523.
Latu, Guillaume. "Algorithmique parallèle et calcul haute performance dédiés à la simulation d'un système hôte-macroparasite." Bordeaux 1, 2002. http://www.theses.fr/2002BOR12632.
Relun, Nicolas. "Stratégie multiparamétrique pour la conception robuste en fatigue." Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2011. http://tel.archives-ouvertes.fr/tel-00669449.
Cargnelli, Matthieu. "OpenWP : étude et extension des technologies de Workflows pour le calcul haute performance sur grille." Paris 11, 2008. http://www.theses.fr/2008PA112265.
This thesis has been conduced in an industrial context. It studies the code refactoring from a sequential scientific code into a grid enabled program. The proposed approach is based on the workflow technologies which are well suited for grid. After a presentation of existing solutions for workflow execution on the grid as well as a solution for code parallelization (openMP), the author presents his proposition : OpenWP. OpenWP is a workflow definition language based on directives to turn a sequential code into a workflow. OpenWP allows the controlled execution of this workflow on the grip using a third party workflow enactment engine. A distributed virtually shared memory system is proposed. The defined language is presented in detail and its expressivity is criticized and compared to OpenMP’s. The conception of OpenWP is then described and the technology choices made are explained. A prototype is presented. The document then shows a proof of concept and a series of performance evaluation of OpenWP used on few programs, among which an industrial mesher used by EADS. An hybrid system based on OpenWP and OpenMP is also described. This system must give OpenWP the ability to exploit the resource hierarchy found in the grid, by using the shared memory multi-processors machines whenever possible through OpenMP. A proof of concept test case is provided and commented
Möller, Nathalie. "Adaptation de codes industriels de simulation en Calcul Haute Performance aux architectures modernes de supercalculateurs." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLV088.
For many years, the stability of the architecture paradigm has facilitated the performance portability of large HPC codes from one generation of supercomputers to another.The announced breakdown of the Moore's Law, which rules the progress of microprocessor engraving, ends this model and requires new efforts on the software's side.Code modernization, based on an algorithmic which is well adapted to the future systems, is mandatory.This modernization is based on well-known principles as the computation concurrency, or degree of parallelism, and the data locality.However, the implementation of these principles in large industrial applications, which often are the result of years of development efforts, turns out to be way more difficult than expected.This thesis contributions are twofold :On the one hand, we explore a methodology of software modernization based on the concept of proto-applications and compare it with the direct approach, while optimizing two simulation codes developed in a similar context.On the other hand, we focus on the identification of the main challenges for the architecture, the programming models and the applications.The two chosen application fields are the Computational Fluid Dynamics and Computational Electro Magnetics
El, gharbi Yannis. "Une approche à deux niveaux pour le calcul de structures haute performance : décomposition -- maillage -- résolution." Thesis, université Paris-Saclay, 2021. http://www.theses.fr/2021UPAST001.
Numerical simulations represent a minor part of the certification proceess for critical parts in the industry. However, it would result in significant cost savings during conception phases, avoiding expensive real tests.Indeed, in cases of localized strong heterogeneities across all the structure, it becomes hard, if not impossible, to run successfully these simulations in reasonable times because of a too large number of unknowns needed for a reliable answer of the structure.To obtain this answer, large scale parallel solving methods are necessary. Domain decomposition methods, which are part of it, are the ones investigated during this thesis.The goal is to make these simulations possible thanks to domain decomposition methods.Indeed, the resolution of the problem but also the meshing of the structure become expensive and the use of parallel methods becomes essential.For this purpose, a two-level substructuring method is proposed. It aims at producing, during the pre-processing step, regular-shaped and homogeneous subdomains possibly meshed in parallel. In addition, it allows to a significant reduction of the condition number for strongly heterogeneous problems solved by a FETI solver. A mixed domain decomposition method with a two-level Robin condition which is adapted to this decomposition could then be developped.The long term objective is to deal with problems with a quasi-industrial complexity like computations at the global structural scale with multi-scale materials such as tridimensional woven composites which are used increasingly intensively in the aeronautical industry for instance
Gholami, Bahman. "Application des systèmes de calcul à haute performance dans les études électrothermiques à l'échelle nanoscopique." Thèse, Université du Québec à Trois-Rivières, 2011. http://depot-e.uqtr.ca/2065/1/030259746.pdf.
Wanza, Weloli Joël. "Modélisation, simulation de différents types d’architectures de noeuds de calcul basés sur l’architecture ARM et optimisés pour le calcul haute-performance." Thesis, Université Côte d'Azur (ComUE), 2019. http://www.theses.fr/2019AZUR4042.
This work is part of a family of European projects called Mont-Blanc whose objective is to develop the next generation of Exascale systems. It addresses specifically the issue of energy efficiency, at micro-architectural level first by considering the use of 64-bit Armv8-A based compute nodes and an associated relevant SoC topology, and examine also the runtime aspects with notably the study of power management strategies that can be better suited to the constraints of HPC highly parallel processing. A design space exploration methodology capable of supporting the simulation of large manycore computing clusters is developped and lead to propose, design and evaluate multi-SoC and their associated SoC Coherent Interconnect models (SCI). This approach is then used to define a pre-exascale architecture allowing to globally reduce the complexity and cost of chip developments without sacrifying performances. The resulting partitioning scheme introduces interesting perspectives at technology level such as the integration of more compute nodes directly on an interposer based System-in-Package (SiP), possibly based on 3D Through Silicon Vias (TSVs) using High Memory Bandwidth (HBM). Energy efficiency is addressed more directly in second instance by studying current power management policies and proposing two strategies to help reducing power while preserving performances. The first one exploits finer application execution knowledge to adjust the frequency of extensive parallel threads and better balance their execution time. The second strategy reduces core frequencies at synchronisation points of jobs to avoid running the cores at full speed while it is not necessary. Experiment results with these strategies, both in simulation and real hardware, show the possibilities offered par this approach to address the strong requirements of Exascale platforms
Boyer, Alexandre. "Contributions to Computing needs in High Energy Physics Offline Activities : Towards an efficient exploitation of heterogeneous, distributed and shared Computing Resources." Electronic Thesis or Diss., Université Clermont Auvergne (2021-...), 2022. http://www.theses.fr/2022UCFAC108.
Pushing the boundaries of sciences and providing more advanced services to individuals and communities continuously demand more sophisticated software, specialized hardware, and a growing need for computing power and storage. At the beginning of the 2020s, we are entering a heterogeneous and distributed computing era where resources will be limited and constrained. Grid communities need to adapt their approach: (i) applications need to support various architectures; (ii) workload management systems have to manage various computing paradigms and guarantee a proper execution of the applications, regardless of the constraints of the underlying systems. This thesis focuses on the latter point through the case of the LHCb experiment.The LHCb collaboration currently relies on an infrastructure involving 170 computing centers across the world, the World LHC Computing Grid, to process a growing amount of Monte Carlo simulations, reproducing the experimental conditions of the experiment. Despite its huge size, it will be unable to handle simulations coming from the next LHC runs in a decent time. In the meantime, national science programs are consolidating computing resources and encourage using supercomputers, which provide a tremendous amount of computing power but pose higher integration challenges.In this thesis, we propose different approaches to supply distributed and shared computing resources with LHCb tasks. We developed methods to increase the number of computing resources allocations and their duration. It resulted in an improvement of the LHCb job throughput on a grid infrastructure (+40.86%). We also designed a series of software solutions to address highly-constrained environment issues that can be found in supercomputers, such as lack of external connectivity and software dependencies. We have applied those concepts to leverage computing power from four partitions of supercomputers ranked in the Top500
Monna, Florence. "Ordonnancement pour les nouvelles plateformes de calcul avec GPUs." Thesis, Paris 6, 2014. http://www.theses.fr/2014PA066390/document.
More and more computers use hybrid architectures combining multi-core processors (CPUs) and hardware accelerators like GPUs (Graphics Processing Units). These hybrid parallel platforms require new scheduling strategies. This work is devoted to a characterization of this new type of scheduling problems. The most studied objective in this work is the minimization of the makespan, which is a crucial problem for reaching the potential of new platforms in High Performance Computing. The core problem studied in this work is scheduling efficiently n independent sequential tasks with m CPUs and k GPUs, where each task of the application can be processed either on a CPU or on a GPU, with minimum makespan. This problem is NP-hard, therefore we propose approximation algorithms with performance ratios ranging from 2 to (2q+1)/(2q)+1/(2qk), q>0, and corresponding polynomial time complexities. The proposed solving method is the first general purpose algorithm for scheduling on hybrid machines with a theoretical performance guarantee that can be used for practical purposes. Some variants of the core problem are studied: a special case where all the tasks are accelerated when assigned to a GPU, with a 3/2-approximation algorithm, a case where preemptions are allowed on CPUs, the same problem with malleable tasks, with an algorithm with a ratio of 3/2. Finally, we studied the problem with dependent tasks, providing a 6-approximation algorithm. Experiments based on realistic benchmarks have been conducted. Some algorithms have been integrated into the scheduler of the xKaapi runtime system for linear algebra kernels, and compared to the state-of-the-art algorithm HEFT
Pérache, Marc. "Contribution à l'élaboration d'environnements de programmation dédiés au calcul scientifique hautes performances." Bordeaux 1, 2006. http://www.theses.fr/2006BOR13238.
Perache, Marc. "Contribution à l'élaboration d'environnements de programmation dédiés au calcul scientifique hautes performances /." [Gif-sur-Yvette] : [CEA Saclay, Direction des systèmes d'information], 2007. http://catalogue.bnf.fr/ark:/12148/cb410047057.
Dao, Van Toan. "Calcul à haute performance et simulations stochastiques : Etude de la reproductibiité numérique sur architectures multicore et manycore." Thesis, Université Clermont Auvergne (2017-2020), 2017. http://www.theses.fr/2017CLFAC005/document.
The reproducibility of numerical experiments on high performance computing systems is sometimes overlooked. Moreover, the numerical methods used for rigorous parallelization of stochastic simulations are often unknown. Indeed, the results obtained for a stochastic simulation using high performance computing systems can be different from run to run with the same parameters and the same execution contexts due to the impact of new architectures, accelerators, compilers, operating systems or a changing of the order of execution of the floating arithmetic operations within the micro-processors for parallelizing optimizations. In the case of non-repeatability of numerical experiments, how can we seriously develop a scientific application? What credit can be given to the parallel software thus developed? In this thesis, we synthesize the main causes of non-reproducibility for a parallel stochastic simulation using high performance computing systems. Unlike the usual parallelism works, we do not focus on improving performance, but on obtaining numerically repeatable results from one experiment to another. We present the reproducibility and its contributions to the science of experimental and numerical computing. Furthermore, we propose some contributions, in particular: to verify the reproducibility and portability of top modern pseudo-random number generators, to detect the correlation between parallel streams issued from such generators, to repeat and reproduce the numerical results of independent parallel stochastic simulations
Bernal, Norena Alvaro. "Conception et étude d'une architecture de haute performance pour le calcul de la fonction exponentielle modulaire." Grenoble INPG, 1999. http://www.theses.fr/1999INPG0112.