Dissertations / Theses on the topic 'Calcul haut performance'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Calcul haut performance.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Mena, morales Valentin. "Approche de conception haut-niveau pour l'accélération matérielle de calcul haute performance en finance." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2017. http://www.theses.fr/2017IMTA0018/document.
Full textThe need for resources in High Performance Computing (HPC) is generally met by scaling up server farms, to the detriment of the energy consumption of such a solution. Accelerating HPC application on heterogeneous platforms, such as FPGAs or GPUs, offers a better architectural compromise as they can reduce the energy consumption of a deployed system. Therefore, a change of programming paradigm is needed to support this heterogeneous acceleration, which trickles down to an increased level of programming complexity tackled by software experts. This is most notably the case for developers in quantitative finance. Applications in this field are constantly evolving and increasing in complexity to stay competitive and comply with legislative changes. This puts even more pressure on the programmability of acceleration solutions. In this context, the use of high-level development and design flows, such as High-Level Synthesis (HLS) for programming FPGAs, is not enough. A domain-specific approach can help to reach performance requirements, without impairing the programmability of accelerated applications.We propose in this thesis a high-level design approach that relies on OpenCL, as a heterogeneous programming standard. More precisely, a recent implementation of OpenCL for Altera FPGA is used. In this context, four main contributions are proposed in this thesis: (1) an initial study of the integration of hardware computing cores to a software library for quantitative finance (QuantLib), (2) an exploration of different architectures and their respective performances, as well as the design of a dedicated architecture for the pricing of American options and their implied volatility, based on a high-level design flow, (3) a detailed characterization of an Altera OpenCL platform, from elemental operators, memory accesses, control overlays, and up to the communication links it is made of, (4) a proposed compilation flow that is specific to the quantitative finance domain, and relying on the aforementioned characterization and on the description of the considered financial applications (option pricing)
Pasca, Bogdan Mihai. "Calcul flottant haute performance sur circuits reconfigurables." Phd thesis, Ecole normale supérieure de lyon - ENS LYON, 2011. http://tel.archives-ouvertes.fr/tel-00654121.
Full textBoyer, Alexandre. "Contributions to Computing needs in High Energy Physics Offline Activities : Towards an efficient exploitation of heterogeneous, distributed and shared Computing Resources." Electronic Thesis or Diss., Université Clermont Auvergne (2021-...), 2022. http://www.theses.fr/2022UCFAC108.
Full textPushing the boundaries of sciences and providing more advanced services to individuals and communities continuously demand more sophisticated software, specialized hardware, and a growing need for computing power and storage. At the beginning of the 2020s, we are entering a heterogeneous and distributed computing era where resources will be limited and constrained. Grid communities need to adapt their approach: (i) applications need to support various architectures; (ii) workload management systems have to manage various computing paradigms and guarantee a proper execution of the applications, regardless of the constraints of the underlying systems. This thesis focuses on the latter point through the case of the LHCb experiment.The LHCb collaboration currently relies on an infrastructure involving 170 computing centers across the world, the World LHC Computing Grid, to process a growing amount of Monte Carlo simulations, reproducing the experimental conditions of the experiment. Despite its huge size, it will be unable to handle simulations coming from the next LHC runs in a decent time. In the meantime, national science programs are consolidating computing resources and encourage using supercomputers, which provide a tremendous amount of computing power but pose higher integration challenges.In this thesis, we propose different approaches to supply distributed and shared computing resources with LHCb tasks. We developed methods to increase the number of computing resources allocations and their duration. It resulted in an improvement of the LHCb job throughput on a grid infrastructure (+40.86%). We also designed a series of software solutions to address highly-constrained environment issues that can be found in supercomputers, such as lack of external connectivity and software dependencies. We have applied those concepts to leverage computing power from four partitions of supercomputers ranked in the Top500
Didelot, Sylvain. "Improving memory consumption and performance scalability of HPC applications with multi-threaded network communications." Thesis, Versailles-St Quentin en Yvelines, 2014. http://www.theses.fr/2014VERS0029/document.
Full textA recent trend in high performance computing shows a rising number of cores per compute node, while the total amount of memory per compute node remains constant. To scale parallel applications on such large machines, one of the major challenges is to keep a low memory consumption. This thesis develops a multi-threaded communication layer over Infiniband which provides both good performance of communications and a low memory consumption. We target scientific applications parallelized using the MPI standard in pure mode or combined with a shared memory programming model. Starting with the observation that network endpoints and communication buffers are critical for the scalability of MPI runtimes, the first contribution proposes three approaches to control their usage. We introduce a scalable and fully-connected virtual topology for connection-oriented high-speed networks. In the context of multirail configurations, we then detail a runtime technique which reduces the number of network connections. We finally present a protocol for dynamically resizing network buffers over the RDMA technology. The second contribution proposes a runtime optimization to enforce the overlap potential of MPI communications, showing a 2x improvement factor on communications. The third contribution evaluates the performance of several MPI runtimes running a seismic modeling application in a hybrid context. On large compute nodes up to 128 cores, the introduction of OpenMP in the MPI application saves up to 17 % of memory. Moreover, we show a performance improvement with our multi-threaded communication layer where the OpenMP threads concurrently participate to the MPI communications
Martelli, Maxime. "Approche haut niveau pour l’accélération d’algorithmes sur des architectures hétérogènes CPU/GPU/FPGA. Application à la qualification des radars et des systèmes d’écoute électromagnétique." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS581/document.
Full textAs the semiconductor industry faces major challenges in sustaining its growth, new High-Level Synthesis tools are repositioning FPGAs as a leading technology for algorithm acceleration in the face of CPU and GPU-based clusters. But as it stands, for a software engineer, these tools do not guarantee, without expertise of the underlying hardware, that these technologies will be harnessed to their full potential. This can be a game breaker for their democratization. From this observation, we propose a methodology for algorithm acceleration on FPGAs. After presenting a high-level model of this architecture, we detail possible optimizations in OpenCL, and finally define a relevant exploration strategy for accelerating algorithms on FPGA. Applied to different case studies, from tomographic reconstruction to the modelling of an airborne radar jammer, we evaluate our methodology according to three main performance criteria: development time, execution time, and energy efficiency
Moy, Matthieu. "Modélisation à haut niveau d'abstraction pour les systèmes embarqués." Habilitation à diriger des recherches, Université de Grenoble, 2014. http://tel.archives-ouvertes.fr/tel-01054555.
Full textCarpen-Amarie, Alexandra. "Utilisation de BlobSeer pour le stockage de données dans les Clouds: auto-adaptation, intégration, évaluation." Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2011. http://tel.archives-ouvertes.fr/tel-00696012.
Full textVienne, Jérôme. "Prédiction de performances d'applications de calcul haute performance sur réseau Infiniband." Phd thesis, Grenoble, 2010. http://www.theses.fr/2010GRENM043.
Full textManufacturers of computer clusters require tools to assist them in making better decisions in terms of architectural design. To address this need, in this thesis work, we focus on the specific issues of estimating computation times and InfiniBand network congestion. These two problems are often dealt with globally. However, an overall approach does not explain the reasons of performance loss related to architectural choices. So our approach was to conduct a more detailed study. In this thesis work, we focus on the following : 1) the estimation of computation time in a Grid, and 2) the estimation of communication times over Infiniband networks. To evaluate the computation time, the proposed approach is based on a static or semi-static analysis of the source code, by cutting it into blocks, before making a micro-benchmarking of these blocks on the targeted architecture. To estimate the communication time, a model of bandwidth sharing for Infiniband networks has been developed, allowing one to predict the impact related to concurrent communications. This model was then incorporated into a simulator to be validated on a set of synthetic communication graphs and on the application Socorro
Vienne, Jérôme. "Prédiction de performances d'applications de calcul haute performance sur réseau Infiniband." Phd thesis, Université de Grenoble, 2010. http://tel.archives-ouvertes.fr/tel-00728156.
Full textApplencourt, Thomas. "Calcul haute performance & chimie quantique." Thesis, Toulouse 3, 2015. http://www.theses.fr/2015TOU30162/document.
Full textThis thesis work has two main objectives: 1. To develop and apply original electronic structure methods for quantum chemistry 2. To implement several computational strategies to achieve efficient large-scale computer simulations. In the first part, both the Configuration Interaction (CI) and the Quantum Monte Carlo (QMC) methods used in this work for calculating quantum properties are presented. We then describe more specifically the selected CI approach (so-called CIPSI approach, Configuration Interaction using a Perturbative Selection done Iteratively) that we used for building trial wavefunctions for QMC simulations. As a first application, we present the QMC calculation of the total non-relativistic energies of transition metal atoms of the 3d series. This work, which has required the implementation of Slater type basis functions in our codes, has led to the best values ever published for these atoms. We then present our original implementation of the pseudo-potentials for QMC and discuss the calculation of atomization energies for a benchmark set of 55 organic molecules. The second part is devoted to the Hight Performance Computing (HPC) aspects. The objective is to make possible and/or facilitate the deployment of very large-scale simulations. From the point of view of the developer it includes: The use of original programming paradigms, single-core optimization process, massively parallel calculations on grids (supercomputer and Cloud), development of collaborative tools , etc - and from the user's point of view: Improved code installation, management of the input/output parameters, GUI, interfacing with other codes, etc
Perotin, Matthieu Martineau Patrick. "Calcul haute performance sur matériel générique." S. l. : S. n, 2008. http://theses.abes.fr/2008TOUR4022.
Full textPérotin, Matthieu. "Calcul haute performance sur matériel générique." Thesis, Tours, 2008. http://www.theses.fr/2008TOUR4022/document.
Full textTwo facts are motivating this work: the demand for High Performance Computing of researchers and the low usage of the computing power of the pedagogic ressources. This thesis aims at giving an answer to the demand for HPC, while preserving the pedagogic ressources for the teaching. This work looked for a solution that would be simple and straightforward for the final users. Their needs and wishes lead to the definition of some specifications, in which most of the constraints could be satisfied with the use of a well designed software stack. Some others, however, cannot be satisfied with the use of existing solutions only, they define a new scheduling problem, in which the goal is to schedule the processes on the available ressources. This problem was studied and solved with various heurisitcs, which performances were compared with a simulator before being implemented in an experimental setup
Lagardère, Louis. "Calcul haute-performance et dynamique moléculaire polarisable." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066042.
Full textThis works is at the interface between theoretical chemistry, scientific computing and applied mathematics. We study different algorithms used to solve the specific equations that arise in polarizable molecular dynamics in a massively parallel context. This family of models requires indeed to solve more complex equations than in the classical case making the use of supercomputers mandatory in order to get significant results. We will more specifically study different types of boundary conditions that represent different ways to model solvation effects : first the Particle Mesh Ewald method to treat periodic boundary conditions and then a continuum solvation model discretized within a domain decomposition strategy : the ddCOSMO. The outline of this thesis is as follows : first, the different parallel strategies in the general context of molecular dynamics are reviewed. Then several methods to adapt these strategies to the specific case of polarizable force fields are presented. After that, strategies that allow to circumvent certain limits due to the use of iterative methods in the context of polarizable molecular dynamics are presented and studied. Then, the adapation of these methods to different cases of boundary conditions is presented : first in the case of the Particle Mesh Ewald method to treat periodic boundary conditions and then in the case of a particular continuum solvation model discretized with a domain decomposition strategy : the ddCOSMO. Finally, various numerical results and applications are presented
Perarnau, Swann. "Environnements pour l'analyse expérimentale d'applications de calcul haute performance." Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00650047.
Full textAubert, Pierre. "Calcul haute performance pour la détection de rayon Gamma." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLV058/document.
Full textThe new generation research experiments will introduce huge data surge to a continuously increasing data production by current experiments. This increasing data rate causes upheavals at many levels, such as data storage, analysis, diffusion and conservation.The CTA project will become the utmost observatory of gamma astronomy on the ground from 2021. It will generate hundreds Peta-Bytes of data by 2030 and will have to be stored, compressed and analyzed each year.This work address the problems of data analysis optimization using high performance computing techniques via an efficient data format generator, very low level programming to optimize the CPU pipeline and vectorization of existing algorithms, introduces a fast compression algorithm for integers and finally exposes a new analysis algorithm based on efficient pictures comparison
Partimbene, Vincent. "Calcul haute performance pour la simulation d'interactions fluide-structure." Phd thesis, Toulouse, INPT, 2018. http://oatao.univ-toulouse.fr/20524/1/PARTIMBENE_Vincent.pdf.
Full textCarpen-Amarie, Alexandra. "BlobSeer as a data-storage facility for clouds : self-Adaptation, integration, evaluation." Thesis, Cachan, Ecole normale supérieure, 2011. http://www.theses.fr/2011DENS0066/document.
Full textThe emergence of Cloud computing brings forward many challenges that may limit the adoption rate of the Cloud paradigm. As data volumes processed by Cloud applications increase exponentially, designing efficient and secure solutions for data management emerges as a crucial requirement. The goal of this thesis is to enhance a distributed data-management system with self-management capabilities, so that it can meet the requirements of the Cloud storage services in terms of scalability, data availability, reliability and security. Furthermore, we aim at building a Cloud data service both compatible with state-of-the-art Cloud interfaces and able to deliver high-throughput data storage. To meet these goals, we proposed generic self-awareness, self-protection and self-configuration components targeted at distributed data-management systems. We validated them on top of BlobSeer, a large-scale data-management system designed to optimize highly-concurrent data accesses. Next, we devised and implemented a BlobSeer-based file system optimized to efficiently serve as a storage backend for Cloud services. We then integrated it within a real-world Cloud environment, the Nimbus platform. The benefits and drawbacks of using Cloud storage for real-life applications have been emphasized in evaluations that involved data-intensive MapReduce applications and tightly-coupled, high-performance computing applications
Jolivet, Pierre. "Méthodes de décomposition de domaine. Application au calcul haute performance." Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENM040/document.
Full textThis thesis introduces a unified framework for various domain decomposition methods:those with overlap, so-called Schwarz methods, and those based on Schur complements,so-called substructuring methods. It is then possible to switch with a high-level of abstractionbetween methods and to build different preconditioners to accelerate the iterativesolution of large sparse linear systems. Such systems are frequently encountered in industrialor scientific problems after discretization of continuous models. Even though thesepreconditioners naturally exhibit good parallelism properties on distributed architectures,they can prove inadequate numerical performance for complex decompositions or multiscalephysics. This lack of robustness may be alleviated by concurrently solving sparse ordense local generalized eigenvalue problems, thus identifying modes that hinder the convergenceof the underlying iterative methods a priori. Using these modes, it is then possibleto define projection operators based on what is usually referred to as a coarse solver. Theseauxiliary tools tend to solve the aforementioned issues, but typically decrease the parallelefficiency of the preconditioners. In this dissertation, it is shown in three points thatthe newly developed construction is efficient: 1) by performing large-scale numerical experimentson Curie—a European supercomputer, and by comparing it with state of the art2) multigrid and 3) direct solvers
Ben, El Haj Ali Amin. "Calcul de haute performance en aéroélasticité et en écoulements turbulents tridimentionnels." Mémoire, École de technologie supérieure, 2008. http://espace.etsmtl.ca/159/1/BEN_HAJ_ALI_Amine.pdf.
Full textNotargiacomo, Thibault. "Approche parcimonieuse et calcul haute performance pour la tomographie itérative régularisée." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAT013/document.
Full textX-Ray computed tomography (CT) is a technique that aims at providing a measure of a given property of the interior of a physical object, given a set of exterior projection measurement. Although CT is a mature technology, most of the algorithm used for image reconstruction in commercial applications are based on analytical methods such as the filtered back-projection. The main idea of this thesis is to exploit the latest advances in the field of applied mathematics and computer sciences in order to study, design and implement algorithms dedicated to 3D cone beam reconstruction from X-Ray flat panel detectors targeting clinically relevant usecases, including low doses and few view acquisitions.In this work, we studied various strategies to model the tomographic operators, and how they can be implemented on a multi-GPU platform. Then we proposed to use the 3D complex wavelet transform in order to regularize the reconstruction problem
Esteghamatian, Amir. "Calcul haute performance pour la simulation multi-échelles des lits fluidisés." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSEC037/document.
Full textFluidized beds are a particular hydrodynamic configuration in which a pack (either dense or loose) of particles laid inside a container is re-suspended as a result of an upward oriented imposed flow at the bottom of the pack. This kind of system is widely used in the chemical engineering industry where catalytic cracking or polymerization processes involve chemical reactions between the catalyst particles and the surrounding fluid and fluidizing the bed is admittedly beneficial to the efficiency of the process. Due to the wide range of spatial scales and complex features of solid/solid and solid/fluid interactions in a dense fluidized bed, the system can be studied at different length scales, namely micro, meso and macro. In this work we focus on micro/meso simulations of fluidized beds. The workflow we use is based on home made high-fidelity numerical tools: GRAINS3D (Pow. Tech., 224:374-389, 2012) for granular dynamics of convex particles and PeliGRIFF (Parallel Efficient LIbrary for GRains In Fluid Flows, Comp. Fluids, 38(8):1608-1628,2009) for reactive fluid/solid flows. The objectives of our micro/meso simulations of such systems are two-fold: (i) to understand the multi-scale features of the system from a hydrodynamic standpoint and (ii) to analyze the performance of our meso-scale numerical model and to improve it accordingly. To this end, we first perform Particle Resolved Simulations (PRS) of liquid/solid and gas/solid fluidization of a 2000 particle system. The accuracy of the numerical results is examined by assessing the space convergence of the computed solution in order to guarantee that our PRS results can be reliably considered as a reference solution for this problem. The computational challenge for our PRS is a combination of a fine mesh to properly resolve all flow length scales to a long enough physical simulation time in order to extract time converged statistics. For that task, High Performance Computing and highly parallel codes as GRAINS3D/PeliGRIFF are extremely helpful. Second, we carry out a detailed cross-comparison of PRS results with those of locally averaged Euler- Lagrange simulations. Results show an acceptable agreement between the micro- and meso-scale predictions on the integral measures as pressure drop, bed height, etc. However, particles fluctuations are remarkably underpredicted by the meso-scale model, especially in the direction transverse to the main flow. We explore different directions in the improvement of the meso-scale model, such as (a) improving the inter-phase coupling scheme and (b) introducing a stochastic formulation for the drag law derived from the PRS results. We show that both improvements (a) and (b) are required to yield a satisfactory match of meso-scale results with PRS results. The new stochastic drag law, which incorporates information on the first and second-order moments of the PRS results, shows promises to recover the appropriate level of particles fluctuations. It now deserves to be validated on a wider range of flow regimes
Birgle, Nabil. "Écoulement dans le sous-sol, méthodes numériques et calcul haute performance." Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066050/document.
Full textWe develop a reliable numerical method to approximate a flow in a porous media, modeled by an elliptic equation. The simulation is made difficult because of the strong heterogeneities of the medium, the size together with complex geometry of the domain. A regular hexahedral mesh does not allow to describe accurately the geological layers of the domain. Consequently, this leads us to work with a mesh made of deformed cubes. There exists several methods of type finite volumes or finite elements which solve this issue. For our method, we wish to have only one degree of freedom per element for the pressure and one degree of freedom per face for the Darcy velocity, to stay as close to the habits of industrial software. Since standard mixed finite element methods does not converge, our method is based on composite mixed finite element. In two dimensions, a polygonal mesh is split into triangles by adding a node to the vertices's barycenter, and explicit formulation of the basis functions was obtained. In dimension 3, the method extend naturally to the case of pyramidal mesh. In the case of a hexahedron or a deformed cube, the element is divided into 24 tetrahedra by adding a node to the vertices's barycenter and splitting the faces into 4 triangles. The basis functions are then built by solving a discrete problem. The proposed methods have been theoretically analyzed and completed by a posteriori estimators. They have been tested on academical and realistic examples by using parallel computation
Bouvier, Clément. "Sélection de caractéristiques stables pour la segmentation d'images histologiques par calcul haute performance." Thesis, Sorbonne université, 2019. http://www.theses.fr/2019SORUS004.
Full textIn preclinical research and more specifically in neurobiology, histology uses images produced by increasingly powerful optical microscopes digitizing entire sections at cell scale. Quantification of stained tissue such as neurons relies on machine learning driven segmentation. However such methods need a lot of additional information, or features, which are extracted from raw data multiplying the quantity of data to process. As a result, the quantity of features is becoming a drawback to process large series of histological images in a fast and robust manner. Feature selection methods could reduce the amount of required information but selected subsets lack of stability. We propose a novel methodology operating on high performance computing (HPC) infrastructures and aiming at finding small and stable sets of features for fast and robust segmentation on high-resolution histological whole sections. This selection has two selection steps: first at feature families scale (an intermediate pool of features, between space and individual feature). Second, feature selection is performed on pre-selected feature families. In this work, the selected sets of features are stables for two different neurons staining. Furthermore the feature selection results in a significant reduction of computation time and memory cost. This methodology can potentially enable exhaustive histological studies at a high-resolution scale on HPC infrastructures for both preclinical and clinical research settings
Bouvier, Clément. "Sélection de caractéristiques stables pour la segmentation d'images histologiques par calcul haute performance." Electronic Thesis or Diss., Sorbonne université, 2019. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2019SORUS004.pdf.
Full textIn preclinical research and more specifically in neurobiology, histology uses images produced by increasingly powerful optical microscopes digitizing entire sections at cell scale. Quantification of stained tissue such as neurons relies on machine learning driven segmentation. However such methods need a lot of additional information, or features, which are extracted from raw data multiplying the quantity of data to process. As a result, the quantity of features is becoming a drawback to process large series of histological images in a fast and robust manner. Feature selection methods could reduce the amount of required information but selected subsets lack of stability. We propose a novel methodology operating on high performance computing (HPC) infrastructures and aiming at finding small and stable sets of features for fast and robust segmentation on high-resolution histological whole sections. This selection has two selection steps: first at feature families scale (an intermediate pool of features, between space and individual feature). Second, feature selection is performed on pre-selected feature families. In this work, the selected sets of features are stables for two different neurons staining. Furthermore the feature selection results in a significant reduction of computation time and memory cost. This methodology can potentially enable exhaustive histological studies at a high-resolution scale on HPC infrastructures for both preclinical and clinical research settings
González, Martha. "Application de techniques orientées-objet pour le calcul réparti de haute performance." Paris 6, 2002. http://www.theses.fr/2002PA066161.
Full textGueunet, Charles. "Calcul haute performance pour l'analyse topologique de données par ensembles de niveaux." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS120.
Full textTopological Data Analysis requires efficient algorithms to deal with the continuously increasing size and level of details of data sets. In this manuscript, we focus on three fundamental topological abstractions based on level sets: merge trees, contour trees and Reeb graphs. We propose three new efficient parallel algorithms for the computation of these abstractions on multi-core shared memory workstations. The first algorithm developed in the context of this thesis is based on multi-thread parallelism for the contour tree computation. A second algorithm revisits the reference sequential algorithm to compute this abstraction and is based on local propagations expressible as parallel tasks. This new algorithm is in practice twice faster in sequential than the reference algorithm designed in 2000 and offers one order of magnitude speedups in parallel. A last algorithm also relying on task-based local propagations is presented, computing a more generic abstraction: the Reeb graph. Contrary to concurrent approaches, these methods provide the augmented version of these structures, hence enabling the full extend of level-set based analysis. Algorithms presented in this manuscript result today in the fastest implementations available to compute these abstractions. This work has been integrated into the open-source platform: the Topology Toolkit (TTK)
Pourroy, Jean. "Calcul Haute Performance : Caractérisation d’architectures et optimisation d’applications pour les futures générations de supercalculateurs." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM028.
Full textInformation systems and High-Performance Computing (HPC) infrastructures play an active role in the improvement of scientific knowledge and the evolution of our societies. The field of HPC is expanding rapidly and users need increasingly powerful architectures to analyze the tsunami of data (numerical simulations, IOT), to make more complex decisions (artificial intelligence), and to make them faster (connected cars, weather).In this thesis work, we discuss several challenges (power consumption, cost, complexity) for the development of new generations of Exascale supercomputers. While industrial applications do not manage to achieve more than 10% of the theoretical performance, we show the need to rethink the architecture of platforms, in particular by using energy-optimized architectures. We then present some of the emerging technologies that will allow their development: 3D memories (HBM), Storage Class Memory (SCM) or photonic interconnection technologies. These new technologies associated with a new communication protocol (Gen-Z) will help to optimally execute the different parts of an application. However, in the absence of a method for fine characterization of code performance, these emerging architectures are potentially condemned since few experts know how to exploit them.Our contribution consists in the development of benchmarks and performance analysis tools. The first aim is to finely characterize specific parts of the microarchitecture. Two microbenchmarks have thus been developed to characterize the memory system and the floating point unit (FPU). The second family of tools is used to study the performance of applications. A first tool makes it possible to monitor the memory bus traffic, a critical resource of modern architectures. A second tool can be used to profile applications by extracting and characterizing critical loops (hot spots).To take advantage of the heterogeneity of platforms, we propose a 5-step methodology to identify and characterize these new platforms, to model the performance of an application, and finally to port its code to the selected architecture. Finally, we show how the tools can help developers to extract the maximum performance from an architecture. By providing our tools in open source, we want to sensitize users to this approach and develop a community around the work of performance characterization and analysis
Yenke, Blaise. "Ordonnancement des sauvegardes/reprises d'applications de calcul haute performance dans les environnements dynamiques." Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00685856.
Full textYenke, Blaise Omer. "Ordonnancement des sauvegardes/reprises d'applications de calcul haute performance dans les environnements dynamiques." Thesis, Grenoble, 2011. http://www.theses.fr/2011GRENM003/document.
Full textThe technological advances has led major organizations such as enterprises, universities andresearch institutes to acquire intranets consisting of several servers and many workstations.However, in some of these organizations, the resources are rarely used at nights, weekends and onholidays, thus releasing a large computing power available and unused.This thesis discusses the exploitation of the idle period of workstaions in order to run HPCapplications. The workstations retained are restarted and integrated in dynamically formed clusters.However, the idle periods do not always permit the complete carrying out of the computationsallocated to them. The checkpointing mechanisms are then used to save in a certain period, theexecution context of applications for a possible restart. It is worth nothing that checkpointing all theprocesses in the required period is not always possible. We propose a scheduling model ofcheckpointing in parallel, which takes into account the time constraints imposed and the bandwidthconstraints (network and disk) to maximize the computation time already taken for the applicationswhich are to be checkpointed
Vömel, Christof. "Contributions à la recherche en calcul scientifique haute performance pour les matrices creuses." Toulouse, INPT, 2003. http://www.theses.fr/2003INPT003H.
Full textRelun, Nicolas. "Stratégie multiparamétrique pour la conception robuste en fatigue." Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2011. http://tel.archives-ouvertes.fr/tel-00669449.
Full textGuermouche, Amina. "Nouveaux Protocoles de Tolérances aux Fautes pour les Applications MPI du Calcul Haute Performance." Phd thesis, Université Paris Sud - Paris XI, 2011. http://tel.archives-ouvertes.fr/tel-00666063.
Full textMaillard, Nicolas. "Calcul Haute-Performance et Mécanique Quantique : analyse des ordonnancements en temps et en mémoire." Phd thesis, Université Joseph Fourier (Grenoble), 2001. http://tel.archives-ouvertes.fr/tel-00004684.
Full textVisseq, Vincent. "Calcul haute performance en dynamique des contacts via deux familles de décomposition de domaine." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2013. http://tel.archives-ouvertes.fr/tel-00848363.
Full textBaboulin, Marc. "Résolutions rapides et fiables pour les solveurs d'algèbre linéaire numérique en calcul haute performance." Habilitation à diriger des recherches, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00967523.
Full textLatu, Guillaume. "Algorithmique parallèle et calcul haute performance dédiés à la simulation d'un système hôte-macroparasite." Bordeaux 1, 2002. http://www.theses.fr/2002BOR12632.
Full textGuilloteau, Quentin. "Une approche autonomique à la régulation en ligne de systèmes HPC, avec un support pour la reproductibilité des expériences." Electronic Thesis or Diss., Université Grenoble Alpes, 2023. http://www.theses.fr/2023GRALM075.
Full textHigh-Performance Computing (HPC) systems have become increasingly more complex, and their performance and power consumption make them less predictable.This unpredictability requires cautious runtime management to guarantee an acceptable Quality-of-Service to the end users.Such a regulation problem arises in the context of the computing grid middleware CiGri that aims at harvesting the idle computing resources of a set of cluster by injection low priority jobs.A too aggressive harvesting strategy can lead to the degradation of the performance for all the users of the clusters, while a too shy harvesting will leave resources idle and thus lose computing power.There is thus a tradeoff between the amount of resources that can be harvested and the resulting degradation of users jobs, which can evolve at runtime based on Service Level Agreements and the current load of the system.We claim that such regulation challenges can be addressed with tools from Autonomic Computing, and in particular when coupled with Control Theory.This thesis investigates several regulation problems in the context of CiGri with such tools.We will focus on regulating the harvesting based on the load of a shared distributed file-system, and improving the overall usage of the computing resources.We will also evaluate and compare the reusability of the proposed control-based solutions in the context of HPC systems.The experiments done in this thesis also led us to investigate new tools and techniques to improve the cost and reproducibility of the experiments.We will present a tool named NixOS-Compose able to generate and deploy reproducible distributed software environments.We will also investigate techniques to reduce the number of machines needed to deploy experiments on grid or cluster middlewares, such as CiGri, while ensuring an acceptable level of realism for the final deployed system
Wanza, Weloli Joël. "Modélisation, simulation de différents types d’architectures de noeuds de calcul basés sur l’architecture ARM et optimisés pour le calcul haute-performance." Thesis, Université Côte d'Azur (ComUE), 2019. http://www.theses.fr/2019AZUR4042.
Full textThis work is part of a family of European projects called Mont-Blanc whose objective is to develop the next generation of Exascale systems. It addresses specifically the issue of energy efficiency, at micro-architectural level first by considering the use of 64-bit Armv8-A based compute nodes and an associated relevant SoC topology, and examine also the runtime aspects with notably the study of power management strategies that can be better suited to the constraints of HPC highly parallel processing. A design space exploration methodology capable of supporting the simulation of large manycore computing clusters is developped and lead to propose, design and evaluate multi-SoC and their associated SoC Coherent Interconnect models (SCI). This approach is then used to define a pre-exascale architecture allowing to globally reduce the complexity and cost of chip developments without sacrifying performances. The resulting partitioning scheme introduces interesting perspectives at technology level such as the integration of more compute nodes directly on an interposer based System-in-Package (SiP), possibly based on 3D Through Silicon Vias (TSVs) using High Memory Bandwidth (HBM). Energy efficiency is addressed more directly in second instance by studying current power management policies and proposing two strategies to help reducing power while preserving performances. The first one exploits finer application execution knowledge to adjust the frequency of extensive parallel threads and better balance their execution time. The second strategy reduces core frequencies at synchronisation points of jobs to avoid running the cores at full speed while it is not necessary. Experiment results with these strategies, both in simulation and real hardware, show the possibilities offered par this approach to address the strong requirements of Exascale platforms
Monna, Florence. "Ordonnancement pour les nouvelles plateformes de calcul avec GPUs." Thesis, Paris 6, 2014. http://www.theses.fr/2014PA066390/document.
Full textMore and more computers use hybrid architectures combining multi-core processors (CPUs) and hardware accelerators like GPUs (Graphics Processing Units). These hybrid parallel platforms require new scheduling strategies. This work is devoted to a characterization of this new type of scheduling problems. The most studied objective in this work is the minimization of the makespan, which is a crucial problem for reaching the potential of new platforms in High Performance Computing. The core problem studied in this work is scheduling efficiently n independent sequential tasks with m CPUs and k GPUs, where each task of the application can be processed either on a CPU or on a GPU, with minimum makespan. This problem is NP-hard, therefore we propose approximation algorithms with performance ratios ranging from 2 to (2q+1)/(2q)+1/(2qk), q>0, and corresponding polynomial time complexities. The proposed solving method is the first general purpose algorithm for scheduling on hybrid machines with a theoretical performance guarantee that can be used for practical purposes. Some variants of the core problem are studied: a special case where all the tasks are accelerated when assigned to a GPU, with a 3/2-approximation algorithm, a case where preemptions are allowed on CPUs, the same problem with malleable tasks, with an algorithm with a ratio of 3/2. Finally, we studied the problem with dependent tasks, providing a 6-approximation algorithm. Experiments based on realistic benchmarks have been conducted. Some algorithms have been integrated into the scheduler of the xKaapi runtime system for linear algebra kernels, and compared to the state-of-the-art algorithm HEFT
Holzer, Markus. "Génération de code automatique pour le calcul exaflopique pour la méthode de boltzmann sur réseaux." Electronic Thesis or Diss., Université de Toulouse (2023-....), 2024. http://www.theses.fr/2024TLSEP155.
Full textExascale supercomputers are computing systems capable of performing 1018floating point operations per second. The supercomputer named Frontier first brokethis barrier of one exaFLOPS and officially initiated the era of exascale computingin 2022. The immense scale of systems like this imposes significant challenges indeveloping codes that can fully exploit this computing power. Furthermore, theincreasingly heterogeneous hardware employed by today’s leading supercomputersadds another layer of complexity. In the field of computational fluid dynamics, it istherefore crucial to carefully consider every aspect of a numerical simulation, startingwith the design and selection of algorithms suited to such environments. For example,algorithms like the lattice Boltzmannmethod are explicitly designed with massiveparallelism in mind, making thema notable alternative to other more establishedmethods. Nevertheless, a highly efficient implementation of this algorithmmust betailored to the respective hardware for optimal usage of resources.To address these challenges, this thesis explores the use of code generationthrough an embedded domain-specific language. Code generation enables us totarget specific hardware architectures and apply precise optimisations that leveragedomain-specific knowledge. In this research, we extend and redesign the Pythonpackage LBMPY to support state-of-the-art variants of the lattice Boltzmann method.LBMPY represents the lattice Boltzmann method symbolically using a computeralgebra system, allowing the automatic derivation of discretised equations based onuser-defined specifications. To obtain equations with a minimal amount of floatingpointoperations we fundamentally enhance the simplification capabilities of LBMPYin this work. The discretised equations derived by LBMPY are provided to the Pythonpackage PYSTENCILS which generates highly optimised architecture-specific computekernels in a lower-level language from these.We expand the range of supportedhardware platforms and overhaul crucial aspects of the code generation process,such as the typing system, to improve performance and maintainability.A sophisticated integration of these compute kernels into the massively parallelmultiphysics framework WALBERLA is also developed, with an in-depth discussionof the key implementation components. One of the most significant advancementsin this integration is the generation of highly specialised interpolation kernels. Thesekernels are essential for transferring information between cells of differing resolutionswithin the simulation domain, ensuring the accuracy and consistency ofthe data across varying grid sizes. This development has enabled us to performthelargest simulation run to date using the lattice Boltzmann method on a nonuniformdomain, utilising more than 4000 AMDMI250X graphics processing units. The abilityto efficientlymanage such a vast and heterogeneous computational environmentunderscores the effectiveness of our approach in scaling complex simulations onnext-generation hardware platforms.We verify and validate our approach by simulating turbulent single-phase flowaround a sphere using a nonuniform mesh configuration on graphics processingunits, successfully reproducing the drag crisis—a complex phenomenon that occursat Reynolds numbers above 200000. Additionally, we demonstrate the capabilities ofour method through slug flow simulations, offering new insights into the behaviourof Taylor bubbles in complex annular pipe configurations. Finally, we analyse the trajectoriesof droplets under the influence of a laser heat source in three-dimensionalthermocapillary flows. To evaluate the performance of our approach, we presentresults from all these scenarios on the latest central processing unit and generalpurpose graphics processing unit hardware.We provide single-node performance data and offer valuable insights by contextualising the measured results with appropriate performance models
Cargnelli, Matthieu. "OpenWP : étude et extension des technologies de Workflows pour le calcul haute performance sur grille." Paris 11, 2008. http://www.theses.fr/2008PA112265.
Full textThis thesis has been conduced in an industrial context. It studies the code refactoring from a sequential scientific code into a grid enabled program. The proposed approach is based on the workflow technologies which are well suited for grid. After a presentation of existing solutions for workflow execution on the grid as well as a solution for code parallelization (openMP), the author presents his proposition : OpenWP. OpenWP is a workflow definition language based on directives to turn a sequential code into a workflow. OpenWP allows the controlled execution of this workflow on the grip using a third party workflow enactment engine. A distributed virtually shared memory system is proposed. The defined language is presented in detail and its expressivity is criticized and compared to OpenMP’s. The conception of OpenWP is then described and the technology choices made are explained. A prototype is presented. The document then shows a proof of concept and a series of performance evaluation of OpenWP used on few programs, among which an industrial mesher used by EADS. An hybrid system based on OpenWP and OpenMP is also described. This system must give OpenWP the ability to exploit the resource hierarchy found in the grid, by using the shared memory multi-processors machines whenever possible through OpenMP. A proof of concept test case is provided and commented
Möller, Nathalie. "Adaptation de codes industriels de simulation en Calcul Haute Performance aux architectures modernes de supercalculateurs." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLV088.
Full textFor many years, the stability of the architecture paradigm has facilitated the performance portability of large HPC codes from one generation of supercomputers to another.The announced breakdown of the Moore's Law, which rules the progress of microprocessor engraving, ends this model and requires new efforts on the software's side.Code modernization, based on an algorithmic which is well adapted to the future systems, is mandatory.This modernization is based on well-known principles as the computation concurrency, or degree of parallelism, and the data locality.However, the implementation of these principles in large industrial applications, which often are the result of years of development efforts, turns out to be way more difficult than expected.This thesis contributions are twofold :On the one hand, we explore a methodology of software modernization based on the concept of proto-applications and compare it with the direct approach, while optimizing two simulation codes developed in a similar context.On the other hand, we focus on the identification of the main challenges for the architecture, the programming models and the applications.The two chosen application fields are the Computational Fluid Dynamics and Computational Electro Magnetics
El, gharbi Yannis. "Une approche à deux niveaux pour le calcul de structures haute performance : décomposition -- maillage -- résolution." Thesis, université Paris-Saclay, 2021. http://www.theses.fr/2021UPAST001.
Full textNumerical simulations represent a minor part of the certification proceess for critical parts in the industry. However, it would result in significant cost savings during conception phases, avoiding expensive real tests.Indeed, in cases of localized strong heterogeneities across all the structure, it becomes hard, if not impossible, to run successfully these simulations in reasonable times because of a too large number of unknowns needed for a reliable answer of the structure.To obtain this answer, large scale parallel solving methods are necessary. Domain decomposition methods, which are part of it, are the ones investigated during this thesis.The goal is to make these simulations possible thanks to domain decomposition methods.Indeed, the resolution of the problem but also the meshing of the structure become expensive and the use of parallel methods becomes essential.For this purpose, a two-level substructuring method is proposed. It aims at producing, during the pre-processing step, regular-shaped and homogeneous subdomains possibly meshed in parallel. In addition, it allows to a significant reduction of the condition number for strongly heterogeneous problems solved by a FETI solver. A mixed domain decomposition method with a two-level Robin condition which is adapted to this decomposition could then be developped.The long term objective is to deal with problems with a quasi-industrial complexity like computations at the global structural scale with multi-scale materials such as tridimensional woven composites which are used increasingly intensively in the aeronautical industry for instance
Gholami, Bahman. "Application des systèmes de calcul à haute performance dans les études électrothermiques à l'échelle nanoscopique." Thèse, Université du Québec à Trois-Rivières, 2011. http://depot-e.uqtr.ca/2065/1/030259746.pdf.
Full textLebre, Adrien. "IOLi : contrôle, ordonnancement et régulation des accès aux données persistantes dans les environnements multi-applicatifs haute performance." Grenoble INPG, 2006. http://www.theses.fr/2006INPG0084.
Full textLots of scientific applications use and create vast amounts of data. Those often have specific ways to access data in non-sequential patterns (strided requests). To avoid performance loss, parallel I/O libraries such as ROMIO are often used to aggregate small separate requests into large contiguous ones. However, optimizations for a given applications are not aware of the whole set of interactions with other ones running at the same time on the cluster. As a consequence, most of the optimization work is lost because they will be disturbed by the other applications. This document presents a software service, named aIOLi, whose role is to control, reschedule and regulate the whole set of interactions coming from all applications running simultaneously on a cluster. Besides, the traditional POSIX API is maintained and used. In such a context, trade-off have to be found between performance, fairness and response time. To achieve this, an I/O scheduling algorithm together with a ``requests aggregator'' considering both application access patterns and global system load have been designed and merged into aIOLi. The aIOLi service consists of a new generic framework pluggable into any I/O file system. Several concurrent runs of the IOR benchmarks show significant improvements on read accesses with regards to POSIX and ROMIO calls
Capra, Antoine. "Virtualisation en contexte HPC." Thesis, Bordeaux, 2015. http://www.theses.fr/2015BORD0436/document.
Full textTo meet the growing needs of the digital simulation and remain at the forefront of technology, supercomputers must be constantly improved. These improvements can be hardware or software order. This forces the application to adapt to a new programming environment throughout its development. It then becomes necessary to raise the question of the sustainability of applications and portability from one machine to another. The use of virtual machines may be a first response to this need for sustaining stabilizing programming environments. With virtualization, applications can be developed in a fixed environment, without being directly impacted by the current environment on a physical machine. However, the additional abstraction induced by virtual machines in practice leads to a loss of performance. We propose in this thesis a set of tools and techniques to enable the use of virtual machines in HPC context. First we show that it is possible to optimize the operation of a hypervisor to respond accurately to the constraints of HPC that are : the placement of implementing son and memory data locality. Then, based on this, we have proposed a resource partitioning service from a compute node through virtual machines. Finally, to expand our work to use for MPI applications, we studied the network solutions and performance of a virtual machine
Dao, Van Toan. "Calcul à haute performance et simulations stochastiques : Etude de la reproductibiité numérique sur architectures multicore et manycore." Thesis, Université Clermont Auvergne (2017-2020), 2017. http://www.theses.fr/2017CLFAC005/document.
Full textThe reproducibility of numerical experiments on high performance computing systems is sometimes overlooked. Moreover, the numerical methods used for rigorous parallelization of stochastic simulations are often unknown. Indeed, the results obtained for a stochastic simulation using high performance computing systems can be different from run to run with the same parameters and the same execution contexts due to the impact of new architectures, accelerators, compilers, operating systems or a changing of the order of execution of the floating arithmetic operations within the micro-processors for parallelizing optimizations. In the case of non-repeatability of numerical experiments, how can we seriously develop a scientific application? What credit can be given to the parallel software thus developed? In this thesis, we synthesize the main causes of non-reproducibility for a parallel stochastic simulation using high performance computing systems. Unlike the usual parallelism works, we do not focus on improving performance, but on obtaining numerically repeatable results from one experiment to another. We present the reproducibility and its contributions to the science of experimental and numerical computing. Furthermore, we propose some contributions, in particular: to verify the reproducibility and portability of top modern pseudo-random number generators, to detect the correlation between parallel streams issued from such generators, to repeat and reproduce the numerical results of independent parallel stochastic simulations
Monna, Florence. "Ordonnancement pour les nouvelles plateformes de calcul avec GPUs." Electronic Thesis or Diss., Paris 6, 2014. http://www.theses.fr/2014PA066390.
Full textMore and more computers use hybrid architectures combining multi-core processors (CPUs) and hardware accelerators like GPUs (Graphics Processing Units). These hybrid parallel platforms require new scheduling strategies. This work is devoted to a characterization of this new type of scheduling problems. The most studied objective in this work is the minimization of the makespan, which is a crucial problem for reaching the potential of new platforms in High Performance Computing. The core problem studied in this work is scheduling efficiently n independent sequential tasks with m CPUs and k GPUs, where each task of the application can be processed either on a CPU or on a GPU, with minimum makespan. This problem is NP-hard, therefore we propose approximation algorithms with performance ratios ranging from 2 to (2q+1)/(2q)+1/(2qk), q>0, and corresponding polynomial time complexities. The proposed solving method is the first general purpose algorithm for scheduling on hybrid machines with a theoretical performance guarantee that can be used for practical purposes. Some variants of the core problem are studied: a special case where all the tasks are accelerated when assigned to a GPU, with a 3/2-approximation algorithm, a case where preemptions are allowed on CPUs, the same problem with malleable tasks, with an algorithm with a ratio of 3/2. Finally, we studied the problem with dependent tasks, providing a 6-approximation algorithm. Experiments based on realistic benchmarks have been conducted. Some algorithms have been integrated into the scheduler of the xKaapi runtime system for linear algebra kernels, and compared to the state-of-the-art algorithm HEFT
Bernal, Norena Alvaro. "Conception et étude d'une architecture de haute performance pour le calcul de la fonction exponentielle modulaire." Grenoble INPG, 1999. http://www.theses.fr/1999INPG0112.
Full textBruned, Vianney. "Analyse statistique et interprétation automatique de données diagraphiques pétrolières différées à l’aide du calcul haute performance." Thesis, Montpellier, 2018. http://www.theses.fr/2018MONTS064.
Full textIn this thesis, we investigate the automation of the identification and the characterization of geological strata using well logs. For a single well, geological strata are determined thanks to the segmentation of the logs comparable to multivariate time series. The identification of strata on different wells from the same field requires correlation methods for time series. We propose a new global method of wells correlation using multiple sequence alignment algorithms from bioinformatics. The determination of the mineralogical composition and the percentage of fluids inside a geological stratum results in an ill-posed inverse problem. Current methods are based on experts’ choices: the selection of a subset of mineral for a given stratum. Because of a model with a non-computable likelihood, an approximate Bayesian method (ABC) assisted with a density-based clustering algorithm can characterize the mineral composition of the geological layer. The classification step is necessary to deal with the identifiability issue of the minerals. At last, the workflow is tested on a study case