Dissertations / Theses on the topic 'High performances calculus'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'High performances calculus.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Jerad, Sadok. "Approches du second ordre de d'ordre élevées pour l'optimisation nonconvex avec variantes sans évaluation de la fonction objective." Electronic Thesis or Diss., Université de Toulouse (2023-....), 2024. http://www.theses.fr/2024TLSEP024.
Even though nonlinear optimization seems (a priori) to be a mature field, new minimization schemes are proposed or rediscovered for modern large-scale problems. As an example and in retrospect of the last decade, we have seen a surge of first-order methods with different analysis, despite the fact that well-known theoretical limitations of the previous methods have been thoroughly discussed.This thesis explores two main lines of research in the field of nonconvex optimization with a narrow focus on second and higher order methods.In the first series, we focus on algorithms that do not compute function values and operate without knowledge of any parameters, as the most popular currently used first-order methods fall into the latter category. We start by redefining the well-known Adagrad algorithm in a trust-region framework and use the latter paradigm to study two first-order deterministic OFFO (Objective-Free Function Optimization) classes. To enable faster exact OFFO algorithms, we then propose a pth-order deterministic adaptive regularization method that avoids the computation of function values. This approach recovers the well-known convergence rate of the standard framework when searching for stationary points, while using significantly less information.In the second set of papers, we analyze adaptive algorithms in the more classical framework where function values are used to adapt parameters. We extend adaptive regularization methods to a specific class of Banach spaces by developing a Hölder gradient descent algorithm. In addition, we investigate a second-order algorithm that alternates between negative curvature and Newton steps with a near-optimal convergence rate. To handle large problems, we propose subspace versions of the algorithm that show promising numerical performance.Overall, this research covers a wide range of optimization techniques and provides valuable insights and contributions to both parameter-free and adaptive optimization algorithms for nonconvex functions. It also opens the door for subsequent theoretical developments and the introduction of faster numerical algorithms
Peretti, Pezzi Guilherme. "High performance hydraulic simulations on the grid using Java and ProActive." Nice, 2011. http://www.theses.fr/2011NICE4118.
L’optimisation de la distribution de l’eau est un enjeu crucial qui a déjà été ciblé par de nombreux outils de modélisation. Des modèles utiles, implémentés il y a des décennies, ont besoin d’évoluer vers des formalismes et des environnements informatiques plus récents. Cette thèse présente la refonte d’un ancien logiciel de simulation hydraulique (IRMA) écrit en FORTRAN, qui a été utilisé depuis plus de 30 ans par la Société du Canal de Provence, afin de concevoir et maintenir les réseaux de distribution d’eau. IRMA a été développé visant principalement pour le traitement des réseaux d’irrigation – en utilisant le modèle probabiliste d’estimation de la demande de Clément – et il permet aujourd’hui de gérer plus de 6000 km de réseaux d’eau sous pression. L’augmentation de la complexité et de la taille des réseaux met en évidence le besoin de moderniser IRMA et de le réécrire dans un langage plus actuel (Java). Cette thèse présente le modèle de simulation implémenté dans IRMA, y compris les équations de perte de charge, les méthodes de linéarisation, les algorithmes d’analyse de la topologie, la modélisation des équipements et la construction du système linéaire. Quelques nouveaux types de simulation sont présentés : la demande en pointe avec une estimation probabiliste de la consommation (débit de Clément), le dimensionnement de pompe (caractéristiques indicées), l’optimisation des diamètres des tuyaux, et la variation de consommation en fonction de la pression. La nouvelle solution adoptée pour résoudre le système linéaire est décrite et une comparaison avec les solveurs existant en Java est présentée. La validation des résultats est réalisée d’abord avec une comparaison avec une comparaison entre les résultats obtenus avec l’ancienne version FORTRAN et la nouvelle solution, pour tous les réseaux maintenus par la Société du Canal de Provence. Une deuxième validation est effectuée en comparant des résultats obtenus à partir d’un outil de simulation standard et bien connu (EPANET). Concernant les performances de la nouvelle solution, des mesures séquentielles de temps sont présentées afin de les comparer avec l’ancienne version FORTRAN. Enfin, deux cas d’utilisation sont présentés afin de démontrer la capacité d’exécuter des simulations distribuées dans une infrastructure de grille, utilisant la solution ProActive. La nouvelle solution a déjà été déployée dans un environnement de production et démontre clairement son efficacité avec une réduction significative du temps de calcul, une amélioration de la qualité des résultats et une intégration facilitée dans le système d’information de la Société du Canal de Provence, notamment la base de données spatiales
Bondouy, Manon. "Construction de modèles réduits pour le calcul des performances des avions." Thesis, Toulouse 3, 2016. http://www.theses.fr/2016TOU30027/document.
The objective of this thesis is to provide a methodology and the associated tools in order to standardize the building process of performance and handling quality models. This typically leads to elaborate surrogate models in order to satisfy industrial contrasting objectives of memory size, accuracy and computation time. After listing the different steps of a construction of surrogates methodology and realizing a critical state of the art, Neural Networks and High Dimensional Model Representation methods have been selected and validated on low dimension functions. For functions of higher dimension, a reduction method based on the optimal selection of submodel surrogates has been developed which allows to satisfy the requirements on accuracy, computation time and memory size. The efficiency of this method has been demonstrated on an aircraft performance model which will be embedded into the avionic systems
Pawlowski, Filip igor. "High-performance dense tensor and sparse matrix kernels for machine learning." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEN081.
In this thesis, we develop high performance algorithms for certain computations involving dense tensors and sparse matrices. We address kernel operations that are useful for machine learning tasks, such as inference with deep neural networks (DNNs). We develop data structures and techniques to reduce memory use, to improve data locality and hence to improve cache reuse of the kernel operations. We design both sequential and shared-memory parallel algorithms. In the first part of the thesis we focus on dense tensors kernels. Tensor kernels include the tensor--vector multiplication (TVM), tensor--matrix multiplication (TMM), and tensor--tensor multiplication (TTM). Among these, TVM is the most bandwidth-bound and constitutes a building block for many algorithms. We focus on this operation and develop a data structure and sequential and parallel algorithms for it. We propose a novel data structure which stores the tensor as blocks, which are ordered using the space-filling curve known as the Morton curve (or Z-curve). The key idea consists of dividing the tensor into blocks small enough to fit cache, and storing them according to the Morton order, while keeping a simple, multi-dimensional order on the individual elements within them. Thus, high performance BLAS routines can be used as microkernels for each block. We evaluate our techniques on a set of experiments. The results not only demonstrate superior performance of the proposed approach over the state-of-the-art variants by up to 18%, but also show that the proposed approach induces 71% less sample standard deviation for the TVM across the d possible modes. Finally, we show that our data structure naturally expands to other tensor kernels by demonstrating that it yields up to 38% higher performance for the higher-order power method. Finally, we investigate shared-memory parallel TVM algorithms which use the proposed data structure. Several alternative parallel algorithms were characterized theoretically and implemented using OpenMP to compare them experimentally. Our results on up to 8 socket systems show near peak performance for the proposed algorithm for 2, 3, 4, and 5-dimensional tensors. In the second part of the thesis, we explore the sparse computations in neural networks focusing on the high-performance sparse deep inference problem. The sparse DNN inference is the task of using sparse DNN networks to classify a batch of data elements forming, in our case, a sparse feature matrix. The performance of sparse inference hinges on efficient parallelization of the sparse matrix--sparse matrix multiplication (SpGEMM) repeated for each layer in the inference function. We first characterize efficient sequential SpGEMM algorithms for our use case. We then introduce the model-parallel inference, which uses a two-dimensional partitioning of the weight matrices obtained using the hypergraph partitioning software. The model-parallel variant uses barriers to synchronize at layers. Finally, we introduce tiling model-parallel and tiling hybrid algorithms, which increase cache reuse between the layers, and use a weak synchronization module to hide load imbalance and synchronization costs. We evaluate our techniques on the large network data from the IEEE HPEC 2019 Graph Challenge on shared-memory systems and report up to 2x times speed-up versus the baseline
Cohet, Romain. "Transport des rayons cosmiques en turbulence magnétohydrodynamique." Thesis, Montpellier, 2015. http://www.theses.fr/2015MONTS051/document.
In this thesis, we study the transport properties of high energy charged particles in turbulent electromagnetic fields.These fields were generated by using the magnetohydrodynamic (MHD) code RAMSES, which solve the compressible ideal MHD equations. We have developed a module for generating the MHD turbulence, by using a large scale forcing technique. The MHD equations induce a cascading of the energy from large scales to small ones, developing an energy spectrum which follows a power law, called the inertial range.We have developed a module for computing the charged particle trajectories once the turbulent spectrum is established. By injecting the particles to energy such as the inverse of the particle Larmor radius corresponds to a mode in the inertial range of the Fourier spectrum, we have highlighted systematic effects related to the power law spectrum. This method showed that the mean free path is independent of the particules energy until the Larmor radius takes values close to the turbulence coherence scale. The dependence of the mean free path with the alfvénic Mach number produced a power law.We have also developed a technique to measure the anisotropy effect of the MHD turbulence in the cosmic rays transport properties through the calculation of local magnetic fields. This study has shown an effect on the pitch angle scattering coefficient, which confirmed the assumption that the particles are more sensitive to changes in small scales fluctuations
Applencourt, Thomas. "Calcul haute performance & chimie quantique." Thesis, Toulouse 3, 2015. http://www.theses.fr/2015TOU30162/document.
This thesis work has two main objectives: 1. To develop and apply original electronic structure methods for quantum chemistry 2. To implement several computational strategies to achieve efficient large-scale computer simulations. In the first part, both the Configuration Interaction (CI) and the Quantum Monte Carlo (QMC) methods used in this work for calculating quantum properties are presented. We then describe more specifically the selected CI approach (so-called CIPSI approach, Configuration Interaction using a Perturbative Selection done Iteratively) that we used for building trial wavefunctions for QMC simulations. As a first application, we present the QMC calculation of the total non-relativistic energies of transition metal atoms of the 3d series. This work, which has required the implementation of Slater type basis functions in our codes, has led to the best values ever published for these atoms. We then present our original implementation of the pseudo-potentials for QMC and discuss the calculation of atomization energies for a benchmark set of 55 organic molecules. The second part is devoted to the Hight Performance Computing (HPC) aspects. The objective is to make possible and/or facilitate the deployment of very large-scale simulations. From the point of view of the developer it includes: The use of original programming paradigms, single-core optimization process, massively parallel calculations on grids (supercomputer and Cloud), development of collaborative tools , etc - and from the user's point of view: Improved code installation, management of the input/output parameters, GUI, interfacing with other codes, etc
Lagardère, Louis. "Calcul haute-performance et dynamique moléculaire polarisable." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066042.
This works is at the interface between theoretical chemistry, scientific computing and applied mathematics. We study different algorithms used to solve the specific equations that arise in polarizable molecular dynamics in a massively parallel context. This family of models requires indeed to solve more complex equations than in the classical case making the use of supercomputers mandatory in order to get significant results. We will more specifically study different types of boundary conditions that represent different ways to model solvation effects : first the Particle Mesh Ewald method to treat periodic boundary conditions and then a continuum solvation model discretized within a domain decomposition strategy : the ddCOSMO. The outline of this thesis is as follows : first, the different parallel strategies in the general context of molecular dynamics are reviewed. Then several methods to adapt these strategies to the specific case of polarizable force fields are presented. After that, strategies that allow to circumvent certain limits due to the use of iterative methods in the context of polarizable molecular dynamics are presented and studied. Then, the adapation of these methods to different cases of boundary conditions is presented : first in the case of the Particle Mesh Ewald method to treat periodic boundary conditions and then in the case of a particular continuum solvation model discretized with a domain decomposition strategy : the ddCOSMO. Finally, various numerical results and applications are presented
Guilloteau, Quentin. "Une approche autonomique à la régulation en ligne de systèmes HPC, avec un support pour la reproductibilité des expériences." Electronic Thesis or Diss., Université Grenoble Alpes, 2023. http://www.theses.fr/2023GRALM075.
High-Performance Computing (HPC) systems have become increasingly more complex, and their performance and power consumption make them less predictable.This unpredictability requires cautious runtime management to guarantee an acceptable Quality-of-Service to the end users.Such a regulation problem arises in the context of the computing grid middleware CiGri that aims at harvesting the idle computing resources of a set of cluster by injection low priority jobs.A too aggressive harvesting strategy can lead to the degradation of the performance for all the users of the clusters, while a too shy harvesting will leave resources idle and thus lose computing power.There is thus a tradeoff between the amount of resources that can be harvested and the resulting degradation of users jobs, which can evolve at runtime based on Service Level Agreements and the current load of the system.We claim that such regulation challenges can be addressed with tools from Autonomic Computing, and in particular when coupled with Control Theory.This thesis investigates several regulation problems in the context of CiGri with such tools.We will focus on regulating the harvesting based on the load of a shared distributed file-system, and improving the overall usage of the computing resources.We will also evaluate and compare the reusability of the proposed control-based solutions in the context of HPC systems.The experiments done in this thesis also led us to investigate new tools and techniques to improve the cost and reproducibility of the experiments.We will present a tool named NixOS-Compose able to generate and deploy reproducible distributed software environments.We will also investigate techniques to reduce the number of machines needed to deploy experiments on grid or cluster middlewares, such as CiGri, while ensuring an acceptable level of realism for the final deployed system
Jolivet, Pierre. "Méthodes de décomposition de domaine. Application au calcul haute performance." Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENM040/document.
This thesis introduces a unified framework for various domain decomposition methods:those with overlap, so-called Schwarz methods, and those based on Schur complements,so-called substructuring methods. It is then possible to switch with a high-level of abstractionbetween methods and to build different preconditioners to accelerate the iterativesolution of large sparse linear systems. Such systems are frequently encountered in industrialor scientific problems after discretization of continuous models. Even though thesepreconditioners naturally exhibit good parallelism properties on distributed architectures,they can prove inadequate numerical performance for complex decompositions or multiscalephysics. This lack of robustness may be alleviated by concurrently solving sparse ordense local generalized eigenvalue problems, thus identifying modes that hinder the convergenceof the underlying iterative methods a priori. Using these modes, it is then possibleto define projection operators based on what is usually referred to as a coarse solver. Theseauxiliary tools tend to solve the aforementioned issues, but typically decrease the parallelefficiency of the preconditioners. In this dissertation, it is shown in three points thatthe newly developed construction is efficient: 1) by performing large-scale numerical experimentson Curie—a European supercomputer, and by comparing it with state of the art2) multigrid and 3) direct solvers
Hascoët, Julien. "Contributions to Software Runtime for Clustered Manycores Applied to Embedded and High-Performance Applications." Thesis, Rennes, INSA, 2018. http://www.theses.fr/2018ISAR0029/document.
The growing need for computing is more and more challenging, especially in the embedded system world with autonomous cars, drones, and smartphones. New highly parallel and heterogeneous processors emerge to answer this challenge. They operate in constrained environments with real-time requirements, reduced power consumption, and safety. Programming these new chips is a time-consuming and challenging task leading to huge software development costs. The Kalray MPPA® processor is a competitive example for low-power super-computing on a single chip. It integrates up to 288 VLIW cores grouped in 18 clusters, each fitted with shared local memory. These clusters are interconnected with a high-bandwidth network-on-chip, and DMA engines are used to communicate. This processor is used in this thesis for experimental results. We propose the AOS library enabling highperformance communications and synchronizations of distributed local memories on clustered manycores. AOS provides 70% of the peak hardware throughput for transfers larger than 8 KB. We propose tools for the implementation of static and dynamic dataflow programs based on AOS to accelerate the parallel application developments onto clustered manycores. We propose an implementation of OpenVX for clustered manycores on top of AOS. OpenVX is a standard based on dataflow for the development of computer vision and neural network computing. The proposed OpenVX implementation includes automatic optimizations like data prefetch to overlap communications and computations, or kernel fusion to avoid the main memory bandwidth bottleneck. Results show super-linear speedups
Bouvier, Clément. "Sélection de caractéristiques stables pour la segmentation d'images histologiques par calcul haute performance." Thesis, Sorbonne université, 2019. http://www.theses.fr/2019SORUS004.
In preclinical research and more specifically in neurobiology, histology uses images produced by increasingly powerful optical microscopes digitizing entire sections at cell scale. Quantification of stained tissue such as neurons relies on machine learning driven segmentation. However such methods need a lot of additional information, or features, which are extracted from raw data multiplying the quantity of data to process. As a result, the quantity of features is becoming a drawback to process large series of histological images in a fast and robust manner. Feature selection methods could reduce the amount of required information but selected subsets lack of stability. We propose a novel methodology operating on high performance computing (HPC) infrastructures and aiming at finding small and stable sets of features for fast and robust segmentation on high-resolution histological whole sections. This selection has two selection steps: first at feature families scale (an intermediate pool of features, between space and individual feature). Second, feature selection is performed on pre-selected feature families. In this work, the selected sets of features are stables for two different neurons staining. Furthermore the feature selection results in a significant reduction of computation time and memory cost. This methodology can potentially enable exhaustive histological studies at a high-resolution scale on HPC infrastructures for both preclinical and clinical research settings
Bouvier, Clément. "Sélection de caractéristiques stables pour la segmentation d'images histologiques par calcul haute performance." Electronic Thesis or Diss., Sorbonne université, 2019. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2019SORUS004.pdf.
In preclinical research and more specifically in neurobiology, histology uses images produced by increasingly powerful optical microscopes digitizing entire sections at cell scale. Quantification of stained tissue such as neurons relies on machine learning driven segmentation. However such methods need a lot of additional information, or features, which are extracted from raw data multiplying the quantity of data to process. As a result, the quantity of features is becoming a drawback to process large series of histological images in a fast and robust manner. Feature selection methods could reduce the amount of required information but selected subsets lack of stability. We propose a novel methodology operating on high performance computing (HPC) infrastructures and aiming at finding small and stable sets of features for fast and robust segmentation on high-resolution histological whole sections. This selection has two selection steps: first at feature families scale (an intermediate pool of features, between space and individual feature). Second, feature selection is performed on pre-selected feature families. In this work, the selected sets of features are stables for two different neurons staining. Furthermore the feature selection results in a significant reduction of computation time and memory cost. This methodology can potentially enable exhaustive histological studies at a high-resolution scale on HPC infrastructures for both preclinical and clinical research settings
Rubeck, Christophe. "Calcul hautes performances pour les formulations intégrales en électromagnétisme basses fréquences." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00793505.
Trahay, François. "De l’interaction des communications et de l’ordonnancement de threads au sein des grappes de machines multi-cœurs." Thesis, Bordeaux 1, 2009. http://www.theses.fr/2009BOR13870/document.
The current trend of constructors for scientific computation is to build clusters whose node include an increasing number of cores.The classical programming model that is only based on MPI is being replaced by hybrid approaches that mix communication and multi-threading. This evolution of the programming model leads to numerous problems since MPI implementations were not designed for multi-threaded applications. In this thesis, in order to guarantee a smooth behavior of communication, we propose a software module that interact with both the threads scheduler and the communication library. This module, by working closely with the thread scheduler, allows to make communication progress in the background and guarantees a high level of reactivity to network events, even when the node is overloaded. We show that this permits to make communication progress in the background and thus to overlap communication and computation. The parallelization of the communication library is also made easier thanks to a task onloading mechanism that is able to exploit the available cores while taking data locality into account. The results we obtain on synthetic application as well as real-life applications show that the interaction between the thread scheduler and the communication library allows to reduce the overhead of communication and thus to improve the application performance
Durocher, Arnaud. "Simulations massives de Dynamique des Dislocations : fiabilité et performances sur architectures parallèles et distribuées." Thesis, Bordeaux, 2018. http://www.theses.fr/2018BORD0423/document.
Dislocation dynamics simulations investigate the behavior of linear defects, called dislocations, in crystalline materials. It is an essential part multiscale modelling of the materials, used for instance in the nuclear industry to characterize the behavior and aging of materials under irradiation. The ability of dislocations to multiply, annihilate and interact presents many challenges, for instance in terms of storage and access to data. This thesis addresses some challenges of dislocation dynamics simulation on parallel and distributed computers. In this thesis, I improve the Optidis simulator to open the way to more complex simulations. My contributions focuses mainly on improving the reliability and performance of Optidis. A new interface to access simulation data is proposed to dissociate its implementation form the physical algorithms. This data structure allows better performance as well as better code maintainability, even with distributed data. A new fast and reliable collision detection and handling algorithm has been implemented. Collision detection techniques from the robotics and 3D animation industries are used to speedup the detection process. With the use of the new data structure and a more reliable design, this algorithm enables more precise collision handling and the use of a larger simulation timestep. The precision of the results have been measured by comparing Optidis to Numodis. The performance of the code has been studied on larger scale simulations with millions of segments and hundreds of CPU cores, demonstrating that such simulations can now be achieved
Cornea, Bogdan Florin. "Prédiction de performances d’applications de calcul distribué exécutées sur une architecture pair-à-pair." Thesis, Besançon, 2011. http://www.theses.fr/2011BESA2012/document.
In the field of high performance computing, the architectures evolve continuously. In order to increase the number of computing nodes or the network speed, an important investment must be considered, from both temporal and financial point of view. Performance prediction methods aim at assisting in finding the best trade-off for such an investment. At the same time, P2P HPC systems have known an increase in development. These heterogeneous architectures would allow solving scientific problems at a low cost, with respect to dedicated systems.The manuscript presents a new method for performance prediction. This method applies to real applications for distributed computing, considered in a real execution environment. This method uses information about the different compiler optimization levels. The prediction results are obtained with reduced slowdown and are scalable. This thesis took shape in the development of the dPerf tool. dPerf predicts the performances of C, C++, and Fortran application, which use MPI or P2P-SAP to communicate. The applications modeled by dPerf are meant for execution on P2P heterogeneous architectures, with a decentralized communication topology. The accuracy of dPerf has been studied on three applications: (i) the Laplace transform, for sequential codes, (ii) the NAS Integer Sort benchmark for distributed MPI programs, (iii) and the obstacle problem, for the decentralized P2P computing and the scaling of the number of computing nodes
Dao, Quang Minh. "High performance processing of metagenomics data." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS203.
The assessment and characterization of the gut microbiome has become a focus of research in the area of human autoimmune diseases. Many diseases such as obesity, inflammatory bowel (IBD), lean or beses twins, colorectal cancers and so on (Qin et al. 2010; Turnbaugh et al. 2009) have already been found to be associated with changes in the human microbiome. To investigate these relationships, quantitative metagenomics (QM) studies based on sequencing data could be performed. Understanding the role of the microbiome in human health and how it can be modulated is becoming increasingly relevant for precision medicine and for the medical management of chronic diseases. Results from such QM studies which report the organisms present in the samples and profile their abundances, will be used for continuous analyses. The terms microbiome and microbiota are used indistinctly to describe the community of microorganisms that live in a given environment. The development of high-throughput DNA sequencing technologies has boosted microbiome research through the study of microbial genomes allowing a more precise quantification of microbial and functional abundance. However, microbiome data analysis is challenging because it involves high-dimensional structured multivariate sparse data and because of its compositional structure of microbiome data. The data preprocessing is typically implemented as a pipeline (workflow) with third-party software that each process input files and produces output files. The pipelines are often deep, with ten or more tools, which could be very diverse from different languages such as R, Python, Perl etc. and integrated into different frameworks (Leipzig 2017) such as Galaxy, Apache Taverna, Toil etc. The challenges with existing approaches is that they are not always efficient with very large datasets in terms of scalability for individual tools in a metagenomics pipeline and their execution speed also has not met the expectations of the bioinformaticians. To date, more and more data are captured or generated from many different research areas such as Physics, Climatology, Sociology, Remote sensing or Management as well as bioinformatics. Indeed, Big Data Analytics (BDA) describes the unprecedented growth of data generated and collected from all kinds of data sources as mentioned above. This growth could be in the volume of data, in the speed of data moving in/out or in the speed of analyzing data which depends on high-performance computing (HPC) technologies. In the past few decades since the invention of the computer, HPC has contributed significantly to our quality of life - driving scientific innovation, enhancing engineering design and consumer goods manufacturing, as well as strengthening national and international security. This has been recognised and emphasised by both government and industry, with major ongoing investments in areas encompassing weather forecasting, scientific research and development as well as drug design and healthcare outcomes. In many ways, those two worlds (HPC and big data) are slowly, but surely converging. They are the keys to overcome limitations of bioinformatics analysis in general and quantitative metagenomics analysis in particular. Within the scope of this thesis, we contributed a novel bioinformatics framework and pipeline called QMSpy which helped bioinformaticians overcome limitations related to HPC and big data domains in the context of quantitative metagenomics. QMSpy tackles two challenges introduced by large scale NGS data: (i) sequencing data alignment - a computation intensive task and (ii) quantify metagenomics objects - a memory intensive task. By leveraging the powerful distributed computing engine (Apache Spark), in combination with the workflow management of big data processing (Hortonwork Data Platform), QMSpy allows us not only to bypass [...]
Barthou, Denis. "Contributions to code optimization and high performance library generation." Habilitation à diriger des recherches, Université de Versailles-Saint Quentin en Yvelines, 2008. http://tel.archives-ouvertes.fr/tel-00551683.
Capra, Antoine. "Virtualisation en contexte HPC." Thesis, Bordeaux, 2015. http://www.theses.fr/2015BORD0436/document.
To meet the growing needs of the digital simulation and remain at the forefront of technology, supercomputers must be constantly improved. These improvements can be hardware or software order. This forces the application to adapt to a new programming environment throughout its development. It then becomes necessary to raise the question of the sustainability of applications and portability from one machine to another. The use of virtual machines may be a first response to this need for sustaining stabilizing programming environments. With virtualization, applications can be developed in a fixed environment, without being directly impacted by the current environment on a physical machine. However, the additional abstraction induced by virtual machines in practice leads to a loss of performance. We propose in this thesis a set of tools and techniques to enable the use of virtual machines in HPC context. First we show that it is possible to optimize the operation of a hypervisor to respond accurately to the constraints of HPC that are : the placement of implementing son and memory data locality. Then, based on this, we have proposed a resource partitioning service from a compute node through virtual machines. Finally, to expand our work to use for MPI applications, we studied the network solutions and performance of a virtual machine
Verdicchio, Marco. "Molecular simulations as test beds for bridging high throughput and high performance computing." Toulouse 3, 2012. http://thesesups.ups-tlse.fr/2236/.
The strong connotation of computational chemistry in terms of computer technologies is at the same time the strength and the weakness of molecular simulations. As a matter of fact, in order to perform such studies (even for few-atom systems) we first need to carry out high-level electronic structure calculations. These calculations typically require nodes (or clusters of nodes) equipped with large (of the order of many GB) memories and processors performing at the level of several Gigaflops. This is because the whole Potential Energy Surface (PES) governing the nuclear motion needs to be worked out first. On the High Performance Computing (HPC) platforms with enhanced parallel capabilities we can run concurrently, on several single multicore (or clusters of) processors, the calculations required by the (large number of) potential energy values necessary to describe the PES explored by a reactive chemical process. The real bottleneck in carrying out related computational campaigns, indeed, is represented by the availability of a computing platform having the proper computational requirements in terms of computing time and physical memory. The (limited) computing capabilities in general available to the scientific community, in fact, still set severe limitations to the development of full a priori computational simulations of molecular processes. Fortunately, innovative computing technologies combining concurrency and networking (such as distributed computing, virtual laboratories, supercomputing, Grid computing) are opening new prospects to the possibility of achieving significant computational throughputs and, therefore, of developing a priori molecular simulations of real systems. The theoretical foundations and the computing paradigms employed for the assemblage of the components of the Grid Empowered Molecular Simulator GEMS are described in Chapter 1. In that chapter the development of grid based workflows allowing the ab initio evaluation of the observable properties of small chemical systems starting from the calculation of the electronic properties is illustrated. In Chapter 2 the issue of the of interoperability between computational codes across different stages of the workflow is faced. The Chapter proposes Q5cost and D5cost common data models as de facto standard formats for quantum chemistry calculations. Chapter 3 relates to the results of standalone ab initio calculations performed on different small chemical systems (X4 clusters and BeH- dimer). The Chapter discusses particular and interesting chemical bonds requiring high-level quantum methods to the end of being rationalized. Finally Chapter 4 and Chapter 5 report the results of our work on two combustion and atmospheric chemistry problems (CH3CH2OO• isomerization and N2+N2 reaction) respectively. They both aim at constructing the PES for a reactive process. Once a PES is generated, the kinetic and dynamical data need to be calculated for a large number of initial conditions, and can be computed on HTC platforms. The assemblage of the computational workflows for the coupled use of HPC and HTC systems is also dealt there
Lanore, Vincent. "On Scalable Reconfigurable Component Models for High-Performance Computing." Thesis, Lyon, École normale supérieure, 2015. http://www.theses.fr/2015ENSL1051/document.
Component-based programming is a programming paradigm which eases code reuse and separation of concerns. Some component models, which are said to be "reconfigurable", allow the modification at runtime of an application's structure. However, these models are not suited to High-Performance Computing (HPC) as they rely on non-scalable mechanisms.The goal of this thesis is to provide models, algorithms and tools to ease the development of component-based reconfigurable HPC applications.The main contribution of the thesis is the DirectMOD component model which eases development and reuse of distributed transformations. In order to improve on this core model in other directions, we have also proposed:• the SpecMOD formal component model which allows automatic specialization of hierarchical component assemblies and provides high-level software engineering features;• mechanisms for efficient fine-grain reconfiguration for AMR applications, an important application class in HPC.An implementation of DirectMOD, called DirectL2C, as been developed so as to implement a series of benchmarks to evaluate our approach. Experiments on HPC architectures show our approach scales. Moreover, a quantitative analysis of the benchmark's codes show that our approach is compact and eases reuse
Boyer, Alexandre. "Contributions to Computing needs in High Energy Physics Offline Activities : Towards an efficient exploitation of heterogeneous, distributed and shared Computing Resources." Electronic Thesis or Diss., Université Clermont Auvergne (2021-...), 2022. http://www.theses.fr/2022UCFAC108.
Pushing the boundaries of sciences and providing more advanced services to individuals and communities continuously demand more sophisticated software, specialized hardware, and a growing need for computing power and storage. At the beginning of the 2020s, we are entering a heterogeneous and distributed computing era where resources will be limited and constrained. Grid communities need to adapt their approach: (i) applications need to support various architectures; (ii) workload management systems have to manage various computing paradigms and guarantee a proper execution of the applications, regardless of the constraints of the underlying systems. This thesis focuses on the latter point through the case of the LHCb experiment.The LHCb collaboration currently relies on an infrastructure involving 170 computing centers across the world, the World LHC Computing Grid, to process a growing amount of Monte Carlo simulations, reproducing the experimental conditions of the experiment. Despite its huge size, it will be unable to handle simulations coming from the next LHC runs in a decent time. In the meantime, national science programs are consolidating computing resources and encourage using supercomputers, which provide a tremendous amount of computing power but pose higher integration challenges.In this thesis, we propose different approaches to supply distributed and shared computing resources with LHCb tasks. We developed methods to increase the number of computing resources allocations and their duration. It resulted in an improvement of the LHCb job throughput on a grid infrastructure (+40.86%). We also designed a series of software solutions to address highly-constrained environment issues that can be found in supercomputers, such as lack of external connectivity and software dependencies. We have applied those concepts to leverage computing power from four partitions of supercomputers ranked in the Top500
Ho, Minh Quan. "Optimisation de transfert de données pour les processeurs pluri-coeurs, appliqué à l'algèbre linéaire et aux calculs sur stencils." Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAM042/document.
Upcoming Exascale target in High Performance Computing (HPC) and disruptive achievements in artificial intelligence give emergence of alternative non-conventional many-core architectures, with energy efficiency typical of embedded systems, and providing the same software ecosystem as classic HPC platforms. A key enabler of energy-efficient computing on many-core architectures is the exploitation of data locality, specifically the use of scratchpad memories in combination with DMA engines in order to overlap computation and communication. Such software paradigm raises considerable programming challenges to both the vendor and the application developer. In this thesis, we tackle the memory transfer and performance issues, as well as the programming challenges of memory- and compute-intensive HPC applications on he Kalray MPPA many-core architecture. With the first memory-bound use-case of the lattice Boltzmann method (LBM), we provide generic and fundamental techniques for decomposing three-dimensional iterative stencil problems onto clustered many-core processors fitted withs cratchpad memories and DMA engines. The developed DMA-based streaming and overlapping algorithm delivers 33%performance gain over the default cache-based implementation.High-dimensional stencil computation suffers serious I/O bottleneck and limited on-chip memory space. We developed a new in-place LBM propagation algorithm, which reduces by half the memory footprint and yields 1.5 times higher performance-per-byte efficiency than the state-of-the-art out-of-place algorithm. On the compute-intensive side with dense linear algebra computations, we build an optimized matrix multiplication benchmark based on exploitation of scratchpad memory and efficient asynchronous DMA communication. These techniques are then extended to a DMA module of the BLIS framework, which allows us to instantiate an optimized and portable level-3 BLAS numerical library on any DMA-based architecture, in less than 100 lines of code. We achieve 75% peak performance on the MPPA processor with the matrix multiplication operation (GEMM) from the standard BLAS library, without having to write thousands of lines of laboriously optimized code for the same result
Cordeiro, Daniel. "The impact of cooperation on new high performance computing platforms." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00690908.
Denoyelle, Nicolas. "De la localité logicielle à la localité matérielle sur les architectures à mémoire partagée, hétérogène et non-uniforme." Thesis, Bordeaux, 2018. http://www.theses.fr/2018BORD0201/document.
Through years, the complexity of High Performance Computing (HPC) systems’ memory hierarchy has increased. Nowadays, large scale machines typically embed several levels of caches and a distributed memory. Recently, on-chip memories and non-volatile PCIe based flash have entered the HPC landscape. This memory architecture is a necessary pain to obtain high performance, but at the cost of a thorough task and data placement. Hardware managed caches used to hide the tedious locality optimizations. Now, data locality, in local or remote memories, in fast or slow memory, in volatile or non-volatile memory, with small or wide capacity, is entirely software manageable. This extra flexibility grants more freedom to application designers but with the drawback of making their work more complex and expensive. Indeed, when managing tasks and data placement, one has to account for several complex trade-offs between memory performance, size and features. This thesis has been supervised between Atos Bull Technologies and Inria Bordeaux – Sud-Ouest. In the hereby document, we detail contemporary HPC systems and characterize machines performance for several locality scenarios. We explain how the programming language semantics affects data locality in the hardware, and thus applications performance. Through a joint work with the INESC-ID laboratory in Lisbon, we propose an insightful extension to the famous Roofline performance model in order to provide locality hints and improve applications performance. We also present a modeling framework to map platform and application performance events to the hardware topology, in order to extract synthetic locality metrics. Finally, we propose an automatic locality policy selector, on top of machine learning algorithms, to easily improve applications tasks and data placement
Le, Fevre Valentin. "Resilient scheduling algorithms for large-scale platforms." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEN019.
This thesis focuses on a major problem for the HPC community: resilience. Computing platforms are bigger and bigger in order to reach what we call exascale, i.e. a computing capacity of 10^18 FLOP/s but they suffer numerous failures. Reducing the execution time and handling the errors are two linked problems: for instance, replication (computing redudancy) decreases the number of critical failures but also decreases the number of available resources. In particular, this thesis focuses on several “checkpoint/restart” mechanisms.(saving the state of an application to restart from that save when a failure occurs): the first part investigates checkpointing on several levels, the use of additional resources to cope with system latency and checkpointing in generic task-graphs. The second part deals with optimal checkpointing strategies when coupled with replication (in linear task graphs, on heterogeneous platforms and with process duplication). The last part explores several scheduling problems linked to increasing disruptions in large-scale platforms
Nguyen, The Tung. "Un environnement pour le calcul intensif pair à pair." Thesis, Toulouse, INPT, 2011. http://www.theses.fr/2011INPT0105/document.
The concept of peer-to-peer (P2P) has known great developments these years in the domains of file sharing, video streaming or distributed databases. Recent advances in microprocessors architecture and networks permit one to consider new applications like distributed high performance computing. However, the implementation of this new type of application on P2P networks gives raise to numerous challenges like heterogeneity, scalability and robustness. In addition, existing transport protocols like TCP and UDP are not well suited to this new type of application. This thesis aims at designing a decentralized and robust environment for the implementation of high performance computing applications on peer-to-peer networks. We are interested in applications in the domains of numerical simulation and optimization that rely on tasks parallel models and that are solved via parallel or distributed iterative algorithms. Unlike existing solutions, our environment allows frequent direct communications between peers. The environment is based on a self adaptive communication protocol that can reconfigure itself dynamically by choosing the most appropriate communication mode between any peers according to decisions concerning algorithmic choice made at the application level or elements of context at transport level, like topology. We present and analyze computational results obtained on several testeds like GRID’5000 and PlanetLab for the obstacle problem and nonlinear network flow problems
Saillard, Emmanuelle. "Static/Dynamic Analyses for Validation and Improvements of Multi-Model HPC Applications." Thesis, Bordeaux, 2015. http://www.theses.fr/2015BORD0176/document.
Supercomputing plays an important role in several innovative fields, speeding up prototyping or validating scientific theories. However, supercomputers are evolving rapidly with now millions of processing units, posing the questions of their programmability. Despite the emergence of more widespread and functional parallel programming models, developing correct and effective parallel applications still remains a complex task. Although debugging solutions have emerged to address this issue, they often come with restrictions. However programming model evolutions stress the requirement for a convenient validation tool able to handle hybrid applications. Indeed as current scientific applications mainly rely on the Message Passing Interface (MPI) parallel programming model, new hardwares designed for Exascale with higher node-level parallelism clearly advocate for an MPI+X solutions with X a thread-based model such as OpenMP. But integrating two different programming models inside the same application can be error-prone leading to complex bugs - mostly detected unfortunately at runtime. In an MPI+X program not only the correctness of MPI should be ensured but also its interactions with the multi-threaded model, for example identical MPI collective operations cannot be performed by multiple nonsynchronized threads. This thesis aims at developing a combination of static and dynamic analysis to enable an early verification of hybrid HPC applications. The first pass statically verifies the thread level required by an MPI+OpenMP application and outlines execution paths leading to potential deadlocks. Thanks to this analysis, the code is selectively instrumented, displaying an error and synchronously interrupting all processes if the actual scheduling leads to a deadlock situation
Aubert, Pierre. "Calcul haute performance pour la détection de rayon Gamma." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLV058/document.
The new generation research experiments will introduce huge data surge to a continuously increasing data production by current experiments. This increasing data rate causes upheavals at many levels, such as data storage, analysis, diffusion and conservation.The CTA project will become the utmost observatory of gamma astronomy on the ground from 2021. It will generate hundreds Peta-Bytes of data by 2030 and will have to be stored, compressed and analyzed each year.This work address the problems of data analysis optimization using high performance computing techniques via an efficient data format generator, very low level programming to optimize the CPU pipeline and vectorization of existing algorithms, introduces a fast compression algorithm for integers and finally exposes a new analysis algorithm based on efficient pictures comparison
Hermellin, Emmanuel. "Modélisation et implémentation de simulations multi-agents sur architectures massivement parallèles." Thesis, Montpellier, 2016. http://www.theses.fr/2016MONTT334/document.
Multi-Agent Based Simulations (MABS) represents a relevant solution for the engineering and the study of complex systems in numerous domains (artificial life, biology, economy, etc.). However, MABS sometimes require a lot of computational resources, which is a major constraint that restricts the possibilities of study for the considered models (scalability, real-time interaction, etc.).Among the available technologies for HPC (High Performance Computing), the GPGPU (General-Purpose computing on Graphics Processing Units) proposes to use the massively parallel architectures of graphics cards as computing accelerator. However, while many areas benefit from GPGPU performances (meteorology, molecular dynamics, finance, etc.). Multi-Agent Systems (MAS) and especially MABS hardly enjoy the benefits of this technology: GPGPU is very little used and only few works are interested in it. In fact, the GPGPU comes along with a very specific development context which requires a deep and not trivial transformation process for multi-agents models. So, despite the existence of works that demonstrate the interest of GPGPU, this difficulty explains the low popularity of GPGPU in the MAS community.In this thesis, we show that among the works which aim to ease the use of GPGPU in an agent context, most of them do it through a transparent use of this technology. However, this approach requires to abstract some parts of the models, what greatly limits the scope of the proposed solutions. To handle this issue, and in contrast to existing solutions, we propose to use a nhybrid approach (the execution of the simulation is shared between both the processor and graphics card) that focuses on accessibility and reusability through a modeling process that allows to use directly GPU programming while simplifying its use. More specifically, this approach is based on a design principle, called GPU delegation of agent perceptions, consists in making a clear separation between the agent behaviors, managed by the processor, and environmental dynamics, handled by the graphics card. So, one major idea underlying this principle is to identify agent computations which can be transformed in new structures (e.g. in the environment) in order to distribute the complexity of the code and modulate its implementation. The study of this principle and the different experiments conducted show the advantages of this approach from both a conceptual and performances point of view. Therefore, we propose to generalize this approach and define a comprehensive methodology relying on GPU delegation specifically adapted to the use of massively parallel architectures for MABS
Pino, Munoz Daniel Humberto. "High-performance computing of sintering process at particle scale." Phd thesis, Ecole Nationale Supérieure des Mines de Saint-Etienne, 2012. http://tel.archives-ouvertes.fr/tel-00843105.
Georgiou, Yiannis. "Contributions for resource and job management in high performance computing." Grenoble, 2010. http://www.theses.fr/2010GRENM079.
High Performance Computing is characterized by the latest technological evolutions in computing architectures and by the increasing needs of applications for computing power. A particular middleware called Resource and Job Management System (RJMS), is responsible for delivering computing power to applications. The RJMS plays an important role in HPC since it has a strategic place in the whole software stack because it stands between the above two layers. However, the latest evolutions in hardware and applications layers have provided new levels of complexities to this middleware. Issues like scalability, management of topological constraints, energy efficiency and fault tolerance have to be particularly considered, among others, in order to provide a better system exploitation from both the system and user point of view. This dissertation provides a state of the art upon the fundamental concepts and research issues of Resources and Jobs Management Systems. It provides a multi-level comparison (concepts, functionalities, performance) of some Resource and Jobs Management Systems in High Performance Computing. An important metric to evaluate the work of a RJMS on a platform is the observed system utilization. However, studies and logs of production platforms show that HPC systems in general suffer of significant un-utilization rates. Our study deals with these clusters' un-utilization periods by proposing methods to aggregate otherwise un-utilized resources for the benefit of the system or the application. More particularly this thesis explores RJMS level mechanisms: 1) for increasing the jobs valuable computation rates in the high volatile environments of a lightweight grid context, 2) for improving system utilization with malleability techniques and 3) providing energy efficient system management through the exploitation of idle computing machines. The experimentation and evaluation in this type of contexts provide important complexities due to the inter-dependency of multiple parameters that have to be taken into control. In this thesis we have developed a methodology based upon real-scale controlled experimentation with submission of synthetic or real workload traces
Fakih, Bilal. "Environnement décentralisé et protocole de communication pour le calcul intensif sur grille." Thesis, Toulouse 3, 2018. http://www.theses.fr/2018TOU30179/document.
This thesis aims at designing an environment for the implementation of high performance computing applications on Grid platforms. We are interested in applications like loosely synchronous applications and pleasingly parallel applications. For loosely synchronous applications, we are interested in particular in applications in the domains of numerical simulation that can be solved via parallel or distributed iterative methods, i.e., synchronous, asynchronous and hybrid iterative method; while, for pleasingly parallel applications, we are interested in planning problems. Our thesis work aims at designing the decentralized environment GRIDHPC. GRIDHPC exploits all the computing resources (all the available cores of computing nodes) using OpenMP as well as several types of networks like Ethernet, Infiniband and Myrinet of the grid platform using the reconfigurable multi network protocol RMNP. Note that RMNP can configure itself automatically and dynamically in function of application requirements like schemes of computation, i.e., synchronous or asynchronous iterative schemes, elements of context like network topology and type of network like Ethernet, Infiniband and Myrinet by choosing the best communication mode between computing nodes and the best network. We present and analyze a set of computational results obtained on Grid5000 platform for the obstacle and planning problems
Pasca, Bogdan Mihai. "Calcul flottant haute performance sur circuits reconfigurables." Phd thesis, Ecole normale supérieure de lyon - ENS LYON, 2011. http://tel.archives-ouvertes.fr/tel-00654121.
Didelot, Sylvain. "Improving memory consumption and performance scalability of HPC applications with multi-threaded network communications." Thesis, Versailles-St Quentin en Yvelines, 2014. http://www.theses.fr/2014VERS0029/document.
A recent trend in high performance computing shows a rising number of cores per compute node, while the total amount of memory per compute node remains constant. To scale parallel applications on such large machines, one of the major challenges is to keep a low memory consumption. This thesis develops a multi-threaded communication layer over Infiniband which provides both good performance of communications and a low memory consumption. We target scientific applications parallelized using the MPI standard in pure mode or combined with a shared memory programming model. Starting with the observation that network endpoints and communication buffers are critical for the scalability of MPI runtimes, the first contribution proposes three approaches to control their usage. We introduce a scalable and fully-connected virtual topology for connection-oriented high-speed networks. In the context of multirail configurations, we then detail a runtime technique which reduces the number of network connections. We finally present a protocol for dynamically resizing network buffers over the RDMA technology. The second contribution proposes a runtime optimization to enforce the overlap potential of MPI communications, showing a 2x improvement factor on communications. The third contribution evaluates the performance of several MPI runtimes running a seismic modeling application in a hybrid context. On large compute nodes up to 128 cores, the introduction of OpenMP in the MPI application saves up to 17 % of memory. Moreover, we show a performance improvement with our multi-threaded communication layer where the OpenMP threads concurrently participate to the MPI communications
Bachmann, Etienne. "Imagerie ultrasonore 2D et 3D sur GPU : application au temps réel et à l'inversion de forme d'onde complète." Thesis, Toulouse 3, 2016. http://www.theses.fr/2016TOU30133/document.
If the most important progresses in ultrasound imaging have been closely linked to the instrumentation's quality, the advent of computing science revolutionized this discipline by introducing growing possibilities in data processing to obtain a better picture. In addition, GPUs, which are the main components of the graphics cards deliver thanks to their architecture a significantly higher processing speed compared with processors, and also for scientific calculation purpose. The goal of this work is to take the best benefit of this new computing tool, by aiming two complementary applications. The first one is to enable real-time imaging with a better quality than other sonographic imaging techniques, thanks to the parallelization of the FTIM (Fast Tpological IMaging) imaging process. The second one is to introduce quantitative imaging and more particularly reconstructing the wavespeed map of an unknown medium, using Full Waveform Inversion
Dirand, Estelle. "Développement d'un système in situ à base de tâches pour un code de dynamique moléculaire classique adapté aux machines exaflopiques." Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAM065/document.
The exascale era will widen the gap between data generation rate and the time to manage their output and analysis in a post-processing way, dramatically increasing the end-to-end time to scientific discovery and calling for a shift toward new data processing methods. The in situ paradigm proposes to analyze data while still resident in the supercomputer memory to reduce the need for data storage. Several techniques already exist, by executing simulation and analytics on the same nodes (in situ), by using dedicated nodes (in transit) or by combining the two approaches (hybrid). Most of the in situ techniques target simulations that are not able to fully benefit from the ever growing number of cores per processor but they are not designed for the emerging manycore processors.Task-based programming models on the other side are expected to become a standard for these architectures but few task-based in situ techniques have been developed so far. This thesis proposes to study the design and integration of a novel task-based in situ framework inside a task-based molecular dynamics code designed for exascale supercomputers. We take benefit from the composability properties of the task-based programming model to implement the TINS hybrid framework. Analytics workflows are expressed as graphs of tasks that can in turn generate children tasks to be executed in transit or interleaved with simulation tasks in situ. The in situ execution is performed thanks to an innovative dynamic helper core strategy that uses the work stealing concept to finely interleave simulation and analytics tasks inside a compute node with a low overhead on the simulation execution time.TINS uses the Intel® TBB work stealing scheduler and is integrated into ExaStamp, a task-based molecular dynamics code. Various experiments have shown that TINS is up to 40% faster than state-of-the-art in situ libraries. Molecular dynamics simulations of up to 2 billions particles on up to 14,336 cores have shown that TINS is able to execute complex analytics workflows at a high frequency with an overhead smaller than 10%
Ferreira, Leite Alessandro. "A user-centered and autonomic multi-cloud architecture for high performance computing applications." Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112355/document.
Cloud computing has been seen as an option to execute high performance computing (HPC) applications. While traditional HPC platforms such as grid and supercomputers offer a stable environment in terms of failures, performance, and number of resources, cloud computing offers on-Demand resources generally with unpredictable performance at low financial cost. Furthermore, in cloud environment, failures are part of its normal operation. To overcome the limits of a single cloud, clouds can be combined, forming a cloud federation often with minimal additional costs for the users. A cloud federation can help both cloud providers and cloud users to achieve their goals such as to reduce the execution time, to achieve minimum cost, to increase availability, to reduce power consumption, among others. Hence, cloud federation can be an elegant solution to avoid over provisioning, thus reducing the operational costs in an average load situation, and removing resources that would otherwise remain idle and wasting power consumption, for instance. However, cloud federation increases the range of resources available for the users. As a result, cloud or system administration skills may be demanded from the users, as well as a considerable time to learn about the available options. In this context, some questions arise such as: (a) which cloud resource is appropriate for a given application? (b) how can the users execute their HPC applications with acceptable performance and financial costs, without needing to re-Engineer the applications to fit clouds' constraints? (c) how can non-Cloud specialists maximize the features of the clouds, without being tied to a cloud provider? and (d) how can the cloud providers use the federation to reduce power consumption of the clouds, while still being able to give service-Level agreement (SLA) guarantees to the users? Motivated by these questions, this thesis presents a SLA-Aware application consolidation solution for cloud federation. Using a multi-Agent system (MAS) to negotiate virtual machine (VM) migrations between the clouds, simulation results show that our approach could reduce up to 46% of the power consumption, while trying to meet performance requirements. Using the federation, we developed and evaluated an approach to execute a huge bioinformatics application at zero-Cost. Moreover, we could decrease the execution time in 22.55% over the best single cloud execution. In addition, this thesis presents a cloud architecture called Excalibur to auto-Scale cloud-Unaware application. Executing a genomics workflow, Excalibur could seamlessly scale the applications up to 11 virtual machines, reducing the execution time by 63% and the cost by 84% when compared to a user's configuration. Finally, this thesis presents a product line engineering (PLE) process to handle the variabilities of infrastructure-As-A-Service (IaaS) clouds, and an autonomic multi-Cloud architecture that uses this process to configure and to deal with failures autonomously. The PLE process uses extended feature model (EFM) with attributes to describe the resources and to select them based on users' objectives. Experiments realized with two different cloud providers show that using the proposed model, the users could execute their application in a cloud federation environment, without needing to know the variabilities and constraints of the clouds
Dao, Van Toan. "Calcul à haute performance et simulations stochastiques : Etude de la reproductibiité numérique sur architectures multicore et manycore." Thesis, Université Clermont Auvergne (2017-2020), 2017. http://www.theses.fr/2017CLFAC005/document.
The reproducibility of numerical experiments on high performance computing systems is sometimes overlooked. Moreover, the numerical methods used for rigorous parallelization of stochastic simulations are often unknown. Indeed, the results obtained for a stochastic simulation using high performance computing systems can be different from run to run with the same parameters and the same execution contexts due to the impact of new architectures, accelerators, compilers, operating systems or a changing of the order of execution of the floating arithmetic operations within the micro-processors for parallelizing optimizations. In the case of non-repeatability of numerical experiments, how can we seriously develop a scientific application? What credit can be given to the parallel software thus developed? In this thesis, we synthesize the main causes of non-reproducibility for a parallel stochastic simulation using high performance computing systems. Unlike the usual parallelism works, we do not focus on improving performance, but on obtaining numerically repeatable results from one experiment to another. We present the reproducibility and its contributions to the science of experimental and numerical computing. Furthermore, we propose some contributions, in particular: to verify the reproducibility and portability of top modern pseudo-random number generators, to detect the correlation between parallel streams issued from such generators, to repeat and reproduce the numerical results of independent parallel stochastic simulations
Qu, Long. "Méthodes de préconditionnement pour la résolution de systèmes linéaires sur des machines massivement parallèles." Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112053.
This thesis addresses a new class of preconditioners which aims at accelerating solving large sparse systems arising in scientific and engineering problem by using preconditioned iterative methods. To apply these preconditioners, the input matrix needs to be reordered with K-way nested dissection. We also introduce an overlapping technique that adapts the idea of overlapping subdomains from domain decomposition methods to nested dissection based methods to improve the convergence of these preconditioners. Results show that such overlapping technique improves the convergence rate of Nested SSOR (NSSOR) and Nested Modified Incomplete LU with Rowsum property (NMILUR) precondtioners that we worked on. We also present the data distribution and parallel algorithms for implementing these preconditioners. Results show that on a 400x400x400 regular grid, the number of iterations with Nested Filtering Factorization preconditioner (NFF) increases slightly while increasing the number of subdomains up to 2048. In terms of runtime performance on Curie supercomputer, it scales up to 2048 cores and it is 2.6 times faster than the domain decomposition preconditioner Restricted Additive Schwarz (RAS) as implemented in PETSc
Gama, Pinheiro Vinicius. "The management of multiple submissions in parallel systems : the fair scheduling approach." Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENM042/document.
We study the problem of scheduling in parallel and distributedsystems with multiple users. New platforms for parallel and distributedcomputing offers very large power which allows to contemplate the resolution ofcomplex interactive applications. Nowadays, it is still difficult to use thispower efficiently due to lack of resource management tools. The work done inthis thesis lies in this context: to analyse and develop efficient algorithmsfor manage computing resources shared among multiple users. We analyzescenarios with many submissions issued from multiple users over time. Thesesubmissions contain one or more jobs and the set of submissions are organizedin successive campaigns. Any job from a campaign can not start until allthe jobs from the previous campaign are completed. Each user is interested inminimizing the sum of flow times of the campaigns.In the first part of this work, we define a theoretical model for Campaign Scheduling under restrictive assumptions andwe show that, in the general case, it is NP-hard. For the single-user case, we show that an$ho$-approximation scheduling algorithm for the (classic) parallel jobscheduling problem is also an $ho$-approximation for the Campaign Schedulingproblem. For the general case with $k$ users, we establish a fairness criteriainspired by time sharing. Then, we propose FairCamp, a scheduling algorithm whichuses campaign deadlines to achieve fairness among users between consecutivecampaigns. We prove that FairCamp increases the flow time of each user by afactor of at most $kho$ compared with a machine dedicated to the user. Wealso prove that FairCamp is an $ho$-approximation algorithm for the maximumstretch.We compare FairCamp to {em First-Come-First-Served} (FCFS) by simulation. We showthat, compared with FCFS, FairCamp reduces the maximum stretch by up to $3.4$times. The difference is significant in systems used by many ($k>5$) users.Our results show that, rather than just individual, independent jobs, campaignsof jobs can be handled by the scheduler efficiently and fairly
Bahi, Mouad. "High Performance by Exploiting Information Locality through Reverse Computing." Phd thesis, Université Paris Sud - Paris XI, 2011. http://tel.archives-ouvertes.fr/tel-00768574.
Egele, Romain. "Optimization of Learning Workflows at Large Scale on High-Performance Computing Systems." Electronic Thesis or Diss., université Paris-Saclay, 2024. http://www.theses.fr/2024UPASG025.
In the past decade, machine learning has experienced exponential growth, propelled by abundant datasets, algorithmic advancements, and increased computational power. Simultaneously, high-performance computing (HPC) has evolved to meet rising computational demands, offering resources to tackle complex scientific challenges.However, machine learning is often a sequential process, making it difficult to scale on HPC systems. Machine learning workflows are built from modules offering numerous configurable parameters, from data augmentation policies to training procedures and model architectures. This thesis focuses on the hyperparameter optimization of learning workflows on large-scale HPC systems, such as the Polaris at the Argonne Leadership Computing Facility.Key contributions include (1) asynchronous decentralized parallel Bayesian optimization, (2) extension to multi-objective, (3) integration of early discarding, and (4) uncertainty quantification of deep neural networks. Furthermore, an open-source software, DeepHyper, is provided, encapsulating the proposed algorithms to facilitate research and application. The thesis highlights the importance of scalable Bayesian optimization methods for the hyperparameter optimization of learning workflows, which is crucial for effectively harnessing the vast computational resources of modern HPC systems
Mena, morales Valentin. "Approche de conception haut-niveau pour l'accélération matérielle de calcul haute performance en finance." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2017. http://www.theses.fr/2017IMTA0018/document.
The need for resources in High Performance Computing (HPC) is generally met by scaling up server farms, to the detriment of the energy consumption of such a solution. Accelerating HPC application on heterogeneous platforms, such as FPGAs or GPUs, offers a better architectural compromise as they can reduce the energy consumption of a deployed system. Therefore, a change of programming paradigm is needed to support this heterogeneous acceleration, which trickles down to an increased level of programming complexity tackled by software experts. This is most notably the case for developers in quantitative finance. Applications in this field are constantly evolving and increasing in complexity to stay competitive and comply with legislative changes. This puts even more pressure on the programmability of acceleration solutions. In this context, the use of high-level development and design flows, such as High-Level Synthesis (HLS) for programming FPGAs, is not enough. A domain-specific approach can help to reach performance requirements, without impairing the programmability of accelerated applications.We propose in this thesis a high-level design approach that relies on OpenCL, as a heterogeneous programming standard. More precisely, a recent implementation of OpenCL for Altera FPGA is used. In this context, four main contributions are proposed in this thesis: (1) an initial study of the integration of hardware computing cores to a software library for quantitative finance (QuantLib), (2) an exploration of different architectures and their respective performances, as well as the design of a dedicated architecture for the pricing of American options and their implied volatility, based on a high-level design flow, (3) a detailed characterization of an Altera OpenCL platform, from elemental operators, memory accesses, control overlays, and up to the communication links it is made of, (4) a proposed compilation flow that is specific to the quantitative finance domain, and relying on the aforementioned characterization and on the description of the considered financial applications (option pricing)
He, Guanlin. "Parallel algorithms for clustering large datasets on CPU-GPU heterogeneous architectures." Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG062.
Clustering, which aims at achieving natural groupings of data, is a fundamental and challenging task in machine learning and data mining. Numerous clustering methods have been proposed in the past, among which k-means is one of the most famous and commonly used methods due to its simplicity and efficiency.Spectral clustering is a more recent approach that usually achieves higher clustering quality than k-means. However, classical algorithms of spectral clustering suffer from a lack of scalability due to their high complexities in terms of number of operations and memory space requirements. This scalability challenge can be addressed by applying approximation methods or by employing parallel and distributed computing.The objective of this thesis is to accelerate spectral clustering and make it scalable to large datasets by combining representatives-based approximation with parallel computing on CPU-GPU platforms. Considering different scenarios, we propose several parallel processing chains for large-scale spectral clustering. We design optimized parallel algorithms and implementations for each module of the proposed chains: parallel k-means on CPU and GPU, parallel spectral clustering on GPU using sparse storage format, parallel filtering of data noise on GPU, etc. Our various experiments reach high performance and validate the scalability of each module and the complete chains
Chakode, Noumowe Rodrigue. "Environnement d'exécution pour des services de calcul à la demande sur des grappes mutualisées." Thesis, Grenoble, 2012. http://www.theses.fr/2012GRENM035/document.
This thesis studies resource management for on-demand computing services through a shared cluster. In such a context, the aim was to propose tools to enable allocating resources automatically for executing on-demand user requests, to enable sharing resources proportionally among those services, while maximizing their use. Funded by the Minalogic global business cluster through the Ciloe Project (http://ciloe.minalogic.net), this work targets on organizations such as SMB, which are not able to support the charge of purchasing and maintaining a dedicated computing infrastructure. Firstly, we have achieved a deep survey in the areas of on-demand computing and high performance computing. From this survey, we have defined a virtualized architecture to enable dynamic execution of user requests thanks to a special resource manager. Finally, we have proposed policies and algorithms which are so flexible to offer a suitable tradeoff between equity and resource use. Having worked in a context of industrial collaboration, we have developed a prototype of our proposal as a proof of concept. Based on open standards, this prototype relies on existing virtualization tools such as OpenNebula for allocating and manipulating virtual machines over the cluster's nodes. From this prototype along with various workloads, we have carried out experiments to evaluate our architecture and scheduling algorithms. Results have shown that our contributions allow to achieve the expected goals while being reliable and efficient
Chapuis, Guillaume. "Exploiting parallel features of modern computer architectures in bioinformatics : applications to genetics, structure comparison and large graph analysis." Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2013. http://tel.archives-ouvertes.fr/tel-01012222.
Brunet, Elisabeth. "Une approche dynamique pour l'optimisation des communications concurrentes sur réseaux hautes performance." Thesis, Bordeaux 1, 2008. http://www.theses.fr/2008BOR13721/document.
The aim of this thesis is to optimize the communications of high performance applications, in the context of clusters computing. Given the massive use of multicore architectures, it is now crucial to handle a large number of concurrent communication flows. We highlighted and analyzed the shortcomings of existing solutions. We therefore designed a new way to schedule communication flows by focusing on the activity of the network cards. Its novelty consists in untying the activity of applications from that of the network cards. Our model takes advantage of the delay that exists between the deposal of the communication requests and the moment when the network cards become idle in order to apply some opportunistic optimizations. NewMadeleine implements this model, thus making possible to exploit last generation high speed networks. The approach of NewMadeleine is not only validated by synthetical tests but also by real applications
Gueunet, Charles. "Calcul haute performance pour l'analyse topologique de données par ensembles de niveaux." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS120.
Topological Data Analysis requires efficient algorithms to deal with the continuously increasing size and level of details of data sets. In this manuscript, we focus on three fundamental topological abstractions based on level sets: merge trees, contour trees and Reeb graphs. We propose three new efficient parallel algorithms for the computation of these abstractions on multi-core shared memory workstations. The first algorithm developed in the context of this thesis is based on multi-thread parallelism for the contour tree computation. A second algorithm revisits the reference sequential algorithm to compute this abstraction and is based on local propagations expressible as parallel tasks. This new algorithm is in practice twice faster in sequential than the reference algorithm designed in 2000 and offers one order of magnitude speedups in parallel. A last algorithm also relying on task-based local propagations is presented, computing a more generic abstraction: the Reeb graph. Contrary to concurrent approaches, these methods provide the augmented version of these structures, hence enabling the full extend of level-set based analysis. Algorithms presented in this manuscript result today in the fastest implementations available to compute these abstractions. This work has been integrated into the open-source platform: the Topology Toolkit (TTK)
Plewa, Joseph-Marie. "Simulation 3D d'une décharge couronne pointe-plan, dans l'air : calcul haute performance, algorithmes de résolution de l'équation de Poisson et analyses physiques." Thesis, Toulouse 3, 2017. http://www.theses.fr/2017TOU30184/document.
This work is devoted to the three dimensional (3D) simulation of streamer corona discharges in air at atmospheric pressure using high-performance parallel computing. When a pulsed high-voltage is applied between a tip and a plane in air, the strong electric field lines constricted around the tip induce the simultaneous propagation of several streamers leading to a corona discharge with a tree structure. Only a true 3D electro-hydrodynamics simulation is able to reproduce this branching and to provide the orders of magnitude of the local deposited energy and the concentration of the species created during the discharge phase. However, such a 3D simulation which requires large computational memory and huge time calculation is nowadays accessible only when performed with massively parallel computation. In the field of 3D electro-hydrodynamics simulations, a special attention must be paid to the efficiency of solvers in solving 3D elliptic equations because their contribution can exceed 80% of the global computation time. Therefore, a specific chapter is devoted to test the performance of iterative and direct methods (such as SOR R&B, BiCGSTAB and MUMPS) in solving elliptic equations, using the massively parallel computation and the MPI library. The calculations are performed on the supercomputer EOS of the CALMIP network, with a number of computing cores and meshes increasing up to respectively 1800 and 8003 (i.e. more than 1/2 Billion meshes). The performances are compared for the calculation of the geometric potential and in a dynamic simulation conditions consisting in the propagation of an analytical space charge density characteristic of the streamers. To perform a complete 3D simulation of the streamer discharge, must also involve a robust algorithm able to solve the coupled conservation equations of the charged particle density with very sharp gradients characteristic of the streamers. In this manuscript, the MUSCL algorithm is tested under different propagation conditions of a cubic density (with uniform or non-uniform velocity field). The 3D code, designed to solve the complete electro-hydrodynamics model of the discharge (coupling the conservation equations, the Poisson equation and the chemical kinetics) is validated by comparing the 3D and 2D results in a simulation conditions presenting a rotational symmetry around the propagation axis of a mono-filamentary streamer. Finally, the first results of the 3D simulations of the discharge phase with the propagation of one or several asymmetric streamers are presented and analyzed. These simulations allow to follow the tree structure of a corona discharge when a pulsed voltage is applied between a tip and a plane. The ignition of the tree structure is studied as a function of the initial position of the plasma spots