Dissertations / Theses on the topic 'Programmation (informatique) – Performances'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Programmation (informatique) – Performances.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Merlin, Armelle. "Modéles opérationnels communicants : performances et algèbres de chemins." Orléans, 2004. http://www.theses.fr/2004ORLE2052.
Full textPérache, Marc. "Contribution à l'élaboration d'environnements de programmation dédiés au calcul scientifique hautes performances." Bordeaux 1, 2006. http://www.theses.fr/2006BOR13238.
Full textBourgoin, Mathias. "Abstractions performantes pour cartes graphiques." Paris 6, 2013. http://www.theses.fr/2013PA066627.
Full textGraphics Processing Units (GPUs) are complex devices with manycomputation units. Dedicated to display management and 3D processing,they are very efficient, but also highly specialized. Since recentyears, it is possible to divert their use to enable them to performgeneral computations normally performed by the CPU of thecomputer. This programming model, GPGPU (General Purpose GPU)programming is mainly based on two frameworks : Cuda and OpenCL. Bothare very low-level and demands explicit management of hardwareparameters such as the memory or the placement of computations on thevarious computation units. The goal of this thesis is the study ofsolutions of higher level of abstraction for GPGPU programming, inorder to make it more accessible and safer. After anintroduction to the context of GPGPU programming, we presnet twoprogramming languages dedicated to GPGPU programming, SPML andSarek. Through their operationnal semantics, we discuss theirproperties and the guarantees they offer. Then, we present animplementation of these languages with OCaml through the SPOC libraryand the domain specific language, Sarek. Performance tests show thatour solution achieves a high level of performance for simple examples,as well as with the translation of a realistic numerical applicationfrom Fortran and Cuda, to OCaml. We also show how our solutions allowto define algorithmic skeletons that offer more abstractions. Throughan example, we present how these skeletons eases GPGPU programming andoffers additional automatic optimizations. Finally we discuss how thecurrent hardware and software evolution can help providing a unifiedsolution for GPGPU programming
Aouad, Lamine Petiton Serge. "Contribution à l'algorithmique matricielle et évaluation de performances sur les grilles de calcul, vers un modèle de programmation à grande échelle." Villeneuve d'Ascq : Université des sciences et technologies de Lille, 2007. https://iris.univ-lille1.fr/dspace/handle/1908/199.
Full textN° d'ordre (Lille 1) : 3775. Résumé en français et en anglais. Titre provenant de la page de titre du document numérisé. Bibliogr. p. [121]-133.
Clet-Ortega, Jérôme. "Exploitation efficace des architectures parallèles de type grappes de NUMA à l’aide de modèles hybrides de programmation." Thesis, Bordeaux 1, 2012. http://www.theses.fr/2012BOR14514/document.
Full textModern computing servers usually consist in clusters of computers with several multi-core CPUs featuring a highly hierarchical hardware design. The major challenge of the programming models implementations is to efficiently take benefit from these servers. Combining two type of models, like MPI and OpenMP, is a current trend to reach this point. However these programming models haven't been designed to work together and that leads to performance issues. In this thesis, we propose to assist the programmer who develop hybrid applications. We lean on an analysis of the computing system architecture in order to set the number of processes and threads. Rather than a classical hybrid approach, that is to say creating one multithreaded MPI process per node, we automatically evaluate alternative solutions, with several multithreaded processes per node, better fitted to modern computing systems
Aouad, Lamine. "Contribution à l'algorithmique matricielle et évaluation de performances sur les grilles de calcul, vers un modèle de programmation à grande échelle." Lille 1, 2005. https://pepite-depot.univ-lille.fr/LIBRE/Th_Num/2005/50376-2005-Aouad.pdf.
Full textRochange, Christine. "Evaluation des performances d'architecture multiprocesseurs à mémoire logiquement partagée." Toulouse 3, 1993. http://www.theses.fr/1993TOU30215.
Full textBenslimane, Djamal. "Etudes de l'apport des techniques de parallélisme dans l'amélioration des performances des systèmes à base de règles de production." Clermont-Ferrand 2, 1990. http://www.theses.fr/1990CLF21287.
Full textAline, Michel. "Evaluation et optimisation de performances en délai en technologie CMOS submicronique." Montpellier 2, 2001. http://www.theses.fr/2001MON20075.
Full textVienne, Jérôme. "Prédiction de performances d'applications de calcul haute performance sur réseau Infiniband." Phd thesis, Grenoble, 2010. http://www.theses.fr/2010GRENM043.
Full textManufacturers of computer clusters require tools to assist them in making better decisions in terms of architectural design. To address this need, in this thesis work, we focus on the specific issues of estimating computation times and InfiniBand network congestion. These two problems are often dealt with globally. However, an overall approach does not explain the reasons of performance loss related to architectural choices. So our approach was to conduct a more detailed study. In this thesis work, we focus on the following : 1) the estimation of computation time in a Grid, and 2) the estimation of communication times over Infiniband networks. To evaluate the computation time, the proposed approach is based on a static or semi-static analysis of the source code, by cutting it into blocks, before making a micro-benchmarking of these blocks on the targeted architecture. To estimate the communication time, a model of bandwidth sharing for Infiniband networks has been developed, allowing one to predict the impact related to concurrent communications. This model was then incorporated into a simulator to be validated on a set of synthetic communication graphs and on the application Socorro
Geneves, Sylvain. "Etude de performances sur processeurs multicoeur : environnement d'exécution événementiel efficace et étude comparative de modèles de programmation." Phd thesis, Université de Grenoble, 2013. http://tel.archives-ouvertes.fr/tel-00842012.
Full textGobert, Daniel. "Incidence des activités de programmation en logo sur les performances en géométrie au cycle moyen et en sixième." Paris 7, 1991. http://www.theses.fr/1991PA077038.
Full textRiyanto. "Simulation, optimisation, et analyse de performances pour les systèmes industriels d'acquisition d'images." Toulouse, INPT, 1997. http://www.theses.fr/1997INPT107H.
Full textZeaiter, Diana. "Prédiction de l'insatisfaction des utilisateurs liée aux performances des applications de l'Internet." Paris 6, 2012. http://www.theses.fr/2012PA066683.
Full textNetwork disruptions can adversely impact a user's web browsing, cause video and audio interruptions, or render web sites and services unreachable. Such problems are frustrating to Internet users, who are oblivious to the underlying problems, but completely exposed to the service degradations. This thesis develops a methodology to automatically predict user dissatisfaction with network application performance. We follow an empirical approach. We design HostView to collect network performance data annotated with user feedback at the end-hosts. Our first contribution is to present the results of a survey we did with 400 computer scientists to collect their perspectives on privacy issues and willingness to provide feedback. Guided by the survey results, we implement a first prototype of HostView to evaluate the CPU overhead of candidate techniques to collect network performance data. Then, we implement a second prototype of HostView to tune our algorithm for collecting user feedback to minimize the user annoyance. We recruit users in a large-scale release of HostView. Our user population connects from different networking environments (e. G. , work, home, or coffee shop). Thus, we investigate if the network performance depends on the networking environment. Our third contribution is to show that for most users RTTs and download data rates are significantly different across networking environments. The mix of application determines data rates but it is the environment that determines RTTs. Finally, our fourth contribution is to develop predictors of user dissatisfaction with network application performance. Our predictors consistently achieve true positive rates above 0. 9
Vienne, Jérôme. "Prédiction de performances d'applications de calcul haute performance sur réseau Infiniband." Phd thesis, Université de Grenoble, 2010. http://tel.archives-ouvertes.fr/tel-00728156.
Full textDrebes, Andi. "Dynamic optimization of data-flow task-parallel applications for large-scale NUMA systems." Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066330/document.
Full textWithin the last decade, microprocessor development reached a point at which higher clock rates and more complex micro-architectures became less energy-efficient, such that power consumption and energy density were pushed beyond reasonable limits. As a consequence, the industry has shifted to more energy efficient multi-core designs, integrating multiple processing units (cores) on a single chip. The number of cores is expected to grow exponentially and future systems are expected to integrate thousands of processing units. In order to provide sufficient memory bandwidth in these systems, main memory is physically distributed over multiple memory controllers with non-uniform access to memory (NUMA). Past research has identified programming models based on fine-grained, dependent tasks as a key technique to unleash the parallel processing power of massively parallel general-purpose computing architectures. However, the execution of task-paralel programs on architectures with non-uniform memory access and the dynamic optimizations to mitigate NUMA effects have received only little interest. In this thesis, we explore the main factors on performance and data locality of task-parallel programs and propose a set of transparent, portable and fully automatic on-line mapping mechanisms for tasks to cores and data to memory controllers in order to improve data locality and performance. Placement decisions are based on information about point-to-point data dependences, readily available in the run-time systems of modern task-parallel programming frameworks. The experimental evaluation of these techniques is conducted on our implementation in the run-time of the OpenStream language and a set of high-performance scientific benchmarks. Finally, we designed and implemented Aftermath, a tool for performance analysis and debugging of task-parallel applications and run-times
Saidani, Tarik. "Optimisation multi-niveau d’une application de traitement d’images sur machines parallèles." Thesis, Paris 11, 2012. http://www.theses.fr/2012PA112268/document.
Full textThis thesis aims to define a design methodology for high performance applications on future embedded processors. These architectures require an efficient usage of their different level of parallelism (fine-grain, coarse-grain), and a good handling of the inter-processor communications and memory accesses. In order to study this methodology, we have used a target processor which represents this type of emerging architectures, the Cell BE processor.We have also chosen a low level image processing application, the Harris points of interest detector, which is representative of a typical low level image processing application that is highly parallel. We have studied several parallelisation schemes of this application and we could establish different optimisation techniques by adapting the software to the specific SIMD units of the Cell processor. We have also developped a library named CELL MPI that allows efficient communication and synchronisation over the processing elements, using a simplified and implicit programming interface. This work allowed us to develop a methodology that simplifies the design of a parallel algorithm on the Cell processor.We have designed a parallel programming tool named SKELL BE which is based on algorithmic skeletons. This programming model providesan original solution of a meta-programming based code generator. Using SKELL BE, we can obtain very high performances applications that uses the Cell architecture efficiently when compared to other tools that exist on the market
Coquereau, Albin. "[ErgoFast] Amélioration de performances du solveur SMT Alt-Ergo grâce à l’intégration d’un solveur SAT efficace." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLY007.
Full textThe automatic SMT (Satisfiability Modulo Theories) solvers are more and more used in the industry and in the academic world. The reason of this success is connected on to the expressiveness of the languages of entrance of these solvers (first order logic with predefined theories), and on their increasing efficiency. The speed of SMT solvers is mainly connected to the decision-making procedures which they implement (SAT solvers, Simplex, etc.). The data structures used and the memory management mechanisms have an immediate impact on the performances. Also, the programming language and the available optimizations of code in the compiler are very important. In the team VALS of the LRI, we develop the SMT solver Alt-Ergo. This tool is programmed with the language OCaml and it is mainly used to prove logical formulas from proof of program workshops as Why3, Spark, Frama-C or the B workshop. His direct competitors are z3 (Microsoft), CVC4 (Univ. New York and Iowa) and yices2 ( SRI). In spite of our efforts in the design and the optimization of the implanted decision-making procedures, it appears that Alt-Ergo is slower than his competitors on certain benchmarks. The reasons are multiple. We identified three important causes. - The first one seems to be connected to the data structures used in the solver. For safety reason, the largest part of Alt-Ergo is developed in a purely functional style of programming with persistent structures. But, the efficiency of these structures is generally worse than imperative structures. - The second seems to be connected to the memory management by the Garbage Collector of the language OCaml, which, compared with a manual management, engenders numerous movements of memory blocks and probably too many cache miss. The difference between cache memory access and RAM access being of the order of 150 clock cycles, the maximal use of the cache memory is very important for the performances. - Finally, the third seems to be connected to the lack of optimizations of the OCaml compiler. Indeed, we noticed that the gap from performance between Alt-Ergo and some of his competitors (written mainly in C or C ++) was strongly reduced when we recompiled them by lowering the compiler optimization level
Pilla, Laércio L. "Équilibrage de charge prenant en compte la topologie des plates-formes de calcul parallèle pour la portabilité des performances." Phd thesis, Université de Grenoble, 2014. http://tel.archives-ouvertes.fr/tel-00981136.
Full textGamatié, Abdoulaye. "Modélisation polychrone et évaluation de systèmes temps réel." Phd thesis, Université Rennes 1, 2004. http://tel.archives-ouvertes.fr/tel-00879359.
Full textMartin, Alexis. "Infrastructure pour la gestion générique et optimisée des traces d’exécution pour les systèmes embarqués." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAM001/document.
Full textValidation process is a critical aspect of systems development. This process is a major concern for embedded systems, to assess their autonomous behavior, led by technical and physical constraints. The growth of embedded systems complexity during last years prevents the use of complex and costly development processes such as formal methods. Thus, post-conception validations must be applied. Execution traces are effective for validation and understanding as they allow the capture of systems behavior during their executions. However, trace analysis tools face two major challenges. First, the management of huge execution traces. Second, the ability to retrieve relevant metrics, from the low-level information the trace contains. This thesis was done as part of the SoC-TRACE projet, and presents three contributions. Our first contribution is a definition of a generic execution trace format that expresses semantics. Our second contribution is a workflow-based infrastructure for generic and automatic trace analysis. This infrastructure addresses the problem of huge traces management using streaming mechanisms. It allows modular and configurable analyses, as well as automatic analyses execution. Our third contribution is about the definition of a generic performance analyses for Linux systems. This contribution provides methods and tools for trace recording, and also analysis workflow to obtain unified performance profiles. We validate our contributions on traces from use cases given by STMicroelectronics, partner of the project, and also on traces recorded from benchmarks executions. Our trace format with semantics allowed us to automatically bring out execution problems. Using streaming mechanisms, we have been able to analyze traces that can reach several hundreds of gigabytes. Our generic analysis method for systems let us to automatically highlight, without any prior knowledge, internal behavior of benchmark programs. Our generic solutions point out a similar execution behavior of benchmarks on different machines and architectures, and showed their impact on the execution
Jurkowiak, Bernard. "Programmation haute performance pour la résolution des problèmes SAT et CSP." Amiens, 2004. http://www.theses.fr/2004AMIE0410.
Full textHabel, Rachid. "Programmation haute performance pour architectures hybrides." Thesis, Paris, ENMP, 2014. http://www.theses.fr/2014ENMP0025/document.
Full textClusters of multicore/GPU nodes connected with a fast network offer very high therotical peak performances, reaching tens of TeraFlops. Unfortunately, the efficient programing of such architectures remains challenging because of their complexity and the diversity of the existing programming models. The purpose of this thesis is to improve the programmability of dense scientific applications on hybrid architectures in three ways: reducing the execution times, processing larger data sets and reducing the programming effort. We propose DSTEP, a directive-based programming model expressing both data and computation distribution. A large set of distribution types are unified in a "dstep distribute" directive and the replication of some distributed elements can be expressed using a "halo". The "dstep gridify" directive expresses both the computation distribution and the schedule constraints of loop iterations. We define a distribution model and demonstrate the correctness of the code transformation from the sequential domain to the parallel domain. From the distribution model, we derive a generic compilation scheme transforming DSTEP annotated input programs into parallel hybrid ones. We have implemented such a tool as a compiler integrated to the PIPS compilation workbench together with a library offering the runtime functionality, especially the communication. Our solution is validated on scientific programs from the NAS Parallel Benchmarks and the PolyBenchs as well as on an industrial signal procesing application
Henry, Sylvain. "Modèles de programmation et supports exécutifs pour architectures hétérogènes." Phd thesis, Université Sciences et Technologies - Bordeaux I, 2013. http://tel.archives-ouvertes.fr/tel-00948309.
Full textAbdelfeteh, Sadok. "Formulation de matériaux de construction à base de sous-produits industriels avec des méthodes issues de l’intelligence artificielle." Thesis, Lille 1, 2016. http://www.theses.fr/2016LIL10077/document.
Full textThe environmental issue has become a major concern for governments and industry. Effective waste management is part of the priority actions in order to achieve a green circular economy. This efficient management imposes first maximum recovery of waste, knowing the large tonnage produced is different sectors. The field of civil engineering is particularly concerned. The valorization of alternative materials in the field of civil engineering has grown significantly in recent years. However, this practice knows some limitations, including the lack of clear regulations and especially the lack of tools and methods suitable for design of materials including alternatives materials. In this context, the present work focuses on the development of mix design method of building materials based on industrial by-products. This hybrid method combines the Genetic Algorithms (GA) as multi-objective optimization tools and Genetic Programming (GP) in its two versions, classical GP and MGGP (MultiGene Genetic Programming) as modeling tools for complex problems by Machine Learning approach. Specific studies were carried out also or these innovative tools, to demontrates benefits and weaknesses of these tools on these applications in civil engineering. Finally, this method of formulation of building materials based on industrial sub products proposed in this work was tested on two case studies (design of high performance concrete and mortars made of alternative materials) and validated by the laboratory tests. The results are conclusive and promising to generalize the method to other applications of Civil Engineering
Chassin, de Kergommeaux Jacques. "Implémentation et évaluation d'un système logique parallèle." Phd thesis, Grenoble 1, 1989. http://tel.archives-ouvertes.fr/tel-00122736.
Full textLe, Folgoc Lionel. "Personal data server engine : design and performance considerations." Versailles-St Quentin en Yvelines, 2012. http://www.theses.fr/2012VERS0052.
Full textMass-storage Secure Portable Tokens are emerging and provide a real breakthrough in the management of sensitive data. They can embed personal data and metadata referencing documents stored encrypted in the Cloud and can manage them under the holder's control. Efficient embedded database techniques are very challenging to design due to conflicting constraints from NAND Flash and embedded systems. In this thesis, we propose an alternative model relying on two key concepts: serialization and stratification of the complete database. A database fully organized sequentially precludes random writes and their negative side effects on Flash write cost. Then, the global stratification allows us to solve the scalability issue of a serialized design and to maintain acceptable performance, without abandoning the benefits of serialization in terms of random writes. We show the effectiveness of this approach through a comprehensive performance study
Perez-Seva, Jean-Paul. "Les optimisations d'algorithmes de traitement de signal sur les architectures modernes parallèles et embarquées." Phd thesis, Université de Nice Sophia-Antipolis, 2009. http://tel.archives-ouvertes.fr/tel-00610865.
Full textDenis, Alexandre. "Contribution à la conception d'une plate-forme haute performance d'intégration d'exécutufs communicants pour la programmation des grilles de calcul." Rennes 1, 2003. https://tel.archives-ouvertes.fr/tel-00009595.
Full textLe, Louët Guillaume. "Maîtrise énergétique des centres de données virtualisés : D'un scénario de charge à l'optimisation du placement des calculs." Phd thesis, Ecole des Mines de Nantes, 2014. http://tel.archives-ouvertes.fr/tel-01044650.
Full textAbouchi, Nacer. "Analyse et mesure de performance des reseaux de communication par simulation." Lyon, INSA, 1990. http://www.theses.fr/1990ISAL0057.
Full textThe of communication networks either local area or wide area networks can be partially modelized using the existing mathematical methods. The analysis or the wide area network's performances and of the behaviour as a function of adaptive routing techniques is still badly controlled. In the same way for local area networks, it may be usefully to study quantitatively their access random discrete event simulation is a solution which can take all the specifications of a network into consideration without any simplification. In the first part, after the evocation of the system modelling and simulation roles, more particularly of the communication systems, we introduce ·the principal criteria that should be studied in order to choose correctly the tools of modelling and simulation. A comparative study of the mostly used tools is also included. The goal of the second part is to present the simulation models that we designed to represent the communication networks (local or wide). The 3 rd part is dedicated for "OSIRES" the network simulation tool we developed. Our study will be guided by the analysis of different deterministic or adaptive routing algorithms either found in the existing networks or proposed in the literature. In the last part, the local area network access techniques proposed by ISO will be analysed and compared. Finally, we conclude this thesis by the perspective and what could be do more
Khizakanchery, Natarajan Surya Narayanan. "Modeling performance of serial and parallel sections of multi-threaded programs in many-core era." Thesis, Rennes 1, 2015. http://www.theses.fr/2015REN1S015/document.
Full textThis thesis work is done in the general context of the ERC, funded Defying Amdahl's Law (DAL) project which aims at exploring the micro-architectural techniques that will enable high performance on future many-core processors. The project envisions that despite future huge investments in the development of parallel applications and porting it to the parallel architectures, most applications will still exhibit a significant amount of sequential code sections and, hence, we should still focus on improving the performance of the serial sections of the application. In this thesis, the research work primarily focuses on studying the difference between parallel and serial sections of the existing multi-threaded (MT) programs and exploring the design space with respect to the processor core requirement for the serial and parallel sections in future many-core with area-performance tradeoff as a primary goal
Serrano, Manuel. "Vers une compilation portable et performante des langages fonctionnels /." Le Chesnay : INRIA, 1995. http://catalogue.bnf.fr/ark:/12148/cb370188956.
Full textCastañeda, Retiz Martha Rosa. "Étude quantitative des mécanismes d'équilibrage de charge dans les systèmes de programmation pour le calcul parallèle." Grenoble INPG, 1999. https://theses.hal.science/tel-00004815.
Full textThe aim of this thesis is the performance evaluation of the the load-balancing mechanisms. The development of load balancing techniques is necessary in order to obtain an effective use of a parallel architecture. We study the problem of the dynamic scheduling of a parallel application. We start with the analysis of the functionalities of a generic scheduler and its implementation in the Athapascan system. Athapascan is a multithreaded system for parallel applications. The structure of the schedule gives the possibility of different algorithms of load balancing. We propose a methodology for a quantitative evaluation of the different algorithms of load balancing. We built a benchmark of synthetic algorithms with dynamic and random features. We study the combined effects of all the parameters of the schedule and of the synthetic load. A factorial design has been choosen because it permits a global view of the influence of the different parameters. Our benchmark has been carried out on a SP1-IBM computer. We used two different methods to analyze our results: the principal components analyses (ACP) and the multilinear regression. The linear models obtained from the analysis give us the possibility of understanding the behavior of every schedule and the influence of its parameters with respect to the synthetic load
Cojean, Terry. "Programmation des architectures hétérogènes à l'aide de tâches divisibles ou modulables." Thesis, Bordeaux, 2018. http://www.theses.fr/2018BORD0041/document.
Full textHybrid computing platforms equipped with accelerators are now commonplace in high performance computing platforms. Due to this evolution, researchers concentrated their efforts on conceiving tools aiming to ease the programmation of applications able to use all computing units of such machines. The StarPU runtime system developed in the STORM team at INRIA Bordeaux was conceived to be a target for parallel language compilers and specialized libraries (linear algebra, Fourier transforms,...). To provide the portability of codes and performances to applications, StarPU schedules dynamic task graphs efficiently on all heterogeneous computing units of the machine. One of the most difficult aspects when expressing an application into a graph of task is to choose the granularity of the tasks, which typically goes hand in hand with the size of blocs used to partition the problem's data. Small granularity do not allow to efficiently use accelerators such as GPUs which require a small amount of task with massive inner data-parallelism in order to obtain peak performance. Inversely, processors typically exhibit optimal performances with a big amount of tasks possessing smaller granularities. The choice of the task granularity not only depends on the type of computing units on which it will be executed, but in addition it will influence the quantity of parallelism available in the system: too many small tasks may flood the runtime system by introducing overhead, whereas too many small tasks may create a parallelism deficiency. Currently, most approaches rely on finding a compromise granularity of tasks which does not make optimal use of both CPU and accelerator resources. The objective of this thesis is to solve this granularity problem by aggregating resources in order to view them not as many small resources but fewer larger ones collaborating to the execution of the same task. One theoretical machine and scheduling model allowing to represent this process exists since several decades: the parallel tasks. The main contributions of this thesis are to make practical use of this model by implementing a parallel task mechanism inside StarPU and to implement and study parallel task schedulers of the literature. The validation of the model is made by improving the programmation and optimizing the execution of numerical applications on top of modern computing machines
Truong, Phuoc Hoa. "Optimisation des performances de la machine synchrone à réluctance variable : approches par la conception et par la commande." Thesis, Mulhouse, 2016. http://www.theses.fr/2016MULH8861/document.
Full textThe main objective of our work is to develop the methods for performance optimization of the SynRM in terms of the design and control. The first part is devoted to control of the SynRM taking into account the saturation, cross coupling and iron losses. Two strategies control to improve the performances of the machine in steady-state are presented: optimal efficiency control and maximum torque per ampere control. The second part of this work focuses on the control of the non-sinusoidal SynRM to reduce torque ripple. Optimal stator currents were obtained with the objectives: a constant electromagnetic torque and minimum ohmic losses. An original formula was presented in the case where the homopolar current is considered. The torque and speed control based on artificial neural networks are then proposed to obtain optimal currents online in real time. The third part deals with the design optimization of SynRM by finite element method. With JMAG software, the barriers of the rotor SynRM were optimized to maximize the average torque, power factor and efficiency of the machine. Finally, all the approaches based on neural networks have been validated by experimental tests. Moreover, the comparisons with conventional methods demonstrate the validity of the proposed methods
Li, Chong. "Un modèle de transition logico-matérielle pour la simplification de la programmation parallèle." Phd thesis, Université Paris-Est, 2013. http://tel.archives-ouvertes.fr/tel-00952082.
Full textDollinger, Jean-François. "A framework for efficient execution on GPU and CPU+GPU systems." Thesis, Strasbourg, 2015. http://www.theses.fr/2015STRAD019/document.
Full textTechnological limitations faced by the semi-conductor manufacturers in the early 2000's restricted the increase in performance of the sequential computation units. Nowadays, the trend is to increase the number of processor cores per socket and to progressively use the GPU cards for highly parallel computations. Complexity of the recent architectures makes it difficult to statically predict the performance of a program. We describe a reliable and accurate parallel loop nests execution time prediction method on GPUs based on three stages: static code generation, offline profiling, and online prediction. In addition, we present two techniques to fully exploit the computing resources at disposal on a system. The first technique consists in jointly using CPU and GPU for executing a code. In order to achieve higher performance, it is mandatory to consider load balance, in particular by predicting execution time. The runtime uses the profiling results and the scheduler computes the execution times and adjusts the load distributed to the processors. The second technique, puts CPU and GPU in a competition: instances of the considered code are simultaneously executed on CPU and GPU. The winner of the competition notifies its completion to the other instance, implying the termination of the latter
Brodu, Etienne. "Fluxional compiler : Seamless shift from development productivity to performance efficiency, in the case of real-time web applications." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSEI061/document.
Full textMost of the now popular web services started as small projects created by few individuals, and grew exponentially. Internet supports this growth because it extends the reach of our communications world wide, while reducing their latency. During its development, an application must grow exponentially, otherwise the risk is to be outpaced by the competition. In the beginning, it is important to verify quickly that the service can respond to the user needs: Fail fast. Languages like Ruby or Java became popular because they propose a productive approach to iterate quickly on user feedbacks. A web application that correctly responds to user needs can become viral. Eventually, the application needs to be efficient to cope with the traffic increase. But it is difficult for an application to be at once productive and efficient. When the user base becomes too important, it is often required to switch the development approach from productivity to efficiency. No platform conciliates these two objectives, so it implies to rewrite the application into an efficient execution model, such as a pipeline. It is a risk as it is a huge and uncertain amount of work. To avoid this risk, this thesis proposes to maintain the productive representation of an application with the efficient one. Javascript is a productive language with a significant community. It is the execution engine the most deployed, as it is present in every browser, and on some servers as well with Node.js. It is now considered as the main language of the web, ousting Ruby or Java. Moreover, the Javascript event-loop is similar to a pipeline. Both execution models process a stream of requests by chaining independent functions. Though, the event-loop supports the needs in development productivity with its global memory, while the pipeline representation allows an efficient execution by allowing parallelization. This thesis studies the possibility for an equivalence to transform an implementation from one representation to the other. With this equivalence, the development team can follow the two approaches concurrently. It can continuously iterate the development to take advantage of their conflicting objectives. This thesis presents a compiler that allows to identify the pipeline from a Javascript application, and isolate its stages into fluxions. A fluxion is named after the contraction between function and flux. It executes a function for each datum on a stream. Fluxions are independent, and can be moved from one machine to the other, so as to cope with the increasing traffic. The development team can begin with the productivity of the event-loop representation. And with the transformation, it can progressively iterate to reach the efficiency of the pipeline representation
Taïani, François. "Programmation des grands systèmes distribués: quelques mécanismes, abstractions, et outils." Habilitation à diriger des recherches, Université Rennes 1, 2011. http://tel.archives-ouvertes.fr/tel-00643729.
Full textMasliah, Ian. "Méthodes de génération automatique de code appliquées à l’algèbre linéaire numérique dans le calcul haute performance." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLS285/document.
Full textParallelism in today's computer architectures is ubiquitous whether it be in supercomputers, workstations or on portable devices such as smartphones. Exploiting efficiently these systems for a specific application requires a multidisciplinary effort that concerns Domain Specific Languages (DSL), code generation and optimization techniques and application-specific numerical algorithms. In this PhD thesis, we present a method of high level programming that takes into account the features of heterogenous architectures and the properties of matrices to build a generic dense linear algebra solver. Our programming model supports both implicit or explicit data transfers to and from General-Purpose Graphics Processing Units (GPGPU) and Integrated Graphic Processors (IGPs). As GPUs have become an asset in high performance computing, incorporating their use in general solvers is an important issue. Recent architectures such as IGPs also require further knowledge to program them efficiently. Our methodology aims at simplifying the development on parallel architectures through the use of high level programming techniques. As an example, we developed a least-squares solver based on semi-normal equations in mixed precision that cannot be found in current libraries. This solver achieves similar performance as other mixed-precision algorithms. We extend our approach to a new multistage programming model that alleviates the interoperability problems between the CPU and GPU programming models. Our multistage approach is used to automatically generate GPU code for CPU-based element-wise expressions and parallel skeletons while allowing for type-safe program generation. We illustrate that this work can be applied to recent architectures and algorithms. The resulting code has been incorporated into a C++ library called NT2. Finally, we investigate how to apply high level programming techniques to batched computations and tensor contractions. We start by explaining how to design a simple data container using modern C++14 programming techniques. Then, we study the issues around batched computations, memory locality and code vectorization to implement a highly optimized matrix-matrix product for small sizes using SIMD instructions. By combining a high level programming approach and advanced parallel programming techniques, we show that we can outperform state of the art numerical libraries
Thévenoux, Laurent. "Synthèse de code avec compromis entre performance et précision en arithmétique flottante IEEE 754." Perpignan, 2014. http://www.theses.fr/2014PERP1176.
Full textNumerical accuracy and execution time of programs using the floating-point arithmetic are major challenges in many computer science applications. The improvement of these criteria is the subject of many research works. However we notice that the accuracy improvement decrease the performances and conversely. Indeed, improvement techniques of numerical accuracy, such as expansions or compensations, increase the number of computations that a program will have to execute. The more the number of computations added is, the more the performances decrease. This thesis work presents a method of accuracy improvement which take into account the negative effect on the performances. So we automatize the error-free transformations of elementary floating-point operations because they present a high potential of parallelism. Moreover we propose some transformation strategies allowing partial improvement of programs to control more precisely the impact on execution time. Then, tradeoffs between accuracy and performances are assured by code synthesis. We present also several experimental results with the help of tools implementing all the contributions of our works
Li, Zheng. "Support architectural pour l'environnement de parallélisation CAPSULE." Paris 11, 2010. http://www.theses.fr/2010PA112328.
Full textDue to the multi-cores, there is a strong incentive to parallelize applications. However, there is no consensus on how to easily parallelize such programs, and the issue is now becoming critical. For this purpose, the ALCHEMY group of INRIA proposed CAPSULE, a conditional parallelization approach. By delegating component spawning and mapping decisions to the architecture or runtime, CAPSULE both simplified the task of the programmer, and achieved a better exploitation of hardware resources. This thesis presented a pure software implementation of CAPSULE, which is based on the Posix threads, but provides simple primitives for the programmer. The experimental results showed that the software implementation is well suited to shared-memory multi-cores with few cores. The thesis also showed that the conditional division approach could improve both the performance and the stability of the real-time execution of parallel programs in embedded systems by maximally using the available cores. If multi-cores become "many-cores", they will likely have to transition to physically distributed memory architectures. We have found that a large number of cores with a physically distributed memory do favor a hardware support over a software support for probing and division. We have proposed a hardware support where the division control is only based on local information and is not centralized. Our experiments show that this hardware-supported localized approach does outperform the central division schemes and is more scalable
Garcia, Pinto Vinicius. "Stratégies d'analyse de performance pour les applications basées sur tâches sur plates-formes hybrides." Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAM058/document.
Full textProgramming paradigms in High-Performance Computing have been shiftingtoward task-based models that are capable of adapting readily toheterogeneous and scalable supercomputers. The performance oftask-based applications heavily depends on the runtime schedulingheuristics and on its ability to exploit computing and communicationresources.Unfortunately, the traditional performance analysis strategies areunfit to fully understand task-based runtime systems and applications:they expect a regular behavior with communication and computationphases, while task-based applications demonstrate no clearphases. Moreover, the finer granularity of task-based applicationstypically induces a stochastic behavior that leads to irregularstructures that are difficult to analyze.In this thesis, we propose performance analysis strategies thatexploit the combination of application structure, scheduler, andhardware information. We show how our strategies can help tounderstand performance issues of task-based applications running onhybrid platforms. Our performance analysis strategies are built on topof modern data analysis tools, enabling the creation of customvisualization panels that allow understanding and pinpointingperformance problems incurred by bad scheduling decisions andincorrect runtime system and platform configuration.By combining simulation and debugging we are also able to build a visualrepresentation of the internal state and the estimations computed bythe scheduler when scheduling a new task.We validate our proposal by analyzing traces from a Choleskydecomposition implemented with the StarPU task-based runtime systemand running on hybrid (CPU/GPU) platforms. Our case studies show howto enhance the task partitioning among the multi-(GPU, core) to getcloser to theoretical lower bounds, how to improve MPI pipelining inmulti-(node, core, GPU) to reduce the slow start in distributed nodesand how to upgrade the runtime system to increase MPI bandwidth. Byemploying simulation and debugging strategies, we also provide aworkflow to investigate, in depth, assumptions concerning the schedulerdecisions. This allows us to suggest changes to improve the runtimesystem scheduling and prefetch mechanisms
Ejjaaouani, Ksander. "Conception du modèle de programmation INKS pour la séparation des préoccupations algorithmiques et d’optimisation dans les codes de simulation numérique : application à la résolution du système Vlasov/Poisson 6D." Thesis, Strasbourg, 2019. http://www.theses.fr/2019STRAD037.
Full textThe InKS programming model aims to improve readability portability and maintainability of simulation codes as well as boosting developer productivity. To fulfill these objectives, InKS proposes two languages, each dedicated to a specific concern. First, InKS PIA provides concepts to express simulation algorithms with no concerns for optimization. Once this foundation is set, InKSPSO enables optimization specialists to reuse the algorithm in order to specify the optimization part. The model offers to write numerous versions of the optimizations, typically one per architecture, from a single algorithm. This strategy limits the rewriting of code for each new optimization specification, boosting developer productivity.We have evaluated the InKS programming model by using it to implement the 6D Vlasov-Poisson solver and compared our version with a Fortran one. This evaluation highlighted that, in addition to the separation of concerns, the InKS approach is not more complex that traditional ones while offering the same performance. Moreover, using the algorithm, it is able to generate valid code for non-critical parts of code, leaving optimization specialists more time to focus on optimizing the computation intensive parts
Vömel, Christof. "Contributions à la recherche en calcul scientifique haute performance pour les matrices creuses." Toulouse, INPT, 2003. http://www.theses.fr/2003INPT003H.
Full textPeretti, Pezzi Guilherme. "High performance hydraulic simulations on the grid using Java and ProActive." Nice, 2011. http://www.theses.fr/2011NICE4118.
Full textL’optimisation de la distribution de l’eau est un enjeu crucial qui a déjà été ciblé par de nombreux outils de modélisation. Des modèles utiles, implémentés il y a des décennies, ont besoin d’évoluer vers des formalismes et des environnements informatiques plus récents. Cette thèse présente la refonte d’un ancien logiciel de simulation hydraulique (IRMA) écrit en FORTRAN, qui a été utilisé depuis plus de 30 ans par la Société du Canal de Provence, afin de concevoir et maintenir les réseaux de distribution d’eau. IRMA a été développé visant principalement pour le traitement des réseaux d’irrigation – en utilisant le modèle probabiliste d’estimation de la demande de Clément – et il permet aujourd’hui de gérer plus de 6000 km de réseaux d’eau sous pression. L’augmentation de la complexité et de la taille des réseaux met en évidence le besoin de moderniser IRMA et de le réécrire dans un langage plus actuel (Java). Cette thèse présente le modèle de simulation implémenté dans IRMA, y compris les équations de perte de charge, les méthodes de linéarisation, les algorithmes d’analyse de la topologie, la modélisation des équipements et la construction du système linéaire. Quelques nouveaux types de simulation sont présentés : la demande en pointe avec une estimation probabiliste de la consommation (débit de Clément), le dimensionnement de pompe (caractéristiques indicées), l’optimisation des diamètres des tuyaux, et la variation de consommation en fonction de la pression. La nouvelle solution adoptée pour résoudre le système linéaire est décrite et une comparaison avec les solveurs existant en Java est présentée. La validation des résultats est réalisée d’abord avec une comparaison avec une comparaison entre les résultats obtenus avec l’ancienne version FORTRAN et la nouvelle solution, pour tous les réseaux maintenus par la Société du Canal de Provence. Une deuxième validation est effectuée en comparant des résultats obtenus à partir d’un outil de simulation standard et bien connu (EPANET). Concernant les performances de la nouvelle solution, des mesures séquentielles de temps sont présentées afin de les comparer avec l’ancienne version FORTRAN. Enfin, deux cas d’utilisation sont présentés afin de démontrer la capacité d’exécuter des simulations distribuées dans une infrastructure de grille, utilisant la solution ProActive. La nouvelle solution a déjà été déployée dans un environnement de production et démontre clairement son efficacité avec une réduction significative du temps de calcul, une amélioration de la qualité des résultats et une intégration facilitée dans le système d’information de la Société du Canal de Provence, notamment la base de données spatiales
Loussert, Arthur. "Understanding and Guiding the Computing Resource Management in a Runtime Stacking Context." Thesis, Bordeaux, 2019. http://www.theses.fr/2019BORD0451.
Full textWith the advent of multicore and manycore processors as building blocks of HPC supercomputers, many applications shift from relying solely on a distributed programming model (e.g., MPI) to mixing distributed and shared memory models (e.g., MPI+OpenMP). This leads to a better exploitation of shared-memory communications and reduces the overall memory footprint. However, this evolution has a large impact on the software stack as applications’ developers do typically mix several programming models to scale over a large number of multicore nodes while coping with their hierarchical depth. One side effect of this programming approach is runtime stacking: mixing multiple models involve various runtime libraries to be alive at the same time. Dealing with different runtime systems may lead to a large number of execution flows that may not efficiently exploit the underlying resources.We first present a study of runtime stacking. It introduces stacking configurations and categories to describe how stacking can appear in applications. We explore runtime-stacking configurations (spatial and temporal) focusing on thread/process placement on hardware resources from different runtime libraries. We build this taxonomy based on the analysis of state-of-the-art runtime stacking and programming models.We then propose algorithms to detect the misuse of compute resources when running a hybrid parallel application. We have implemented these algorithms inside a dynamic tool, called the Overseer. This tool monitors applications, and outputs resource usage to the user with respect to the application timeline, focusing on overloading and underloading of compute resources.Finally, we propose a second external tool called Overmind, that monitors the thread/process management and (re)maps them to the underlying cores taking into account the hardware topology and the application behavior. By capturing a global view of resource usage the Overmind adapts the process/thread placement, and aims at taking the best decision to enhance the use of each compute node inside a supercomputer. We demonstrate the relevance of our approach and show that our low-overhead implementation is able to achieve good performance even when running with configurations that would have ended up with bad resource usage
Sergent, Marc. "Passage à l'echelle d'un support d'exécution à base de tâches pour l'algèbre linéaire dense." Thesis, Bordeaux, 2016. http://www.theses.fr/2016BORD0372/document.
Full textThe ever-increasing supercomputer architectural complexity emphasizes the need for high-level parallel programming paradigms to design efficient, scalable and portable scientific applications. Among such paradigms, the task-based programming model abstracts away much of the architecture complexity by representing an application as a Directed Acyclic Graph (DAG) of tasks. Among them, the Sequential-Task-Flow (STF) model decouples the task submission step, sequential, from the parallel task execution step. While this model allows for further optimizations on the DAG of tasks at submission time, there is a key concern about the performance hindrance of sequential task submission when scaling. This thesis’ work focuses on studying the scalability of the STF-based StarPU runtime system (developed at Inria Bordeaux in the STORM team) for large scale 3D simulations of the CEA which uses dense linear algebra solvers. To that end, we collaborated with the HiePACS team of Inria Bordeaux on the Chameleon software, which is a collection of linear algebra solvers on top of task-based runtime systems, to produce an efficient and scalable dense linear algebra solver on top of StarPU up to 3,000 cores and 288 GPUs of CEA-DAM’s TERA-100 cluster
Collet, Julien. "Exploration of parallel graph-processing algorithms on distributed architectures." Thesis, Compiègne, 2017. http://www.theses.fr/2017COMP2391/document.
Full textWith the advent of ever-increasing graph datasets in a large number of domains, parallel graph-processing applications deployed on distributed architectures are more and more needed to cope with the growing demand for memory and compute resources. Though large-scale distributed architectures are available, notably in the High-Performance Computing (HPC) domain, the programming and deployment complexity of such graphprocessing algorithms, whose parallelization and complexity are highly data-dependent, hamper usability. Moreover, the difficult evaluation of performance behaviors of these applications complexifies the assessment of the relevance of the used architecture. With this in mind, this thesis work deals with the exploration of graph-processing algorithms on distributed architectures, notably using GraphLab, a state of the art graphprocessing framework. Two use-cases are considered. For each, a parallel implementation is proposed and deployed on several distributed architectures of varying scales. This study highlights operating ranges, which can eventually be leveraged to appropriately select a relevant operating point with respect to the datasets processed and used cluster nodes. Further study enables a performance comparison of commodity cluster architectures and higher-end compute servers using the two use-cases previously developed. This study highlights the particular relevance of using clustered commodity workstations, which are considerably cheaper and simpler with respect to node architecture, over higher-end systems in this applicative context. Then, this thesis work explores how performance studies are helpful in cluster design for graph-processing. In particular, studying throughput performances of a graph-processing system gives fruitful insights for further node architecture improvements. Moreover, this work shows that a more in-depth performance analysis can lead to guidelines for the appropriate sizing of a cluster for a given workload, paving the way toward resource allocation for graph-processing. Finally, hardware improvements for next generations of graph-processing servers areproposed and evaluated. A flash-based victim-swap mechanism is proposed for the mitigation of unwanted overloaded operations. Then, the relevance of ARM-based microservers for graph-processing is investigated with a port of GraphLab on a NVIDIA TX2-based architecture