Dissertations / Theses on the topic 'Tolérance aux pannes byzantine'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Tolérance aux pannes byzantine.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Shoker, Ali. "Byzantine fault tolerance from static selection to dynamic switching." Toulouse 3, 2012. http://thesesups.ups-tlse.fr/1924/.
Full textByzantine Fault Tolerance (BFT) is becoming crucial with the revolution of online applications and due to the increasing number of innovations in computer technologies. Although dozens of BFT protocols have been introduced in the previous decade, their adoption by practitioners sounds disappointing. To some extant, this indicates that existing protocols are, perhaps, not yet too convincing or satisfactory. The problem is that researchers are still trying to establish 'the best protocol' using traditional methods, e. G. , through designing new protocols. However, theoretical and experimental analyses demonstrate that it is hard to achieve one-size-fits-all BFT protocols. Indeed, we believe that looking for smarter tac-tics like 'fasten fragile sticks with a rope to achieve a solid stick' is necessary to circumvent the issue. In this thesis, we introduce the first BFT selection model and algorithm that automate and simplify the election process of the 'preferred' BFT protocol among a set of candidate ones. The selection mechanism operates in three modes: Static, Dynamic, and Heuristic. For the two latter modes, we present a novel BFT system, called Adapt, that reacts to any potential changes in the system conditions and switches dynamically between existing BFT protocols, i. E. , seeking adaptation. The Static mode allows BFT users to choose a single BFT protocol only once. This is quite useful in Web Services and Clouds where BFT can be sold as a service (and signed in the SLA contract). This mode is basically designed for systems that do not have too fuctuating states. In this mode, an evaluation process is in charge of matching the user preferences against the profiles of the nominated BFT protocols considering both: reliability, and performance. The elected protocol is the one that achieves the highest evaluation score. The mechanism is well automated via mathematical matrices, and produces selections that are reasonable and close to reality. Some systems, however, may experience fluttering conditions, like variable contention or message payloads. In this case, the static mode will not be e?cient since a chosen protocol might not fit the new conditions. The Dynamic mode solves this issue. Adapt combines a collection of BFT protocols and switches between them, thus, adapting to the changes of the underlying system state. Consequently, the 'preferred' protocol is always polled for each system state. This yields an optimal quality of service, i. E. , reliability and performance. Adapt monitors the system state through its Event System, and uses a Support Vector Regression method to conduct run time predictions for the performance of the protocols (e. G. , throughput, latency, etc). Adapt also operates in a Heuristic mode. Using predefined heuristics, this mode optimizes user preferences to improve the selection process. The evaluation of our approach shows that selecting the 'preferred' protocol is automated and close to reality in the static mode. In the Dynamic mode, Adapt always achieves the optimal performance among available protocols. The evaluation demonstrates that the overall system performance can be improved significantly too. Other cases explore that it is not always worthy to switch between protocols. This is made possible through conducting predictions with high accuracy, that can reach more than 98% in many cases. Finally, the thesis shows that Adapt can be smarter through using heursitics
Leduc, Guilain. "Performance et sécurité d'une Blockchain auto-adaptative et innovante." Electronic Thesis or Diss., Université de Lorraine, 2022. http://www.theses.fr/2022LORR0220.
Full textResearch on blockchain application frameworks rarely offers performance evaluation. This thesis proposes a comprehensive methodology to help software integrators better understand and measure the influence of configuration parameters on the overall quality of long-term service performance. In order to improve performance, the new adaptive consensus protocol Sabine (Self-Adaptive BlockchaIn coNsEnsus) is proposed to dynamically modify one of these parameters in the PBFT consensus. The configuration parameter of this consensus is the number of validators involved and result of a trade-off between security and performance. The Sabine protocol maximises this number provided that the output rate matches the input rate. Sabine is evaluated and validated in real-world settings, the results of which show that Sabine has an acceptable relative error between the requested and committed transaction rates. Two new validator selection algorithms are proposed that reverse the random paradigm of current protocols to select the nodes leading to better performance. The first is based on a reputation system that rewards the fastest nodes. The second selects the closest nodes by imposing a continuous rotation of the selection. These two algorithms have been simulated and their impact on decentralisation discussed. This selection, associated with Sabine, improves security by giving the system more margin to increase the number of validators. This work opens the way to more reactive chains, with less latency and more throughput
Farina, Giovanni. "Tractable Reliable Communication in Compromised Networks." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS310.
Full textReliable communication is a fundamental primitive in distributed systems prone to Byzantine (i.e. arbitrary, and possibly malicious) failures to guarantee the integrity, delivery, and authorship of the messages exchanged between processes. Its practical adoption strongly depends on the system assumptions. Several solutions have been proposed so far in the literature implementing such a primitive, but some lack in scalability and/or demand topological network conditions computationally hard to be verified. This thesis aims to investigate and address some of the open problems and challenges implementing such a communication primitive. Specifically, we analyze how a reliable communication primitive can be implemented in 1) a static distributed system where a subset of processes is compromised, 2) a dynamic distributed system where part of the processes is Byzantine faulty, and 3) a static distributed system where every process can be compromised and recover. We define several more efficient protocols and we characterize alternative network conditions guaranteeing their correctness
Solat, Siamak. "Novel fault-tolerant, self-configurable, scalable, secure, decentralized, and high-performance distributed database replication architecture using innovative sharding to enable the use of BFT consensus mechanisms in very large-scale networks." Electronic Thesis or Diss., Université Paris Cité, 2023. http://www.theses.fr/2023UNIP7025.
Full textThis PhD thesis consists of 6 Chapters. In the first Chapter, as an introduction, we provide an overview of the general goals and motives of decentralized and permissionless networks, as well as the obstacles they face. In the introduction, we also refer to the irrational and illogical solution, known as "permissioned blockchain" that has been proposed to improve the performance of networks similar to Bitcoin. This matter has been detailed in Chapter 5. In Chapter 2, we make clear and intelligible the systems that the proposed idea, Parallel Committees, is based on such networks. We detail the indispensable features and essential challenges in replication systems. Then in Chapter 3, we discuss in detail the low performance and scalability limitations of replication systems that use consensus mechanisms to process transactions, and how these issues can be improved using the sharding technique. We describe the most important challenges in the sharding of distributed replication systems, an approach that has already been implemented in several blockchain-based replication systems and although it has shown remarkable potential to improve performance and scalability, yet current sharding techniques have several significant scalability and security issues. We explain why most current sharding protocols use a random assignment approach for allocating and distributing nodes between shards due to security reasons. We also detail how a transaction is processed in a sharded replication system, based on current sharding protocols. We describe how a shared-ledger across shards imposes additional scalability limitations and security issues on the network and explain why cross-shard or inter-shard transactions are undesirable and more costly, due to the problems they cause, including atomicity failure and state transition challenges, along with a review of proposed solutions. We also review some of the most considerable recent works that utilize sharding techniques for replication systems. This part of the work has been published as a peer-reviewed book chapter in "Building Cybersecurity Applications with Blockchain Technology and Smart Contracts" (Springer, 2023). In Chapter 4, we propose a novel sharding technique, Parallel Committees, supporting both processing and storage/state sharding, to improve the scalability and performance of distributed replication systems that use a consensus to process clients' requests. We introduce an innovative and novel approach of distributing nodes between shards, using a public key generation process that simultaneously mitigates Sybil attack and serves as a proof-of-work mechanism. Our approach effectively reduces undesirable cross-shard transactions that are more complex and costly to process than intra-shard transactions. The proposed idea has been published as peer-reviewed conference proceedings in the IEEE BCCA 2023. We then explain why we do not make use of a blockchain structure in the proposed idea, an issue that is discussed in great detail in Chapter 5. This clarification has been published in the Journal of Software (JSW), Volume 16, Number 3, May 2021. And, in the final Chapter of this thesis, Chapter 6, we summarize the important points and conclusions of this research
Drid, Hamza. "Tolérance aux pannes dans les réseaux optiques de type WDM." Rennes 1, 2010. http://www.theses.fr/2010REN1S031.
Full textSurvivability in optical network is an important issue due to the huge bandwidth offered by optical technology. Survivability means that the network has the ability to maintain an acceptable service level even after an occurrence of failures within the network. In this thesis, we study the survivability in optical networks. Indeed, our work focuses on two main parts. The first part addresses the survivability in networks composed of one single domain. Firstly, we study and classify the various mechanisms of survivability proposed in the literature. Then we focus on p-cycles design. The major challenge of p-cycle design resides in finding an optimal set of p-cycles protecting the network for a given working capacity. In our thesis we propose a novel heuristic approach, which computes an efficient set of p-cycles protecting the network in one step. Our heuristic approach takes into consideration two main criteria: the redundancy and the number of p-cycles involved in the solution. The mechanisms studied in the first part are typically destined to single-domain protection, because they assume that each node in the network may have a complete vision of the physical topology of the network. Such an assumption is not realistic in the case of large networks, such as a multi-domain networks. Few works have focused on survivability in multi-domain optical networks. The second part of this thesis describes and evaluates existing solutions and compares their performances. We propose also a solution based on p-cycles and topology aggregation which overcomes the different problems of the existing solutions
Christian, Delbé. "Tolérance aux pannes pour objets actifs asynchrones : modèle, protocole et expérimentations." Phd thesis, Université de Nice Sophia-Antipolis, 2007. http://tel.archives-ouvertes.fr/tel-00207953.
Full textJafar, Samir. "Programmation des systèmes parallèles distribués : tolérance aux pannes, résilience et adaptabilité." Phd thesis, Grenoble INPG, 2006. http://tel.archives-ouvertes.fr/tel-00085169.
Full textDans ce travail, la représentation de l'état de l'exécution d'un programme parallèle est un graphe, dynamique, de flot de données construit à l'exécution. Cette description du parallélisme est indépendante du nombre de ressources et donc exploitée pour résoudre les problèmes liés à la dynamicité des plateformes considérées. La définition de formats portables pour la représentation des noeuds du graphe résout les problèmes d'hétérogénéité. La sauvegarde du graphe de flot de données d'une application durant son exécution sur une plateforme, constitue des points de reprise pour cette application. Par la suite, une reprise est possible sur un autre type ou nombre de processus. Deux méthodes de sauvegarde / reprise, avec une analyse formelle de leurs complexités, sont présentées : SEL (Systematic Event Logging) et TIC (Theft-Induced Checkpointing). Des mesures expérimentales d'un prototype sur des applications caractéristiques montrent que le surcoût à l'exécution peut être amorti, permettant d'envisager des exécutions tolérantes aux pannes qui passent à l'échelle.
Lahrach, Farid. "Tolérance aux pannes des circuits FPGAs à base de mémoire SRAM." Thesis, Troyes, 2016. http://www.theses.fr/2016TROY0028.
Full textNowadays, SRAM-based FPGAs are omnipresent for embedded electronic applications. Consequently, these circuits became the key player of the overall System-On-Chip (SoC) yield enhancement. However, faults are increasingly pronounced in these emergent technologies, from permanent faults arising from circuit processing at nanometer scales to transient soft errors arising from high-energy particle hits. So fault-tolerance of SRAM-based FPGA is an important system metric to ensure the dependability of embedded applications. The first part of this thesis exposes a comprehensive technique to cope with multiple faults in applications implemented in SRAM-based FPGA without incurring substantial area, power, or performance penalties. This approach has three main benefits compared to redundancy-based fault-tolerance: it’s very low overhead, the option for runtime management, and its complete flexibility. Run-time management can be a very valuable feature of a system, particularly for mission-critical applications. This fault-tolerance approach handles runtime problems on-line, minimizing the amount of system downtime and eliminating the need for outside intervention. The last part of this thesis is oriented toward configuration memory array of SRAM-based FPGA test and diagnostic. New fault models in configuration frames and March algorithms are proposed. These tests have the advantage to benefit from a fast implementation and achieving high fault coverage
Delbé, Christian. "Tolérance aux pannes pour objets actifs asynchrones : protocole, modèle et expérimentations." Nice, 2007. http://www.theses.fr/2007NICE4002.
Full textL'objectif premier de cette thèse est de proposer un protocole de tolérance aux pannes par recouvrement arrière pour le modèle à objets actifs asynchrones communicants ASP (Asynchronous Sequential Processes) et son implémentation en Java ProActive. Cette thèse généralise la problématique soulevée par le développement de ce protocole : nous étudions le recouvrement d'une application répartie depuis un état global non cohérent. Nous proposons donc dans un premier temps un protocole par points de reprise et son implémentation ne supposant pas que les états globaux soient cohérents. Nous montrons à travers des expérimentations réalistes utilisant des applications réparties communicantes que notre solution et son implémentation présentent de bonnes performances. Nous contribuons aussi de manière plus générale à l'étude du recouvrement depuis un état global non cohérent en définissant formellement une nouvelle condition de recouvrabilité, la P-cohérence, basée sur la notion de promesse d'évènement. Cette définition s'intègre dans un formalisme événementiel capable de prendre en compte la sémantique de n'importe quel système ; elle est donc applicable dans un cadre général. En particulier, en appliquant ce formalisme au modèle ASP, nous prouvons la correction de notre protocole en montrant que les états globaux formés durant l'exécution sont toujours recouvrables. Enfin, nous contribuons plus spécifiquement au domaine des grilles de calcul en proposant une extension de notre protocole et son implémentation adaptée à ce contexte. Cette extension se base sur la constitution automatique de groupes de recouvrement au déploiement de l'application. Elle permet une répartition indépendante des mémoires stables et un confinement des effets d'une panne au seul groupe concerné
Abdallah, Maha. "Gestion transactionnelle dictatoriale : de la haute performance à la tolérance aux pannes." Versailles-St Quentin en Yvelines, 2001. http://www.theses.fr/2001VERS0016.
Full textChouikhi, Samira. "Tolérance aux pannes dans un réseau de capteurs sans fil multi-canal." Thesis, Paris Est, 2016. http://www.theses.fr/2016PESC1031/document.
Full textThe development in Micro Electro-Mechanic Systems (MEMS) combined with the emergence of new information and communication technologies allowed the integration of the data sensing, processing and transmission in a single tiny device which is the wireless sensor. Consequently, the networks formed by these sensors offer a lot advantages compared with the traditional networks, in particular in terms of the deployment simplicity and cost. This led to the development of a wide range of Wireless Sensor Networks' applications in the domains of health, environment, industry, infrastructures, spatial activities, or even military activities and in many other domains. However, new challenges appear from the particular characteristics of these networks. In fact, many applications of this type of networks are critical and require that the correct functioning of the network is maintained as long as possible. However, the environments in which these networks are deployed return the mission of network maintenance very complicated or even impossible; hence, the necessity of integrating mechanisms of self-correction which aim to overcome the appeared problems without a human intervention. In this context, we focused our study on the techniques and mechanisms implemented to improve the property of fault tolerance in the wireless sensor networks. First, we proposed centralized and distributed approaches for the connectivity restoration and the channel reallocation in a multi-channel communication context after the failure of a critical node. After the formulation of the problem as a multi-objective optimization problem, we proposed some algorithms based on the heuristics of graphs coloring and Steiner tree, very known in the graph theory to solve this type of problems. In a second part in this thesis, we studied a particular application case, precision agriculture, and we proposed a distributed solution for the failure recovery in wireless sensor networks
Marin, Olivier-Gilles. "L'architecture logicielle DARX : adaptation de la résistance aux pannes aux systèmes multi-agents." Le Havre, 2003. http://www.theses.fr/2003LEHA0008.
Full textDistributed applications are very sensitive to both host and process failures. This is all the truer for multi-agent systems, which are likely to deploy multitudes of agents on a great number of locations. However, fault tolerance involves costly mechanisms; it is thus advisable to apply it wisely. This thesis work relates to the dynamic adaptation of fault tolerance within multi-agents platforms. The aim of this research is double: (1) to provide effective methods for ensuring fail-proof multi-agent computations, (2) to develop a framework for the design of scalable applications, in terms of the number of hosts as well as the number of processes/agents. The DARX framework strives to achieve this twofold objective by providing transparent, agent-specific replication support which adapts to the computation context
Yu, Lei. "Management et tolérance aux pannes des services sur grilles informatiques pour l'intégration d'applications." Châtenay-Malabry, Ecole centrale de Paris, 2008. http://www.theses.fr/2008ECAP1072.
Full textGrid computing is analogous to the power grid in the way that computing resources will be provided in the same way as gas and electricity are provided to us now. Along with the deployment of more and more heterogeneous clusters, the problem of requiring middlewares to leverage existing IT infrastructure to optimize compute resources and manage data and computing workloads has emerged. Grid computing has become an increasingly popular solution to optimize resource allocation and integrate variable computing resources in highly charged IT environments. Several research efforts have been conducted to support the thesis that the Grid services oriented architecture is a suitable solution for realizing legacy scientific applications integration in a grid environment, and this structure can be used to build a scalable, robust and distributed integration system. A new approach for application integration is proposed, applying WS-Resource to wrap legacy applications into Grid services. Then a centralized meta-scheduler is implemented and a new scheduling algorithm, MWL, is proposed. With the meta-scheduler and MWL, jobs can be scheduled and mapped to the resources which have the minimum workload. In order to maintain job state in WS-Resource, WS-Resource properties are defined and are used to provide information for implementing more effective job scheduling (e. G. MCT). For large-scale application integration, a distributed, scalable and robust scheduling structure is proposed. In this structure, a two-step solution is described to solve the fault-tolerant issues: the scheduling algorithm level and the failure detection mechanism. The DDFT algorithm is a robust scheduling algorithm to ensure jobs submission and mapping even if there is a failure of scheduler or connection. Moreover a series of algorithms are proposed to detect the failed scheduler or connection and reconstruct automatically the scheduling structure. Finally, a simulator based on SimGrid is developed. This simulator can be used tosimulate different topologies of distributed scheduling system
Voge, Marie-Emilie. "Optimisation des réseaux de télécommunications : Réseaux multiniveaux, Tolérance aux pannes et Surveillance du trafic." Phd thesis, Université de Nice Sophia-Antipolis, 2006. http://tel.archives-ouvertes.fr/tel-00171565.
Full textNous nous intéressons aussi bien aux réseaux de coeur qu'aux réseaux d'accès. Dans le premier chapitre, nous présentons brièvement les réseaux d'accès ainsi que les réseaux multiniveaux de type IP/WDM et l'architecture MPLS que nous considérons pour les réseaux de coeur. Ces réseaux sont composés d'un niveau physique sur lequel est routé un niveau virtuel. A leur tour les requêtes des utilisateurs sont routées sur le niveau virtuel. Nous abordons également la tolérance aux pannes dans les réseaux multiniveaux qui motive deux problèmes que nous avons étudiés.
Le second chapitre est consacré à la conception de réseaux virtuels. Dans un premier temps nous modélisons un problème prenant en compte la tolérance aux pannes, puis nous en étudions un sous-problème, le groupage. Notre objectif est de minimiser le nombre de liens virtuels, ou tubes, à installer pour router un ensemble de requêtes quelconque lorsque le niveau physique est un chemin orienté.
Le troisième chapitre traite des groupes de risque (SRRG) induits par l'empilement de niveaux au sein d'un réseau multiniveaux. Grâce à une modélisation par des graphes colorés, nous étudions la connexité et la vulnérabilité aux pannes de ces réseaux.
L'objet du quatrième chapitre est le problème du placement d'instruments de mesure du trafic dans le réseau d'accès d'un opérateur. Nous considérons aussi bien les mesures passives qu'actives. La surveillance du trafic possède de nombreuses applications, en particulier la détection de pannes et l'évaluation des performances d'un réseau.
Bernard, Thibault. "Marches aléatoires et mot circulant, adaptativité et tolérance aux pannes dans les environnements distribués." Phd thesis, Université de Reims - Champagne Ardenne, 2006. http://tel.archives-ouvertes.fr/tel-00143600.
Full textalgorithmes reposent principalement sur les trois propriétés fondamentales des marches aléatoires (Percussion, Couverture, Rencontre). Nous fournissons une méthode qui évalue
le temps ́ecoulé avant que ces trois propriétés soient vérifiées. Cela nous permet d'évaluer de la complexité de nos algorithmes. Dans un second temps, nous proposons l'utilisation d'un jeton circulant aléatoirement sous forme de mot circulant afin de collecter sur ce jeton des informations topologiques. Ces informations permettent la construction et la maintenance d'une structure couvrante du réseau de communication. Ensuite, nous
avons utilisé cette structure pour concevoir un algorithme de circulation de jeton tolérant aux pannes pour les environnements dynamiques. Cet algorithme a la particularité d'être complètement décentralisé. Nous proposons dans un dernier temps d'adapter notre circulation de jeton pour proposer une solution au problème d'allocation de ressources dans les réseaux ad-hoc.
Voge, Marie-Émilie. "Optimisation des réseaux de télécommunications : réseaux multiniveaux, tolérance aux pannes et surveillance du trafic." Nice, 2006. http://www.theses.fr/2006NICE4085.
Full textThis thesis is devoted to optimization problems arising in telecommunication networks. We tackle these problems from two main points of view. On the one hand we study their complexity and approximability properties. On the second hand, we propose heuristic methods, approximation algorithms or even exact algorithms that we compare with mixed integer linear programming formulations on specific instances. We are interested in backbone networks as well as access networks. In the first chapter, we briefly present access networks and IP/WDM multilayer backbone networks using the MPLS architecture. These networks are composed of a physical layer on which is routed a virtual layer. In turn, the users' requests are routed on the virtual layer. We also present multilayer network survivability issues motivating two of the questions we have studied. The second chapter is dedicated to the design of virtual networks. First we propose a mixed integer linear programming formulation with network survivability constraints. Then we study a sub-problem, the grooming problem. Our objective is to minimize the number of virtual links, needed to route a given set of requests when the physical layer is a directed path. The third chapter deals with Shared Risk Resource Groups (SRRG) induced by stacking up network layers in multilayer networks. Thanks to the colored graphs model, we study connexity and failure vulnerability of these networks. The positioning of active and passive traffic measurement points in the access network of an internet service provider is the subject of the fourth chapter
Shou, Yanbo. "Cryptographie sur les courbes elliptiques et tolérance aux pannes dans les réseaux de capteurs." Thesis, Besançon, 2014. http://www.theses.fr/2014BESA2015/document.
Full textThe emergence of embedded systems has enabled the development of wireless sensor networks indifferent domains. However, the security remains an open problem. The vulnerability of sensor nodesis mainly due to the lack of resources. In fact, the processing unit doesn’t have enough power ormemory to handle complex security mechanisms.Cryptography is a widely used solution to secure networks. Compared with symmetric cryptography,the asymmetric cryptography requires more complicated computations, but it offers moresophisticated key distribution schemes and digital signature.In this thesis, we try to optimize the performance of ECC. An asymmetric cryptosystem which isknown for its robustness and the use of shorter keys than RSA. We propose to use parallelismtechniques to accelerate the computation of scalar multiplications, which is recognized as the mostcomputationally expensive operation on elliptic curves. The test results have shown that our solutionprovides a significant gain despite an increase in energy consumption.The 2nd part of our contribution is the application of fault tolerance in our parallelism architecture.We use redundant nodes for fault detection and computation recovery. Thus, by using ECC and faulttolerance, we propose an efficient and reliable security solution for embedded systems
Lahoud, Samer. "Routage et allocation de flots avec tolérance aux pannes dans les réseaux Internet nouvelle génération." Télécom Bretagne, 2006. http://www.theses.fr/2006TELB0011.
Full textHuc, Florian. "Conception de Réseaux Dynamiques Tolérants aux Pannes." Phd thesis, Université de Nice Sophia-Antipolis, 2008. http://tel.archives-ouvertes.fr/tel-00472781.
Full textLemarinier, Pierre. "Fiabilité et traitement de la volatilité dans les systèmes de calcul global." Paris 11, 2006. http://www.theses.fr/2006PA112258.
Full textThe distributed computing systems gather more and more processors and are thus subjected to higher failure frequencies. The message passing applications are now generally written using the MPI interface. Numbers of automatic and transparent fault tolerant protocols for message passing libraries have been proposed and implemented. All these protocols rely on checkpoint/restart mechanisms, coordinated or not. However, no comparison of these protocols have been presented yet, in term of cost on the initial performance of MPI applications. We expose in this paper the first comparison between the different kind of fault tolerant protocols. The first part describes in a common model five protocols: a distant pessimistic message logging protocol, a sender based pessimistic message logging protocol, a causal message logging protocol, a non blocking coordinated checkpoint protocol and finally a blocking coordinated checkpoint protocol. The second part of this thesis presents the implementation of the fourth first protocols in the MPICH library and the fifth protocol in the MPICH2 library. Then we sum up the experiment results we obtained for the pessimistic protocols implementation and detail the performance measurements of the causal implementation and the coordinated checkpoint implementations, using micro benchmarks and NAS applications on different computing systems
Aliouat, Makhlouf. "Reprise de processus dans un environnement distribué après pannes matérielles transitoires ou permanentes." Phd thesis, Grenoble INPG, 1986. http://tel.archives-ouvertes.fr/tel-00320133.
Full textPucel, Xavier. "A unified point of view on diagnosability." Toulouse, INSA, 2008. http://eprint.insa-toulouse.fr/archive/00000237/.
Full textThe problem of model-based fault diagnosis in complex systems has received an increasing interest over the past decades. Experience has proved that it needs to be taken into account during the system design stage, by means of diagnosability analysis. Diagnosability is the ability of a system to exhibit different symptoms for a set of anticipated fault situations. Several approaches for diagnosability have been developed using different modelling formalisms. , yet the reasoning for diagnosability analysis is very similar in all these approaches. This thesis provides a comparison of these and a unified definition of diagnosability. An original approach for diagnosability analysis, based on partial fault modes, is described and implemented in the context of service oriented architecture, more precisely on web services. An original generalization of the definition of diagnosability to any set of system states is presented, that accounts for many kinds of properties, like repair preconditions or quality of service. This work opens perspectives for model independent diagnosability reasoning, diagnosability based on other types of models, and in integrating diagnosis into a general purpose supervision tool. Model-based diagnosis and diagnosability of software systems is still a young applicative domain, and opens many connections with the software safety engineering domain
Fuguet, Tortolero César. "Introduction de mécanismes de tolérance aux pannes franches dans les architectures de processeur « many-core » à mémoire partagée cohérente." Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066462/document.
Full textThe always increasing performance demands of applications such as cryptography, scientific simulation, network packets dispatching, signal processing or even general-purpose computing has made of many-core architectures a necessary trend in the processor design. These architectures can have hundreds or thousands of processor cores, so as to provide important computational throughputs with a reasonable power consumption. However, their important transistor density makes many-core architectures more prone to hardware failures. There is an augmentation in the fabrication process variability, and in the stress factors of transistors, which impacts both the manufacturing yield and lifetime. A potential solution to this problem is the introduction of fault-tolerance mechanisms allowing the processor to function in a degraded mode despite the presence of defective internal components. We propose a complete in-the-field reconfiguration-based permanent failure recovery mechanism for shared-memory many-core processors. This mechanism is based on a firmware (stored in distributed on-chip read-only memories) executed at each hardware reset by the internal processor cores without any external intervention. It consists in distributed software procedures, which locate the faulty components (cores, memory banks, and network-on-chip routers), reconfigure the hardware architecture, and provide a description of the functional hardware infrastructure to the operating system. Our proposal is evaluated using a cycle-accurate SystemC virtual prototype of an existing many-core architecture. We evaluate both its latency, and its silicon cost
Bouguerra, Mohamed slim. "Tolérance aux pannes dans des environnements de calcul parallèle et distribué : optimisation des stratégies de sauvegarde/reprise et ordonnancement." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00910358.
Full textBouguerra, Mohamed Slim. "Tolérance aux pannes dans des environnements de calcul parallèle et distribué : optimisation des stratégies de sauvegarde/reprise et ordonnancement." Thesis, Grenoble, 2012. http://www.theses.fr/2012GRENM023/document.
Full textThe parallel computing platforms available today are increasingly larger. Typically the emerging parallel platforms will be composed of several millions of CPU cores running up to a billion of threads. This intensive growth of the number of parallel threads will make the application subject to more and more failures. Consequently it is necessary to develop efficient strategies providing safe and reliable completion for HPC parallel applications. Checkpointing is one of the most popular and efficient technique for developing fault-tolerant applications on such a context. However, checkpoint operations are costly in terms of time, computation and network communications. This will certainly affect the global performance of the application. In the first part of this thesis, we propose a performance model that expresses formally the checkpoint scheduling problem. Two variants of the problem have been considered. In the first variant, the objective is the minimization of the expected completion time. Under this model we prove that when the failure rate and the checkpoint cost are constant the optimal checkpoint strategy is necessarily periodic. For the general problem when the failure rate and the checkpoint cost are arbitrary we provide a numerical solution for the problem. In the second variant if the problem, we exhibit the tradeoff between the impact of the checkpoints operations and the lost computation due to failures. In particular, we prove that the checkpoint scheduling problem is NP-hard even in the simple case of uniform failure distribution. We also present a dynamic programming scheme for determining the optimal checkpointing times in all the variants of the problem. In the second part of this thesis, we design several fault tolerant scheduling algorithms that minimize the application makespan and in the same time maximize the application reliability. Mainly, in this part we point out that the growth rate of the failure distribution determines the relationship between both objectives. More precisely we show that when the failure rate is decreasing the two objectives are antagonist. In the second hand when the failure rate is increasing both objective are congruent. Finally, we provide approximation algorithms for both failure rate cases
Fuguet, Tortolero César. "Introduction de mécanismes de tolérance aux pannes franches dans les architectures de processeur « many-core » à mémoire partagée cohérente." Electronic Thesis or Diss., Paris 6, 2015. http://www.theses.fr/2015PA066462.
Full textThe always increasing performance demands of applications such as cryptography, scientific simulation, network packets dispatching, signal processing or even general-purpose computing has made of many-core architectures a necessary trend in the processor design. These architectures can have hundreds or thousands of processor cores, so as to provide important computational throughputs with a reasonable power consumption. However, their important transistor density makes many-core architectures more prone to hardware failures. There is an augmentation in the fabrication process variability, and in the stress factors of transistors, which impacts both the manufacturing yield and lifetime. A potential solution to this problem is the introduction of fault-tolerance mechanisms allowing the processor to function in a degraded mode despite the presence of defective internal components. We propose a complete in-the-field reconfiguration-based permanent failure recovery mechanism for shared-memory many-core processors. This mechanism is based on a firmware (stored in distributed on-chip read-only memories) executed at each hardware reset by the internal processor cores without any external intervention. It consists in distributed software procedures, which locate the faulty components (cores, memory banks, and network-on-chip routers), reconfigure the hardware architecture, and provide a description of the functional hardware infrastructure to the operating system. Our proposal is evaluated using a cycle-accurate SystemC virtual prototype of an existing many-core architecture. We evaluate both its latency, and its silicon cost
Perronne, Lucas. "Vers des protocoles de tolérance aux fautes byzantines efficaces et robustes." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM075/document.
Full textOver the last decade, Cloud computing instigated an important switch of paradigm in numerous information systems. This new paradigm is mainly illustrated by the re-location of the whole IT infrastructures out of companies’ warehouses. The use of local servers has thus being replaced by remote ones, rented from dedicated providers such as Google, Amazon, Microsoft.In order to ensure the sustainability of this economic model, it appears necessary to provide several guarantees to users, related to the security, availability, or even reliability of the proposed resources. Such quality of service (QoS) factors allow providers and users to reach an agreement on the expected level of dependability. Practically, the proposed servers must episodically cope with arbitrary faults (also called byzantine faults), such as incorrect/corrupted messages, servers crashes, or even network failures. Nevertheless, the Cloud computing environment encouraged the emergence of technologies such as virtualization or state machine replication. These technologies allow cloud providers to efficiently face the occurrences of faults through the implementation of fault tolerance protocols.Byzantine Fault Tolerance (BFT) is a research area involving state machine replication concepts, and aiming at ensuring continuity and reliability of hosted services in presence of any kind of arbitrary behaviors. In order to handle such threat, numerous protocols were proposed. These protocols must be efficient in order to counterbalance the extra cost of replication, and robust in order to lower the impact of byzantine behaviors on the system performance. We first noticed that tackling both these concerns at the same time is difficult: current protocols are either designed to be efficient at the expense of their robustness, or robust at the expense of their efficiency. We tackle this specific problem in this thesis, our goal being to provide the required tools to design both efficient and robust BFT protocols.Our focus is mainly dedicated to two types of denial-of-service attacks involving requests management. The first one is caused by the partial corruption of a request transmitted by a client. The second one is caused by the intentional drop of a request upon receipt. In order to face efficiently both these byzantine behaviors, several mechanisms were integrated in robust BFT protocols. In practice, these mecanisms involve high overheads, and thus lead to the significant performance drop of robust protocols compared to efficien ones. This assessment allows us to introduce our first contribution: the definition of several generic design principles, applicable to numerous existing BFT protocols, and aiming at reducing these overheads while maintaining the same level of robustness.The second contribution introduces ER-PBFT, a new protocol implementing these design principles on PBFT, the reference in terms of byzantine fault tolerance. We demonstrate the efficiency of our new robustness policy, both in fault-free scenarios and in presence of byzantine behaviors.The third contribution highlights ER-COP, a new BFT protocol dedicated to both efficiency and robustness, implementing our design principles on COP, the BFT protocol providing for now the best performances in a fault-free environment. We evaluate the additional cost introduced by our robustness policy, and we demonstrate ER-COP's ability to handle byzantine behaviors
Zhang, Zhen. "Détection des pannes franches et reconfiguration automatique dans un micro-réseau intégré sur puce." Paris 6, 2011. http://www.theses.fr/2011PA066430.
Full textMakassikis, Constantinos. "Conception d'un modèle et de frameworks de distribution d'applications sur grappes de PCs avec tolérance aux pannes à faible coût." Electronic Thesis or Diss., Nancy 1, 2011. http://www.theses.fr/2011NAN10011.
Full textPC clusters are distributed architectures whose adoption spreads as a result of their low cost but also their extensibility in terms of nodes. In particular, the increase in nodes is responsable for the increase of fail-stop failures which jeopardize distributed applications. The absence of efficient and portable solutions limits their use to non critical applications or without time constraints. MoLOToF is a model for application-level fault tolerance based on checkpointing. To ease the addition of fault tolerance, it proposes to structure applications using fault-tolerant skeletons as well as collaborations between the programmer and the fault tolerance system to gain in efficiency. The application of MoLOToF on SPMD and Master-Worker families of parallel algorithms lead to FT-GReLoSSS and ToMaWork frameworks respectively. Each framework provides fault-tolerant skeletons suited to targeted families of algorithms and an original implementation. FT-GReLoSSS uses C++ on top of MPI while ToMaWork uses Java on top of virtual shared memory system provided by JavaSpaces technology. The frameworks' evaluation reveals a reasonable time development overhead and negligible runtime overheads in absence of fault tolerance. Experiments up to 256 nodes on a dualcore PC cluster, demonstrate a better efficiency of FT-GReLoSSS' fault tolerance solution compared to existing system-level solutions (LAM/MPI and DMTCP)
Nolot, Florent. "Stabilisation des horloges de phases dans les systèmes distribués." Amiens, 2002. http://www.theses.fr/2002AMIE0205.
Full textDiouri, Mohammed El Mehdi. "Efficacité énergétique dans le calcul très haute performance : application à la tolérance aux pannes et à la diffusion de données." Phd thesis, Ecole normale supérieure de lyon - ENS LYON, 2013. http://tel.archives-ouvertes.fr/tel-00881094.
Full textMakassikis, Constantinos. "Conception d'un modèle et de frameworks de distribution d'applications sur grappes de PCs avec tolérance aux pannes à faible coût." Phd thesis, Université Henri Poincaré - Nancy I, 2011. http://tel.archives-ouvertes.fr/tel-00591083.
Full textAhmadi, Sajjad. "Contribution à l'étude de la tolérance de pannes de convertisseurs multiniveaux en pont en H." Electronic Thesis or Diss., Université de Lorraine, 2021. http://www.theses.fr/2021LORR0026.
Full textEnsuring service continuity in safety-critical applications is indispensable. In some of these applications, the multilevel inverters play a vital role. Hence, employing a multilevel converter with fault tolerant feature is of great importance. In this regard, a fault tolerant five-level Neutral Point Clamped (NPC) inverter is proposed in this research work. The proposed fault diagnosis algorithm is based on failure mode analysis, which is a logic based approach. The realization of this strategy does not require any component modeling and complicated calculations. Although switches are more fragile than clamping diodes, clamping diodes are also subjected to breakdown. Hence, identification of a defective clamping diode is also studied in this research work. Moreover, for fault detection procedure, a voltage quantifier is proposed to avoid any misdiagnosis arising from measurement errors and voltage drop in the circuit. Following to the fault diagnosis, the proposed fault tolerant strategy aims to restore the rated voltage and current at the inverter terminal in the presence of an open-circuit fault in a switch or in a clamping or anti-parallel diode. Compared with healthy operation, harmonic content of the terminal voltage and current is not increased. The proposed fault tolerant structure does not include any contactor or bidirectional switch, which allows fast triggering of fault tolerant operation. The simulation and experimental results are presented to validate the effectiveness of the proposed approaches. A fault is detected in 20 µs and localized between 20 and 60 µs after occurrence, depending on the faulty semiconductor (switch or clamping diode)
Da, penha coelho Alexandre Augusto. "Tolérance aux fautes et fiabilité pour les réseaux sur puce 3D partiellement connectés." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAT054.
Full textNetworks-on-Chip (NoC) have emerged as a viable solution for the communication challenges in highly complex Systems-on-Chip (SoC). The NoC architecture paradigm, based on a modular packet-switched mechanism, can address many of the on-chip communication challenges such as wiring complexity, communication latency, and bandwidth. Furthermore, the combined benefits of 3D IC and Networks-on-Chip (NoC) schemes provide the possibility of designing a high-performance system in a limited chip area. The major advantages of Three-Dimensional Networks-on-Chip (3D-NoCs) are a considerable reduction in the average wire length and wire delay, resulting in lower power consumption and higher performance. However, 3D-NoCs suffer from some reliability issues such as the process variability of 3D-IC manufacturing. In particular, the low yield of vertical connection significantly impacts the design of three-dimensional die stacks with a large number of Through Silicon Via (TSV). Equally concerning, advances in integrated circuit manufacturing technologies are resulting in a potential increase in their sensitivity to the effects of radiation present in the environment in which they will operate. In fact, the increasing number of transient faults has become, in recent years, a major concern in the design of critical SoC. As a result, the evaluation of the sensitivity of circuits and applications to events caused by energetic particles present in the real environment is a major concern that needs to be addressed. So, this thesis presents contributions in two important areas of reliability research: in the design and implementation of deadlock-free fault-tolerant routing schemes for the emerging three-dimensional Networks-on-Chips; and in the design of fault injection frameworks able to emulate single and multiple transient faults in the HDL-based circuits. The first part of this thesis addresses the issues of transient and permanent faults in the architecture of 3D-NoCs and introduces a new resilient routing computation unit as well as a new runtime fault-tolerant routing scheme. A novel resilient mechanism is introduced in order to tolerate transient faults occurring in the route computation unit (RCU), which is the most important logical element in NoC routers. Failures in the RCU can provoke misrouting, which may lead to severe effects such as deadlocks or packet loss, corrupting the operation of the entire chip. By combining a reliable fault detection circuit leveraging circuit-level double-sampling, with a cost-effective rerouting mechanism, we develop a full fault-tolerance solution that can efficiently detect and correct such fatal errors before the affected packets leave the router. Yet in the first part of this thesis, a novel fault-tolerant routing scheme for vertically-partially-connected 3D Networks-on-Chip called FL-RuNS is presented. Thanks to an asymmetric distribution of virtual channels, FL-RuNS can guarantee 100% packet delivery under an unconstrained set of runtime and permanent vertical link failures. With the aim to emulate the radiation effects on new SoCs designs, the second part of this thesis addresses the fault injection methodologies by introducing two frameworks named NETFI-2 (Netlist Fault Injection) and NoCFI (Networks-on-Chip Fault Injection). NETFI-2 is a fault injection methodology able to emulate transient faults such as Single Event Upsets (SEU) and Single Event Transient (SET) in a HDL-based (Hardware Description Language) design. Extensive experiments performed on two appealing case studies are presented to demonstrate NETFI-2 features and advantage. Finally, in the last part of this work, we present NoCFI as a novel methodology to inject multiple faults such as MBUs and SEMT in a Networks-on-Chip architecture. NoCFI combines ASIC-design-flow, in order to extract layout information, and FPGA-design-flow to emulate multiple transient faults
Andrei, Geanina. "Contribution à la commande tolerante aux pannes dans la conduite du vol." Thesis, Toulouse, INSA, 2010. http://www.theses.fr/2010ISAT0033/document.
Full textThis thesis uses the nonlinear inverse control technique to synthesize control laws for dealing with two types of failures aboard transport airplanes. The first type of failure affects some actuators without compromising the overall controllability of the airplane: this situation arises particularly in the case of an isolated fault whose effects can theoretically be compensated taking into account the redundancy of actuators in terms of their effects on the dynamics of flight. The problem that arises is the reallocation of control surfaces to maintain for it, when possible, a standard behaviour in both equilibrium and makeover situations. The case of an aileron failure is studied here and a roll manoeuvre is considered for standard response of the aircraft. At this level, looking for the solution of this problem will lead us to combine the nonlinear inverse control technique and classical Mathematical Programming solicited online in order to take account of all material and structural constraints to be taken into account to ensure safety of the aircraft. The second considered type of failure affects an entire chain of command, leading to catastrophic situations where the aircraft is no more controllable in the classical sense and requires the immediate interruption of commercial flight to find a solution for the survival of people on board the aircraft through an emergency landing. In this thesis we consider the case of total loss of power for aerodynamic actuators and we develop a flight safety strategy based on a sequence of flight phases to which are associated limited control objectives based on the few opportunities from the only remaining actuators, the aircraft engines. Here too, the nonlinear inverse control technique plays an important role in the synthesis of control laws essential to put the airplane inacceptable conditions for landing
Belloum, Adam Scander. "Étude d'un système multiprocesseurs reconfigurable dédié aux traitements d'images basé sur les processeurs de signaux." Compiègne, 1996. http://www.theses.fr/1996COMPD877.
Full textKrawezik, Géraud. "Contribution à l'étude de la programmation des machines parallèles complexes." Paris 11, 2004. http://www.theses.fr/2004PA112160.
Full textThe goal of this thesis?is to study the programming of complex parallel machines, which are used to solve large scale numerical problems. It mainly concentrates in three points which are first the study of standard parallel languages and their respective efficiency. Then we will study a fault tolerant parallel programming labrary and its runtime. At last we will consider the future languages, with the presentation of already existing ones and their common caracteristics before presenting the definition of a new language. In the first part, we will show that the OpenMP tool in the case of shared memory machines enables the user to get more performance than with MPI which is now the standard of parallel programming. But this is done with a high programming effort which goes against the easyness intended in OpenMP. In the next part, we will present MPICH-V, an automatic fault-tolerant implementation, and especially its runtime, by presenting three caracteristics that are important for this part, which are the remote launching, the connection between nodes and the handling of fault detection. In each case we will discuss the possible technical choices before extending them to a grid environment. At last, we will present upcoming parallel languages with different examples of their usage, before presenting our own, based on a shared memory mechanism and programmed communications
Pley, Julien. "Protocoles d'accord pour la gestion d'une grille de calcul dynamique." Rennes 1, 2007. ftp://ftp.irisa.fr/techreports/theses/2007/pley.pdf.
Full textWe present a middleware for dynamic grids where the federated resources are provided by different institutions. Within a domain, the resources interact in a synchronous manner. Interactions between resources belonging to different domains are asynchronous. Every machine or domain can join or leave the grid at any time; due to failures, or on purpose. We propose a fault-tolerant solution which takes advantage of this hierarchical structure to solve the grid membership problem and the load-balanced task allocation problem. Each service is the composition of a synchronous protocol and an asynchronous agreement protocol which is always a variation of the fundamental Consensus problem. We define the "insensitivity to erroneous suspicions"; a new metric to compare different Consensus protocols based on Diamond S failure detectors
Haddar, Mohamed Amine. "Codage d’algorithmes distribués d’agents mobiles à l’aide de calculs locaux." Thesis, Bordeaux 1, 2011. http://www.theses.fr/2011BOR14429/document.
Full textToday, distributed systems must satisfy increasinglynew requirements for quality of service and the emergence ofnew applications such as Grid Computing, whichgenerally results in requirements of dynamicity andmobility. If satisfactory solutions exist forstatic distributed environments, they are inadequate in the casewhere the system becomes dynamic (mobility, evolution,components change). Indeed, the design of distributed algorithms istraditionally based on the assumption of a network whosetopology is static. Our goal, in this thesis, is to defineand study a model based on mobile agents to implementand execute distributed algorithms encoded by local computations.This model must take into account failures that can alter thethe distributed system operation. It should also improveperformance vis-à-vis the classical models (message passing systems)
Moataz, Fatima Zahra. "Vers des réseaux optiques efficaces et tolérants aux pannes : complexité et algorithmes." Thesis, Nice, 2015. http://www.theses.fr/2015NICE4077/document.
Full textWe study in this thesis optimization problems with application in optical networks. The problems we consider are related to fault-tolerance and efficient resource allocation and the results we obtain are mainly related to the computational complexity of these problems. The first part of this thesis is devoted to finding paths and disjoint paths. Finding a path is crucial in all types of networks in order to set up connections and finding disjoint paths is a common approach used to provide some degree of protection against failures in networks. We study these problems under different settings. We first focus on finding paths and node or link-disjoint paths in networks with asymmetric nodes, which are nodes with restrictions on their internal connectivity. Afterwards, we consider networks with star Shared Risk Link Groups (SRLGs) which are groups of links that might fail simultaneously due to a localized event. In these networks, we investigate the problem of finding SRLG-disjoint paths. The second part of this thesis focuses on the problem of Routing and Spectrum Assignment (RSA) in Elastic Optical Networks (EONs). EONs are proposed as the new generation of optical networks and they aim at an efficient and flexible use of the optical resources. RSA is the key problem in EONs and it deals with allocating resources to requests under multiple constraints. We first study the static version of RSA in tree networks. Afterwards, we examine a dynamic version of RSA in which a non-disruptive spectrum defragmentation technique is used. Finally, we present in the appendix another problem that has been studied during this thesis
García-Gutiérrez, Luis Antonio. "Développement d'un contrôle actif tolérant aux défaillances appliqué aux systèmes PV." Thesis, Toulouse 3, 2019. http://www.theses.fr/2019TOU30071.
Full textThis work contributes by developing an active fault tolerant control (AFTC) for Photovoltaic (PV) systems. The fault detection and diagnosis (FDD) methodology is based on the analysis of a model that compares real-time measurement. We use a high granularity PV array model in the FDD tool to allow faults to be detected in complex conditions. Firstly, the research focuses on fault detection in complex shadow conditions. A real-time approach is presented to emulate the electrical characteristics of PV modules under complex shadow conditions. Using a precise emulators approach is a real challenge to study the high non-linearity and the complexity of PV systems in partial shading. The real-time emulation was validated with simple experimental results under failure conditions to design specific fault-detection algorithms in a first sample. The second part of the research addresses the FDD method for DC/DC and DC/AC power converters that are connected to the grid. Primary results allowed us to validate the system's recovery for normal operating points after a fault with this complete AFTC approach. Emulations based on the simulation of distributed power converters, fault detection methodologies based on a model, and a hybrid diagnostician were then presented
Lodygensky, Oleg. "Contribution aux infrastructures de calcul global : délégation inter plates-formes, intégration de services standards et application à la physique des hautes énergies." Phd thesis, Université Paris Sud - Paris XI, 2006. http://tel.archives-ouvertes.fr/tel-00147815.
Full textLes mondes du commerce, de l'industrie et de la recherche, ont bien compris les avantages et les enjeux de cette révolution et investissent massivement dans la recherche et le développement autour de ces nouvelles technologies, que l'on appelle les "grilles", qui désignent des ressources informatiques globales et qui ouvrent une nouvelle approche. Une des disciplines autour des grilles concerne le calcul. Elle est l'objet des travaux présentés ici.
Sur le campus de l'Université Paris-Sud, à Orsay, une synergie est née entre le Laboratoire de Recherche en Informatique (LRI) d'une part, et le Laboratoire de l'Accélérateur Linéaire (LAL), d'autre part, afin de mener à bien, ensemble, des travaux sur les infrastructures de grille qui ouvrent de nouvelles voies d'investigation pour le premier et de nouvelles méthodes de travail pour le second.
Les travaux présentés dans ce manuscrit sont le résultat de cette collaboration pluridisciplinaire. Ils se sont basés sur XtremWeb, la plate-forme de recherche et de production de calcul global développée au LRI. Nous commençons par présenter un état de l'art des systèmes distribués à grande Èchelle, ses principes fondamentaux, son architecture basée sur les services.
Puis nous introduisons XtremWeb et détaillons les modifications que nous avons dû apporter, tant au niveau de son architecture que de son implémentation, afin de mieux répondre aux exigences et aux besoins de ce type de plate-forme. Nous présentons ensuite deux études autour de cette plate-forme permettant de généraliser l'utilisation de ressources inter grilles, d'une part, et d'utiliser sur une grille des services qui n'ont pas été prévus à cette fin, d'autre part. Enfin, nous présentons l'utilisation, les problèmes à résoudre et les avantages à tirer de notre plate-forme par la communauté de recherche en physique des hautes énergies, grande consommatrice de ressources informatiques.
Durand, Anaïs. "Algorithmes distribués efficaces adaptés à un contexte incertain." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAM037/document.
Full textDistributed systems become increasingly wide and complex, while their usage extends to various domains (e.g., communication, home automation, monitoring, cloud computing). Thus, distributed systems are executed in diverse contexts. In this thesis, we focus on uncertain contexts, i.e., the context is not completely known a priori or is unsettled. More precisely, we consider two main kinds of uncertainty: processes that are not completely identified and the presence of faults. The absence of identification is frequent in large networks composed of massively produced and deployed devices. In addition, anonymity is often required for security and privacy. Similarly, large networks are exposed to faults (e.g, process crashes, wireless connection drop), but the service must remain available.This thesis is composed of four main contributions. First, we study the leader election problem in unidirectional rings of homonym processes, i.e., processes are identified but their ID is not necessarily unique. Then, we propose a silent self-stabilizing leader election algorithm for arbitrary connected network. This is the first algorithm under such conditions that stabilizes in a polynomial number of steps. The third contribution is a new stabilizing property designed for dynamic networks that ensures fast and gradual convergences after topological changes. We illustrate this property with a clock synchronizing algorithm. Finally, we consider the issue of concurrency in resource allocation problems. In particular, we study the level of concurrency that can be achieved in a wide class of resource allocation problem, i.e., the local resource allocation
Hoarau, William. "Injection de fautes dans les systèmes distribués." Paris 11, 2008. http://www.theses.fr/2008PA112152.
Full textIn large scale distributed systems, the occurrence of faults is unavoidable. Being able to control faults (such as the crash of a process) is an important tool to deploy reliable distributed systems. In this thesis, we present FAIL (for Fault Injection Language), a language that permits to elaborate complex fault scenarios easily. It is possible to design probabilistic scenarios (for quantitative tests) as well as deterministic reproduciple ones. We then present FAIL-FCI (FAIL Cluster Implementation), our fault injector, that consists in a compiler, a runtime library, and a middleware platform for distributed fault-injection. FCI can be interfaced with various programming languages and does not require source code modification. We also present various tests we conducted on several distributed applications
Mouafo, Tchinda Yves. "Robustesse des applications temps-réel multicoeurs : techniques de construction d'un ordonnacement équitable tolérant aux pannes matérielles." Thesis, Chasseneuil-du-Poitou, Ecole nationale supérieure de mécanique et d'aérotechnique, 2017. http://www.theses.fr/2017ESMA0015/document.
Full textThis thesis proposes several techniques to build a valid schedule with a Pfair algorithm for multicore real-time systems despite permanent processor failures. Depending on the nature of the tasks, additional time may be allocated or not to recover the lost execution. First, we consider a single core failure. We then show that if no additional time is allocated, the use of a single more core than the required minimum provides a valid schedule : it is the Limited Hardware Redundancy Technique. However, if full recovery is mandatory, we propose three techniques : the Substitute Subtasks Technique which increases the WCET to provide additionnal time which can be used to recover the lost time, the Constrain and Release Technique which creates a time margin between each task's deadline and the following period which can be used to recover the lost execution and the Aperiodic Flow Technique which reschedules the lost execution within the idle time units. Then, these techniques are mixed to adapt the scheduling behaviour to the nature of the impacted tasks. Finally, the case of the failure of several cores is studied.To adapt the system load to the number of remaining functionnal cores we use the criticality mode change which modifies the temporal parameters of some tasks or we discard some tasks according to their importance
Khorguani, Ana. "Gestion de données persistantes efficace pour des serveurs hybrides avec mémoire non-volatile." Electronic Thesis or Diss., Université Grenoble Alpes, 2023. http://www.theses.fr/2023GRALM069.
Full textNon-volatile memory (NVMM) technologies are a great opportunity to build fast fault-tolerant programs, as they provide persistent storage in main memory. However, since the processor caches remain volatile, solutions are needed to recover a consistent state from NVMM after a crash. In this thesis, we propose fast checkpointing approaches in NVMM to make multi-threaded programs fault tolerant. We focus on achieving high failure-free performance by flushing persistent data structures to NVMM periodically.Our first work, ResPCT, considers a memory architecture where a single copy of the data is saved directly in NVMM. ResPCT uses In-Cache-Line logging to efficiently track modifications during failure-free execution, and to restore a consistent state after a crash. The ResPCT API enables programmers to position restart points in their program, which simplifies the identification of the persistent program state and can also help improving performance. Experiments with representative benchmarks and applications, show that ResPCT outperforms state-of-the-art solutions.Our evaluation of ResPCT shows that relying on a single copy of the data in NVMM can limit performance. Therefore, in the second part of our study, we consider an alternative approach that integrates DRAM into the memory architecture. The proposed system involves maintaining a working copy of the data in DRAM, enabling the program to operate on this copy rather than on the slower version stored in NVMM. In this work, we compare several techniques to write data from DRAM to NVMM during checkpoints. Our results show that even though some techniques have advantages over others, choosing the most suitable method for transferring modifications from DRAM to NVMM depends on the specific characteristics of the applications
Hanna, Fouad. "Etude et développement du nouvel algorithme distribué de consensus FLC permfettant de maintenir la cohérence des données partagées et tolérant aux fautess." Thesis, Besançon, 2016. http://www.theses.fr/2016BESA2051.
Full textNowadays, collaborative work took a very important place in many fields and particularly in the medicaltelediagnosis field. The consistency of shared data is a key issue in this type of applications. Moreover, itis essential to use a consensus algorithm to ensure data consistency in collaborative platforms. We presenthere our new consensus algorithm FLC that helps to ensure data consistency in asynchronous collaborativedistributed systems. Our algorithm is fault tolerant and aims to improve the performance of consensus ingeneral and particularly in the case of process crashes. The new algorithm uses the leader oracle tocircumvent the impossibility result of the FLP theorem. It is decentralized and considers the crash-stop failuremodel. The FLC algorithm is based on two main ideas. The first is to perform, at the beginning of eachround, a simple election phase guaranteeing the existence of only one leader per round. The second is totake advantage of system stability and more particularly of the fact that the leader does not crash betweentwo consecutive consensus runs. The performance of our algorithm was analyzed and compared to the mostknown algorithms in the domain. The results obtained by simulation, using the Neko platform, demonstratedthat our algorithm gave the best performance when using a multicast network in the best case scenario and insituations where the algorithm undergoes one or more crashes of coordinators/leaders processes
Franca, Rezende Tuanir. "Leaderless state-machine replication : from fail-stop to Byzantine failures." Electronic Thesis or Diss., Institut polytechnique de Paris, 2021. http://www.theses.fr/2021IPPAS016.
Full textModern distributed services are expected to be highly available, as our societies have been growing increasingly dependent on them. The common way to achieve high availability is through the replication of data in multiple service replicas. In this way, the service remains operational in case of failures as clients can be relayed to other working replicas. In distributed systems, the classic technique to implement such fault-tolerant services is called State-Machine Replication (SMR), where a service is defined as a deterministic state-machine and each replica keeps a local copy of the machine. To guarantee that the service remains consistent, replicas coordinate with each other and agree on the order of transitions to be applied to their copies of the state-machine. The replication performed by modern Internet services spans across several geographical locations (geo-replication). This allows for increased availability and low latency, since clients can communicate with the closest geo-graphical replica. Due to their reliance on a leader replica, classical SMR protocols offer limited scalability and availability under this setting. To solve this problem, recent protocols follow instead a leaderless approach, in which each replica is able to make progress using a quorum of its peers. These new leaderless protocols are complex and each one presents an ad-hoc approach to leaderlessness. The first contribution of this thesis is a framework that captures the essence of Leaderless State-Machine Replication (Leaderless SMR) and the formalization of some of its limits. Due to the increasingly sensitive nature of replicated services, leveraging simple benign failures is no longer enough. Recent research is headed towards developing protocols that support arbitrary behavior of some replicas (Byzantine failures) and that also thrive in a geo-replicated environment. An example of this new type of sensitive replicated services that has been the focus of a lot of research are blockchains. Blockchains are powered by Byzantine replication protocols adapted to work over hundreds or even thousands of replicas. When the membership control over such replicas is open, that is, anyone can run a replica, we say the blockchain is permissionless. In the converse case, when the membership is controlled by a set of known entities like companies, we say the blockchain is permissioned. When such Byzantine protocols follow the classic leader-driven approach they suffer from scalability and availability issues, similarly to their non-byzantine counterparts. In the second part of this thesis, we adapt our framework to support Byzantine failures and present the first framework for Byzantine Leaderless SMR. Furthermore, we show that when properly instantiated it allows to sidestep the scalability problems in leader-driven Byzantine SMR protocols for permissioned blockchains
Maurer, Alexandre. "Communication fiable dans les réseaux multi-sauts en présence de fautes byzantines." Thesis, Paris 6, 2014. http://www.theses.fr/2014PA066347/document.
Full textAs modern networks grow larger and larger, they become more likely to fail. Indeed, their nodes can be subject to attacks, failures, memory corruptions... In order to encompass all possible types of failures, we consider the most general model of failure: the Byzantine model, where the failing nodes have an arbitrary (and thus, potentially malicious) behavior. Such failures are extremely dangerous, as one single Byzantine node, if not neutralized, can potentially lie to the entire network. We consider the problem of reliably exchanging information in a multihop network despite such Byzantine failures. Solutions exist but require a dense network, where each node has a large number of neighbors. In this thesis, we propose solutions for sparse networks, such as the grid, where each node has at most 4 neighbors. In a first part, we accept that some correct nodes fail to communicate reliably. In exchange, we propose quantitative solutions that tolerate a large number of Byzantine failures, and significantly outperform previous solutions in sparse networks. In a second part, we propose algorithms that ensure reliable communication between all correct nodes, provided that the Byzantine nodes are sufficiently distant from each other. At last, we generalize existing results to new contexts: dynamic networks, and networks with an unbounded diameter
Tixeuil, Sébastien. "Auto-stabilisation Efficace." Phd thesis, Université Paris Sud - Paris XI, 2000. http://tel.archives-ouvertes.fr/tel-00124843.
Full textNous avons développé le concept de détecteur de défaillances transitoires, des oracles appelés par les processeurs du système, qui indiquent si des défaillances transitoires sont survenues, en un temps constant. Notre implantation permet de classifier les problèmes classiques suivant les ressources spécifiques nécessaires à la détection d'une erreur. Pour les tâches statiques, une suite naturelle a été de montrer qu'une condition sur le code localement exécuté par chaque processeur pouvait être suffisante pour garantir l'auto-stabilisation du système tout entier, indépendamment des hypothèses d'exécution et de la topologie du graphe de communication. Du fait que l'algorithme n'est pas modifié, il est forcément sans surcoût. De manière duale, nous avons développé des outils de synchronisation permettant de construire des algorithmes auto-stabilisants pour des spécifications dynamiques avec un surcoût en mémoire constant, c'est à dire indépendant de la taille du réseau. En outre, l'un des algorithmes présentés est instantanément stabilisant. Enfin, nous avons présenté une technique générale pour réduire systématiquement le coût des communications, en garantissant un délai de retransmission borné, et nous avons donné un cadre général ainsi que des outils d'implantation pour écrire des algorithmes auto-stabilisants dans ce contexte.