Dissertations / Theses: 'Fault resilience'

1

Wilkes, Charles Thomas. "Programming methodologies for resilience and availability." Diss., Georgia Institute of Technology, 1987. http://hdl.handle.net/1853/8308.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Nascimento, Flávia Maristela Santos. "A SIMULATION-BASED FAULT RESILIENCE ANALYSIS FOR REAL-TIME SYSTEMS." Escola Politécnica / Instituto de Matemática, 2009. http://repositorio.ufba.br/ri/handle/ri/21461.

Full text

Abstract:

Submitted by Diogo Barreiros (diogo.barreiros@ufba.br) on 2017-02-17T14:47:00Z No. of bitstreams: 1 flavia maristela santos nascimento.pdf: 1166834 bytes, checksum: 576c7c98a85b5cc824a7869fbb31347e (MD5)
Approved for entry into archive by Vanessa Reis (vanessa.jamile@ufba.br) on 2017-02-17T14:58:14Z (GMT) No. of bitstreams: 1 flavia maristela santos nascimento.pdf: 1166834 bytes, checksum: 576c7c98a85b5cc824a7869fbb31347e (MD5)
Made available in DSpace on 2017-02-17T14:58:14Z (GMT). No. of bitstreams: 1 flavia maristela santos nascimento.pdf: 1166834 bytes, checksum: 576c7c98a85b5cc824a7869fbb31347e (MD5)
Sistemas de tempo real tem sido amplamente utilizados no contexto de sistemas mecatrônicos uma vez que, para controlar entidades do mundo real, ´e necessário considerar tanto seus requisitos lógicos quanto os temporais. Em tais sistemas, mecanismos para prover tolerância a falhas devem ser implementados já que falhas podem implicar em perdas consideráveis. Por exemplo, um erro em um sistema de controle de voo pode incorrer em perda de vidas humanas. Várias abordagens de escalonamento com tolerância a falhas para sistemas de tempo real foram derivadas. Entretanto, a maioria delas restringe o modelo de sistema e/ou falhas de modo particular, ou estão fortemente acopladas ao modelo de recuperação do sistema ou a política de escalonamento. Além disso, não existe uma m´métrica formal que permita comparar as abordagens existentes do ponto de vista da resiliência a falhas. O objetivo principal deste trabalho ´e preencher esta lacuna, fornecendo uma m´métrica de resiliência a falhas para sistemas de tempo real, que seja o mais independente possível dos modelos do sistema e/ou de falhas. Para tanto, uma análise baseada em simulação foi desenvolvida para calcular a resiliência de todas as tarefas de um sistema, através da simulação de intervalos de tempo específicos. Em seguida, t´técnicas de inferência estatística são utilizadas para inferir a resiliência do sistema. Os resultados mostraram que a m´métrica desenvolvida pode ser utilizada para comparar, por exemplo, duas políticas de escalonamento para sistemas de tempo real sob a ´ótica de resiliência a falhas, o que demonstra que a abordagem desenvolvida ´e razoavelmente independente do modelo de sistema.

APA, Harvard, Vancouver, ISO, and other styles

3

Pai, Raikar Siddhesh Prakash Sunita. "Network Fault Resilient MPI for Multi-Rail Infiniband Clusters." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1325270841.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Monge, Solano Ignacio, and Enikő Matók. "Developing for Resilience: Introducing a Chaos Engineering tool." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20808.

Full text

Abstract:

Software complexity continues to accelerate, as new tools, frameworks, and technologiesbecome available. This, in turn, increases its fragility and liability. Despite the amount ofinvestment to test and harden their systems, companies still pay the price of failure. Towithstand this fast-paced development environment and ensure software availability, largescalesystems must be built with resilience in mind. Chaos Engineering is a new practicethat aims to assess some of these challenges. In this thesis, the methodology, requirements,and iterations of the system design and architecture for a chaos engineering tool arepresented. In a matter of only a couple of months and the working hours of two engineers, itwas possible to build a tool that is able to shed light on the attributes that make the targetedsystem resilient as well as the weaknesses in its failure handling mechanisms. This toolgreatly reduces the otherwise manual testing labor and allows software engineering teamsto find potentially costly failures. These results prove the benefits that many companiescould experience in their return of investment by adopting the practice of ChaosEngineering.

APA, Harvard, Vancouver, ISO, and other styles

5

Souto, Laiz. "Data-driven approaches for event detection, fault location, resilience assessment, and enhancements in power systems." Doctoral thesis, Universitat de Girona, 2021. http://hdl.handle.net/10803/671402.

Full text

Abstract:

This thesis presents the study and development of distinct data-driven techniques to support event detection, fault location, and resilience assessment towards enhancements in power systems. It is divided in three main parts as follows. The first part investigates improvements in power system monitoring and event detection methods with focus on dimensionality reduction techniques in wide-area monitoring systems. The second part focuses on contributions to fault location tasks in power distribution networks, relying on information about the network topology and its electrical parameters for short-circuit simulations over a range of scenarios. The third part assesses enhancements in power system resilience to high-impact, lowprobability events associated with extreme weather conditions and human-made attacks, relying on information about the system topology combined with simulations of representative scenarios for impact assessment and mitigation. Overall, the proposed data-driven algorithms contribute to event detection, fault location, and resilience assessment, relying on electrical measurements recorded by intelligent electronic devices, historical data of past events, and representative scenarios, together with information about the network topology, electrical parameters, and operating status. The validation of the algorithms, implemented in MATLAB, is based on computer simulations using network models implemented in OpenDSS and Simulink
Esta tesis presenta el estudio y el desarrollo de distintas técnicas basadas en datos para respaldar las tareas de detección de eventos, localización de fallos y resiliencia hacia mejoras en sistemas de energía eléctrica. Los contenidos se dividen en tres partes principales descritas a continuación. La primera parte investiga mejoras en el monitoreo de sistemas de energía eléctrica y métodos de detección de eventos con enfoque en técnicas de reducción de dimensionalidad en wide-area monitoring systems. La segunda parte se centra en contribuciones a tareas de localización de fallos en redes eléctricas de distribución, basándose en información acerca de la topología de la red y sus parámetros eléctricos para simulaciones de cortocircuito en una variedad de escenarios. La tercera parte evalúa mejoras en la resiliencia de sistemas de energía eléctrica ante eventos de alto impacto y baja probabilidad asociados con condiciones climáticas extremas y ataques provocados por humanos, basándose en información sobre la topología del sistema combinada con simulaciones de escenarios representativos para la evaluación y mitigación del impacto. En general, los algoritmos propuestos basados en datos contribuyen a la detección de eventos, la localización de fallos, y el aumento de la resiliencia de sistemas de energía eléctrica, basándose en mediciones eléctricas registradas por dispositivos electrónicos inteligentes, datos históricos de eventos pasados y escenarios representativos, en conjunto con información acerca de la topología de la red, parámetros eléctricos y estado operativo. La validación de los algoritmos, implementados en MATLAB, se basa en simulaciones computacionales utilizando modelos de red implementados en OpenDSS y Simulink

APA, Harvard, Vancouver, ISO, and other styles

6

Bentria, Dounia. "Combining checkpointing and other resilience mechanisms for exascale systems." Thesis, Lyon, École normale supérieure, 2014. http://www.theses.fr/2014ENSL0971/document.

Full text

Abstract:

Dans cette thèse, nous nous sommes intéressés aux problèmes d'ordonnancement et d'optimisation dans des contextes probabilistes. Les contributions de cette thèse se déclinent en deux parties. La première partie est dédiée à l’optimisation de différents mécanismes de tolérance aux pannes pour les machines de très large échelle qui sont sujettes à une probabilité de pannes. La seconde partie est consacrée à l’optimisation du coût d’exécution des arbres d’opérateurs booléens sur des flux de données.Dans la première partie, nous nous sommes intéressés aux problèmes de résilience pour les machines de future génération dites « exascales » (plateformes pouvant effectuer 1018 opérations par secondes).Dans le premier chapitre, nous présentons l’état de l’art des mécanismes les plus utilisés dans la tolérance aux pannes et des résultats généraux liés à la résilience.Dans le second chapitre, nous étudions un modèle d’évaluation des protocoles de sauvegarde de points de reprise (checkpoints) et de redémarrage. Le modèle proposé est suffisamment générique pour contenir les situations extrêmes: d’un côté le checkpoint coordonné, et de l’autre toute une famille de stratégies non-Coordonnées. Nous avons proposé une analyse détaillée de plusieurs scénarios, incluant certaines des plateformes de calcul existantes les plus puissantes, ainsi que des anticipations sur les futures plateformes exascales.Dans les troisième, quatrième et cinquième chapitres, nous étudions l'utilisation conjointe de différents mécanismes de tolérance aux pannes (réplication, prédiction de pannes et détection d'erreurs silencieuses) avec le mécanisme traditionnel de checkpoints et de redémarrage. Nous avons évalué plusieurs modèles au moyen de simulations. Nos résultats montrent que ces modèles sont bénéfiques pour un ensemble de modèles d'applications dans le cadre des futures plateformes exascales.Dans la seconde partie de la thèse, nous étudions le problème de la minimisation du coût de récupération des données par des applications lors du traitement d’une requête exprimée sous forme d'arbres d'opérateurs booléens appliqués à des prédicats sur des flux de données de senseurs. Le problème est de déterminer l'ordre dans lequel les prédicats doivent être évalués afin de minimiser l'espérance du coût du traitement de la requête. Dans le sixième chapitre, nous présentons l'état de l'art de la seconde partie et dans le septième chapitre, nous étudions le problème pour les requêtes exprimées sous forme normale disjonctive. Nous considérons le cas plus général où chaque flux peut apparaître dans plusieurs prédicats et nous étudions deux modèles, le modèle où chaque prédicat peut accéder à un seul flux et le modèle où chaque prédicat peut accéder à plusieurs flux
In this thesis, we are interested in scheduling and optimization problems in probabilistic contexts. The contributions of this thesis come in two parts. The first part is dedicated to the optimization of different fault-Tolerance mechanisms for very large scale machines that are subject to a probability of failure and the second part is devoted to the optimization of the expected sensor data acquisition cost when evaluating a query expressed as a tree of disjunctive Boolean operators applied to Boolean predicates. In the first chapter, we present the related work of the first part and then we introduce some new general results that are useful for resilience on exascale systems.In the second chapter, we study a unified model for several well-Known checkpoint/restart protocols. The proposed model is generic enough to encompass both extremes of the checkpoint/restart space, from coordinated approaches to a variety of uncoordinated checkpoint strategies. We propose a detailed analysis of several scenarios, including some of the most powerful currently available HPC platforms, as well as anticipated exascale designs.In the third, fourth, and fifth chapters, we study the combination of different fault tolerant mechanisms (replication, fault prediction and detection of silent errors) with the traditional checkpoint/restart mechanism. We evaluated several models using simulations. Our results show that these models are useful for a set of models of applications in the context of future exascale systems.In the second part of the thesis, we study the problem of minimizing the expected sensor data acquisition cost when evaluating a query expressed as a tree of disjunctive Boolean operators applied to Boolean predicates. The problem is to determine the order in which predicates should be evaluated so as to shortcut part of the query evaluation and minimize the expected cost.In the sixth chapter, we present the related work of the second part and in the seventh chapter, we study the problem for queries expressed as a disjunctive normal form. We consider the more general case where each data stream can appear in multiple predicates and we consider two models, the model where each predicate can access a single stream and the model where each predicate can access multiple streams

APA, Harvard, Vancouver, ISO, and other styles

7

Raja, Chandrasekar Raghunath. "Designing Scalable and Efficient I/O Middleware for Fault-Resilient High-Performance Computing Clusters." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1417733721.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Teixeira, André. "Toward Cyber-Secure and Resilient Networked Control Systems." Doctoral thesis, KTH, Reglerteknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-154204.

Full text

Abstract:

Resilience is the ability to maintain acceptable levels of operation in the presence of abnormal conditions. It is an essential property in industrial control systems, which are the backbone of several critical infrastructures. The trend towards using pervasive information technology systems, such as the Internet, results in control systems becoming increasingly vulnerable to cyber threats. Traditional cyber security does not consider the interdependencies between the physical components and the cyber systems. On the other hand, control-theoretic approaches typically deal with independent disturbances and faults, thus they are not tailored to handle cyber threats. Theory and tools to analyze and build control system resilience are, therefore, lacking and in need to be developed. This thesis contributes towards a framework for analyzing and building resilient control systems. First, a conceptual model for networked control systems with malicious adversaries is introduced. In this model, the adversary aims at disrupting the system behavior while remaining undetected by an anomaly detector The adversary is constrained in terms of the available model knowledge, disclosure resources, and disruption capabilities. These resources may correspond to the anomaly detector’s algorithm, sniffers of private data, and spoofers of control commands, respectively. Second, we address security and resilience under the perspective of risk management, where the notion of risk is defined in terms of a threat’s scenario, impact, and likelihood. Quantitative tools to analyze risk are proposed. They take into account both the likelihood and impact of threats. Attack scenarios with high impact are identified using the proposed tools, e.g., zero-dynamics attacks are analyzed in detail. The problem of revealing attacks is also addressed. Their stealthiness is characterized, and how to detect them by modifying the system’s structure is also described. As our third contribution, we propose distributed fault detection and isolation schemes to detect physical and cyber threats on interconnected second-order linear systems. A distributed scheme based on unknown input observers is designed to jointly detect and isolate threats that may occur on the network edges or nodes. Additionally, we propose a distributed scheme based on local models and measurements that is resilient to changes outside the local subsystem. The complexity of the proposed methods is decreased by reducing the number of monitoring nodes and by characterizing the minimum amount of model information and measurements needed to achieve fault detection and isolation. Finally, we tackle the problem of distributed reconfiguration under sensor and actuator faults. In particular, we consider a control system with redundant sensors and actuators cooperating to recover from the removal of individual nodes. The proposed scheme minimizes a quadratic cost while satisfying a model-matching condition, which maintains the nominal closed-loop behavior after faults. Stability of the closed-loop system under the proposed scheme is analyzed.
Ett resilient system har förmågan att återhämta sig efter en kraftig och oväntad störning. Resiliens är en viktig egenskap hos industriella styrsystem som utgör en viktig komponent i många kritiska infrastrukturer, såsom processindustri och elkraftnät. Trenden att använda storskaliga IT-system, såsom Internet, inom styrsystem resulterar i en ökad sårbarhet för cyberhot. Traditionell IT-säkerhet tar inte hänsyn till den speciella koppling mellan fysikaliska komponenter och ITsystem som finns inom styrsystem. Å andra sidan så brukar traditionell reglerteknik fokusera på att hantera naturliga fel och inte cybersårbarheter. Teori och verktyg för resilienta och cybersäkra styrsystem saknas därför och behöver utvecklas. Denna avhandling bidrar till att ta fram ett ramverk för att analysera och konstruera just sådana styrsystem. Först så tar vi fram en representativ abstrakt modell för nätverkade styrsystem som består av fyra komponenter: den fysikaliska processen med sensorer och ställdon, kommunikationsnätet, det digitala styrsystemet och en feldetektor. Sedan införs en konceptuell modell för attacker gentemot det nätverkade styrsystemet. I modellen så beskrivs attacker som försöker undgå att skapa alarm i feldetektorn men ändå stör den fysikaliska processen. Dessutom så utgår modellen ifrån att den som utför attacken har begränsade resurser i fråga om modellkännedom och kommunikationskanaler. Det beskrivna ramverket används sedan för att studera resilens gentemot attackerna genom en riskanalys, där risk definieras utifrån ett hots scenario, konsekvenser och sannolikhet. Kvantitativa metoder för att uppskatta attackernas konsekvenser och sannolikheter tas fram, och speciellt visas hur hot med hög risk kan identifieras och motverkas. Resultaten i avhandlingen illustreras med ett flertal numeriska och praktiska exempel.

QC 20141016

APA, Harvard, Vancouver, ISO, and other styles

9

Zounon, Mawussi. "On numerical resilience in linear algebra." Thesis, Bordeaux, 2015. http://www.theses.fr/2015BORD0038/document.

Full text

Abstract:

Comme la puissance de calcul des systèmes de calcul haute performance continue de croître, en utilisant un grand nombre de cœurs CPU ou d’unités de calcul spécialisées, les applications hautes performances destinées à la résolution des problèmes de très grande échelle sont de plus en plus sujettes à des pannes. En conséquence, la communauté de calcul haute performance a proposé de nombreuses contributions pour concevoir des applications tolérantes aux pannes. Cette étude porte sur une nouvelle classe d’algorithmes numériques de tolérance aux pannes au niveau de l’application qui ne nécessite pas de ressources supplémentaires, à savoir, des unités de calcul ou du temps de calcul additionnel, en l’absence de pannes. En supposant qu’un mécanisme distinct assure la détection des pannes, nous proposons des algorithmes numériques pour extraire des informations pertinentes à partir des données disponibles après une pannes. Après l’extraction de données, les données critiques manquantes sont régénérées grâce à des stratégies d’interpolation pour constituer des informations pertinentes pour redémarrer numériquement l’algorithme. Nous avons conçu ces méthodes appelées techniques d’Interpolation-restart pour des problèmes d’algèbre linéaire numérique tels que la résolution de systèmes linéaires ou des problèmes aux valeurs propres qui sont indispensables dans de nombreux noyaux scientifiques et applications d’ingénierie. La résolution de ces problèmes est souvent la partie dominante; en termes de temps de calcul, des applications scientifiques. Dans le cadre solveurs linéaires du sous-espace de Krylov, les entrées perdues de l’itération sont interpolées en utilisant les entrées disponibles sur les nœuds encore disponibles pour définir une nouvelle estimation de la solution initiale avant de redémarrer la méthode de Krylov. En particulier, nous considérons deux politiques d’interpolation qui préservent les propriétés numériques clés de solveurs linéaires bien connus, à savoir la décroissance monotone de la norme-A de l’erreur du gradient conjugué ou la décroissance monotone de la norme résiduelle de GMRES. Nous avons évalué l’impact du taux de pannes et l’impact de la quantité de données perdues sur la robustesse des stratégies de résilience conçues. Les expériences ont montré que nos stratégies numériques sont robustes même en présence de grandes fréquences de pannes, et de perte de grand volume de données. Dans le but de concevoir des solveurs résilients de résolution de problèmes aux valeurs propres, nous avons modifié les stratégies d’interpolation conçues pour les systèmes linéaires. Nous avons revisité les méthodes itératives de l’état de l’art pour la résolution des problèmes de valeurs propres creux à la lumière des stratégies d’Interpolation-restart. Pour chaque méthode considérée, nous avons adapté les stratégies d’Interpolation-restart pour régénérer autant d’informations spectrale que possible. Afin d’évaluer la performance de nos stratégies numériques, nous avons considéré un solveur parallèle hybride (direct/itérative) pleinement fonctionnel nommé MaPHyS pour la résolution des systèmes linéaires creux, et nous proposons des solutions numériques pour concevoir une version tolérante aux pannes du solveur. Le solveur étant hybride, nous nous concentrons dans cette étude sur l’étape de résolution itérative, qui est souvent l’étape dominante dans la pratique. Les solutions numériques proposées comportent deux volets. A chaque fois que cela est possible, nous exploitons la redondance de données entre les processus du solveur pour effectuer une régénération exacte des données en faisant des copies astucieuses dans les processus. D’autre part, les données perdues qui ne sont plus disponibles sur aucun processus sont régénérées grâce à un mécanisme d’interpolation
As the computational power of high performance computing (HPC) systems continues to increase by using huge number of cores or specialized processing units, HPC applications are increasingly prone to faults. This study covers a new class of numerical fault tolerance algorithms at application level that does not require extra resources, i.e., computational unit or computing time, when no fault occurs. Assuming that a separate mechanism ensures fault detection, we propose numerical algorithms to extract relevant information from available data after a fault. After data extraction, well chosen part of missing data is regenerated through interpolation strategies to constitute meaningful inputs to numerically restart the algorithm. We have designed these methods called Interpolation-restart techniques for numerical linear algebra problems such as the solution of linear systems or eigen-problems that are the inner most numerical kernels in many scientific and engineering applications and also often ones of the most time consuming parts. In the framework of Krylov subspace linear solvers the lost entries of the iterate are interpolated using the available entries on the still alive nodes to define a new initial guess before restarting the Krylov method. In particular, we consider two interpolation policies that preserve key numerical properties of well-known linear solvers, namely the monotony decrease of the A-norm of the error of the conjugate gradient or the residual norm decrease of GMRES. We assess the impact of the fault rate and the amount of lost data on the robustness of the resulting linear solvers.For eigensolvers, we revisited state-of-the-art methods for solving large sparse eigenvalue problems namely the Arnoldi methods, subspace iteration methods and the Jacobi-Davidson method, in the light of Interpolation-restart strategies. For each considered eigensolver, we adapted the Interpolation-restart strategies to regenerate as much spectral information as possible. Through intensive experiments, we illustrate the qualitative numerical behavior of the resulting schemes when the number of faults and the amount of lost data are varied; and we demonstrate that they exhibit a numerical robustness close to that of fault-free calculations. In order to assess the efficiency of our numerical strategies, we have consideredan actual fully-featured parallel sparse hybrid (direct/iterative) linear solver, MaPHyS, and we proposed numerical remedies to design a resilient version of the solver. The solver being hybrid, we focus in this study on the iterative solution step, which is often the dominant step in practice. The numerical remedies we propose are twofold. Whenever possible, we exploit the natural data redundancy between processes from the solver toperform an exact recovery through clever copies over processes. Otherwise, data that has been lost and is not available anymore on any process is recovered through Interpolationrestart strategies. These numerical remedies have been implemented in the MaPHyS parallel solver so that we can assess their efficiency on a large number of processing units (up to 12; 288 CPU cores) for solving large-scale real-life problems

APA, Harvard, Vancouver, ISO, and other styles

10

Rink, Norman Alexander, and Jeronimo Castrillon. "Comprehensive Backend Support for Local Memory Fault Tolerance." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-215785.

Full text

Abstract:

Technological advances drive hardware to ever smaller feature sizes, causing devices to become more vulnerable to transient faults. Applications can be protected against faults by adding error detection and recovery measures in software. This is popularly achieved by applying automatic program transformations. However, transformations applied to program representations at abstraction levels higher than machine instructions are fundamentally incapable of protecting against vulnerabilities that are introduced during compilation. In particular, a large proportion of a program’s memory accesses are introduced by the compiler backend. This report presents a backend that protects these accesses against faults in the memory system. It is demonstrated that the presented backend can detect all single bit flips in memory that would be missed by an error detection scheme that operates on the LLVM intermediate representation of programs. The presented compiler backend is obtained by modifying the LLVM backend for the x86 architecture. On a subset of SPEC CINT2006 the runtime overhead incurred by the backend modifications amounts to 1.50x for the 32-bit processor architecture i386, and 1.13x for the 64-bit architecture x86_64. To achieve comprehensive detection of memory faults, the modified backend implements an adjusted calling convention that leaves library function calls transparent and intact.

APA, Harvard, Vancouver, ISO, and other styles

11

Liu, Jiaqi. "Handling Soft and Hard Errors for Scientific Applications." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1483632126075067.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Jamal, Aygul. "A parallel iterative solver for large sparse linear systems enhanced with randomization and GPU accelerator, and its resilience to soft errors." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS269/document.

Full text

Abstract:

Dans cette thèse de doctorat, nous abordons trois défis auxquels sont confrontés les solveurs d'algèbres linéaires dans la perspective des futurs systèmes exascale: accélérer la convergence en utilisant des techniques innovantes au niveau algorithmique, en profitant des accélérateurs GPU (Graphics Processing Units) pour améliorer le calcul sur plusieurs systèmes, en évaluant l'impact des erreurs due à l'augmentation du parallélisme dans les superordinateurs. Nous nous intéressons à l'étude des méthodes permettant d'accélérer la convergence et le temps d'exécution des solveurs itératifs pour les grands systèmes linéaires creux. Le solveur plus spécifiquement considéré dans ce travail est le “parallel Algebraic Recursive Multilevel Solver (pARMS)” qui est un soldeur parallèle sur mémoire distribuée basé sur les méthodes de sous-espace de Krylov.Tout d'abord, nous proposons d'intégrer une technique de randomisation appelée “Random Butterfly Transformations (RBT)” qui a été proposée avec succès pour éliminer le coût du pivotage dans la résolution des systèmes linéaires denses. Notre objectif est d'appliquer cette technique dans le préconditionneur ARMS de pARMS pour résoudre plus efficacement le dernier système Complément de Schur dans l'application du processus à multi-niveaux récursif. En raison de l'importance considérable du dernier Complément de Schur pour certains problèmes de test, nous proposons également d'utiliser une variante creux de RBT suivie d'un solveur direct creux (SuperLU). Les résultats expérimentaux sur certaines matrices de la collection de Davis montrent une amélioration de la convergence et de la précision par rapport aux implémentations existantes.Ensuite, nous illustrons comment une approche non intrusive peut être appliquée pour implémenter des calculs GPU dans le solveur pARMS, plus particulièrement pour la phase de préconditionnement locale qui représente une partie importante du temps pour la résolution. Nous comparons les solveurs purement CPU avec les solveurs hybrides CPU / GPU sur plusieurs problèmes de test issus d'applications physiques. Les résultats de performance du solveur hybride CPU / GPU utilisant le préconditionnement ARMS combiné avec RBT, ou le préconditionnement ILU(0), montrent un gain de performance jusqu'à 30% sur les problèmes de test considérés dans nos expériences.Enfin, nous étudions l'effet des défaillances logicielles variable sur la convergence de la méthode itérative flexible GMRES (FGMRES) qui est couramment utilisée pour résoudre le système préconditionné dans pARMS. Le problème ciblé dans nos expériences est un problème elliptique PDE sur une grille régulière. Nous considérons deux types de préconditionneurs: une factorisation LU incomplète à double seuil (ILUT) et le préconditionneur ARMS combiné avec randomisation RBT. Nous considérons deux modèle de fautes logicielles différentes où nous perturbons la multiplication du vecteur matriciel et la phase de préconditionnement, et nous comparons leur impact potentiel sur la convergence
In this PhD thesis, we address three challenges faced by linear algebra solvers in the perspective of future exascale systems: accelerating convergence using innovative techniques at the algorithm level, taking advantage of GPU (Graphics Processing Units) accelerators to enhance the performance of computations on hybrid CPU/GPU systems, evaluating the impact of errors in the context of an increasing level of parallelism in supercomputers. We are interested in studying methods that enable us to accelerate convergence and execution time of iterative solvers for large sparse linear systems. The solver specifically considered in this work is the parallel Algebraic Recursive Multilevel Solver (pARMS), which is a distributed-memory parallel solver based on Krylov subspace methods.First we integrate a randomization technique referred to as Random Butterfly Transformations (RBT) that has been successfully applied to remove the cost of pivoting in the solution of dense linear systems. Our objective is to apply this method in the ARMS preconditioner to solve more efficiently the last Schur complement system in the application of the recursive multilevel process in pARMS. The experimental results show an improvement of the convergence and the accuracy. Due to memory concerns for some test problems, we also propose to use a sparse variant of RBT followed by a sparse direct solver (SuperLU), resulting in an improvement of the execution time.Then we explain how a non intrusive approach can be applied to implement GPU computing into the pARMS solver, more especially for the local preconditioning phase that represents a significant part of the time to compute the solution. We compare the CPU-only and hybrid CPU/GPU variant of the solver on several test problems coming from physical applications. The performance results of the hybrid CPU/GPU solver using the ARMS preconditioning combined with RBT, or the ILU(0) preconditioning, show a performance gain of up to 30% on the test problems considered in our experiments.Finally we study the effect of soft fault errors on the convergence of the commonly used flexible GMRES (FGMRES) algorithm which is also used to solve the preconditioned system in pARMS. The test problem in our experiments is an elliptical PDE problem on a regular grid. We consider two types of preconditioners: an incomplete LU factorization with dual threshold (ILUT), and the ARMS preconditioner combined with RBT randomization. We consider two soft fault error modeling approaches where we perturb the matrix-vector multiplication and the application of the preconditioner, and we compare their potential impact on the convergence of the solver

APA, Harvard, Vancouver, ISO, and other styles

13

Stoicescu, Miruna. "Architecting Resilient Computing Systems : a Component-Based Approach." Thesis, Toulouse, INPT, 2013. http://www.theses.fr/2013INPT0120/document.

Full text

Abstract:

L'évolution des systèmes pendant leur vie opérationnelle est incontournable. Les systèmes sûrs de fonctionnement doivent évoluer pour s'adapter à des changements comme la confrontation à de nouveaux types de fautes ou la perte de ressources. L'ajout de cette dimension évolutive à la fiabilité conduit à la notion de résilience informatique. Parmi les différents aspects de la résilience, nous nous concentrons sur l'adaptativité. La sûreté de fonctionnement informatique est basée sur plusieurs moyens, dont la tolérance aux fautes à l'exécution, où l'on attache des mécanismes spécifiques (Fault Tolerance Mechanisms, FTMs) à l'application. A ce titre, l'adaptation des FTMs à l'exécution s'avère un défi pour développer des systèmes résilients. Dans la plupart des travaux de recherche existants, l'adaptation des FTMs à l'exécution est réalisée de manière préprogrammée ou se limite à faire varier quelques paramètres. Tous les FTMs envisageables doivent être connus dès le design du système et déployés et attachés à l'application dès le début. Pourtant, les changements ont des origines variées et, donc, vouloir équiper un système pour le pire scénario est impossible. Selon les observations pendant la vie opérationnelle, de nouveaux FTMs peuvent être développés hors-ligne, mais intégrés pendant l'exécution. On dénote cette capacité comme adaptation agile, par opposition à l'adaptation préprogrammée. Dans cette thèse, nous présentons une approche pour développer des systèmes sûrs de fonctionnement flexibles dont les FTMs peuvent s'adapter à l'exécution de manière agile par des modifications à grain fin pour minimiser l'impact sur l'architecture initiale. D'abord, nous proposons une classification d'un ensemble de FTMs existants basée sur des critères comme le modèle de faute, les caractéristiques de l'application et les ressources nécessaires. Ensuite, nous analysons ces FTMs et extrayons un schéma d'exécution générique identifiant leurs parties communes et leurs points de variabilité. Après, nous démontrons les bénéfices apportés par les outils et les concepts issus du domaine du génie logiciel, comme les intergiciels réflexifs à base de composants, pour développer une librairie de FTMs adaptatifs à grain fin. Nous évaluons l'agilité de l'approche et illustrons son utilité à travers deux exemples d'intégration : premièrement, dans un processus de développement dirigé par le design pour les systèmes ubiquitaires et, deuxièmement, dans un environnement pour le développement d'applications pour des réseaux de capteurs
Evolution during service life is mandatory, particularly for long-lived systems. Dependable systems, which continuously deliver trustworthy services, must evolve to accommodate changes e.g., new fault tolerance requirements or variations in available resources. The addition of this evolutionary dimension to dependability leads to the notion of resilient computing. Among the various aspects of resilience, we focus on adaptivity. Dependability relies on fault tolerant computing at runtime, applications being augmented with fault tolerance mechanisms (FTMs). As such, on-line adaptation of FTMs is a key challenge towards resilience. In related work, on-line adaption of FTMs is most often performed in a preprogrammed manner or consists in tuning some parameters. Besides, FTMs are replaced monolithically. All the envisaged FTMs must be known at design time and deployed from the beginning. However, dynamics occurs along multiple dimensions and developing a system for the worst-case scenario is impossible. According to runtime observations, new FTMs can be developed off-line but integrated on-line. We denote this ability as agile adaption, as opposed to the preprogrammed one. In this thesis, we present an approach for developing flexible fault-tolerant systems in which FTMs can be adapted at runtime in an agile manner through fine-grained modifications for minimizing impact on the initial architecture. We first propose a classification of a set of existing FTMs based on criteria such as fault model, application characteristics and necessary resources. Next, we analyze these FTMs and extract a generic execution scheme which pinpoints the common parts and the variable features between them. Then, we demonstrate the use of state-of-the-art tools and concepts from the field of software engineering, such as component-based software engineering and reflective component-based middleware, for developing a library of fine-grained adaptive FTMs. We evaluate the agility of the approach and illustrate its usability throughout two examples of integration of the library: first, in a design-driven development process for applications in pervasive computing and, second, in a toolkit for developing applications for WSNs

APA, Harvard, Vancouver, ISO, and other styles

14

Excoffon, William. "Résilience des systèmes informatiques adaptatifs : modélisation, analyse et quantification." Phd thesis, Toulouse, INPT, 2018. http://oatao.univ-toulouse.fr/20791/1/Excoffon_20791.pdf.

Full text

Abstract:

On appelle résilient un système capable de conserver ses propriétés de sûreté de fonctionnement en dépit des changements (nouvelles menaces, mise-à-jour,…). Les évolutions rapides des systèmes, y compris des systèmes embarqués, implique des modifications des applications et des configurations des systèmes, en particulier au niveau logiciel. De tels changements peuvent avoir un impact sur la sûreté de fonctionnement et plus précisément sur les hypothèses des mécanismes de tolérance aux fautes. Un système est donc résilient si de pareils changements n’invalident pas les mécanismes de sûreté de fonctionnement, c’est-à-dire, si les mécanismes déjà en place restent cohérents malgré les changements ou dont les incohérences peuvent être rapidement résolues. Nous proposons tout d’abord dans cette thèse un modèle pour les systèmes résilients. Grâce à ce modèle nous pourrons évaluer les capacités d’un ensemble de mécanismes de tolérance aux fautes à assurer les propriétés de sûreté issues des spécifications non fonctionnelles. Cette modélisation nous permettra également de définir un ensemble de mesures afin de quantifier la résilience d’un système. Enfin nous discuterons dans le dernier chapitre de la possibilité d’inclure la résilience comme un des objectifs du processus de développement

APA, Harvard, Vancouver, ISO, and other styles

15

Lauret, Jimmy. "Prévention et détection des interférences inter-aspects : méthode et application à l'aspectisation de la tolérance aux fautes." Phd thesis, Institut National Polytechnique de Toulouse - INPT, 2013. http://tel.archives-ouvertes.fr/tel-01067471.

Full text

Abstract:

La programmation orientée aspects (POA) sépare les différentes préoccupations composant un système informatique pour améliorer la modularité. La POA offre de nombreux bénéfices puisqu'elle permet de séparer le code fonctionnel du code non-fonctionnel améliorant ainsi leur réutilisation et la configurabilitè des systèmes informatiques. La configurabilité est un élément essentiel pour assurer la résilience des systèmes informatiques, puisqu'elle permet de modifier les mécanismes de sûreté de fonctionnement. Cependant le paradigme de programmation orientée aspect introduit de nouveaux défis pour le test. Dans les systèmes de grande taille où plusieurs préoccupations non fonctionnelles cohabitent, une implémentation à l'aide d'aspects de ces préoccupations peut être problématique. Partageant le même flot de données et le même flot de contrôle les aspects implémentant les différentes préoccupations peuvent écrire dans des variables lues par d'autres aspects ou interrompre le flot de contrôle commun aux différents aspects empêchant ainsi l'exécution de certains d'entre eux.Dans cette thèse nous nous intéressons plus spécifiquement aux interférences entre aspects dans le cadre du développement de mécanismes de tolérance aux fautes implémentés sous forme d'aspects. Ces interférences sont dues à une absence de déclaration de précédence entre les aspects ou à une déclaration de précédence erronée. Afin de mieux maîtriser l'assemblage des différents aspects composant un mécanisme de tolérance aux fautes, nous avons développé une méthode alliant l'évitement à la détection des interférences au niveau du code. Le but de l'évitement est d'empêcher l'introduction d'interférences en imposant une déclaration de précédence entre les aspects lors de l'intégration des aspects. La détection permet d'exhiber lors du test les erreurs introduites dans la déclaration des précédences. Ces deux facettes de notre approche sont réalisées grâce à l'utilisation d'une extension d'AspectJ appelée AIRIA. Les constructions d'AIRIA permettent l'instrumentation et donc la détection des interférences entre aspects, avec des facilités de compilation permettant de mettre en oeuvre l'évitement d'interférences. Notre approche est outillée et vise à limiter le temps de déboguage : le testeur peut se concentrer directement sur les points où une interférence se produit. Nous illustrons notre approche sur une étude de cas : un protocole de réplication duplex. Dans ce contexte le protocole est implémenté en utilisant des aspects à grain fin permettant ainsi une meilleure configurabilité de la politique de réplication. Nous montrons que l'assemblage de ces aspects à grain fin donne lieu à des interférences de flot de données et flot de contrôle qui sont détectées par notre approche d'instrumentation. Nous définissons un ensemble d'aspects interférant pour l'exemple, et nous montrons comment notre approche permet la détection d'interférences.

APA, Harvard, Vancouver, ISO, and other styles

16

Psiakis, Rafail. "Performance optimization mechanisms for fault-resilient VLIW processors." Thesis, Rennes 1, 2018. http://www.theses.fr/2018REN1S095/document.

Full text

Abstract:

Les processeurs intégrés dans des domaines critiques exigent une combinaison de fiabilité, de performances et de faible consommation d'énergie. Very Large Instruction Word (VLIW) processeurs améliorent les performances grâce à l'exploitation ILP (Instruction Level Parallelism), tout en maintenant les coûts et la puissance à un niveau bas. L’ILP étant fortement dépendant de l'application, le processeur n'utilise pas toutes ses ressources en permanence et ces ressources peuvent donc être utilisées pour l'exécution d'instructions redondantes. Cette thèse présente une méthodologie d’injection fautes pour processeurs VLIW et trois mécanismes matériels pour traiter les pannes légères, permanentes et à long terme menant à trois contributions.La première contribution présente un schéma d’analyse du facteur de vulnérabilité architecturale et du facteur de vulnérabilité d’instruction pour les processeurs VLIW. Une méthodologie d’injection de fautes au niveau de différentes structures de mémoire est proposée pour extraire les capacités de masquage architecture / instruction du processeur. Un schéma de classification des défaillances de haut niveau est présenté pour catégoriser la sortie du processeur. La deuxième contribution explore les ressources inactives hétérogènes au moment de l'exécution, à l'intérieur et à travers des ensembles d'instructions consécutifs. Pour ce faire, une technique d’ordonnancement des instructions optimisée pour le matériel est appliquée en parallèle avec le pipeline afin de contrôler efficacement la réplication et l’ordonnancement des instructions. Suivant les tendances à la parallélisation croissante, une conception basée sur les clusters est également proposée pour résoudre les problèmes d’évolutivité, tout en maintenant une pénalité surface/énergie raisonnable. La technique proposée accélère la performance de 43,68% avec une surcoût en surface et en énergie de ~10% par rapport aux approches existantes. Les analyses AVF et IVF évaluent la vulnérabilité du processeur avec le mécanisme proposé.La troisième contribution traite des défauts persistants. Un mécanisme matériel est proposé, qui réplique au moment de l'exécution les instructions et les planifie aux emplacements inactifs en tenant compte des contraintes de ressources. Si une ressource devient défaillante, l'approche proposée permet de relier efficacement les instructions d'origine et les instructions répliquées pendant l'exécution. Les premiers résultats de performance d’évaluation montrent un gain de performance jusqu’à 49% sur les techniques existantes.Afin de réduire davantage le surcoût lié aux performances et de prendre en charge l’atténuation des erreurs uniques et multiples sur les transitoires de longue durée (LDT), une quatrième contribution est présentée. Nous proposons un mécanisme matériel qui détecte les défauts toujours actifs pendant l'exécution et réorganise les instructions pour utiliser non seulement les unités fonctionnelles saines, mais également les composants sans défaillance des unités fonctionnelles concernées. Lorsque le défaut disparaît, les composants de l'unité fonctionnelle concernés peuvent être réutilisés. La fenêtre de planification du mécanisme proposé comprend deux ensembles d'instructions pouvant explorer des solutions d'atténuation lors de l'exécution de l'instruction en cours et de l'instruction suivante. Les résultats obtenus sur l'injection de fautes montrent que l'approche proposée peut atténuer un grand nombre de fautes avec des performances, une surface et une surcharge de puissance faibles
Embedded processors in critical domains require a combination of reliability, performance and low energy consumption. Very Long Instruction Word (VLIW) processors provide performance improvements through Instruction Level Parallelism (ILP) exploitation, while keeping cost and power in low levels. Since the ILP is highly application dependent, the processor does not use all its resources constantly and, thus, these resources can be utilized for redundant instruction execution. This thesis presents a fault injection methodology for VLIW processors and three hardware mechanisms to deal with soft, permanent and long-term faults leading to three contributions. The first contribution presents an Architectural Vulnerability Factor (AVF) and Instruction Vulnerability Factor (IVF) analysis schema for VLIW processors. A fault injection methodology at different memory structures is proposed to extract the architectural/instruction masking capabilities of the processor. A high-level failure classification schema is presented to categorize the output of the processor. The second contribution explores heterogeneous idle resources at run-time both inside and across consecutive instruction bundles. To achieve this, a hardware optimized instruction scheduling technique is applied in parallel with the pipeline to efficiently control the replication and the scheduling of the instructions. Following the trends of increasing parallelization, a cluster-based design is also proposed to tackle the issues of scalability, while maintaining a reasonable area/power overhead. The proposed technique achieves a speed-up of 43.68% in performance with a ~10% area and power overhead over existing approaches. AVF and IVF analysis evaluate the vulnerability of the processor with the proposed mechanism.The third contribution deals with persistent faults. A hardware mechanism is proposed which replicates at run-time the instructions and schedules them at the idle slots considering the resource constraints. If a resource becomes faulty, the proposed approach efficiently rebinds both the original and replicated instructions during execution. Early evaluation performance results show up to 49\% performance gain over existing techniques.In order to further decrease the performance overhead and to support single and multiple Long-Duration Transient (LDT) error mitigation a fourth contribution is presented. We propose a hardware mechanism, which detects the faults that are still active during execution and re-schedules the instructions to use not only the healthy function units, but also the fault-free components of the affected function units. When the fault faints, the affected function unit components can be reused. The scheduling window of the proposed mechanism is two instruction bundles being able to explore mitigation solutions in the current and the next instruction execution. The obtained fault injection results show that the proposed approach can mitigate a large number of faults with low performance, area, and power overhead

APA, Harvard, Vancouver, ISO, and other styles

17

Lacerda, Felipe Gomes. "Classical leakage-resilient circuits from quantum fault-tolerant computation." reponame:Repositório Institucional da UnB, 2015. http://repositorio.unb.br/handle/10482/19594.

Full text

Abstract:

Tese (doutorado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2015.
Submitted by Fernanda Percia França (fernandafranca@bce.unb.br) on 2015-12-03T18:53:40Z No. of bitstreams: 1 2015_FelipeGomesLacerda.pdf: 601123 bytes, checksum: 14f5ac6d48a9354291bd06577410685e (MD5)
Approved for entry into archive by Raquel Viana(raquelviana@bce.unb.br) on 2016-02-26T22:02:05Z (GMT) No. of bitstreams: 1 2015_FelipeGomesLacerda.pdf: 601123 bytes, checksum: 14f5ac6d48a9354291bd06577410685e (MD5)
Made available in DSpace on 2016-02-26T22:02:05Z (GMT). No. of bitstreams: 1 2015_FelipeGomesLacerda.pdf: 601123 bytes, checksum: 14f5ac6d48a9354291bd06577410685e (MD5)
Implementações físicas de algoritmos criptográficos vazam informação, o que os torna vulneráveis aos chamados ataques de canal lateral. Atualmente, criptografia é utilizada em uma variedade crescente de cenários, e frequentemente a suposição de que a execução de criptossistemas é fisicamente isolada não é realista. A área de resistência a vazamentos propõe mitigar ataques de canal lateral projetando protocolos que são seguros mesmo se a informação vaza durante a execução. Neste trabalho, estudamos computação resistente a vazamento, que estuda o problema de executar computação universal segura na presença de vazamento. Computação quântica tolerante a falhas se preocupa com o problema de ruído em computadores quânticos. Uma vez que é extremamente difícil isolar sistemas quânticos de ruído, a área de tolerância a falhas propões esquemas para executar computações corretamente mesmo se há algum ruído. Existe uma conexão entre resistência a vazamento e tolerância a falhas. Neste trabalho, mostramos que vazamento em um circuito clássico é uma forma de ruído, quando o circuito é interpretado como um circuito quântico. Posteriormente, provamos que para um modelo de vazamento arbitrário, existe um modelo de ruído correspondente para o qual um circuito que é tolerante a falhas de acordo com um modelo de ruído também é resistente a vazamento de acordo com o modelo de vazamento dado. Também mostramos como utilizar construções para tolerância a falhas para implementar circuitos clássicos que são seguros em modelos de vazamento específicos. Isto é feito estabelecendo critérios para os quais circuitos quânticos podem ser convertidos em circuitos clássicos de certa forma que a propriedade de resistência a vazamentos é preservada. Usando estes critérios, convertemos uma implementação de computação quântica tolerante a falhas em um compilador resistente a vazamentos clássicos, isto é, um esquema que compila um circuito arbitrário em um circuito de mesma funcionalidade que é resistente a vazamentos. ______________________________________________________________________________________________ ABSTRACT
Physical implementations of cryptographic algorithms leak information, which makes them vulnerable to so-called side-channel attacks. Cryptography is now used in an ever-increasing variety of scenarios, and the assumption that the execution of cryptosystems is physically insulated is often not realistic. The field of leakage resilience proposes to mitigate side-channel attacks by designing protocols that are secure even if information leaks during execution. In this work, we study leakage-resilient computation, which concerns the problem of performing secure universal computation in the presence of leakage. Fault-tolerant quantum computation is concerned with the problem of noise in quantum computers. Since it is very hard to insulate quantum systems from noise, fault tolerance proposes schemes for performing computations correctly even if some noise is present. It turns out that there exists a connection between leakage resilience and fault tolerance. In this work, we show that leakage in a classical circuit is a form of noise, when the circuit is interpreted as quantum. We then prove that for an arbitrary leakage model, there exists a corresponding noise model in which a circuit that is fault-tolerant against the noise model is also resilient against the given leakage model. We also show how to use constructions for fault tolerance to implement classical circuits that are secure in specific leakage models. This is done by establishing criteria in which quantum circuits can be converted into classical circuits in such a way that the leakage resilience property is preserved. Using these criteria, we convert an implementation of universal fault-tolerant quantum computation into a classical leakageresilient compiler, i.e., a scheme that compiles an arbitrary circuit into a circuit of the same functionality that is leakage-resilient.

APA, Harvard, Vancouver, ISO, and other styles

18

Butler, Bryan P. (Bryan Philip). "A fault-tolerant shared memory system architecture for a Byzantine resilient computer." Thesis, Massachusetts Institute of Technology, 1989. http://hdl.handle.net/1721.1/13360.

Full text

Abstract:

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1989.
Includes bibliographical references (leaves 145-147).
by Bryan P. Butler.
M.S.

APA, Harvard, Vancouver, ISO, and other styles

19

Abbaspour, Ali Reza. "Active Fault-Tolerant Control Design for Nonlinear Systems." FIU Digital Commons, 2018. https://digitalcommons.fiu.edu/etd/3917.

Full text

Abstract:

Faults and failures in system components are the two main reasons for the instability and the degradation in control performance. In recent decades, fault-tolerant control (FTC) approaches were introduced to improve the resiliency of the control system against faults and failures. In general, FTC techniques are classified into two major groups: passive and active. Passive FTC systems do not rely on the fault information to control the system and are closely related to the robust control techniques while an active FTC system performs based on the information received from the fault detection and isolation (FDI) system, and the fault problem will be tackled more intelligently without affecting other parts of the system. This dissertation technically reviews fault and failure causes in control systems and finds solutions to compensate for their effects. Recent achievements in FDI approaches, and active and passive FTC designs are investigated. Thorough comparisons of several different aspects are conducted to understand the advantages and disadvantages of different FTC techniques to motivate researchers to further developing FTC, and FDI approaches. Then, a novel active FTC system framework based on online FDI is presented which has significant advantages in comparison with other state of the art FTC strategies. To design the proposed active FTC, a new FDI approach is introduced which uses the artificial neural network (ANN) and a model based observer to detect and isolate faults and failures in sensors and actuators. In addition, the extended Kalman filter (EKF) is introduced to tune ANN weights and improve the ANN performance. Then, the FDI signal combined with a nonlinear dynamic inversion (NDI) technique is used to compensate for the faults in the actuators and sensors of a nonlinear system. The proposed scheme detects and accommodates faults in the actuators and sensors of the system in real-time without the need of controller reconfiguration. The proposed active FTC approach is used to design a control system for three different applications: Unmanned aerial vehicle (UAV), load frequency control system, and proton exchange membrane fuel cell (PEMFC) system. The performance of the designed controllers are investigated through numerical simulations by comparison with conventional control approaches, and their advantages are demonstrated.

APA, Harvard, Vancouver, ISO, and other styles

20

Biswas, Shuchismita. "Power Grid Partitioning and Monitoring Methods for Improving Resilience." Diss., Virginia Tech, 2021. http://hdl.handle.net/10919/104684.

Full text

Abstract:

This dissertation aims to develop decision-making tools that aid power grid operators in mitigating extreme events. Two distinct areas are focused on: a) improving grid performance after a severe disturbance, and b) enhancing grid monitoring to facilitate timely preventive actions. The first part of the dissertation presents a proactive islanding strategy to split the bulk power transmission system into smaller self-adequate islands in order to arrest the propagation of cascading failures after an event. Heuristic methods are proposed to determine in what sequence should the island boundary lines be disconnected such that there are no operation constraint violations. The idea of optimal partitioning is further extended to the distribution network. A planning problem for determining which parts of the existing distribution grid can be converted to microgrids is formulated. This partitioning formulation addresses safety limits, uncertainties in load and generation, availability of grid-forming units, and topology constraints such as maintaining network radiality. Microgrids help maintain energy supply to critical loads during grid outages, thereby improving resilience. The second part of the dissertation focuses on wide-area monitoring using Phasor Measurement Unit (PMU) data. Strategies for data imputation and prediction exploiting the spatio-temporal correlation in PMU measurements are outlined. A deep-learning-based methodology for identifying the location of temporary power systems faults is also illustrated. As severe weather events become more frequent, and the threats from coordinated cyber intrusions increase, formulating strategies to reduce the impact of such events on the power grid becomes important; and the approaches outlined in this work can find application in this context.
Doctor of Philosophy
The modern power grid faces multiple threats, including extreme-weather events, solar storms, and potential cyber-physical attacks. Towards the larger goal of enhancing power systems resilience, this dissertation develops strategies to mitigate the impact of such extreme events. The proposed schemes broadly aim to- a) improve grid performance in the immediate aftermath of a disruptive event, and b) enhance grid monitoring to identify precursors of impending failures. To improve grid performance after a disruption, we propose a proactive islanding strategy for the bulk power grid, aimed at arresting the propagation of cascading failures. For the distribution network, a mixed-integer linear program is formulated for identifying optimal sub-networks with load and distributed generators that may be retrofitted to operate as self-adequate microgrids, if supply from the bulk power systems is lost. To address the question of enhanced monitoring, we develop model-agnostic, computationally efficient recovery algorithms for archived and streamed data from Phasor Measurement Units (PMU) with data drops and additive noise. PMUs are highly precise sensors that provide high-resolution insight into grid dynamics. We also illustrate an application where PMU data is used to identify the location of temporary line faults.

APA, Harvard, Vancouver, ISO, and other styles

21

Souza, Gisele Pinheiro. "Tuplebiz : um espaço de tuplas distribuido e com suporte a transações resilientes a falhas bizantinas." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2012. http://hdl.handle.net/10183/70239.

Full text

Abstract:

Os modelos de coordenação de comunicação possibilitam a cooperação entre os diversos processos que fazem parte de um sistema distribuído. O modelo de coordenação de espaço de dados compartilhado, o qual é representado pelo espaço de tuplas, permite que a comunicação tenha tanto desacoplamento referencial quanto temporal. Devido essas características, o espaço de tuplas é frequentemente usado em aplicações pervasivas e paralelas. A habilidade de tolerar a falhas é importante para ambos os tipos de aplicações. Para aplicações pervasivas na área médica, uma falha pode custar vidas. Nesse contexto, esse trabalho propõe o Tuplebiz, um espaço de tuplas distribuído que suporta transações em um ambiente sujeito a falhas bizantinas. As falhas bizantinas encapsulam uma variedade de comportamentos faltosos que podem ocorrer no sistema. O Tuplebiz é dividido em partições de dados para facilitar a distribuição entre diferentes servidores. Cada partição garante tolerância a falhas por meio de replicação de máquina de estados. Adicionalmente, o Tuplebiz também provê transações que possuem as propriedades ACID, isto é, as propriedades de atomicidade, consistência, isolamento e durabilidade. O gerente de transações é responsável por garantir o isolamento das transações. Testes de desempenho e injeção de falhas foram realizados. A latência do Tuplebiz sem falhas é aproximadamente 2,8 vezes maior que a latência de um sistema não replicado. Os testes de injeção tiveram como base um framework de testes de injeção de falhas para sistemas tolerantes a falhas bizantinas. Os testes avaliaram os seguintes tipos de falha: mensagens perdidas, atrasos de envio de mensagens, corrupção de mensagens, suspensão do sistema e crash. A latência no caso de falhas foi maior que no caso sem falhas, mas todas as falhas foram suportadas pelo Tuplebiz. Como estudo de caso, é revisada a integração do Tuplebiz com a Guaraná, uma linguagem específica de domínio usada para modelar soluções de integração de sistemas. As tarefas de uma solução de integração na Guaraná são centralizadas atualmente. A proposta de integração prevê a distribuição das tarefas entre diferentes servidores.
The coordination models enable the communication among the process in a distributed system. The shared data model is time and referential decoupled, which is represented by tuple spaces. For this reason, the tuple space is used by parallel and pervasive applications. The fault tolerance is very important for both type of application. For healthcare applications, the fault can cost a life. In this context, this work introduces the Tuplebiz, a distributed tuple space that supports transactions in environment where byzantine faults can occur. Byzantine faults include many types of system faults. The Tuplebiz is spitted in partitions. The main idea behind it is to distribute the tuple space among servers. Each partition guarantees the fault tolerance by using state machine replication. Furthermore, Tuplebiz has transaction support, which follows the ACID properties (atomicity, consistency, isolation, durability). The transaction manager is responsible for maintaining the isolation. Performance and fault injection tests were made in order to evaluate the Tuplebiz. The Tuplebiz latency is approximately 2.8 times bigger than the one for a non replicated system. The injection tests were based on an injection fault framework for byzantine faults. The tests applied were: lost message, delay message, corrupted message, system suspension and crash. The latency was worst on those cases; however the Tuplebiz was able to deal with all of them. Also, a case is presented. This case shows the integration between Tuplebiz and Guaraná, which is a domain specific language, used for designing Enterprise Application Integration applications. The solution integration tasks are centralized nowadays. The integration approach aims to distribute the tasks among servers.

APA, Harvard, Vancouver, ISO, and other styles

22

Decouchant, Jérémie. "Collusions and Privacy in Rational-Resilient Gossip." Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAM034/document.

Full text

Abstract:

Les protocoles de dissémination de contenus randomisés sont une alternative bon marché et pouvant monter en charge aux systèmes centralisés. Cependant, il est bien connu que ces protocoles souffrent en présence de comportements individualistes, i.e., de participants qui cherchent à recevoir un contenu sans contribuer en retour à sa propagation. Alors que le problème des participants égoïstes a été bien étudié dans la littérature, les coalitions de participants égoïstes ont été laissés de côté. De plus, les manières actuelles permettant de limiter ou tolérer ces comportements exigent des noeuds qu'ils enregistrent leurs interactions, et rendent public leur contenu, ce qui peut dévoiler des informations gênantes. De nos jours, il y a consensus autour du besoin de renforcer les possibilités de contrôle des usagers de systèmes informatiques sur leurs données personnelles. Cependant, en l'état de nos connaissances, il n'existe pas de protocole qui évite de divulguer des informations personnelles sur les utilisateurs tout en limitant l'impact des comportements individualistes.Cette thèse apporte deux contributions.Tout d'abord, nous présentons AcTinG, un protocole qui empêche les coalitions de noeuds individualistes dans les systèmes pair-à-pair de dissémination de contenus, tout en garantissant une absence de faux-positifs dans le processus de détection de fautes. Les utilisateurs de AcTinG enregistrent leurs interactions dans des enregistrements sécurisés, et se vérifient les uns les autres grâce à une procédure d'audit non prédictible, mais vérifiable a posteriori. Ce protocole est un équilibre de Nash par construction. Une évaluation de performance montre qu'AcTinG est capable de fournir les messages à tous les noeuds malgré la présence de coalitions, et présente des propriétés de passage à l'échelle similaires aux protocoles classiques de dissémination aléatoire.Ensuite, nous décrivons PAG, le premier protocole qui évite de dévoiler des informations sur les usagers tout en les contrôlant afin d'éviter les comportements égoïstes. PAG se base sur une architecture de surveillance formée par les participants, ainsi que sur des procédures de chiffrement homomorphiques. L'évaluation théorique de ce protocole montre qu'obtenir le détail des interactions des noeuds est difficile, même en cas d'attaques collectives. Nous évaluons ce protocole en terme de protection de l'intimité des interactions et en terme de performance en utilisant un déploiement effectué sur un cluster de machines, ainsi que des simulations qui impliquent jusqu'à un million de participants, et enfin en utilisant des preuves théoriques. Ce protocole a un surcoût en bande-passante inférieur aux protocoles de communications anonymes existants, et est raisonnable en terme de coût cryptographique
Gossip-based content dissemination protocols are a scalable and cheap alternative to centralised content sharing systems. However, it is well known that these protocols suffer from rational nodes, i.e., nodes that aim at downloading the content without contributing their fair share to the system. While the problem of rational nodes that act individually has been well addressed in the literature, textit{colluding} rational nodes is still an open issue. In addition, previous rational-resilient gossip-based solutions require nodes to log their interactions with others, and disclose the content of their logs, which may disclose sensitive information. Nowadays, a consensus exists on the necessity of reinforcing the control of users on their personal information. Nonetheless, to the best of our knowledge no privacy-preserving rational-resilient gossip-based content dissemination system exists.The contributions of this thesis are twofold.First, we present AcTinG, a protocol that prevents rational collusions in gossip-based content dissemination protocols, while guaranteeing zero false positive accusations. AcTing makes nodes maintain secure logs and mutually check each others' correctness thanks to verifiable but non predictable audits. As a consequence of its design, it is shown to be a Nash-equilibrium. A performance evaluation shows that AcTinG is able to deliver all messages despite the presence of colluders, and exhibits similar scalability properties as standard gossip-based dissemination protocols.Second, we describe PAG, the first accountable and privacy-preserving gossip protocol. PAG builds on a monitoring infrastructure, and homomorphic cryptographic procedures to provide privacy to nodes while making sure that nodes forward the content they receive. The theoretical evaluation of PAG shows that breaking the privacy of interactions is difficult, even in presence of a global and active opponent. We assess this protocol both in terms of privacy and performance using a deployment performed on a cluster of machines, simulations involving up to a million of nodes, and theoretical proofs. The bandwidth overhead is much lower than existing anonymous communication protocols, while still being practical in terms of CPU usage

APA, Harvard, Vancouver, ISO, and other styles

23

Calugaru, Vladimir. "Earthquake Resilient Tall Reinforced Concrete Buildings at Near-Fault Sites Using Base Isolation and Rocking Core Walls." Thesis, University of California, Berkeley, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3616424.

Full text

Abstract:

This dissertation pursues three main objectives: (1) to investigate the seismic response of tall reinforced concrete core wall buildings, designed following current building codes, subjected to pulse type near-fault ground motion, with special focus on the relation between the characteristics of the ground motion and the higher-modes of response; (2) to determine the characteristics of a base isolation system that results in nominally elastic response of the superstructure of a tall reinforced concrete core wall building at the maximum considered earthquake level of shaking; and (3) to demonstrate that the seismic performance, cost, and constructability of a base-isolated tall reinforced concrete core wall building can be significantly improved by incorporating a rocking core-wall in the design.

First, this dissertation investigates the seismic response of tall cantilever wall buildings subjected to pulse type ground motion, with special focus on the relation between the characteristics of ground motion and the higher-modes of response. Buildings 10, 20, and 40 stories high were designed such that inelastic deformation was concentrated at a single flexural plastic hinge at their base. Using nonlinear response history analysis, the buildings were subjected to near-fault seismic ground motions as well as simple close-form pulses, which represented distinct pulses within the ground motions. Euler-Bernoulli beam models with lumped mass and lumped plasticity were used to model the buildings.

Next, this dissertation investigates numerically the seismic response of six seismically base-isolated (BI) 20-story reinforced concrete buildings and compares their response to that of a fixed-base (FB) building with a similar structural system above ground. Located in Berkeley, California, 2 km from the Hayward fault, the buildings are designed with a core wall that provides most of the lateral force resistance above ground. For the BI buildings, the following are investigated: two isolation systems (both implemented below a three-story basement), isolation periods equal to 4, 5, and 6 s, and two levels of flexural strength of the wall. The first isolation system combines tension-resistant friction pendulum bearings and nonlinear fluid viscous dampers (NFVDs); the second combines low-friction tension-resistant cross-linear bearings, lead-rubber bearings, and NFVDs.

Finally, this dissertation investigates the seismic response of four 20-story buildings hypothetically located in the San Francisco Bay Area, 0.5 km from the San Andreas fault. One of the four studied buildings is fixed-base (FB), two are base-isolated (BI), and one uses a combination of base isolation and a rocking core wall (BIRW). Above the ground level, a reinforced concrete core wall provides the majority of the lateral force resistance in all four buildings. The FB and BI buildings satisfy requirements of ASCE 7-10. The BI and BIRW buildings use the same isolation system, which combines tension-resistant friction pendulum bearings and nonlinear fluid viscous dampers. The rocking core-wall includes post-tensioning steel, buckling-restrained devices, and at its base is encased in a steel shell to maximize confinement of the concrete core. The total amount of longitudinal steel in the wall of the BIRW building is 0.71 to 0.87 times that used in the BI buildings. Response history two-dimensional analysis is performed, including the vertical components of excitation, for a set of ground motions scaled to the design earthquake and to the maximum considered earthquake (MCE). While the FB building at MCE level of shaking develops inelastic deformations and shear stresses in the wall that may correspond to irreparable damage, the BI and the BIRW buildings experience nominally elastic response of the wall, with floor accelerations and shear forces which are 0.36 to 0.55 times those experienced by the FB building. The response of the four buildings to two historical and two simulated near-fault ground motions is also studied, demonstrating that the BIRW building has the largest deformation capacity at the onset of structural damage.

(Abstract shortened by UMI.)

APA, Harvard, Vancouver, ISO, and other styles

24

Moriam, Sadia [Verfasser], Gerhard [Gutachter] Fettweis, and Andreas [Gutachter] Herkersdorf. "On Fault Resilient Network-on-Chip for Many Core Systems / Sadia Moriam ; Gutachter: Gerhard Fettweis, Andreas Herkersdorf." Dresden : Technische Universität Dresden, 2019. http://d-nb.info/1226899838/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Leipnitz, Marcos Tomazzoli. "Resilient regular expression matching on FPGAs with fast error repair." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2017. http://hdl.handle.net/10183/168788.

Full text

Abstract:

O paradigma Network Function Virtualization (NFV) promete tornar as redes de computadores mais escaláveis e flexíveis, através do desacoplamento das funções de rede de hardware dedicado e fornecedor específico. No entanto, funções de rede computacionalmente intensivas podem ser difíceis de virtualizar sem degradação de desempenho. Neste contexto, Field-Programmable Gate Arrays (FPGAs) têm se mostrado uma boa opção para aceleração por hardware de funções de rede virtuais que requerem alta vazão, sem se desviar do conceito de uma infraestrutura NFV que visa alta flexibilidade. A avaliação de expressões regulares é um mecanismo importante e computacionalmente intensivo, usado para realizar Deep Packet Inpection, que pode ser acelerado por FPGA para atender aos requisitos de desempenho. Esta solução, no entanto, apresenta novos desafios em relação aos requisitos de confiabilidade. Particularmente para FPGAs baseados em SRAM, soft errors na memória de configuração são uma ameaça de confiabilidade significativa. Neste trabalho, apresentamos um mecanismo de tolerância a falhas abrangente para lidar com falhas de configuração na funcionalidade de módulos de avaliação de expressões regulares baseados em FPGA. Além disso, é introduzido um mecanismo de correção de erros que considera o posicionamento desses módulos no FPGA para reduzir o tempo de reparo do sistema, melhorando a confiabilidade e a disponibilidade. Os resultados experimentais mostram que a taxa de falha geral e o tempo de reparo do sistema podem ser reduzidos em 95% e 90%, respectivamente, com custos de área e performance admissíveis.
The Network Function Virtualization (NFV) paradigm promises to make computer networks more scalable and flexible by decoupling the network functions (NFs) from dedicated and vendor-specific hardware. However, network and compute intensive NFs may be difficult to virtualize without performance degradation. In this context, Field-Programmable Gate Arrays (FPGAs) have been shown to be a good option for hardware acceleration of virtual NFs that require high throughput, without deviating from the concept of an NFV infrastructure which aims at high flexibility. Regular expression matching is an important and compute intensive mechanism used to perform Deep Packet Inspection, which can be FPGA-accelerated to meet performance constraints. This solution, however, introduces new challenges regarding dependability requirements. Particularly for SRAM-based FPGAs, soft errors on the configuration memory are a significant dependability threat. In this work we present a comprehensive fault tolerance mechanism to deal with configuration faults on the functionality of FPGA-based regular expression matching engines. Moreover, a placement-aware scrubbing mechanism is introduced to reduce the system repair time, improving the system reliability and availability. Experimental results show that the overall failure rate and the system mean time to repair can be reduced in 95% and 90%, respectively, with manageable area and performance costs.

APA, Harvard, Vancouver, ISO, and other styles

26

Ozturk, Erdinc. "Efficient and tamper-resilient architectures for pairing based cryptography." Worcester, Mass. : Worcester Polytechnic Institute, 2009. http://www.wpi.edu/Pubs/ETD/Available/etd-010409-225223/.

Full text

Abstract:

Dissertation (Ph.D.)--Worcester Polytechnic Institute.
Keywords: Pairing Based Cryptography; Identity Based Cryptography; Tate Pairing; Montgomery Multiplication; Robust Codes; Fault Detection; Tamper-Resilient Architecture. Includes bibliographical references (leaves 97-104).

APA, Harvard, Vancouver, ISO, and other styles

27

Li, Yi Verfasser], Martin [Akademischer Betreuer] Kappas, Heiko [Akademischer Betreuer] Faust, Christoph [Akademischer Betreuer] Dittrich, Daniela [Akademischer Betreuer] Sauer, Renate [Akademischer Betreuer] Bürger-Arndt, and Hans [Akademischer Betreuer] [Ruppert. "Integrated approaches of social-ecological resilience assessment and urban resilience management : Resilience thinking, transformations and implications for sustainable city development in Lianyungang, China / Yi Li. Betreuer: Martin Kappas. Gutachter: Martin Kappas ; Heiko Faust ; Christoph Dittrich ; Daniela Sauer ; Renate Bürger-arndt ; Hans Ruppert." Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2016. http://d-nb.info/1082425575/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Li, Yi Verfasser], Martin [Akademischer Betreuer] Kappas, Heiko [Akademischer Betreuer] Faust, Christoph [Akademischer Betreuer] [Dittrich, Daniela Akademischer Betreuer] Sauer, Renate [Akademischer Betreuer] Bürger-Arndt, and Hans [Akademischer Betreuer] [Ruppert. "Integrated approaches of social-ecological resilience assessment and urban resilience management : Resilience thinking, transformations and implications for sustainable city development in Lianyungang, China / Yi Li. Betreuer: Martin Kappas. Gutachter: Martin Kappas ; Heiko Faust ; Christoph Dittrich ; Daniela Sauer ; Renate Bürger-arndt ; Hans Ruppert." Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2016. http://nbn-resolving.de/urn:nbn:de:gbv:7-11858/00-1735-0000-0028-86BB-7-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Shoker, Ali. "Byzantine fault tolerance from static selection to dynamic switching." Toulouse 3, 2012. http://thesesups.ups-tlse.fr/1924/.

Full text

Abstract:

La Tolérance aux pannes Byzantines (BFT) est de plus en plus crucial avec l'évolution d'applications et en raison de la croissance de l'innovation technologique en informatique. Bien que des dizaines de protocoles BFT aient été introduites dans les années précédentes, leur mise en œuvre ne semble pas satisfaisant. Pour faire face à cette complexité, due à la dependence d'un protocol d'une situation, nous tentons une approche qui permettra de sélectionner un protocole en fonction d'une situation. Ceci nous paraît, en s'inspirant de tout système d'encrage, comme une démarche nécessaire pour aborder la problématique de la BFT. Dans cette thèse, nous introduisons un modèle de sélection ainsi que l'algorithme qui permet de simplifier et d'automatiser le processus d'élection d'un protocole. Ce mécanisme est conçu pour fonctionner selon 3 modes : statique, dynamique et heuristique. Les deux derniers modes, nécessitent l'introduction d'un système réactif, nous ont conduits à présenter un nouveau modèle BFT : Adapt. Il réagit à tout changement et effectue, d'une manière adaptée, la commutation entre les protocoles d'une façon dynamique. Le mode statique permet aux utilisateurs de BFT de choisir un protocole BFT en une seule fois. Ceci est très utile dans les services Web et les " Clouds " où le BFT peut être fournit comme un service inclut dans le contrat (SLA). Ce mode est essentiellement conçu pour les systèmes qui n'ont pas trop d'états fluctuants. Pour ce faire, un processus d'évaluation est en charge de faire correspondre, à priori, les préférences de l'utilisateur aux profils du protocole BFT nommé, en fonction des critères de fiabilité et de performance. Le protocole choisi est celui qui réalise le meilleur score d'évaluation. Le mécanisme est bien automatisé à travers des matrices mathématiques, et produit des sélections qui sont raisonnables. D'autres systèmes peuvent cependant avoir des conditions flottantes, il s'agit de la variation des charges ou de la taille de message qui n'est pas fixe. Dans ce cas, le mode statique ne peut continuer à être efficace et risque de ne pas pouvoir s'adapter aux nouvelles conditions. D'où la nécessité de trouver un moyen permettant de répondre aux nouvelles exigences d'une façon dynamique. Adapt combine un ensemble de protocoles BFT ainsi que leurs mécanismes de commutation pour assurer l'adaptation à l'évolution de l'état du système. Par conséquent, le "Meilleur" protocole est toujours sélectionné selon l'état du système. On obtient ainsi une qualité optimisée de service, i. E. , la fiabilité et la performance. Adapt contrôle l'état du système grâce à ses mécanismes d'événements, et utilise une méthode de "Support Vecor Regrssion" pour conduire aux prédictions en temps réel pour l'exécution des protocoles (par exemple, débit, latence, etc. ). Ceci nous conduit aussi à un mode heuristique. En utilisant des heuristiques prédéfinies, on optimise les préférences de l'utilisateur afin d'améliorer le processus de sélection. L'évaluation de notre approche montre que le choix du "meilleur" protocole est automatisé et proche de la réalité de la même façon que dans le mode statique. En mode dynamique, Adapt permet toujours d'obtenir la performance optimale des protocoles disponibles. L'évaluation démontre, en plus, que la performance globale du système peut être améliorée de manière significative. Explorer d'autres cas qui ne conduisent pas de basculer entre les protocoles. Ceci est rendu possible grâce à la réalisation des prévisions d'une grande precision qui peuvent atteindre plus de 98% dans de nombreux cas. La thèse montre que cette adaptabilité est rendue possible grâce à l'utilisation des heuristiques dans un mode dynamique
Byzantine Fault Tolerance (BFT) is becoming crucial with the revolution of online applications and due to the increasing number of innovations in computer technologies. Although dozens of BFT protocols have been introduced in the previous decade, their adoption by practitioners sounds disappointing. To some extant, this indicates that existing protocols are, perhaps, not yet too convincing or satisfactory. The problem is that researchers are still trying to establish 'the best protocol' using traditional methods, e. G. , through designing new protocols. However, theoretical and experimental analyses demonstrate that it is hard to achieve one-size-fits-all BFT protocols. Indeed, we believe that looking for smarter tac-tics like 'fasten fragile sticks with a rope to achieve a solid stick' is necessary to circumvent the issue. In this thesis, we introduce the first BFT selection model and algorithm that automate and simplify the election process of the 'preferred' BFT protocol among a set of candidate ones. The selection mechanism operates in three modes: Static, Dynamic, and Heuristic. For the two latter modes, we present a novel BFT system, called Adapt, that reacts to any potential changes in the system conditions and switches dynamically between existing BFT protocols, i. E. , seeking adaptation. The Static mode allows BFT users to choose a single BFT protocol only once. This is quite useful in Web Services and Clouds where BFT can be sold as a service (and signed in the SLA contract). This mode is basically designed for systems that do not have too fuctuating states. In this mode, an evaluation process is in charge of matching the user preferences against the profiles of the nominated BFT protocols considering both: reliability, and performance. The elected protocol is the one that achieves the highest evaluation score. The mechanism is well automated via mathematical matrices, and produces selections that are reasonable and close to reality. Some systems, however, may experience fluttering conditions, like variable contention or message payloads. In this case, the static mode will not be e?cient since a chosen protocol might not fit the new conditions. The Dynamic mode solves this issue. Adapt combines a collection of BFT protocols and switches between them, thus, adapting to the changes of the underlying system state. Consequently, the 'preferred' protocol is always polled for each system state. This yields an optimal quality of service, i. E. , reliability and performance. Adapt monitors the system state through its Event System, and uses a Support Vector Regression method to conduct run time predictions for the performance of the protocols (e. G. , throughput, latency, etc). Adapt also operates in a Heuristic mode. Using predefined heuristics, this mode optimizes user preferences to improve the selection process. The evaluation of our approach shows that selecting the 'preferred' protocol is automated and close to reality in the static mode. In the Dynamic mode, Adapt always achieves the optimal performance among available protocols. The evaluation demonstrates that the overall system performance can be improved significantly too. Other cases explore that it is not always worthy to switch between protocols. This is made possible through conducting predictions with high accuracy, that can reach more than 98% in many cases. Finally, the thesis shows that Adapt can be smarter through using heursitics

APA, Harvard, Vancouver, ISO, and other styles

30

Kuentzer, Felipe Augusto. "More than a timing resilient template : a case study on reliability-oriented improvements on blade." Pontif?cia Universidade Cat?lica do Rio Grande do Sul, 2018. http://tede2.pucrs.br/tede2/handle/tede/8093.

Full text

Abstract:

Submitted by PPG Ci?ncia da Computa??o (ppgcc@pucrs.br) on 2018-05-21T13:19:36Z No. of bitstreams: 1 FELIPE_AUGUSTO_KUENTZER_TES.pdf: 3277301 bytes, checksum: 7e77c5eb72299302d091329bde56b953 (MD5)
Approved for entry into archive by Sheila Dias (sheila.dias@pucrs.br) on 2018-06-01T12:13:22Z (GMT) No. of bitstreams: 1 FELIPE_AUGUSTO_KUENTZER_TES.pdf: 3277301 bytes, checksum: 7e77c5eb72299302d091329bde56b953 (MD5)
Made available in DSpace on 2018-06-01T12:33:57Z (GMT). No. of bitstreams: 1 FELIPE_AUGUSTO_KUENTZER_TES.pdf: 3277301 bytes, checksum: 7e77c5eb72299302d091329bde56b953 (MD5) Previous issue date: 2018-03-28
? medida que o projeto de VLSI avan?a para tecnologias ultra submicron, as margens de atraso adicionadas para compensar variabilidades de processo de fabrica??o, temperatura de opera??o e tens?o de alimenta??o, tornam-se uma parte significativa do per?odo de rel?gio em circuitos s?ncronos tradicionais. As arquiteturas resilientes a varia??es de atraso surgiram como uma solu??o promissora para aliviar essas margens de tempo projetadas para o pior caso, melhorando o desempenho do sistema e reduzindo o consumo de energia. Essas arquiteturas incorporam circuitos adicionais para detec??o e recupera??o de viola??es de atraso que podem surgir ao projetar o circuito com margens de tempo menores. Os sistemas ass?ncronos apresentam potencial para melhorar a efici?ncia energ?tica e o desempenho devido ? aus?ncia de um sinal de rel?gio global. Al?m disso, os circuitos ass?ncronos s?o conhecidos por serem robustos a varia??es de processo, tens?o e temperatura. Blade ? um modelo que incorpora as vantagens de projeto ass?ncrono e resilientes a varia??es de atraso. No entanto, o Blade ainda apresenta desafios em rela??o ? sua testabilidade, o que dificulta sua aplica??o comercial ou em larga escala. Embora o projeto visando testabilidade com Scan seja amplamente utilizado na ind?stria, os altos custos de sil?cio associados com o seu uso no Blade podem ser proibitivos. Por outro lado, os circuitos ass?ncronos podem apresentar vantagens para testes funcionais, enquanto o circuito resiliente fornece feedback cont?nuo durante o funcionamento normal do circuito, uma caracter?stica que pode ser aplicada para testes concorrentes. Nesta Tese, a testabilidade do Blade ? avaliada sob uma perspectiva diferente, onde o circuito implementado com o Blade apresenta propriedades de confiabilidade que podem ser exploradas para testes. Inicialmente, um m?todo de classifica??o de falhas que relaciona padr?es comportamentais com falhas estruturais dentro da l?gica de detec??o de erro e uma nova implementa??o orientada para teste desse m?dulo de detec??o s?o propostos. A parte de controle ? analisada para falhas internas, e um novo projeto ? proposto, onde o teste ? melhorado e o circuito pode ser otimizado pelo fluxo de projeto. Um m?todo original de medi??o de tempo das linhas de atraso tamb?m ? abordado. Finalmente, o teste de falhas de atrasos em caminhos cr?ticos do caminho de dados ? explorado como uma consequ?ncia natural de um circuito implementado com Blade, onde o monitoramento cont?nuo para detec??o de viola??es de atraso fornece a informa??o necess?ria para a detec??o concorrente de viola??es que extrapolam a capacidade de recupera??o do circuito resiliente. A integra??o de todas as contribui??es fornece uma cobertura de falha satisfat?ria para um custo de ?rea que, para os circuitos avaliados nesta Tese, pode variar de 4,24% a 6,87%, enquanto que a abordagem Scan para os mesmos circuitos apresenta custo que varia de 50,19% a 112,70% em ?rea, respectivamente. As contribui??es desta Tese demonstraram que, com algumas melhorias na arquitetura do Blade, ? poss?vel expandir sua confiabilidade para al?m de um sistema de toler?ncia a viola??es de atraso no caminho de dados, e tamb?m um avan?o para teste de falhas (inclusive falhas online) de todo o circuito, bem como melhorar seu rendimento, e lidar com quest?es de envelhecimento.
As the VLSI design moves into ultra-deep-submicron technologies, timing margins added due to variabilities in the manufacturing process, operation temperature and supply voltage become a significant part of the clock period in traditional synchronous circuits. Timing resilient architectures emerged as a promising solution to alleviate these worst-case timing margins, improving system performance and/or reducing energy consumption. These architectures embed additional circuits for detecting and recovering from timing violations that may arise after designing the circuit with reduced time margins. Asynchronous systems, on the other hand, have a potential to improve energy efficiency and performance due to the absence of a global clock. Moreover, asynchronous circuits are known to be robust to process, voltage and temperature variations. Blade is an asynchronous timing resilient template that leverages the advantages of both asynchronous and timing resilient techniques. However, Blade still presents challenges regarding its testability, which hinders its commercial or large-scale application. Although the design for testability with scan chains is widely applied in the industry, the high silicon costs associated with its use in Blade can be prohibitive. Asynchronous circuits can also present advantages for functional testing, and the timing resilient characteristic provides continuous feedback during normal circuit operation, which can be applied for concurrent testing. In this Thesis, Blade?s testability is evaluated from a different perspective, where circuits implemented with Blade present reliability properties that can be explored for stuck-at and delay faults testing. Initially, a fault classification method that relates behavioral patterns with structural faults inside the error detection logic and a new test-driven implementation of this detection module are proposed. The control part is analyzed for internal faults, and a new design is proposed, where the test coverage is improved and the circuit can be further optimized by the design flow. An original method for time measuring delay lines is also addressed. Finally, delay fault testing of critical paths in the data path is explored as a natural consequence of a Blade circuit, where the continuous monitoring for detecting timing violations provide the necessary feedback for online detection of these delay faults. The integration of all the contributions provides a satisfactory fault coverage for an area overhead that, for the evaluated circuits in this thesis, can vary from 4.24% to 6.87%, while the scan approach for the same circuits implies an area overhead varying from 50.19% to 112.70%, respectively. The contributions of this Thesis demonstrated that with a few improvements in the Blade architecture it is possible to expand its reliability beyond a timing resilient system to delay violations in the data path, but also advances for fault testing (including online faults) of the entire circuit, yield, and aging.

APA, Harvard, Vancouver, ISO, and other styles

31

Cunha, Hugo Assis. "An architecture to resilient and highly available identity providers based on OpenID standard." Universidade Federal do Amazonas, 2014. http://tede.ufam.edu.br/handle/handle/4431.

Full text

Abstract:

Submitted by Lúcia Brandão (lucia.elaine@live.com) on 2015-07-14T15:58:20Z No. of bitstreams: 1 Dissertação - Hugo Assis Cunha.pdf: 4753834 bytes, checksum: 4304c038b5fb3c322af4b88ba5d58195 (MD5)
Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-07-20T14:08:11Z (GMT) No. of bitstreams: 1 Dissertação - Hugo Assis Cunha.pdf: 4753834 bytes, checksum: 4304c038b5fb3c322af4b88ba5d58195 (MD5)
Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-07-20T14:12:26Z (GMT) No. of bitstreams: 1 Dissertação - Hugo Assis Cunha.pdf: 4753834 bytes, checksum: 4304c038b5fb3c322af4b88ba5d58195 (MD5)
Made available in DSpace on 2015-07-20T14:12:26Z (GMT). No. of bitstreams: 1 Dissertação - Hugo Assis Cunha.pdf: 4753834 bytes, checksum: 4304c038b5fb3c322af4b88ba5d58195 (MD5) Previous issue date: 2014-09-26
Não Informada
Quando se trata de sistemas e serviços de autenticação seguros, há duas abordagens principais: a primeira procura estabelecer defesas para todo e qualquer tipo de ataque. Na verdade, a maioria dos serviços atuais utilizam esta abordagem, a qualsabe-sequeéinfactívelefalha. Nossapropostautilizaasegundaabordagem, a qual procura se defender de alguns ataques, porém assume que eventualmente o sistema pode sofrer uma intrusão ou falha e ao invés de tentar evitar, o sistema simplesmente as tolera através de mecanismos inteligentes que permitem manter o sistema atuando de maneira confiável e correta. Este trabalho apresenta uma arquiteturaresilienteparaserviçosdeautenticaçãobaseadosemOpenIDcomuso deprotocolosdetolerânciaafaltaseintrusões, bemcomoumprotótipofuncional da arquitetura. Por meio dos diversos testes realizados foi possível verificar que o sistema apresenta um desempenho melhor que um serviço de autenticação do OpenID padrão, ainda com muito mais resiliência, alta disponibilidade, proteção a dados sensíveis e tolerância a faltas e intrusões. Tudo isso sem perder a compatibilidade com os clientes OpenID atuais.
Secure authentication services and systems typically are based on two main approaches: the first one seeks to defend itself of all kind of attack. Actually, the major current services use this approach, which is known for present failures as well as being completely infeasible. Our proposal uses the second approach, which seeks to defend itself of some specific attacks, and assumes that eventually the system may suffer an intrusion or fault. Hence, the system does not try avoiding the problems, but tolerate them by using intelligent mechanisms which allow the system keep executing in a trustworthy and safe state. This research presents a resilient architecture to authentication services based on OpenID by the use of fault and intrusion tolerance protocols, as well as a functional prototype. Through the several performed tests, it was possible to note that our system presents a better performance than a standard OpenID service, but with additional resilience, high availability, protection of the sensitive data, beyond fault and intrusion tolerance, always keeping the compatibility with the current OpenID clients.

APA, Harvard, Vancouver, ISO, and other styles

32

Araújo, José. "Design, Implementation and Validation of Resource-Aware and Resilient Wireless Networked Control Systems." Doctoral thesis, KTH, Reglerteknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-152535.

Full text

Abstract:

Networked control over wireless networks is of growing importance in many application domains such as industrial control, building automation and transportation systems. Wide deployment however, requires systematic design tools to enable efficient resource usage while guaranteeing close-loop control performance. The control system may be greatly affected by the inherent imperfections and limitations of the wireless medium and malfunction of system components. In this thesis, we make five important contributions that address these issues. In the first contribution, we consider event- and self-triggered control and investigate how to efficiently tune and execute these paradigms for appropriate control performance. Communication strategies for aperiodic control are devised, where we jointly address the selection of medium-access control and scheduling policies. Experimental results show that the best trade-off is obtained by a hybrid scheme, combining event- and self-triggered control together with contention-based and contention-free medium access control. The second contribution proposes an event-based method to select between fast and slow periodic sampling rates. The approach is based on linear quadratic control and the event condition is a quadratic function of the system state. Numerical and experimental results show that this hybrid controller is able to reduce the average sampling rate in comparison to a traditional periodic controller, while achieving the same closed-loop control performance. In the third contribution, we develop compensation methods for out-of-order communications and time-varying delays using a game-theoretic minimax control framework. We devise a linear temporal coding strategy where the sensor combines the current and previous measurements into a single packet to be transmitted. An experimental evaluation is performed in a multi-hop networked control scenario with a routing layer vulnerability exploited by a malicious application. The experimental and numerical results show the advantages of the proposed compensation schemes. The fourth contribution proposes a distributed reconfiguration method for sensor and actuator networks. We consider systems where sensors and actuators cooperate to recover from faults. Reconfiguration is performed to achieve model-matching, while minimizing the steady-state estimation error covariance and a linear quadratic control cost. The reconfiguration scheme is implemented in a room heating testbed, and experimental results demonstrate the method's ability to automatically reconfigure the faulty system in a distributed and fast manner. The final contribution is a co-simulator, which combines the control system simulator Simulink with the wireless network simulator COOJA. The co-simulator integrates physical plant dynamics with realistic wireless network models and the actual embedded software running on the networked devices. Hence, it allows for the validation of the complete wireless networked control system, including the study of the interactions between software and hardware components.

QC 20140929

APA, Harvard, Vancouver, ISO, and other styles

33

Quistrebert, Yohann. "Pour un statut fondateur de la victime psychologique en droit de la responsabilité civile." Thesis, Rennes 1, 2018. http://www.theses.fr/2018REN1G001.

Full text

Abstract:

Le retentissement psychologique d’événements sources de responsabilité, quels qu’ils soient – acte de terrorisme, perte d’un être cher, harcèlement moral… – est spécifique du fait de ses caractères protéiforme et invisible. Tout d’abord, le premier d’entre eux tient au fait qu’en matière psychologique tant les atteintes que les souffrances en résultant sont diverses. Ainsi, d’un point de vue lésionnel, certains événements vont s’avérer plus traumatisants que d’autres, principalement ceux au cours desquels le sujet a été confronté à sa propre mort. Concernant la souffrance, un sujet peut tout aussi bien souffrir émotionnellement d’une altération de sa propre intégrité – par exemple physique avec le diagnostic d’une pathologie grave – que d’un tort affectant celle d’un proche (e.g. décès, handicap). Un retentissement qualifié d’invisible ensuite, puisqu’il apparaît bien plus aisé d’identifier une atteinte à l’intégrité physique qu’une atteinte à l’intégrité psychique. De plus, certaines atteintes psychologiques sont totalement insaisissables en raison de leur caractère éminemment diffus. L’objet de cette démonstration est donc de savoir comment le droit de la responsabilité civile va appréhender la victime de ce retentissement psychologique. Sa prise en charge ne pourra être que particulière du fait de l’interaction inévitable entre les sphères juridique et psychologique.Afin de le découvrir sera proposée, dans un premier temps, une conceptualisation de la victime psychologique se fondant sur la réalité psychopathologique. Deux grandes distinctions nourrissent cette réflexion. L’une est de nature juridique ; il s’agit de la distinction du dommage et du préjudice. L’autre est d’origine psychopathologique ; elle oppose le choc émotionnel au traumatisme psychique. Leur entrecroisement permettra d’élaborer différents cas de manifestation de la souffrance psychologique et de dessiner les contours de la qualité de victime. Dans un second temps, au titre de l’indemnisation de la victime psychologique, tant l’appréciation que l’évaluation de ses préjudices seront examinées. Les répercussions du traumatisme psychique voire du choc émotionnel vont parfois être si importantes que l’indemnisation ne pourra se cantonner à la seule souffrance éprouvée. Des conséquences de nature différente, par exemple patrimoniales, devront être prises en considération. À cette fin, une typologie des préjudices de la victime sous analyse mérite d’être mise en place. Des règles d’indemnisation distinctes seront érigées en fonction du préjudice subi. Un préjudice présumé, notamment à partir d’un dommage, ne pourra logiquement être compensé de la même façon que des préjudices non présumables, c’est-à-dire soumis à expertise. En somme, le système d’indemnisation à instaurer se devra d’être en phase avec le système de révélation de la souffrance qui aura été précédemment établi.Ainsi, cette étude se propose de construire un réel statut fondateur de la victime psychologique. Une fois cette notion cardinale intégralement conceptualisée, un régime d’indemnisation s’en inférant sera rationnellement avancé
The psychological impact of the events, which are the source of responsibility, be they acts of terrorism, loss of a loved one, psychological harassment, is specific to characteristics both protean and invisible. The first among them is due to the fact that in psychological matter injuries and the resulting suffering are both varied. As such, from the injury point of view, certain events will prove to be more traumatizing than others. Principally those during which the subject has been faced with his own death. Concerning suffering, a subject can as well emotionally suffer a change in his own integrity – for example the physical one with a diagnosis of a serious illness – that of a sort damage which affects that of a loved one (e.g. death or handicap). Then, the impact is considered invisible. It appears much more simple indeed, to identify harm to physical integrity as a harm to psychic integrity. More so, certain psychological harms are totally imperceptible by reason of their eminently diffuse characteristic. The object of this demonstration is therefore to know how civil liability law will comprehend the victim of such a psychological impact. Its comprehension will be particular given the inevitable interaction between the judicial and psychological spheres.In order to better understand this, we will first propose a conceptualization of the psychological victim that blends into psychopathological reality. Two major distinctions feed this thought. One is legal nature, which relates to the distinction between prejudice and harm. The other is psychopathological in nature which opposes emotional shock and psychic trauma. Their intertwining allows us to elaborate different cases of manifestation of psychological suffering and define the contours of the qualities of the victim. Secondly, regarding compensation for a psychological victim, both the appreciation and the evaluation of these prejudices will be examined. The repercussions of psychic trauma, or even emotional shock can sometimes be so grave that compensation cannot restrict itself only to the experienced suffering. Consequences of different natures, for example patrimonial ones, must be taken into consideration. To this end, a division of the prejudices of the psychological victim should be put in place. Distinct rules of compensation will be established based on the prejudice endured. A prejudice presumed, originating notably from a harm, cannot logically be compensated in the same fashion as non-presumable prejudices that require a forensic assessment. In short, the system of compensation must be in phase with the system of disclosure of suffering that has been previously established. As a result, this study proposes to construct a true founding status of a psychological victim. Once this principal notion has been completely conceptualized, we can use it to create a rational compensation scheme

APA, Harvard, Vancouver, ISO, and other styles

34

Shehaj, Marinela. "Robust dimensioning of wireless optical networks with multiple partial link failures." Thesis, Compiègne, 2020. http://www.theses.fr/2020COMP2540.

Full text

Abstract:

Cette thèse résume le travail que j’ai effectué dans le domaine de l'optimisation des réseaux optiques sans fil. Plus spécifiquement, l'objectif principal de ce travail est de proposer des algorithmes efficaces de dimensionnement de réseau pour assurer la satisfaction du trafic dans un réseau qui subit des pannes partielles de liens (par exemple lorsque certains liens et/ou nœuds sont opérationnels avec une capacité réduite) causés principalement par les conditions météorologiques. Le critère principal pour déterminer l'efficacité des algorithmes proposés est le coût de dimensionnement du réseau tout en maintenant la satisfaction du trafic à des niveaux élevés. Les domaines d'application principale que nous avons à l'esprit sont les réseaux qui utilisent le Free Space Optics (FSO) - une technologie de transmission optique sans fil à large bande où les liens de communication sont assurés au moyen d'un faisceau laser envoyé de l'émetteur au récepteur placé en ligne droite. Les réseaux FSO présentent plusieurs avantages (comme le coût peu élevé, la facilite d'installation, la grande capacité de transmission, etc.), mais le plus grand inconvénient est la vulnérabilité des liens FSO face aux conditions météorologiques, causant une perte substantielle de la puissance de transmission sur le canal optique. Cela rend le problème de dimensionnement du réseau important et difficile. Par conséquent, une approche appropriée du dimensionnement du réseau FSO devrait tenir compte de ces pertes afin que le niveau du trafic transporté soit satisfaisant dans toutes les conditions météorologiques observées. Dans cette thèse, nous avons étudié et développé une telle approche. Dans la première partie de la thèse, nous introduisons un premier problème de dimensionnement, qui a pour objectif d’être le plus général possible et inclue les contraintes les plus importantes. Nous présentons ensuite un algorithme d'optimisation robuste pour ce problème de dimensionnement. Pour construire notre approche, nous commençons par définir un ensemble de défaillances des liens, dit de référence, qui utilise les données météorologiques d’une période donnée pour laquelle le réseau doit être protégé. Ensuite, nous formulons mathématiquement le problème de dimensionnement robuste de réseau qui utilise l'ensemble des pannes de liens ci-dessus. Pourtant, cet ensemble des pannes de référence obtenu contiendra, dans la plupart des cas, un nombre excessif d'états et en même temps ne contiendra pas tous les états qui apparaîtront potentiellement dans le futur. Par conséquent, nous proposons d'approximer cet ensemble par un type spécial d'ensemble de défaillances des liens virtuel (dite ensemble d’incertitude), appelé K-set et paramétré par une valeur entière K, où K est inférieur ou égal au nombre de tous les liens du réseau. Pour un K donné, le K-set contient tous les états du réseau correspondant à toutes les combinaisons de K, ou moins, des liens affectés simultanément. Dans certains cas, il y a des situations où la météo est extrêmement mauvaise et pour lequel nous proposons de construire un modèle de réseau hybride composé de liens FSO et de liens de fibre optique terrestre. La deuxième partie de cette thèse est consacrée à l'amélioration de l’approximation de l’ensemble des pannes de référence via des ensembles de d'incertitude (ou poly-topes d'incertitude). Dans la première partie, nous avons présenté l'idée de K-sets des liens. Maintenant, nous étendons cela en considérant les dégradations simultanées de K nœuds (ce qui signifie la dégradation de tous les liens adjacents)
This thesis summaries the work we have done in optimization of wireless optical networks. More specifically, the main goal of this work is to propose appropriate network dimensioning algorithms for managing the demand and ensuring traffic satisfaction in a network under partial link failures (i.e. when some links and/or nodes are operational with reduced capacity) caused mostly by weather conditions. The primary criterion in deciding the efficiency of the proposed algorithms for the network is the dimensioning cost of the network while keeping the traffic satisfaction at high reasonable levels. The main application area we have in mind are the networks that apply Free Space Optics (FSO) - a well established broadband wireless optical transmission technology where the communication links are provided by means of a laser beam sent from the transmitter to the receiver placed in the line of sight. FSO networks exhibit several important advantages but the biggest disadvantage is vulnerability of the FSO links to weather conditions, causing substantial loss of the transmission power over optical channel. This makes the problem of network dimensioning important, and, as a matter of fact, di cult. Therefore, a proper approach to FSO network dimensioning should take such losses into account so that the level of carried traffic is satisfactory under all observed weather conditions. In this thesis, we firstly describe such an approach. In the first part of the thesis, we introduce a relevant dimensioning problem and present a robust optimization algorithm for such enhanced dimensioning. To construct our approach we start with building a reference failure set which uses a set of weather data records for a given time period against which the network must be protected. Next, a mathematical model formulation of the robust network dimensioning problem uses the above failure set. Yet, such obtained reference set will most likely contain an excessive number of states and at the same time will not contain all the states that will appear in the reality. Hence, we propose to approximate the reference failure set with a special kind of virtual failure set called K-set parameterized by an integer value K, where K is less than or equal to the number of all links in the network. For a given K, the K-set contains all states corresponding to all combinations of K, or less, simultaneously affected links. Sometimes, there are situations where the weather is extremely bad and what we propose is to build a hybrid network model composed of FSO and fiber links. The second part of this thesis is devoted to the improvement of the so-called uncertainty sets (or uncertainty polytopes). In the first part we have introduced the idea of link Ksets. Now we extend this by considering simultaneous degradations of K nodes (meaning degradation of all adjacent links). Finally, inspired by the hitting set problem a new idea was to find a large number of subsets of two or three affected links and to use all possible combinations (composed of 2 or at most 3 of this subsets) to build a new virtual failure set that covers as much as possible the reference failure set that we got from the study of real weather data records. Next, this new failure set will serve as input for our cut-generation xxi algorithm so that we can dimension the network at a minimum cost and for a satisfactory demand realization. A substantial part of the work is devoted to present numerical study for different network instances that illustrates the effectiveness of the proposed approach. A dedicated space is given to the construction of a realistic network instance called Paris Metropolitan Area Network (PMAN)

APA, Harvard, Vancouver, ISO, and other styles

35

Huang, Chung-Hao, and 黃重豪. "Model Checking Collaboration,Competition and Dense Fault Resilience." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/22398042253671313459.

Full text

Abstract:

博士
國立臺灣大學
電子工程學研究所
104
In this thesis, I introduce BSIL(basicstrategy-interactionlogic) and TCL(temporal cooperation logic) which can help in formally define and verify the strategy interaction property of a game. The former, BSIL, is an extension to ATL (alternating-timelogic)for the specification of strategies interaction of players in a system. BSIL is able to describe one system strategy that can cooperate with several strategies of the environment for different requirements. Such properties are important in practice and Is how that such properties are notexpressibleinATL*,GL(gamelogic),andAMC(alternatingμ-calculus). Specifically, BSIL is more expressive than ATL but incomparable with ATL*, GL, and AMC in expressiveness. I show that, for fulfilling a specification in BSIL, a memoryful strategy is necessary. I also show that the model checking complexity of BSIL is PSPACE-complete and is of lower complexity than those of ATL*, GL, AMC, and the general strategy logics. Which may imply that BSIL can be useful in closing the gap between large scale real-world projects and the time consuming game-theoretical results. I then show the feasibility of our techniques by implementation and experiment with our PSPACE model-checking algorithm for BSIL. On the other hand, TCL allows successive definition of strategies for agents and agencies. Like BSIL the expressiveness of TCL is still incompa rable with ATL*, GL and AMC. However, it can describe deterministic Nash equilibria while BSIL cannot. I prove that the model checking complexity of TCL is EXPTIME-complete. TCL enjoys this relatively cheap complexity by disallowing a too close entanglement between cooperation and competition while allowing such entanglement leads to an on-elementary complexity. I have implemented a model checker for TCL and shown the feasibility of model checking in the experimentonsomebenchmarks. Although BSIL and TCL have decent expressive power and benefit from relatively low complexity. PSPACE-complete and EXPTIME-complete is still not good enough for real problem. To adopt the game concept to real world problem, I introduce an algorithm to calculatethe highest degr ee of fault tolerance a system can achieve with the control of a safety critical systems. Which can be reduced to solving a game between a malicious environment and a controller. During the game play, the environment tries to break the system through injecting failures while the controller tries to keep the system safe by making correct decisions. I found a new control objective which offers a better balance between complexity and precision for such systems: we seek systems that are k-resilient. A systemisk-resilient means it is able to rapidly recover from a sequence of small number, up to k, of local faults infinitely many times if the blocks of up to k faults are separated by short recovery periods in which no fault occurs. k-resilience is a simple abstraction from the precise distribution of local faults, but I believe it is much more refined than the traditional objective to maximize the number of local faults. I will provide detail argument of why this is the right level of abstraction for safety critical systems when local faults are few and far between. I have proved, with respect to resilience, the computational complexity of constructing optimal control is low. And a demonstration of the feasibility through an implementation and experimental results will be in following chapters.

APA, Harvard, Vancouver, ISO, and other styles

36

Herrmann, Linda. "Formal Configuration of Fault-Tolerant Systems." 2018. https://tud.qucosa.de/id/qucosa%3A34074.

Full text

Abstract:

Bit flips are known to be a source of strange system behavior, failures, and crashes. They can cause dramatic financial loss, security breaches, or even harm human life. Caused by energized particles arising from, e.g., cosmic rays or heat, they are hardly avoidable. Due to transistor sizes becoming smaller and smaller, modern hardware becomes more and more prone to bit flips. This yields a high scientific interest, and many techniques to make systems more resilient against bit flips are developed. Fault-tolerance techniques are techniques that detect and react to bit flips or their effects. Before using these techniques, they typically need to be configured for the particular system they shall protect, the grade of resilience that shall be achieved, and the environment. State-of-the-art configuration approaches have a high risk of being imprecise, of being affected by undesired side effects, and of yielding questionable resilience measures. In this thesis we encourage the usage of formal methods for resiliency configuration, point out advantages and investigate difficulties. We exemplarily investigate two systems that are equipped with fault-tolerance techniques, and we apply parametric variants of probabilistic model checking to obtain optimal configurations for pre-defined resilience criteria. Probabilistic model checking is an automated formal method that operates on Markov models, i.e., state-based models with probabilistic transitions, where costs or rewards can be assigned to states and transitions. Probabilistic model checking can be used to compute, e.g., the probability of having a failure, the conditional probability of detecting an error in case of bit-flip occurrence, or the overhead that arises due to error detection and correction. Parametric variants of probabilistic model checking allow parameters in the transition probabilities and in the costs and rewards. Instead of computing values for probabilities and overhead, parametric variants compute rational functions. These functions can then be analyzed for optimality. The considered fault-tolerant systems are inspired by the work of project partners. The first system is an inter-process communication protocol as it is used in the Fiasco.OC microkernel. The communication structures provided by the kernel are protected against bit flips by a fault-tolerance technique. The second system is inspired by the redo-based fault-tolerance technique \haft. This technique protects an application against bit flips by partitioning the application's instruction flow into transaction, adding redundance, and redoing single transactions in case of error detection. Driven by these examples, we study challenges when using probabilistic model checking for fault-tolerance configuration and present solutions. We show that small transition probabilities, as they arise in error models, can be a cause of previously known accuracy issues, when using numeric solver in probabilistic model checking. We argue that the use of non-iterative methods is an acceptable alternative. We debate on the usability of the rational functions for finding optimal configurations, and show that for relatively short rational functions the usage of mathematical methods is appropriate. The redo-based fault-tolerance model suffers from the well-known state-explosion problem. We present a new technique, counter-based factorization, that tackles this problem for system models that do not scale because of a counter, as it is the case for this fault-tolerance model. This technique utilizes the chain-like structure that arises from the counter, splits the model into several parts, and computes local characteristics (in terms of rational functions) for these parts. These local characteristics can then be combined to retrieve global resiliency and overhead measures. The rational functions retrieved for the redo-based fault-tolerance model are huge - for small model instances they already have the size of more than one gigabyte. We therefor can not apply precise mathematic methods to these functions. Instead, we use the short, matrix-based representation, that arises from factorization, to point-wise evaluate the functions. Using this approach, we systematically explore the design space of the redo-based fault-tolerance model and retrieve sweet-spot configurations.

APA, Harvard, Vancouver, ISO, and other styles

37

Hamouda, Sara S. "Resilience in high-level parallel programming languages." Phd thesis, 2019. http://hdl.handle.net/1885/164137.

Full text

Abstract:

The consistent trends of increasing core counts and decreasing mean-time-to-failure in supercomputers make supporting task parallelism and resilience a necessity in HPC programming models. Given the complexity of managing multi-threaded distributed execution in the presence of failures, there is a critical need for task-parallel abstractions that simplify writing efficient, modular, and understandable fault-tolerant applications. MPI User-Level Failure Mitigation (MPI-ULFM) is an emerging fault-tolerant specification of MPI. It supports failure detection by returning special error codes and provides new interfaces for failure mitigation. Unfortunately, the unstructured form of failure reporting provided by MPI-ULFM hinders the composability and the clarity of the fault-tolerant programs. The low-level programming model of MPI and the simplistic failure reporting mechanism adopted by MPI-ULFM make MPI-ULFM more suitable as a low-level communication layer for resilient high-level languages, rather than a direct programming model for application development. The asynchronous partitioned global address space model is a high-level programming model designed to improve the productivity of developing large-scale applications. It represents a computation as a global control flow of nested parallel tasks that use global data partitioned among processes. Recent advances in the APGAS model supported control flow recovery by adding failure awareness to the nested parallelism model --- async-finish --- and by providing structured failure reporting through exceptions. Unfortunately, the current implementation of the resilient async-finish model results in a high performance overhead that can restrict the scalability of applications. Moreover, the lack of data resilience support limits the productivity of the model as it shifts the challenges of handling data availability and atomicity under failure to the programmer. In this thesis, we demonstrate that resilient APGAS languages can achieve scalable performance under failure by exploiting fault tolerance features in emerging communication libraries such as MPI-ULFM. We propose multi-resolution resilience, in which high-level resilient constructs are composed from efficient lower-level resilient constructs, as an approach for bridging the gap between the efficiency of user-level fault tolerance and the productivity of system-level fault tolerance. To address the limited resilience efficiency of the async-finish model, we propose 'optimistic finish' --- a message-optimal resilient termination detection protocol for the finish construct. To improve programmer productivity, we augment the APGAS model with resilient data stores that can simplify preserving critical application data in the presence of failure. In addition, we propose the 'transactional finish' construct as a productive mechanism for handling atomic updates on resilient data. Finally, we demonstrate the multi-resolution resilience approach by designing high-level resilient application frameworks based on the async-finish model. We implemented the above enhancements in the X10 language, an embodiment of the APGAS model, and performed empirical evaluation for the performance of resilient X10 using micro-benchmarks and a suite of transactional and non-transactional resilient applications. Concepts of the APGAS model are realized in multiple programming languages, which can benefit from the conceptual and technical contributions of this thesis. The presented empirical evaluation results will aid future comparisons with other resilient programming models.

APA, Harvard, Vancouver, ISO, and other styles

38

Moriam, Sadia. "On Fault Resilient Network-on-Chip for Many Core Systems." Doctoral thesis, 2018. https://tud.qucosa.de/id/qucosa%3A34064.

Full text

Abstract:

Rapid scaling of transistor gate sizes has increased the density of on-chip integration and paved the way for heterogeneous many-core systems-on-chip, significantly improving the speed of on-chip processing. The design of the interconnection network of these complex systems is a challenging one and the network-on-chip (NoC) is now the accepted scalable and bandwidth efficient interconnect for multi-processor systems on-chip (MPSoCs). However, the performance enhancements of technology scaling come at the cost of reliability as on-chip components particularly the network-on-chip become increasingly prone to faults. In this thesis, we focus on approaches to deal with the errors caused by such faults. The results of these approaches are obtained not only via time-consuming cycle-accurate simulations but also by analytical approaches, allowing for faster and accurate evaluations, especially for larger networks. Redundancy is the general approach to deal with faults, the mode of which varies according to the type of fault. For the NoC, there exists a classification of faults into transient, intermittent and permanent faults. Transient faults appear randomly for a few cycles and may be caused by the radiation of particles. Intermittent faults are similar to transient faults, however, differing in the fact that they occur repeatedly at the same location, eventually leading to a permanent fault. Permanent faults by definition are caused by wires and transistors being permanently short or open. Generally, spatial redundancy or the use of redundant components is used for dealing with permanent faults. Temporal redundancy deals with failures by re-execution or by retransmission of data while information redundancy adds redundant information to the data packets allowing for error detection and correction. Temporal and information redundancy methods are useful when dealing with transient and intermittent faults. In this dissertation, we begin with permanent faults in NoC in the form of faulty links and routers. Our approach for spatial redundancy adds redundant links in the diagonal direction to the standard rectangular mesh topology resulting in the hexagonal and octagonal NoCs. In addition to redundant links, adaptive routing must be used to bypass faulty components. We develop novel fault-tolerant deadlock-free adaptive routing algorithms for these topologies based on the turn model without the use of virtual channels. Our results show that the hexagonal and octagonal NoCs can tolerate all 2-router and 3-router faults, respectively, while the mesh has been shown to tolerate all 1-router faults. To simplify the restricted-turn selection process for achieving deadlock freedom, we devised an approach based on the channel dependency matrix instead of the state-of-the-art Duato's method of observing the channel dependency graph for cycles. The approach is general and can be used for the turn selection process for any regular topology. We further use algebraic manipulations of the channel dependency matrix to analytically assess the fault resilience of the adaptive routing algorithms when affected by permanent faults. We present and validate this method for the 2D mesh and hexagonal NoC topologies achieving very high accuracy with a maximum error of 1%. The approach is very general and allows for faster evaluations as compared to the generally used cycle-accurate simulations. In comparison, existing works usually assume a limited number of faults to be able to analytically assess the network reliability. We apply the approach to evaluate the fault resilience of larger NoCs demonstrating the usefulness of the approach especially compared to cycle-accurate simulations. Finally, we concentrate on temporal and information redundancy techniques to deal with transient and intermittent faults in the router resulting in the dropping and hence loss of packets. Temporal redundancy is applied in the form of ARQ and retransmission of lost packets. Information redundancy is applied by the generation and transmission of redundant linear combinations of packets known as random linear network coding. We develop an analytic model for flexible evaluation of these approaches to determine the network performance parameters such as residual error rates and increased network load. The analytic model allows to evaluate larger NoCs and different topologies and to investigate the advantage of network coding compared to uncoded transmissions. We further extend the work with a small insight to the problem of secure communication over the NoC. Assuming large heterogeneous MPSoCs with components from third parties, the communication is subject to active attacks in the form of packet modification and drops in the NoC routers. Devising approaches to resolve these issues, we again formulate analytic models for their flexible and accurate evaluations, with a maximum estimation error of 7%.

APA, Harvard, Vancouver, ISO, and other styles

39

Amaro, Luís Alberto Pires. "Resilient Artificial Neural Networks." Master's thesis, 2020. http://hdl.handle.net/10316/92519.

Full text

Abstract:

Dissertação de Mestrado em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia
Diariamente estamos dependentes de Redes Neurais Artificiais para resolver problemas complexos. Estas estão persentes na tecnologia que é crucial para a nossa sociedade no presente, tal como deteção de fraude de cartão de crédito, dispositivos médicos e estarão presentes em tecnologia que será crucial para a nossa sociedade no futuro, como por exemplo, veículos autônomos. Redes Neurais Artificiais são parte de Sistemas Críticos de Segurança, sistemas esses que têm várias vidas humanas que dependem deles. Um sistema é considerado crítico de segurança se em casa de falha levar à perda de vidas ou causar danos substanciais tanto em propriedades como no ambiente. Atualmente, há um grande número de Sistemas Críticos de Segurança, que dependem de Redes Neurais Artificiais. Por este motivo, é fundamental garantir que estas tenham um nível de robustez/resiliência adequado ao facto de existirem vidas humanas que dependem dos resultados produzidos por estes sistemas. Neste trabalho, temos dois objetivos principais. O primeiro é estudar o impacto que a técnica de Dropout tem na resiliência/robustez das Redes Neurais Artificiais. O segundo é desenvolver uma nova técnica e estudar o impacto que esta tem na resiliência/robustez das Redes Neurais Artificiais. A nova técnica é chamada de Stimulated Dropout. De forma a atingir os objetivos propostos, foram treinadas várias redes neurais com diferentes probabilidades de Dropout e Stimulated Dropout. As redes foram treinadas e testadas usando duas base de dados diferentes. Durante os testes das redes neurais iremos realizar bitflips na memória, e de forma a entender o impacto de ambas as técnicas na resiliência das Redes Neurais Artificiais, os resultados foram analisados usando três parâmetros, o número de Silent Data Corruptions (SDCs), accuracy e o tempo de treino. Os resultados mostram que o número de SDCs vai decrescer em redes treinadas com Dropout e Stimulated Dropout. Na base de dados Mnist, quando temos uma rede treinada sem nenhuma das técnicas, temos uma percentagem de SDCs de 6.77%, enquanto que com uma probabilidade de dropout de 80%, a percentagem é de 3.76%, cerca de 45% a menos. No caso do Stimulated Dropout, a menor percentagem de SDCs ocore quando temos uma probabilidade de 20%, onde temos uma percentagem de SDCs de 5.15%, cerca de 24% a menos. Quanto a base de dados Fashion Mnist, quando temos uma rede treinada sem nenhuma das técnicas, temos umas percentagem de SDCs de 7.93%, enquanto que com uma probabilidade de 80%, temos umas percentagem de 4.8%, cerca de 46% menos. No caso do Stimulated Dropout, a menor percentagem de SDCs ocorre quanto temos uma probabilidade de 50%, onde temos uma percentagem de SDCs de 4.22%, cerca de 47% menos .No caso do Dropout, observamos que existe um tradeoff entre a accuracy e o número de SDCs. Isto porque, o menor número de SDCs em ambas as bases de dados acontece quando temos uma probabilidade de dropoutde 80%, sendo que aaccuracy decresce à medida que a probabilidade de dropout aumenta. Relativamente ao Stimulated Dropout, observamos um tradeoff entre o tempo de treino e o número de SDCs. Isto porque, o tempo de treino das redes treinadas com diferentes probabilidade de stimulated dropout é consideravelmente maior do que o tempo de treino da rede treinada sem nenhuma das técnicas.
Artificial Neural Networks (ANNs) are used daily to help humans solve complex problemsin real-life situations. They are present in technology that is key to our society in the present, such as Credit card fraud detection, medical devices, and they will be presentin technology that will be key to our society in the future, such as Autonomous Vehicles (AV). A system is considered safety-critical if in case of failure leads to a loss of life or substantial damage both in property or environment. There are currently a large number of Safety-Critical Systems (SCSs) that rely on ANNs. For this reason it is crucial to ensure that the ANNs have a level of robustness/resilience adequate to the fact that there are human lives that depend on the results produced by them. In this work we have two main objectives. The first one is to study the impact that the Dropout technique has on the resilience/robustness of ANNs. The second one is to develop a new technique and study its impact on the resilience/robustness of ANNs. The new technique is named Stimulated Dropout. To achieve these goals we are going to trainmultiple neural networks with different dropout and stimulated dropout probabilities. The networks were trained and tested using two different databases. During the tests of these neural networks we will perform memory bitflips, and to understand the impact of both techniques in the resilience of ANNs, the results were analyzed using three parameters, the number of SDCs, accuracy, and training time. Our results show that adding Dropout and Stimulated Dropout to the the neural networks will decrease the number of SDCs, which means that both techniques have a positive impact on the resilience/robustness of ANNs. For the Mnist database, when we have a network trained without any of the techniques, we have a percentage of SDCs of 6.77%, whereas with a dropout probability of 80% the percentage of SDCs will be 3.76%, about 45% less. While in the case of the Stimulated Dropout, the lowest percentage of SDCs happens when we have a probability of 20%, where we have a percentage of SDCs of 5.15% about 24% less. As for the Fashion Mnist database, when we have a network trained without any of the techniques, we have a percentage of SDCs of 7.93%, whereas with a dropout probability of 80%, we have a percentage of 4.8%, about 46% less. While in the case of the Stimulated Dropout, the lowest percentage of SDCs happens when we have a probability of 50%, where we have a percentage of SDCs of 4.22%, about 47% less. In the case of Dropout, we observed that there is a tradeoff between the accuracy and the number of SDCs. This because, the lowest number of SDCs in both databases happens when we have a dropout probability of 80%, and if we compare the accuracy of the different dropout probabilities, we can see that the accuracy decreases as the dropout probability increases. Regarding the Stimulated Dropout, we observed that there is a tradeoff between the training time and the number of SDCs. This because, the training time of the networks trained with different stimulated dropout probabilities is far higher that the training time of the network trained without any of the techniques.
Outro - This thesis is part of the AI4EU project

APA, Harvard, Vancouver, ISO, and other styles

40

Kulkarni, Sameer G. "Resource Management for Efficient, Scalable and Resilient Network Function Chains." Doctoral thesis, 2018. http://hdl.handle.net/11858/00-1735-0000-002E-E477-8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Fault resilience'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles