Academic literature on the topic 'Fault resilience'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Fault resilience.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Fault resilience"

1

Lee, Yen-Lin, Shinta Nuraisya Arizky, Yu-Ren Chen, Deron Liang, and Wei-Jen Wang. "High-Availability Computing Platform with Sensor Fault Resilience." Sensors 21, no. 2 (January 13, 2021): 542. http://dx.doi.org/10.3390/s21020542.

Full text
Abstract:
Modern computing platforms usually use multiple sensors to report system information. In order to achieve high availability (HA) for the platform, the sensors can be used to efficiently detect system faults that make a cloud service not live. However, a sensor may fail and disable HA protection. In this case, human intervention is needed, either to change the original fault model or to fix the sensor fault. Therefore, this study proposes an HA mechanism that can continuously provide HA to a cloud system based on dynamic fault model reconstruction. We have implemented the proposed HA mechanism on a four-layer OpenStack cloud system and tested the performance of the proposed mechanism for all possible sets of sensor faults. For each fault model, we inject possible system faults and measure the average fault detection time. The experimental result shows that the proposed mechanism can accurately detect and recover an injected system fault with disabled sensors. In addition, the system fault detection time increases as the number of sensor faults increases, until the HA mechanism is degraded to a one-system-fault model, which is the worst case as the system layer heartbeating.
APA, Harvard, Vancouver, ISO, and other styles
2

Lu, Wei, Weidong Wang, Ergude Bao, Weiwei Xing, and Kai Zhu. "Improving Resilience of Software Systems: A Case Study in 3D-Online Game System." International Journal of Software Engineering and Knowledge Engineering 27, no. 01 (February 2017): 1–22. http://dx.doi.org/10.1142/s0218194017500012.

Full text
Abstract:
Resilience is the property that enables a system to continue operating properly when one or more faults occur. Nowadays, as software systems become more and more complex, their hardware execution platforms also become more heterogenous with larger scale. Software systems may fail due to some faults such as node breakdown, communication failure, or data processing failure. In this paper, we propose a ring-based resilience mechanism, which implements fault detection and recovery. (1) To solve the problem that the central server may have high burden of network traffic, we design a ring-based heartbeat algorithm for crash fault detection. (2) We also design a light-weight recovery mechanism to recover from crash faults as compared with the current system-specific mechanisms. To evaluate our mechanism, we use a 3D-online game system as a case study. By injecting faults, we test the effectiveness and overhead of the proposed mechanism. Compared with other mechanisms, the experimental results show that our mechanism can support resilience very well and is better at dealing with the crash fault caused by high cluster workload with acceptable overhead.
APA, Harvard, Vancouver, ISO, and other styles
3

Wu, Weiqiang, Ning Huang, Lina Sun, and Xiaolu Zheng. "Measurement and Analysis of MANET Resilience with Fault Tolerance Strategies." Mathematical Problems in Engineering 2017 (2017): 1–10. http://dx.doi.org/10.1155/2017/9806365.

Full text
Abstract:
Resilience is usually considered as the ability of network fault tolerance. To improve the resilience of MANET, fault tolerance strategies such as routing protocols are usually employed which will impact resilience of MANET. For resilience measurement and fault tolerance strategies’ efficiency evaluation, the impact of fault tolerance strategies deserves a detailed study. However, the general MANET resilience measurement methods do not consider the fault tolerance strategies as individual resilience influence factors, let alone reflecting the interplay among strategies that deployed on different network layers. Thus, it results in a limitation on efficiency assessment of fault tolerance strategies. In this paper, it models fault tolerance strategies for MANET resilience measurement with considering strategies as individual resilience influence factors. Firstly, through analyzing the features of fault tolerance strategies that deployed on physical and logical layers of network, we built a hierarchical network model to describe the resilience impact of strategies. Then, based on this network model, we proposed fault tolerance strategies model to measure resilience of MANET. Particularly, the model can well support the interplay study among different strategies through contrasting the quantitative value defined by strategy model. At last, a case study was given for verification and analysis.
APA, Harvard, Vancouver, ISO, and other styles
4

Ding, Dong, Lei Wang, Zhijie Yang, Kai Hu, and Hongjun He. "ACIMS: Analog CIM Simulator for DNN Resilience." Electronics 10, no. 6 (March 15, 2021): 686. http://dx.doi.org/10.3390/electronics10060686.

Full text
Abstract:
Analog Computing In Memory (ACIM) combines the advantages of both Compute In Memory (CIM) and analog computing, making it suitable for the design of energy-efficient hardware accelerators for computationally intensive DNN applications. However, their use will introduce hardware faults that decrease the accuracy of DNN. In this work, we take Sandwich-Ram as the real hardware example of ACIM and are the first to propose a fault injection and fault-aware training framework for it, named Analog Computing In Memory Simulator (ACIMS). Using this framework, we can simulate and repair the hardware faults of ACIM. The experimental results show that ACIMS can recover 91.0%, 93.7% and 89.8% of the DNN’s accuracy drop through retraining on the MNIST, SVHN and Cifar-10 datasets, respectively; moreover, their adjusted accuracy can reach 97.0%, 95.3% and 92.4%.
APA, Harvard, Vancouver, ISO, and other styles
5

SERVICE, TRAVIS, and DANIEL TAURITZ. "INCREASING INFRASTRUCTURE RESILIENCE THROUGH COMPETITIVE COEVOLUTION." New Mathematics and Natural Computation 05, no. 02 (July 2009): 441–57. http://dx.doi.org/10.1142/s1793005709001416.

Full text
Abstract:
The world is increasingly dependent on critical infrastructures such as the electric power grid, water, gas and oil transport systems. Due to this increasing dependence and inadequate infrastructure expansion, these systems are becoming increasingly stressed. These additional stresses leave these systems less resilient to external faults, both accidental and malicious than ever before. As a result of this increased vulnerability, many critical infrastructures are becoming susceptible to cascading failures, where an initial fault caused by an external force may induce a domino-effect of further component failures. An important implication is that traditional infrastructure risk analysis methods, often relying on Monte Carlo sampling of fault scenarios, are no longer sufficient. Instead, systematic analysis based on worst-case attacks by intelligent adversaries is essential. This paper describes a coevolutionary methodology to simultaneously discover low-effort high-impact faults and corresponding means of hardening infrastructures against them. We empirically validate our methodology through an electric power transmission system case study.
APA, Harvard, Vancouver, ISO, and other styles
6

Xie, Fei, Jun Yan, and Jun Shen. "A Novel PageRank-Based Fault Handling Strategy for Workflow Scheduling in Cloud Data Centers." International Journal of Web Services Research 18, no. 4 (October 2021): 1–26. http://dx.doi.org/10.4018/ijwsr.2021100101.

Full text
Abstract:
Unexpected faults result in unscheduled cloud outage, which negatively affects the completion of workflow tasks in the cloud. This paper presents a novel PageRank-based fault handling strategy to rescue workflow tasks at the faulty data center. The proposed approach uses a holistic view and considers the task attributes, the timeline scenario, and the overall cloud performance. A priority assignment system is developed based on the modified PageRank algorithm to prioritise workflow tasks. A min-max normalization method is applied to select the target data center and match the timeline at this data center. Additionally, a dynamic PageRank-constrained task scheduling algorithm is proposed to generate the task scheduling solution. The simulation results show that the proposed approach can achieve better fault handling performance, measured by task resilience ratio, workflow resilience ratio, and workflow continuity ratio in both the traditional 3-replica and the image backup cloud environment.
APA, Harvard, Vancouver, ISO, and other styles
7

Luo, Mon-Yen, and Chu-Sing Yang. "Enabling fault resilience for web services." Computer Communications 25, no. 3 (February 2002): 198–209. http://dx.doi.org/10.1016/s0140-3664(01)00363-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Meilianda, Ella, Franck Lavigne, Biswajeet Pradhan, Patrick Wassmer, Darusman Darusman, and Marjolein Dohmen-Janssen. "Barrier Islands Resilience to Extreme Events: Do Earthquake and Tsunami Play a Role?" Water 13, no. 2 (January 13, 2021): 178. http://dx.doi.org/10.3390/w13020178.

Full text
Abstract:
Barrier islands are indicators of coastal resilience. Previous studies have proven that barrier islands are surprisingly resilient to extreme storm events. At present, little is known about barrier systems’ resilience to seismic events triggering tsunamis, co-seismic subsidence, and liquefaction. The objective of this study is, therefore, to investigate the morphological resilience of the barrier islands in responding to those secondary effects of seismic activity of the Sumatra–Andaman subduction zone and the Great Sumatran Fault system. Spatial analysis in Geographical Information Systems (GIS) was utilized to detect shoreline changes from the multi-source datasets of centennial time scale, including old topographic maps and satellite images from 1898 until 2017. Additionally, the earthquake and tsunami records and established conceptual models of storm effects to barrier systems, are corroborated to support possible forcing factors analysis. Two selected coastal sections possess different geomorphic settings are investigated: (1) Lambadeuk, the coast overlying the Sumatran Fault system, (2) Kuala Gigieng, located in between two segments of the Sumatran Fault System. Seven consecutive pairs of comparable old topographic maps and satellite images reveal remarkable morphological changes in the form of breaching, landward migrating, sinking, and complete disappearing in different periods of observation. While semi-protected embayed Lambadeuk is not resilient to repeated co-seismic land subsidence, the wave-dominated Kuala Gigieng coast is not resilient to the combination of tsunami and liquefaction events. The mega-tsunami triggered by the 2004 earthquake led to irreversible changes in the barrier islands on both coasts.
APA, Harvard, Vancouver, ISO, and other styles
9

Caseiro, Luís, and André Mendes. "Fault Analysis and Non-Redundant Fault Tolerance in 3-Level Double Conversion UPS Systems Using Finite-Control-Set Model Predictive Control." Energies 14, no. 8 (April 15, 2021): 2210. http://dx.doi.org/10.3390/en14082210.

Full text
Abstract:
Fault-tolerance is critical in power electronics, especially in Uninterruptible Power Supplies, given their role in protecting critical loads. Hence, it is crucial to develop fault-tolerant techniques to improve the resilience of these systems. This paper proposes a non-redundant fault-tolerant double conversion uninterruptible power supply based on 3-level converters. The proposed solution can correct open-circuit faults in all semiconductors (IGBTs and diodes) of all converters of the system (including the DC-DC converter), ensuring full-rated post-fault operation. This technique leverages the versatility of Finite-Control-Set Model Predictive Control to implement highly specific fault correction. This type of control enables a conditional exclusion of the switching states affected by each fault, allowing the converter to avoid these states when the fault compromises their output but still use them in all other conditions. Three main types of corrective actions are used: predictive controller adaptations, hardware reconfiguration, and DC bus voltage adjustment. However, highly differentiated corrective actions are taken depending on the fault type and location, maximizing post-fault performance in each case. Faults can be corrected simultaneously in all converters, as well as some combinations of multiple faults in the same converter. Experimental results are presented demonstrating the performance of the proposed solution.
APA, Harvard, Vancouver, ISO, and other styles
10

Rizzi, F., K. Morris, K. Sargsyan, P. Mycek, C. Safta, O. Le Maître, O. Knio, and B. Debusschere. "Partial differential equations preconditioner resilient to soft and hard faults." International Journal of High Performance Computing Applications 32, no. 5 (January 29, 2017): 658–73. http://dx.doi.org/10.1177/1094342016684975.

Full text
Abstract:
We present a domain-decomposition-based preconditioner for the solution of partial differential equations (PDEs) that is resilient to both soft and hard faults. The algorithm reformulates the PDE as a sampling problem, followed by a solution update through data manipulation that is resilient to both soft and hard faults. This reformulation allows us to recast the problem as a set of independent tasks, and exploit data locality to reduce global communication. We discuss two different parallel implementations: (a) a single program multiple data (SPMD) version based on a one-to-one mapping between subdomain and MPI processes responsible for both state and computation; and (b) an asynchronous server–client implementation where all state information is held by the servers and clients are designed solely as computational units. We present a scalability comparison of both implementations under nominal conditions, showing efficiency within ~80% for up to 12,000 cores. We present a resilience analysis under different fault scenarios based on the server–client implementation. This framework provides resiliency to hard faults such that if a client crashes, it stops asking for work, and the servers simply distribute the work among all of the other clients alive. Erroneous subdomain solves (e.g. due to soft faults) appear as corrupted data, which is either rejected if that causes a task to fail, or is seamlessly filtered out during the regression stage through a suitable noise model. Three different types of faults are modeled: hard faults modeling nodes (or clients) crashing; soft faults occurring during the communication of the tasks between server and clients; and soft faults occurring during task execution. We demonstrate the resiliency of the approach for a 2D elliptic PDE, and explore the effect of the faults at various failure rates.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Fault resilience"

1

Wilkes, Charles Thomas. "Programming methodologies for resilience and availability." Diss., Georgia Institute of Technology, 1987. http://hdl.handle.net/1853/8308.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Nascimento, Flávia Maristela Santos. "A SIMULATION-BASED FAULT RESILIENCE ANALYSIS FOR REAL-TIME SYSTEMS." Escola Politécnica / Instituto de Matemática, 2009. http://repositorio.ufba.br/ri/handle/ri/21461.

Full text
Abstract:
Submitted by Diogo Barreiros (diogo.barreiros@ufba.br) on 2017-02-17T14:47:00Z No. of bitstreams: 1 flavia maristela santos nascimento.pdf: 1166834 bytes, checksum: 576c7c98a85b5cc824a7869fbb31347e (MD5)
Approved for entry into archive by Vanessa Reis (vanessa.jamile@ufba.br) on 2017-02-17T14:58:14Z (GMT) No. of bitstreams: 1 flavia maristela santos nascimento.pdf: 1166834 bytes, checksum: 576c7c98a85b5cc824a7869fbb31347e (MD5)
Made available in DSpace on 2017-02-17T14:58:14Z (GMT). No. of bitstreams: 1 flavia maristela santos nascimento.pdf: 1166834 bytes, checksum: 576c7c98a85b5cc824a7869fbb31347e (MD5)
Sistemas de tempo real tem sido amplamente utilizados no contexto de sistemas mecatrônicos uma vez que, para controlar entidades do mundo real, ´e necessário considerar tanto seus requisitos lógicos quanto os temporais. Em tais sistemas, mecanismos para prover tolerância a falhas devem ser implementados já que falhas podem implicar em perdas consideráveis. Por exemplo, um erro em um sistema de controle de voo pode incorrer em perda de vidas humanas. Várias abordagens de escalonamento com tolerância a falhas para sistemas de tempo real foram derivadas. Entretanto, a maioria delas restringe o modelo de sistema e/ou falhas de modo particular, ou estão fortemente acopladas ao modelo de recuperação do sistema ou a política de escalonamento. Além disso, não existe uma m´métrica formal que permita comparar as abordagens existentes do ponto de vista da resiliência a falhas. O objetivo principal deste trabalho ´e preencher esta lacuna, fornecendo uma m´métrica de resiliência a falhas para sistemas de tempo real, que seja o mais independente possível dos modelos do sistema e/ou de falhas. Para tanto, uma análise baseada em simulação foi desenvolvida para calcular a resiliência de todas as tarefas de um sistema, através da simulação de intervalos de tempo específicos. Em seguida, t´técnicas de inferência estatística são utilizadas para inferir a resiliência do sistema. Os resultados mostraram que a m´métrica desenvolvida pode ser utilizada para comparar, por exemplo, duas políticas de escalonamento para sistemas de tempo real sob a ´ótica de resiliência a falhas, o que demonstra que a abordagem desenvolvida ´e razoavelmente independente do modelo de sistema.
APA, Harvard, Vancouver, ISO, and other styles
3

Pai, Raikar Siddhesh Prakash Sunita. "Network Fault Resilient MPI for Multi-Rail Infiniband Clusters." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1325270841.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Monge, Solano Ignacio, and Enikő Matók. "Developing for Resilience: Introducing a Chaos Engineering tool." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20808.

Full text
Abstract:
Software complexity continues to accelerate, as new tools, frameworks, and technologiesbecome available. This, in turn, increases its fragility and liability. Despite the amount ofinvestment to test and harden their systems, companies still pay the price of failure. Towithstand this fast-paced development environment and ensure software availability, largescalesystems must be built with resilience in mind. Chaos Engineering is a new practicethat aims to assess some of these challenges. In this thesis, the methodology, requirements,and iterations of the system design and architecture for a chaos engineering tool arepresented. In a matter of only a couple of months and the working hours of two engineers, itwas possible to build a tool that is able to shed light on the attributes that make the targetedsystem resilient as well as the weaknesses in its failure handling mechanisms. This toolgreatly reduces the otherwise manual testing labor and allows software engineering teamsto find potentially costly failures. These results prove the benefits that many companiescould experience in their return of investment by adopting the practice of ChaosEngineering.
APA, Harvard, Vancouver, ISO, and other styles
5

Souto, Laiz. "Data-driven approaches for event detection, fault location, resilience assessment, and enhancements in power systems." Doctoral thesis, Universitat de Girona, 2021. http://hdl.handle.net/10803/671402.

Full text
Abstract:
This thesis presents the study and development of distinct data-driven techniques to support event detection, fault location, and resilience assessment towards enhancements in power systems. It is divided in three main parts as follows. The first part investigates improvements in power system monitoring and event detection methods with focus on dimensionality reduction techniques in wide-area monitoring systems. The second part focuses on contributions to fault location tasks in power distribution networks, relying on information about the network topology and its electrical parameters for short-circuit simulations over a range of scenarios. The third part assesses enhancements in power system resilience to high-impact, lowprobability events associated with extreme weather conditions and human-made attacks, relying on information about the system topology combined with simulations of representative scenarios for impact assessment and mitigation. Overall, the proposed data-driven algorithms contribute to event detection, fault location, and resilience assessment, relying on electrical measurements recorded by intelligent electronic devices, historical data of past events, and representative scenarios, together with information about the network topology, electrical parameters, and operating status. The validation of the algorithms, implemented in MATLAB, is based on computer simulations using network models implemented in OpenDSS and Simulink
Esta tesis presenta el estudio y el desarrollo de distintas técnicas basadas en datos para respaldar las tareas de detección de eventos, localización de fallos y resiliencia hacia mejoras en sistemas de energía eléctrica. Los contenidos se dividen en tres partes principales descritas a continuación. La primera parte investiga mejoras en el monitoreo de sistemas de energía eléctrica y métodos de detección de eventos con enfoque en técnicas de reducción de dimensionalidad en wide-area monitoring systems. La segunda parte se centra en contribuciones a tareas de localización de fallos en redes eléctricas de distribución, basándose en información acerca de la topología de la red y sus parámetros eléctricos para simulaciones de cortocircuito en una variedad de escenarios. La tercera parte evalúa mejoras en la resiliencia de sistemas de energía eléctrica ante eventos de alto impacto y baja probabilidad asociados con condiciones climáticas extremas y ataques provocados por humanos, basándose en información sobre la topología del sistema combinada con simulaciones de escenarios representativos para la evaluación y mitigación del impacto. En general, los algoritmos propuestos basados en datos contribuyen a la detección de eventos, la localización de fallos, y el aumento de la resiliencia de sistemas de energía eléctrica, basándose en mediciones eléctricas registradas por dispositivos electrónicos inteligentes, datos históricos de eventos pasados y escenarios representativos, en conjunto con información acerca de la topología de la red, parámetros eléctricos y estado operativo. La validación de los algoritmos, implementados en MATLAB, se basa en simulaciones computacionales utilizando modelos de red implementados en OpenDSS y Simulink
APA, Harvard, Vancouver, ISO, and other styles
6

Bentria, Dounia. "Combining checkpointing and other resilience mechanisms for exascale systems." Thesis, Lyon, École normale supérieure, 2014. http://www.theses.fr/2014ENSL0971/document.

Full text
Abstract:
Dans cette thèse, nous nous sommes intéressés aux problèmes d'ordonnancement et d'optimisation dans des contextes probabilistes. Les contributions de cette thèse se déclinent en deux parties. La première partie est dédiée à l’optimisation de différents mécanismes de tolérance aux pannes pour les machines de très large échelle qui sont sujettes à une probabilité de pannes. La seconde partie est consacrée à l’optimisation du coût d’exécution des arbres d’opérateurs booléens sur des flux de données.Dans la première partie, nous nous sommes intéressés aux problèmes de résilience pour les machines de future génération dites « exascales » (plateformes pouvant effectuer 1018 opérations par secondes).Dans le premier chapitre, nous présentons l’état de l’art des mécanismes les plus utilisés dans la tolérance aux pannes et des résultats généraux liés à la résilience.Dans le second chapitre, nous étudions un modèle d’évaluation des protocoles de sauvegarde de points de reprise (checkpoints) et de redémarrage. Le modèle proposé est suffisamment générique pour contenir les situations extrêmes: d’un côté le checkpoint coordonné, et de l’autre toute une famille de stratégies non-Coordonnées. Nous avons proposé une analyse détaillée de plusieurs scénarios, incluant certaines des plateformes de calcul existantes les plus puissantes, ainsi que des anticipations sur les futures plateformes exascales.Dans les troisième, quatrième et cinquième chapitres, nous étudions l'utilisation conjointe de différents mécanismes de tolérance aux pannes (réplication, prédiction de pannes et détection d'erreurs silencieuses) avec le mécanisme traditionnel de checkpoints et de redémarrage. Nous avons évalué plusieurs modèles au moyen de simulations. Nos résultats montrent que ces modèles sont bénéfiques pour un ensemble de modèles d'applications dans le cadre des futures plateformes exascales.Dans la seconde partie de la thèse, nous étudions le problème de la minimisation du coût de récupération des données par des applications lors du traitement d’une requête exprimée sous forme d'arbres d'opérateurs booléens appliqués à des prédicats sur des flux de données de senseurs. Le problème est de déterminer l'ordre dans lequel les prédicats doivent être évalués afin de minimiser l'espérance du coût du traitement de la requête. Dans le sixième chapitre, nous présentons l'état de l'art de la seconde partie et dans le septième chapitre, nous étudions le problème pour les requêtes exprimées sous forme normale disjonctive. Nous considérons le cas plus général où chaque flux peut apparaître dans plusieurs prédicats et nous étudions deux modèles, le modèle où chaque prédicat peut accéder à un seul flux et le modèle où chaque prédicat peut accéder à plusieurs flux
In this thesis, we are interested in scheduling and optimization problems in probabilistic contexts. The contributions of this thesis come in two parts. The first part is dedicated to the optimization of different fault-Tolerance mechanisms for very large scale machines that are subject to a probability of failure and the second part is devoted to the optimization of the expected sensor data acquisition cost when evaluating a query expressed as a tree of disjunctive Boolean operators applied to Boolean predicates. In the first chapter, we present the related work of the first part and then we introduce some new general results that are useful for resilience on exascale systems.In the second chapter, we study a unified model for several well-Known checkpoint/restart protocols. The proposed model is generic enough to encompass both extremes of the checkpoint/restart space, from coordinated approaches to a variety of uncoordinated checkpoint strategies. We propose a detailed analysis of several scenarios, including some of the most powerful currently available HPC platforms, as well as anticipated exascale designs.In the third, fourth, and fifth chapters, we study the combination of different fault tolerant mechanisms (replication, fault prediction and detection of silent errors) with the traditional checkpoint/restart mechanism. We evaluated several models using simulations. Our results show that these models are useful for a set of models of applications in the context of future exascale systems.In the second part of the thesis, we study the problem of minimizing the expected sensor data acquisition cost when evaluating a query expressed as a tree of disjunctive Boolean operators applied to Boolean predicates. The problem is to determine the order in which predicates should be evaluated so as to shortcut part of the query evaluation and minimize the expected cost.In the sixth chapter, we present the related work of the second part and in the seventh chapter, we study the problem for queries expressed as a disjunctive normal form. We consider the more general case where each data stream can appear in multiple predicates and we consider two models, the model where each predicate can access a single stream and the model where each predicate can access multiple streams
APA, Harvard, Vancouver, ISO, and other styles
7

Raja, Chandrasekar Raghunath. "Designing Scalable and Efficient I/O Middleware for Fault-Resilient High-Performance Computing Clusters." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1417733721.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Teixeira, André. "Toward Cyber-Secure and Resilient Networked Control Systems." Doctoral thesis, KTH, Reglerteknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-154204.

Full text
Abstract:
Resilience is the ability to maintain acceptable levels of operation in the presence of abnormal conditions. It is an essential property in industrial control systems, which are the backbone of several critical infrastructures. The trend towards using pervasive information technology systems, such as the Internet, results in control systems becoming increasingly vulnerable to cyber threats. Traditional cyber security does not consider the interdependencies between the physical components and the cyber systems. On the other hand, control-theoretic approaches typically deal with independent disturbances and faults, thus they are not tailored to handle cyber threats. Theory and tools to analyze and build control system resilience are, therefore, lacking and in need to be developed. This thesis contributes towards a framework for analyzing and building resilient control systems. First, a conceptual model for networked control systems with malicious adversaries is introduced. In this model, the adversary aims at disrupting the system behavior while remaining undetected by an anomaly detector The adversary is constrained in terms of the available model knowledge, disclosure resources, and disruption capabilities. These resources may correspond to the anomaly detector’s algorithm, sniffers of private data, and spoofers of control commands, respectively. Second, we address security and resilience under the perspective of risk management, where the notion of risk is defined in terms of a threat’s scenario, impact, and likelihood. Quantitative tools to analyze risk are proposed. They take into account both the likelihood and impact of threats. Attack scenarios with high impact are identified using the proposed tools, e.g., zero-dynamics attacks are analyzed in detail. The problem of revealing attacks is also addressed. Their stealthiness is characterized, and how to detect them by modifying the system’s structure is also described. As our third contribution, we propose distributed fault detection and isolation schemes to detect physical and cyber threats on interconnected second-order linear systems. A distributed scheme based on unknown input observers is designed to jointly detect and isolate threats that may occur on the network edges or nodes. Additionally, we propose a distributed scheme based on local models and measurements that is resilient to changes outside the local subsystem. The complexity of the proposed methods is decreased by reducing the number of monitoring nodes and by characterizing the minimum amount of model information and measurements needed to achieve fault detection and isolation. Finally, we tackle the problem of distributed reconfiguration under sensor and actuator faults. In particular, we consider a control system with redundant sensors and actuators cooperating to recover from the removal of individual nodes. The proposed scheme minimizes a quadratic cost while satisfying a model-matching condition, which maintains the nominal closed-loop behavior after faults. Stability of the closed-loop system under the proposed scheme is analyzed.
Ett resilient system har förmågan att återhämta sig efter en kraftig och oväntad störning. Resiliens är en viktig egenskap hos industriella styrsystem som utgör en viktig komponent i många kritiska infrastrukturer, såsom processindustri och elkraftnät. Trenden att använda storskaliga IT-system, såsom Internet, inom styrsystem resulterar i en ökad sårbarhet för cyberhot. Traditionell IT-säkerhet tar inte hänsyn till den speciella koppling mellan fysikaliska komponenter och ITsystem som finns inom styrsystem. Å andra sidan så brukar traditionell reglerteknik fokusera på att hantera naturliga fel och inte cybersårbarheter. Teori och verktyg för resilienta och cybersäkra styrsystem saknas därför och behöver utvecklas. Denna avhandling bidrar till att ta fram ett ramverk för att analysera och konstruera just sådana styrsystem. Först så tar vi fram en representativ abstrakt modell för nätverkade styrsystem som består av fyra komponenter: den fysikaliska processen med sensorer och ställdon, kommunikationsnätet, det digitala styrsystemet och en feldetektor. Sedan införs en konceptuell modell för attacker gentemot det nätverkade styrsystemet. I modellen så beskrivs attacker som försöker undgå att skapa alarm i feldetektorn men ändå stör den fysikaliska processen. Dessutom så utgår modellen ifrån att den som utför attacken har begränsade resurser i fråga om modellkännedom och kommunikationskanaler. Det beskrivna ramverket används sedan för att studera resilens gentemot attackerna genom en riskanalys, där risk definieras utifrån ett hots scenario, konsekvenser och sannolikhet. Kvantitativa metoder för att uppskatta attackernas konsekvenser och sannolikheter tas fram, och speciellt visas hur hot med hög risk kan identifieras och motverkas. Resultaten i avhandlingen illustreras med ett flertal numeriska och praktiska exempel.

QC 20141016

APA, Harvard, Vancouver, ISO, and other styles
9

Zounon, Mawussi. "On numerical resilience in linear algebra." Thesis, Bordeaux, 2015. http://www.theses.fr/2015BORD0038/document.

Full text
Abstract:
Comme la puissance de calcul des systèmes de calcul haute performance continue de croître, en utilisant un grand nombre de cœurs CPU ou d’unités de calcul spécialisées, les applications hautes performances destinées à la résolution des problèmes de très grande échelle sont de plus en plus sujettes à des pannes. En conséquence, la communauté de calcul haute performance a proposé de nombreuses contributions pour concevoir des applications tolérantes aux pannes. Cette étude porte sur une nouvelle classe d’algorithmes numériques de tolérance aux pannes au niveau de l’application qui ne nécessite pas de ressources supplémentaires, à savoir, des unités de calcul ou du temps de calcul additionnel, en l’absence de pannes. En supposant qu’un mécanisme distinct assure la détection des pannes, nous proposons des algorithmes numériques pour extraire des informations pertinentes à partir des données disponibles après une pannes. Après l’extraction de données, les données critiques manquantes sont régénérées grâce à des stratégies d’interpolation pour constituer des informations pertinentes pour redémarrer numériquement l’algorithme. Nous avons conçu ces méthodes appelées techniques d’Interpolation-restart pour des problèmes d’algèbre linéaire numérique tels que la résolution de systèmes linéaires ou des problèmes aux valeurs propres qui sont indispensables dans de nombreux noyaux scientifiques et applications d’ingénierie. La résolution de ces problèmes est souvent la partie dominante; en termes de temps de calcul, des applications scientifiques. Dans le cadre solveurs linéaires du sous-espace de Krylov, les entrées perdues de l’itération sont interpolées en utilisant les entrées disponibles sur les nœuds encore disponibles pour définir une nouvelle estimation de la solution initiale avant de redémarrer la méthode de Krylov. En particulier, nous considérons deux politiques d’interpolation qui préservent les propriétés numériques clés de solveurs linéaires bien connus, à savoir la décroissance monotone de la norme-A de l’erreur du gradient conjugué ou la décroissance monotone de la norme résiduelle de GMRES. Nous avons évalué l’impact du taux de pannes et l’impact de la quantité de données perdues sur la robustesse des stratégies de résilience conçues. Les expériences ont montré que nos stratégies numériques sont robustes même en présence de grandes fréquences de pannes, et de perte de grand volume de données. Dans le but de concevoir des solveurs résilients de résolution de problèmes aux valeurs propres, nous avons modifié les stratégies d’interpolation conçues pour les systèmes linéaires. Nous avons revisité les méthodes itératives de l’état de l’art pour la résolution des problèmes de valeurs propres creux à la lumière des stratégies d’Interpolation-restart. Pour chaque méthode considérée, nous avons adapté les stratégies d’Interpolation-restart pour régénérer autant d’informations spectrale que possible. Afin d’évaluer la performance de nos stratégies numériques, nous avons considéré un solveur parallèle hybride (direct/itérative) pleinement fonctionnel nommé MaPHyS pour la résolution des systèmes linéaires creux, et nous proposons des solutions numériques pour concevoir une version tolérante aux pannes du solveur. Le solveur étant hybride, nous nous concentrons dans cette étude sur l’étape de résolution itérative, qui est souvent l’étape dominante dans la pratique. Les solutions numériques proposées comportent deux volets. A chaque fois que cela est possible, nous exploitons la redondance de données entre les processus du solveur pour effectuer une régénération exacte des données en faisant des copies astucieuses dans les processus. D’autre part, les données perdues qui ne sont plus disponibles sur aucun processus sont régénérées grâce à un mécanisme d’interpolation
As the computational power of high performance computing (HPC) systems continues to increase by using huge number of cores or specialized processing units, HPC applications are increasingly prone to faults. This study covers a new class of numerical fault tolerance algorithms at application level that does not require extra resources, i.e., computational unit or computing time, when no fault occurs. Assuming that a separate mechanism ensures fault detection, we propose numerical algorithms to extract relevant information from available data after a fault. After data extraction, well chosen part of missing data is regenerated through interpolation strategies to constitute meaningful inputs to numerically restart the algorithm. We have designed these methods called Interpolation-restart techniques for numerical linear algebra problems such as the solution of linear systems or eigen-problems that are the inner most numerical kernels in many scientific and engineering applications and also often ones of the most time consuming parts. In the framework of Krylov subspace linear solvers the lost entries of the iterate are interpolated using the available entries on the still alive nodes to define a new initial guess before restarting the Krylov method. In particular, we consider two interpolation policies that preserve key numerical properties of well-known linear solvers, namely the monotony decrease of the A-norm of the error of the conjugate gradient or the residual norm decrease of GMRES. We assess the impact of the fault rate and the amount of lost data on the robustness of the resulting linear solvers.For eigensolvers, we revisited state-of-the-art methods for solving large sparse eigenvalue problems namely the Arnoldi methods, subspace iteration methods and the Jacobi-Davidson method, in the light of Interpolation-restart strategies. For each considered eigensolver, we adapted the Interpolation-restart strategies to regenerate as much spectral information as possible. Through intensive experiments, we illustrate the qualitative numerical behavior of the resulting schemes when the number of faults and the amount of lost data are varied; and we demonstrate that they exhibit a numerical robustness close to that of fault-free calculations. In order to assess the efficiency of our numerical strategies, we have consideredan actual fully-featured parallel sparse hybrid (direct/iterative) linear solver, MaPHyS, and we proposed numerical remedies to design a resilient version of the solver. The solver being hybrid, we focus in this study on the iterative solution step, which is often the dominant step in practice. The numerical remedies we propose are twofold. Whenever possible, we exploit the natural data redundancy between processes from the solver toperform an exact recovery through clever copies over processes. Otherwise, data that has been lost and is not available anymore on any process is recovered through Interpolationrestart strategies. These numerical remedies have been implemented in the MaPHyS parallel solver so that we can assess their efficiency on a large number of processing units (up to 12; 288 CPU cores) for solving large-scale real-life problems
APA, Harvard, Vancouver, ISO, and other styles
10

Rink, Norman Alexander, and Jeronimo Castrillon. "Comprehensive Backend Support for Local Memory Fault Tolerance." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-215785.

Full text
Abstract:
Technological advances drive hardware to ever smaller feature sizes, causing devices to become more vulnerable to transient faults. Applications can be protected against faults by adding error detection and recovery measures in software. This is popularly achieved by applying automatic program transformations. However, transformations applied to program representations at abstraction levels higher than machine instructions are fundamentally incapable of protecting against vulnerabilities that are introduced during compilation. In particular, a large proportion of a program’s memory accesses are introduced by the compiler backend. This report presents a backend that protects these accesses against faults in the memory system. It is demonstrated that the presented backend can detect all single bit flips in memory that would be missed by an error detection scheme that operates on the LLVM intermediate representation of programs. The presented compiler backend is obtained by modifying the LLVM backend for the x86 architecture. On a subset of SPEC CINT2006 the runtime overhead incurred by the backend modifications amounts to 1.50x for the 32-bit processor architecture i386, and 1.13x for the 64-bit architecture x86_64. To achieve comprehensive detection of memory faults, the modified backend implements an adjusted calling convention that leaves library function calls transparent and intact.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Fault resilience"

1

1941-, Hollnagel Erik, Nemeth Christopher P, and Dekker Sidney, eds. Resilience engineering perspectives. Aldershot: Ashgate, 2008.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Hollnagel, Erik. Resilience engineering in practice: A guidebook. Farnham, Surrey, England: Ashgate, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Hollnagel, Erik. Resilience Engineering in Practice: A Guidebook. Farnham, Surrey, England: Ashgate, 2010.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Marie, Healy Ann, ed. Resilience: The science of why things bounce back. New York: Free Press, 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Kavian, Yousef S., and Mark Stephen Leeson. Resilient optical network design: Advances in fault-tolerant methodologies. Hershey, PA: Information Science Reference, 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

1947-, Anderson Tom, ed. Resilient computing systems. New York: Wiley, 1985.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Troubitsyna, Elena A. Software Engineering for Resilient Systems: Third International Workshop, SERENE 2011, Geneva, Switzerland, September 29-30, 2011. Proceedings. Berlin, Heidelberg: Springer-Verlag GmbH Berlin Heidelberg, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

service), SpringerLink (Online, ed. Software Engineering for Resilient Systems: 4th International Workshop, SERENE 2012, Pisa, Italy, September 27-28, 2012. Proceedings. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Nemeth, Christopher P., and Erik Hollnagel. Resilience Engineering in Practice - Becoming Resilient. Taylor & Francis Group, 2016.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Zolli, Andrew, and Ann Marie Healy. Resilience: Why Things Bounce Back. Headline Publishing Group, 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Fault resilience"

1

Häring, Ivo. "Fault Tree Analysis." In Technical Safety, Reliability and Resilience, 71–99. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-33-4272-9_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Barbosa, Raul, Johan Karlsson, Henrique Madeira, and Marco Vieira. "Fault Injection." In Resilience Assessment and Evaluation of Computing Systems, 263–81. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-29032-9_13.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Chatterjee, Bijoy Chand, Nityananda Sarma, Partha Pratim Sahu, and Eiji Oki. "A Reliable Fault Resilience Scheme." In Lecture Notes in Electrical Engineering, 85–100. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-46203-5_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Saeed, Luqman, and Ghazy Abdallah. "Resilience with Fault Tolerance API." In Pro Cloud Native Java EE Apps, 241–56. Berkeley, CA: Apress, 2022. http://dx.doi.org/10.1007/978-1-4842-8900-6_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Liu, Zhiyu, Guihai Chen, Chunfeng Yuan, Sanglu Lu, and Chengzhong Xu. "Fault Resilience of Structured P2P Systems." In Web Information Systems – WISE 2004, 736–41. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004. http://dx.doi.org/10.1007/978-3-540-30480-7_77.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Boin, Arjen, Allan McConnell, and Paul ‘t Hart. "Pathways to Resilience." In Governing the Pandemic, 107–20. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-72680-5_6.

Full text
Abstract:
AbstractThe COVID-19 crisis has tested public institutions, crisis leadership and societal solidarity to the core. Fault lines have come to the fore; unsuspected strengths have been noted. But will this be enough to initiate the necessary steps to prepare our societies for the future crises that will come? In this chapter, we offer the building blocks for an action agenda. We identify various pathways to enhanced resilience.
APA, Harvard, Vancouver, ISO, and other styles
7

Zussblatt, Niels P., Alexander A. Ganin, Sabrina Larkin, Lance Fiondella, and Igor Linkov. "Resilience and Fault Tolerance in Electrical Engineering." In NATO Science for Peace and Security Series C: Environmental Security, 427–47. Dordrecht: Springer Netherlands, 2017. http://dx.doi.org/10.1007/978-94-024-1123-2_16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Strigini, Lorenzo. "Fault Tolerance and Resilience: Meanings, Measures and Assessment." In Resilience Assessment and Evaluation of Computing Systems, 3–24. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-29032-9_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Orailoğlu, Alex. "On-Line Fault Resilience Through Gracefully Degradable ASICs." In On-Line Testing for VLSI, 145–51. Boston, MA: Springer US, 1998. http://dx.doi.org/10.1007/978-1-4757-6069-9_14.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Brandstetter, Lukas, Marc Fischlin, Robin Leander Schröder, and Michael Yonli. "On the Memory Fault Resilience of TLS 1.3." In Security Standardisation Research, 1–22. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-64357-7_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Fault resilience"

1

Guilley, Sylvain, Laurent Sauvage, Jean-Luc Danger, and Nidhal Selmane. "Fault Injection Resilience." In 2010 Workshop on Fault Diagnosis and Tolerance in Cryptography (FDTC). IEEE, 2010. http://dx.doi.org/10.1109/fdtc.2010.15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Hulse, Daniel, Christopher Hoyle, Irem Y. Tumer, Kai Goebel, and Chetan Kulkarni. "Temporal Fault Injection Considerations in Resilience Quantification." In ASME 2020 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2020. http://dx.doi.org/10.1115/detc2020-22154.

Full text
Abstract:
Abstract Resilience models assess a system’s ability to withstand disruption by quantifying the value of metrics (e.g. expected cost or loss) over time. When such a metric is the result of injecting faults in a dynamic model over an interval of time, it is important that it represent the statistical expectation of fault responses rather than a single response. Since fault responses vary over fault injection times, representing the statistical expectation of responses requires sampling a number of points. However, fault models are often built around computationally expensive dynamic simulations, and it is desirable to be able to iterate over designs as quickly as possible to improve system resilience. With this in mind, this paper explores approaches to sample fault injection times to minimize computational cost while accurately representing the expectation of fault resilience metrics over the set possible occurrence times. Two general approaches are presented: an a priori approach that attempts to minimize error without knowing the underlying cost function, and an a posteriori approach that minimizes error when the cost function is known. Among a priori methods, numerical integration minimizes error and computational time compared to Monte Carlo sampling, however both are prone to error when the metric’s fault response curve is discontinuous. While a posteriori approaches can locate and correct for these discontinuities, the resulting error reduction is not robust to design changes that shift the underlying location of discontinuities. The ultimate decision to use an a priori or a posteriori approach to quantify resilience is thus dependent on a number of considerations, including computational cost, the robustness of the approximation to design changes, and the underlying form of the resilience function.
APA, Harvard, Vancouver, ISO, and other styles
3

Hulse, Daniel, Christopher Hoyle, Kai Goebel, and Irem Y. Tumer. "Optimizing Function-Based Fault Propagation Model Resilience Using Expected Cost Scoring." In ASME 2018 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2018. http://dx.doi.org/10.1115/detc2018-85318.

Full text
Abstract:
Complex engineered systems are often associated with risk due to high failure consequences, high complexity, and large investments. As a result, it is desirable for complex engineered systems to be resilient such that they can avoid or quickly recover from faults. Ideally, this should be done at the early design stage where designers are most able to explore a large space of concepts. Previous work has shown that functional models can be used to predict fault propagation behavior and motivate design work. However, little has been done to formally optimize a design based on these predictions, partially because the effects of these models have not been quantified into an objective function to optimize. This work introduces a scoring function which integrates with a fault scenario-based simulation to enable the risk-neutral optimization of functional model resilience. This scoring function accomplishes this by resolving the tradeoffs between the design costs, operating costs, and modeled fault response of a given design in a way that may be parameterized in terms of designer-specified resilient features. This scoring function is adapted and applied to the optimization of controlling functions which recover flows in a monopropellant orbiter. In this case study, an evolutionary algorithm is found to find the optimal logic for these functions, showing an improvement over a typical a-priori guess by exploring a large range of solutions, demonstrating the value of the approach.
APA, Harvard, Vancouver, ISO, and other styles
4

Hulse, Daniel, and Lukman Irshad. "Synthetic Fault Mode Generation for Resilience Analysis and Failure Mechanism Discovery." In ASME 2022 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2022. http://dx.doi.org/10.1115/detc2022-90072.

Full text
Abstract:
Abstract Traditional risk-based design processes seek to mitigate operational hazards by manually identifying possible faults and corresponding mitigation strategies — a tedious process which critically relies on the designer’s limited knowledge. Resilience-based design, on the other hand, seeks to embody generic hazard-mitigating properties in the system to mitigate unknown hazards, often by modelling the system’s response to potential randomly-generated hazardous events. This work creates a framework to adapt these scenario generation approaches to the traditional risk-based design process to synthetically generate fault modes, by representing them as a unique combination of internal component health-states which can then be injected and simulated in a model of the system failure dynamics. The design process may then reduce the risk of unknown internal hazards by iteratively mitigating the effects of these modes. The performance of this approach is evaluated in a model of an autonomous rover, where cluster analysis shows that elaborating the faulty state-space in the drive system using this approach uncovers a wider range of possible hazardous trajectories and failure consequences within each trajectory. However, this increase in hazard information gained from exhaustive mode sampling comes at a high computational expense, highlighting the need for advanced, efficient methods to search and sample the faulty state-space.
APA, Harvard, Vancouver, ISO, and other styles
5

Ju, Xiaoen, Livio Soares, Kang G. Shin, Kyung Dong Ryu, and Dilma Da Silva. "On fault resilience of OpenStack." In SOCC '13: ACM Symposium on Cloud Computing. New York, NY, USA: ACM, 2013. http://dx.doi.org/10.1145/2523616.2523622.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Salman, Mustafa, Morteza Sarailoo, and N. Eva Wu. "Fault diagnosis based on partitioned power system models." In 2016 Resilience Week (RWS). IEEE, 2016. http://dx.doi.org/10.1109/rweek.2016.7573312.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Martins, Joao F., Ines Neves, Adriana Mar, Pedro Pereira, Vitor Pires, and Rui Amaral Lopes. "Fault Resilience in Energy Community Microgrids." In 2022 3rd International Conference on Smart Grid and Renewable Energy (SGRE). IEEE, 2022. http://dx.doi.org/10.1109/sgre53517.2022.9774093.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Li, Yawei, Prashasta Gujrati, Zhiling Lan, and Xian-he Sun. "Fault-Driven Re-Scheduling For Improving System-level Fault Resilience." In 2007 International Conference on Parallel Processing (ICPP 2007). IEEE, 2007. http://dx.doi.org/10.1109/icpp.2007.42.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Naughton, Thomas, Wesley Bland, Geoffroy Vallee, Christian Engelmann, and Stephen L. Scott. "Fault injection framework for system resilience evaluation." In the 2009 workshop. New York, New York, USA: ACM Press, 2009. http://dx.doi.org/10.1145/1552526.1552530.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Anzt, Hartwig, Jack Dongarra, and Enrique S. Quintana-Ortí. "Tuning stationary iterative solvers for fault resilience." In the 6th Workshop. New York, New York, USA: ACM Press, 2015. http://dx.doi.org/10.1145/2832080.2832081.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Fault resilience"

1

Ted Quinn, Richard Bockhorst, Craig Peterson, and Gregg Swindlehurst. Design to Achieve Fault Tolerance and Resilience. Office of Scientific and Technical Information (OSTI), September 2012. http://dx.doi.org/10.2172/1057690.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Kramer, William, Saurabh Jha, James Brandt, and Ann Gentile. Final Report - Holistic Measurement Driven Resilience: Combining Operational Fault and Failure Measurements and Fault Injection for Quantifying Fault Detection, Propagation and Impact. Office of Scientific and Technical Information (OSTI), April 2020. http://dx.doi.org/10.2172/1615150.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Chen, S., L. Peng, and G. Bronevetsky. A Framework For Evaluating Comprehensive Fault Resilience Mechanisms In Numerical Programs. Office of Scientific and Technical Information (OSTI), January 2015. http://dx.doi.org/10.2172/1179432.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Stearley, Jon R., Rolf E. Riesen, James H. ,. III Laros, Kurt Brian Ferreira, Kevin Thomas Tauke Pedretti, Ron A. Oldfield, Todd Kordenbrock, and Ronald Brian Brightwell. Increasing fault resiliency in a message-passing environment. Office of Scientific and Technical Information (OSTI), October 2009. http://dx.doi.org/10.2172/1001015.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Sargsyan, Khachik, Khachik Sargsyan, Cosmin Safta, Cosmin Safta, Bert Debusschere, Bert Debusschere, Habib N. Najm, et al. Fault Resilient Domain Decomposition Preconditioner for PDEs. Office of Scientific and Technical Information (OSTI), June 2015. http://dx.doi.org/10.2172/1494624.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Stearley, Jon R., James H. ,. III Laros, Kurt Brian Ferreira, Kevin Thomas Tauke Pedretti, Ron A. Oldfield, Rolf Riesen, and Ronald Brian Brightwell. rMPI : increasing fault resiliency in a message-passing environment. Office of Scientific and Technical Information (OSTI), April 2011. http://dx.doi.org/10.2172/1012733.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Morris, Karla Vanessa, Francesco Rizzi, Khachik Sargsyan, Kathryn Dahlgren, Paul Mycek, Cosmin Safta, Olivier Le Maitre, Omar Knio, and Bert Debusschere. Scalability of Partial Differential Equations Preconditioner Resilient to Soft and Hard Faults. Office of Scientific and Technical Information (OSTI), May 2016. http://dx.doi.org/10.2172/1561477.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography