Dissertations / Theses on the topic 'Fault-tolerance computing'

To see the other types of publications on this topic, follow the link: Fault-tolerance computing.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Fault-tolerance computing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Mugwar, Bader. "Fault tolerance : a new method to detect fault in computing systems." Virtual Press, 1986. http://liblink.bsu.edu/uhtbin/catkey/450654.

Full text
Abstract:
This paper discusses the detection of Fault Tolerance in computers. It outlines the present techniques available, namely, Anderson's and Avizienis: The writer introduces a new method based on Anderson's detection technique; this modified version turns out to be a more foolproof system. Since the shortcomings of both the 'old' techniques are discussed in detail the writer also suggests how to overcome them using the technique that he had proposed. To prove the excellence of his method, the writer applies his technique to the SIFT system to show that it is workable and superior to previous ones. Diagrams are provided for clarification.Ball State UniversityMuncie, IN 47306
APA, Harvard, Vancouver, ISO, and other styles
2

Sullivan, John F. "Network fault tolerance system." Link to electronic thesis, 2000. http://www.wpi.edu/Pubs/ETD/Available/etd-0501100-125656.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Wagealla, Waleed. "Reliable mobile agents for distributed computing." Thesis, Nottingham Trent University, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.272441.

Full text
Abstract:
The emergence of platform-independent, mobile code technologies has created big opportunities for Internet-based applications. Mobile agents are being utilized to perform a variety of tasks from personalized computing to business-critical transactions. Unfortunately, these advances were not matched by correspondent research into the reliability of these new technologies. This work has been undertaken to investigate the faulttolerance of this new paradigm. Agent programs' mobility and autonomy of execution has introduced a new class of failures different to that of traditional distributed systems. Therefore, fault tolerance is one of the main problems that must be resolved to improve the adoption of an agents' paradigm. The investigation of mobile agents reliability in this thesis resulted in the development of REMA (REliable Mobile Agents), which guarantees the reliable execution, migration, and communication of mobile agents in the presence of faults that might affect the agents hosts or their communication network. We introduced an algorithm for the transparent detection of faults that might affect agent execution, migration, and communication. A decentralized structure was used to divide the agent dynamic distributed system into network-partitioning proof spaces. Lightweight messaging was adopted as the basic error detection engine, which together with the loosely coupled detection managers provided an efficient, low overhead detection mechanism for agent-based distributed processing. The problem of taking checkpoint of agent execution is hampered by the lack of the accessibility of the underlying structure of the JVM. Thus, an alternative solution has been achieved through the REMA Checkpoint and Recovery (REMA-CR) package. REMA-CR provides the developer with powerful classes and methods that allow for capturing the critical data of agents' execution. The developed recovery protocol offers a communication-pairs, independent checkpointing strategy at a low-cost, that covers all possible faults that might invalidate reliable agent execution, migration and communication and maintains the exactly once execution property. The results and the performance of REMA confirmed our objectives of providing a fault tolerant wrapper for agents and their applications with acceptable overhead cost.
APA, Harvard, Vancouver, ISO, and other styles
4

Pierce, Evelyn Tumlin. "Self-adjusting quorum systems for Byzantine fault tolerance /." Full text (PDF) from UMI/Dissertation Abstracts International, 2000. http://wwwlib.umi.com/cr/utexas/fullcit?p3004357.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Hall, Stephen. "An integrated fault tolerance framework for service oriented computing." Thesis, Lancaster University, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.547982.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Clements, N. Scott. "Fault tolerance control of complex dynamical systems." Diss., Georgia Institute of Technology, 2003. http://hdl.handle.net/1853/15515.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Damani, Om Prakash. "Optimistic protocols for fault-tolerance in distributed systems /." Digital version accessible at:, 1999. http://wwwlib.umi.com/cr/utexas/main.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Snodgrass, Joshua D. "Low-power fault tolerance for spacecraft FPGA-based numerical computing." Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2006. http://library.nps.navy.mil/uhtbin/hyperion/06Sep%5FSnodgrass%5FPhD.pdf.

Full text
Abstract:
Dissertation (Ph.D. in Electrical Engineering)--Naval Postgraduate School, September 2006.
Dissertation Advisor(s): Herschel H. Loomis. "September 2006." Includes bibliographical references (p. 217-224). Also available in print.
APA, Harvard, Vancouver, ISO, and other styles
9

Hunt, Robert D. "New software-based fault tolerance methods for high performance computing." Thesis, University of Bristol, 2015. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.683389.

Full text
Abstract:
As computer systems become ever more powerful and parallel, processing larger and larger sets of data, there is increased need for ensuring that scientific software applications are tolerant to faults in both hardware and software. New algorithms which take advantage of knowledge about the structure and calculation of important mathematical problems would enable increasingly more efficient and fault tolerant computation to be performed with minimal overhead. This thesis demonstrates how improvements to two important application areas in High Performance Computing (HP C) - that of Monte Carlo methods and Sparse Linear Algebra - can result in software with greater fault tolerance alongside low overheads. It proposes models that employ variations on existing techniques dealing with layout topologies in grids and a form of Error-Correcting Code (ECC) to provide an increased degree of fault tolerance in calculations. The models make efficient use of the variations to produce schemes that are both robust and based on straightforward approaches which can be implemented in a simple manner. The theory behind the models is developed and evaluated and basic implementations are created to gauge the performance and viability of the schemes. Both models perform well in the majority of cases with low overheads in the range of 0-10%, and both are eminently scalable. Furthermore, the methods with highest overhead in the Sparse Linear Algebra schemes are found to increase in performance for larger data sets that are more sparse - those that would require the extra protection afforded by software fault tolerance the most.
APA, Harvard, Vancouver, ISO, and other styles
10

Rao, Sriram S. "Egida : a toolkit for low-overhead fault-tolerance /." Digital version accessible at:, 1999. http://wwwlib.umi.com/cr/utexas/main.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Parameswaran, Rupa. "Investigation of precision versus fault tolerance in voting algorithms." Thesis, Georgia Institute of Technology, 2002. http://hdl.handle.net/1853/13536.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Bazzi, Rida Adnan. "Automatically increasing fault tolerance in distributed systems." Diss., Georgia Institute of Technology, 1994. http://hdl.handle.net/1853/8133.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Klonowska, Kamilla. "Theoretical aspects on performance bounds and fault tolerance in parallel computing /." Karlskrona : Department of Systems and Software Engineering, School of Engineering, Blekinge Institute of Technology, 2007. http://www.bth.se/fou/Forskinfo.nsf/allfirst2/a46ebb190dfb7caec12573a700356d59?OpenDocument.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Yi, Byungho. "Faults and fault-tolerance in distributed computing systems : the election problem." Diss., Georgia Institute of Technology, 1994. http://hdl.handle.net/1853/8312.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Stewart, Robert. "Reliable massively parallel symbolic computing : fault tolerance for a distributed Haskell." Thesis, Heriot-Watt University, 2013. http://hdl.handle.net/10399/2834.

Full text
Abstract:
As the number of cores in manycore systems grows exponentially, the number of failures is also predicted to grow exponentially. Hence massively parallel computations must be able to tolerate faults. Moreover new approaches to language design and system architecture are needed to address the resilience of massively parallel heterogeneous architectures. Symbolic computation has underpinned key advances in Mathematics and Computer Science, for example in number theory, cryptography, and coding theory. Computer algebra software systems facilitate symbolic mathematics. Developing these at scale has its own distinctive set of challenges, as symbolic algorithms tend to employ complex irregular data and control structures. SymGridParII is a middleware for parallel symbolic computing on massively parallel High Performance Computing platforms. A key element of SymGridParII is a domain specific language (DSL) called Haskell Distributed Parallel Haskell (HdpH). It is explicitly designed for scalable distributed-memory parallelism, and employs work stealing to load balance dynamically generated irregular task sizes. To investigate providing scalable fault tolerant symbolic computation we design, implement and evaluate a reliable version of HdpH, HdpH-RS. Its reliable scheduler detects and handles faults, using task replication as a key recovery strategy. The scheduler supports load balancing with a fault tolerant work stealing protocol. The reliable scheduler is invoked with two fault tolerance primitives for implicit and explicit work placement, and 10 fault tolerant parallel skeletons that encapsulate common parallel programming patterns. The user is oblivious to many failures, they are instead handled by the scheduler. An operational semantics describes small-step reductions on states. A simple abstract machine for scheduling transitions and task evaluation is presented. It defines the semantics of supervised futures, and the transition rules for recovering tasks in the presence of failure. The transition rules are demonstrated with a fault-free execution, and three executions that recover from faults. The fault tolerant work stealing has been abstracted in to a Promela model. The SPIN model checker is used to exhaustively search the intersection of states in this automaton to validate a key resiliency property of the protocol. It asserts that an initially empty supervised future on the supervisor node will eventually be full in the presence of all possible combinations of failures. The performance of HdpH-RS is measured using five benchmarks. Supervised scheduling achieves a speedup of 757 with explicit task placement and 340 with lazy work stealing when executing Summatory Liouville up to 1400 cores of a HPC architecture. Moreover, supervision overheads are consistently low scaling up to 1400 cores. Low recovery overheads are observed in the presence of frequent failure when lazy on-demand work stealing is used. A Chaos Monkey mechanism has been developed for stress testing resiliency with random failure combinations. All unit tests pass in the presence of random failure, terminating with the expected results.
APA, Harvard, Vancouver, ISO, and other styles
16

Bicer, Tekin. "Supporting Fault Tolerance and Dynamic Load Balancing in FREERIDE-G." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1267638588.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Roy, Amitabha. "Symmetry breaking and fault tolerance in boolean satisfiability /." view abstract or download file of text, 2001. http://wwwlib.umi.com/cr/uoregon/fullcit?p3024528.

Full text
Abstract:
Thesis (Ph. D.)--University of Oregon, 2001.
Typescript. Includes vita and abstract. Includes bibliographical references (leaves 124-127). Also available for download via the World Wide Web; free to University of Oregon users.
APA, Harvard, Vancouver, ISO, and other styles
18

Nguyen, Anthony. "Database system architecture for fault tolerance and disaster recovery." [Denver, Colo.] : Regis University, 2009. http://adr.coalliance.org/codr/fez/view/codr:152.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

何偉康 and Wai-hong Ho. "Performance and fault-tolerance studies of wormhole routers in 2D meshes." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1997. http://hub.hku.hk/bib/B31214125.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Ho, Wai-hong. "Performance and fault-tolerance studies of wormhole routers in 2D meshes /." Hong Kong : University of Hong Kong, 1997. http://sunzi.lib.hku.hk/hkuto/record.jsp?B19685737.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Tarafdar, Ashis. "Software fault tolerance in distributed systems using controlled re-execution /." Digital version accessible at:, 2000. http://wwwlib.umi.com/cr/utexas/main.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Arechiga, Austin Podoll. "Sensitivity of Feedforward Neural Networks to Harsh Computing Environments." Thesis, Virginia Tech, 2018. http://hdl.handle.net/10919/84527.

Full text
Abstract:
Neural Networks have proven themselves very adept at solving a wide variety of problems, in particular they accel at image processing. However, it remains unknown how well they perform under memory errors. This thesis focuses on the robustness of neural networks under memory errors, specifically single event upset style errors where single bits flip in a network's trained parameters. The main goal of these experiments is to determine if different neural network architectures are more robust than others. Initial experiments show that MLPs are more robust than CNNs. Within MLPs, deeper MLPs are more robust and for CNNs larger kernels are more robust. Additionally, the CNNs displayed bimodal failure behavior, where memory errors would either not affect the performance of the network, or they would degrade its performance to be on par with random guessing. VGG16, ResNet50, and InceptionV3 were also tested for their robustness. ResNet50 and InceptionV3 were both more robust than VGG16. This could be due to their use of Batch Normalization or the fact that ResNet50 and InceptionV3 both use shortcut connections in their hidden layers. After determining which networks were most robust, some estimated error rates from neutrons were calculated for space environments to determine if these architectures were robust enough to survive. It was determined that large MLPs, ResNet50, and InceptionV3 could survive in Low Earth Orbit on commercial memory technology and only use software error correction.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
23

Soria-Rodriguez, Pedro. "Multicast-Based Interactive-Group Object-Replication For Fault Tolerance." Digital WPI, 1999. https://digitalcommons.wpi.edu/etd-theses/1069.

Full text
Abstract:
"Distributed systems are clusters of computers working together on one task. The sharing of information across different architectures, and the timely and efficient use of the network resources for communication among computers are some of the problems involved in the implementation of a distributed system. In the case of a low latency system, the network utilization and the responsiveness of the communication mechanism are even more critical. This thesis introduces a new approach for the distribution of messages to computers in the system, in which, the Common Object Request Broker Architecture (CORBA) is used in conjunction with IP multicast to implement a fault-tolerant, low latency distributed system. Fault tolerance is achieved by replication of the current state of the system across several hosts. An update of the current state is initiated by a client application that contacts one of the state object replicas. The new information needs to be distributed to all the members of the distributed system (the object replicas). This state update is accomplished by using a two-phase commit protocol, which is implemented using a binary tree structure along with IP multicast to reduce the amount of network utilization, distribute the computation load associated with state propagation, and to achieve faster communication among the members of the distributed system. The use of IP multicast enhances the speed of message distribution, while the two-phase commit protocol encapsulates IP multicast to produce a reliable multicast service that is suitable for fault tolerant, distributed low latency applications. The binary tree structure, finally, is essential for the load sharing of the state commit response collection processing. "
APA, Harvard, Vancouver, ISO, and other styles
24

Villamayor, Leguizamón Jorge Luis. "Fault tolerance configuration and management for HPC applications using RADIC architecture." Doctoral thesis, Universitat Autònoma de Barcelona, 2018. http://hdl.handle.net/10803/666057.

Full text
Abstract:
Los sistemas de computación de alto rendimiento (HPC) continúan creciendo exponencialmente en términos de cantidad y densidad de componentes para lograr mayor potencia de cálculo. Al mismo tiempo, cloud computing se está volviendo popular, ya que las características clave tales como escalabilidad, pay-per-use y alta disponibilidad continúan evolucionando. También se está convirtiendo en una plataforma competitiva para ejecutar aplicaciones paralelas HPC debido al rendimiento cada vez mayor de instancias virtualizadas y de alta disponibilidad. Sin embargo, aumentar la cantidad de componentes para crear sistemas más grandes tiende a incrementar la frecuencia de fallos tanto en clústeres como en cloud. Hoy en día, los sistemas HPC tienen una tasa de fallos de alrededor de 1000 por año, lo que significa un fallo cada aproximadamente 8 horas. La mayoría de las aplicaciones paralelas distribuidas se construyen sobre una interfaz de paso de mensajes (MPI). Las implementaciones MPI siguen una semántica fail-stop predeterminada, que aborta la ejecución en caso de fallos de host en un clúster. En este caso, el propietario de la aplicación debe reiniciar la ejecución, lo que afecta el tiempo total esperado del mismo y, además, el costo, ya que requiere adquirir más recursos durante periodos de tiempo más largos. Las técnicas de Tolerancia a Fallos (TF) deben aplicarse a las ejecuciones paralelas de MPI tanto en clúster como en cloud. Con las técnicas de TF, se garantiza una alta disponibilidad para aplicaciones paralelas. Para aplicar algunas soluciones de TF, se requieren privilegios de administrador para instalarlas en los nodos del clúster. Además, cuando aparecen fallos, se requiere intervención humana para reiniciar la aplicación. Se prefiere una solución que minimice la intervención de usuarios y administradores. Una contribución de esta tesis es un Fault Tolerance Manager (FTM) para checkpoints coordinados, que proporciona a los usuarios la recuperación automática de fallos al perder nodos de clúster. FTM aprovecha el almacenamiento local en los nodos para guardar los checkpoints, y distribuye sus copias entre los nodos de computo, evitando el cuello de botella de un almacenamiento centralizado. También aprovechamos FTM para utilizar protocolos de rollback-recovery no-coordinados y semi-coordinados. En esta contribución, FTM se implementa en la capa de aplicación. Además, se agrega un controlador dinámico de recursos al FTM, que monitoriza el uso de recursos destinados para la protección FT y realiza acciones para mantener un nivel aceptable de protección. Otra contribución apunta a la configuración de tareas de protección y recuperación de la TF. Se presentan dos modelos: el modelo First Protection Point (FPP) determina el punto de partida para introducir la protección de TF de modo a obtener beneficios en términos de tiempo total de ejecución, incluyendo fallos, eliminando checkpoints innecesarios. El segundo modelo permite mejorar la configuración de recursos de la TF para la tarea de recuperación. Con respecto a entornos cloud, proponemos Resilience as a Service (RaaS), un servicio que provee TF para aplicaciones HPC, que utiliza FTM. RaaS proporciona al cloud un servicio de TF distribuido, escalable y altamente disponible. Rediseña los mecanismos tradicionales de protección y recuperación de HPC, para aprovechar de forma nativa las capacidades del cloud y sus múltiples alternativas para implementar tareas de TF. En resumen, esta tesis contribuye a proporcionar un gestor de tolerancia a fallos multi-plataforma “Multi-Platform Resilience Manager” (MRM), adecuado para entornos de clústers y clouds (públicos y privados). La solución presentada proporciona TF de forma automática, distribuida y transparente en las capas de aplicación y usuario según los requisitos de usuarios, aplicaciones y entorno de ejecución. Brinda además, a los usuarios información crítica de la TF, lo que les permite compensar entre costos y protección, manteniendo el tiempo medio de reparación dentro de rangos aceptables. Durante las validaciones experimentales se utilizaron varios entornos experimentales, como clústeres tradicionales y cloud (públicos y privados), ejecutando diferentes aplicaciones paralelas. Los experimentos verifican la funcionalidad y la mejora de las contribuciones. Además, también muestran que el tiempo medio de reparación se encuentra limitado y dentro de rangos aceptables.
High Performance Computing (HPC) systems continue growing exponentially in terms of components quantity and density to achieve demanding computational power. At the same time, cloud computing is becoming popular, as key features such as scalability, pay-per-use and availability continue to evolve. It is also becoming a competitive platform for running parallel HPC applications due to the increasing performance of virtualized, highly-available instances. Although, augmenting the amount of components to create larger systems tends to increment the frequency of failures in both clusters and cloud environments. Nowadays, HPC systems have a failure rate of around 1000 per year, meaning a failure every approximately 8 hours. Most of the parallel distributed applications are built on top of a Message Passing Interface (MPI). MPI implementations follow a default fail-stop semantic, which aborts the execution in case of host failure in a cluster. In this case, the application owner needs to restart the execution, which affects the wall clock time and, also, the cost since it requires to acquire computing resources for longer periods of time. Fault Tolerance (FT) techniques need to be applied to MPI parallel executions in both, cluster and cloud environments. With FT techniques, high availability is ensured for parallel applications. In order to apply some FT solutions, administrator privileges are required, to install them in the cluster nodes. Moreover, when failures appear human intervention is required to recover the application. A solution, which minimizes users and administrators intervention is preferred. A contribution of this thesis is a Fault Tolerance Manager (FTM) for coordinated checkpoint, which provides the application's users with automatic recovery from failures when losing computing nodes. It takes advantage of node local storage to save checkpoints, and it distributes copies of them along all the computation nodes, avoiding the bottleneck of a central stable storage. We also leverage the FTM to use uncoordinated and semi-coordinated rollback recovery protocols. In this contribution, FTM is implemented in the application-layer. Furthermore, a dynamic resource controller is added to the FTM, which monitors the FT protection resource usage and performs actions to maintain an acceptable level of protection. Another contribution aims to the FT protection and recovery tasks configuration. Two models are introduced. The First Protection Point model (FPP) determines the starting point to introduce FT protection gaining benefits in terms of total execution time including failures, by removing unnecessary checkpoints. The second model allows improving the FT resource configuration for the recovery task. Regarding cloud environments, we propose Resilience as a Service (RaaS), a fault tolerant framework for HPC applications, which uses FTM. RaaS provides clouds with a highly available, distributed and scalable fault-tolerant service. It redesigns traditional HPC protection and recovery mechanisms, to natively leverage cloud capabilities and its multiple alternatives for implementing FT tasks. To summarize, this thesis contributes on providing a multi-platform resilience manager, suitable for traditional baremetal clusters and clouds (public and private). The presented solution provides FT in an automatic, distributed and transparent manner in the application and user levels according to the users, applications, and runtime requirements. It gives the users critical FT information, allowing them to trade-off cost and protection keeping the mean time to repair within acceptable ranges. Several experimental environments such as bare-metal clusters and cloud (public and private), running different parallel applications were used during the experimental validations. The experiments verify the functionality and improvement of the contributions. Moreover, they also show that the Mean Time To Repair is bounded within acceptable ranges.
APA, Harvard, Vancouver, ISO, and other styles
25

Varghese, Blesson. "Swarm-array computing : a swarm robotics inspired approach to achieve automated fault tolerance in high-performance computing systems." Thesis, University of Reading, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.559260.

Full text
Abstract:
Abstract: Fault tolerance is an important area of research in high-performance computing. Traditional fault tolerant methods which require human administrator intervention are challenged by many drawbacks and hence pose a constraint in achieving efficient fault tolerance for high-performance computer systems. The research presented in this dissertation is motivated towards the development of automated fault tolerant methods for high-performance computing. To this end, four questions are addressed: (1) How can autonomic computing concepts be ap- plied to parallel computing? (2) How can a bridge between multi-agent systems and parallel computing systems be built for achieving fault tolerance? (3) How can pro- cessor virtualization for process migration be extended for achieving fault tolerance in parallel computing systems? (4) How can traditional fault tolerant methods be replaced to achieve efficient fault tolerance in high-performance computing systems? In this dissertation, Swarm-Array Computing, a novel framework inspired by the concept of multi-agents in swarm robotics, and built on the foundations of parallel and autonomic computing is proposed to address these questions. The framework comprises three approaches, firstly, intelligent agents, secondly, intelligent cores, and thirdly, a combination of these as a means to achieving automated fault tolerance inline with the goals of autonomic computing. The feasibility of the framework is evaluated using simulation and practical experimental studies. The simulation studies were performed by emulating a field programmable gate array on a multi-agent simulator. The practical studies involved the implementation of a parallel reduction algorithm using message passing interfaces on a computer cluster. The statistics gathered from the experiments confirm that the swarm-array computing approaches improve the fault tolerance of high-performance computing systems over traditional fault tolerant mechanisms. The agent concepts within the framework are formalised by mapping a layered architecture onto both intelligent agents and intelligent cores. Elements of the work reported in this dissertation have been published as journal and conference papers (Appendix A) and presented as public lectures, conference presentations and posters (Appendix B).
APA, Harvard, Vancouver, ISO, and other styles
26

Celik, Yasin. "FEASIBILITY STUDIES OF STATISTIC MULTIPLEXED COMPUTING." Diss., Temple University Libraries, 2018. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/511914.

Full text
Abstract:
Computer and Information Science
Ph.D.
In 2012, when Professor Shi introduced me to the concept of Statistic Multiplexed Computing (SMC), I was skeptical. It contradicted everything I have learned and heard about distributed and parallel computing. However, I did believe that unhandled failures in any application will negatively impact its scalability. For that, I agreed to take on the feasibility study of SMC for practical applications. After six+ years research and experimentations, it became clear to me that the most widely believed misconception is “either performance or reliability” when upscaling a distributed application. This conception was the result of the direct use of hop-by-hop communication protocols in distributed application construction. Terminology: Hop-by-hop data protocol is a two-sided reliable lossless data communication protocol for transmitting data between a sender and a receiver. Either the sender or the receiver crash will cause data losses. Examples: MPI, RPC, RMI, OpenMP. End-to-end data protocol is a single-sided reliable lossless data communication protocol for transmitting data between application programs. All runtime available processors, networks and storage will be automatically dispatched to the best effort support of the reliable communication regardless transient and permanent device failures. Examples: HDFS, Blockchain, Fabric and SMC. Active end-to-end data protocol is a single-sided reliable lossless data communication pro- tocol for transmitting data and automatically synchronizing application programs. Example: SMC (AnkaCom, AnkaStore (this dissertation)). Unlike the hop-by-hop protocols, the use of end-to-end protocol forms an application- dependent overlay network. An overlay network for distributed and parallel computing application, such as Blockchain, has been proven to defy the “common wisdom” for two important distributed computing challenges: a) Extreme scale computing without single-point failures is practically feasible. Thus, all transaction or data losses can be eliminated. b) Extreme scale synchronized transaction replication is practically feasible. Thus, the CAP conjecture and theorem become irrelevant. Unlike passive overlay networks, such as the HDFS and Blockchain, this dissertation study proves that an active overlay network can deliver higher performance, higher reliability and security at the same time as the application up scales. Although application-level security is not part of this dissertation, it is easy to see that application-level end-to-end protocols will fundamentally eliminate the “man-in-the-middle” attacks. This will nullify many well-known attacks. With the zero-single-point failure and zero impact synchronous replication features, SMC applications are naturally resistant to DDoS and ransomware attacks. This dissertation explores practical implementations of the SMC concept for compute intensive (CI) and data intensive (DI) applications. This defense will disclose the details of CI and DI runtime implementations and results of inductive computational experiments. The computational environments include the NSF Chameleon bare-metal HPC cloud and Temple’s TCloud cluster.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
27

Luckow, André. "A dependable middleware for enhancing the fault tolerance of distributed computations in grid environments." Aachen Shaker, 2009. http://d-nb.info/1002791081/04.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Morten, Andrew J. "An accurate analytical framework for computing fault-tolerance thresholds using the [[7,1,3]] quantum code." Thesis, Massachusetts Institute of Technology, 2005. http://hdl.handle.net/1721.1/35052.

Full text
Abstract:
Thesis (S.B.)--Massachusetts Institute of Technology, Dept. of Physics, 2005.
Includes bibliographical references (p. 141-143).
In studies of the threshold for fault-tolerant quantum error-correction, it is generally assumed that the noise channel at all levels of error-correction is the depolarizing channel. The effects of this assumption on the threshold result are unknown. We address this problem by calculating the effective noise channel at all levels of error-correction specifically for the Steane [[7,1,3]] code, and we recalculate the threshold using the new noise channels. We present a detailed analytical framework for these calculations and run numerical simulations for comparison. We find that only X and Z failures occur with significant probability in the effective noise channel at higher levels of error-correction. We calculate that when changes in the noise channel are accounted for, the value of the threshold for the Steane [[7,1,3]] code increases by about 30 percent, from .00030 to .00039, when memory failures occur with one tenth the probability of all other failures. Furthermore, our analytical model provides a framework for calculating thresholds for systems where the initial noise channel is very different from the depolarizing channel, such as is the case for ion trap quantum computation.
by Andrew J. Morten.
S.B.
APA, Harvard, Vancouver, ISO, and other styles
29

Hay, Karen June. "A proof methodology for verification of real-time and fault-tolerance properties of distributed programs." Diss., The University of Arizona, 1993. http://hdl.handle.net/10150/186261.

Full text
Abstract:
From the early days of programming, the dependability of software has been a concern. The development of distributed systems that must respond in real-time and continue to function correctly in spite of hardware failure have increased the concern while making the task of ensuring dependability more complex. This dissertation presents a technique for improving confidence in software designed to execute on a distributed system of fail-stop processors. The methodology presented is based on a temporal logic augmented with time intervals and probability distributions. A temporal logic augmented with time intervals, Bounded Time Temporal Logic (BTTL), supports the specification and verification of real-time properties such as, "The program will poll the sensor every t to T time units." Analogously, a temporal logic augmented with probability distributions, Probabilistic Bounded Time Temporal Logic (PBTTL), supports reasoning about fault-tolerant properties such as, "The program will complete with probability less than or equal to p", and a combination of these properties such as, "The program will complete within t and T time units with probability less than or equal to p." The syntax and semantics of the two logics, BTTL and PBTTL, are carefully developed. This includes development of a program state model, state transition model, message passing system model and failure system model. An axiomatic program model is then presented and used for the development of a set of inference rules. The inference rules are designed to simplify use of the logic for reasoning about typical programming language constructs and commonly occurring programming scenarios. In addition to offering a systematic approach for verifying typical behaviors, the inference rules are intended to support the derivation of formulas expressing timing and probabilistic relationships between the execution times and probabilities of individual statements, groups of statements, message passing and failure recovery. Use of the methodology is demonstrated in examples of varying complexity, including five real-time examples and four combined real-time and fault-tolerant examples.
APA, Harvard, Vancouver, ISO, and other styles
30

Alfawair, Mai. "A framework for evolving grid computing systems." Thesis, De Montfort University, 2009. http://hdl.handle.net/2086/3423.

Full text
Abstract:
Grid computing was born in the 1990s, when researchers were looking for a way to share expensive computing resources and experiment equipment. Grid computing is becoming increasingly popular because it promotes the sharing of distributed resources that may be heterogeneous in nature, and it enables scientists and engineering professionals to solve large scale computing problems. In reality, there are already huge numbers of grid computing facilities distributed around the world, each one having been created to serve a particular group of scientists such as weather forecasters, or a group of users such as stock markets. However, the need to extend the functionalities of current grid systems lends itself to the consideration of grid evolution. This allows the combination of many disjunct grids into a single powerful grid that can operate as one vast computational resource, as well as for grid environments to be flexible, to be able to change and to evolve. The rationale for grid evolution is the current rapid and increasing advances in both software and hardware. Evolution means adding or removing capabilities. This research defines grid evolution as adding new functions and/or equipment and removing unusable resources that affect the performance of some nodes. This thesis produces a new technique for grid evolution, allowing it to be seamless and to operate at run time. Within grid computing, evolution is an integration of software and hardware and can be of two distinct types, external and internal. Internal evolution occurs inside the grid boundary by migrating special resources such as application software from node to node inside the grid. While external evolution occurs between grids. This thesis develops a framework for grid evolution that insulates users from the complexities of grids. This framework has at its core a resource broker together with a grid monitor to cope with internal and external evolution, advance reservation, fault tolerance, the monitoring of the grid environment, increased resource utilisation and the high availability of grid resources. The starting point for the present framework of grid evolution is when the grid receives a job whose requirements do not exist on the required node which triggers grid evolution. If the grid has all the requirements scattered across its nodes, internal evolution enabling the grid to migrate the required resources to the required node in order to satisfy job requirements ensues, but if the grid does not have these resources, external evolution enables the grid either to collect them from other grids (permanent evolution) or to send the job to other grids for execution (just in time) evolution. Finally a simulation tool called (EVOSim) has been designed, developed and tested. It is written in Oracle 10g and has been used for the creation of four grids, each of which has a different setup including different nodes, application software, data and polices. Experiments were done by submitting jobs to the grid at run time, and then comparing the results and analysing the performance of those grids that use the approach of evolution with those that do not. The results of these experiments have demonstrated that these features significantly improve the performance of grid environments and provide excellent scheduling results, with a decreasing number of rejected jobs.
APA, Harvard, Vancouver, ISO, and other styles
31

Kwon, Young Woo. "Effective Fusion and Separation of Distribution, Fault-Tolerance, and Energy-Efficiency Concerns." Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/49386.

Full text
Abstract:
As software applications are becoming increasingly distributed and mobile, their design and implementation are characterized by distributed software architectures, possibility of faults, and the need for energy awareness. Thus, software developers should be able to simultaneously reason about and handle the concerns of distribution, fault-tolerance, and energy-efficiency. Being closely intertwined, these concerns can introduce significant complexity into the design and implementation of modern software. In other words, to develop reliable and energy-efficient applications, software developers must understand how distribution, fault-tolerance, and energy-efficiency interplay with each other and how to implement these concerns while keeping the complexity in check. This dissertation addresses five technical issues that stand on the way of engineering reliable and energy-efficient software: (1) how can developers select and parameterize middleware to achieve the requisite levels of performance, reliability, and energy-efficiency? (2) how can one streamline the process of implementing and reusing fault tolerance functionality in distributed applications? (3) can automated techniques be developed to help transition centralized applications to using cloud-based services efficiently and reliably? (4) how can one leverage cloud-based resources to improve the energy-efficiency of mobile applications? (5) how can middleware be adapted to improve the energy-efficiency of distributed mobile applications operated over heterogeneous mobile networks? To address these issues, this research studies the concerns of distribution, fault-tolerance, and energy-efficiency as well as their interaction. It also develops novel approaches, techniques, and tools that effectively fuse and separate these concerns as required by particular software development scenarios. The specific innovations include (1) a systematic assessment of the performance, conciseness, complexity, reliability, and energy consumption of middleware mechanisms for accessing remote functionality, (2) a declarative approach to hardening distributed applications with resiliency against partial failure, (3) cloud refactoring, a set of automated program transformations for transitioning to using cloud-based services efficiently and reliably, (4) a cloud offloading approach that improves the energy-efficiency of mobile applications without compromising their reliability, (5) a middleware mechanism that optimizes energy consumption by adapting execution patterns dynamically in response to fluctuations in network conditions.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
32

Stainer, Julien. "Computability Abstractions for Fault-tolerant Asynchronous Distributed Computing." Thesis, Rennes 1, 2015. http://www.theses.fr/2015REN1S054/document.

Full text
Abstract:
Cette thèse étudie ce qui peut-être calculé dans des systèmes composés de multiple ordinateurs communicant par messages ou partageant de la mémoire. Les modèles considérés prennent en compte la possibilité de défaillance d'une partie de ces ordinateurs ainsi que la variabilité et l'hétérogénéité de leurs vitesses d'exécution. Les résultats présentés considèrent principalement les problèmes d'accord, les systèmes sujets au partitionnement et les détecteurs de fautes. Ce document établis des relations entre les modèles itérés connus et la notion de détecteur de fautes. Il présente une hiérarchie de problèmes généralisant l'accord k-ensembliste et le consensus s-simultané. Une nouvelle construction universelle basée sur des objets consensus s-simultané ainsi qu'une famille de modèles itérés autorisant plusieurs processus à s'exécuter en isolation sont introduites
This thesis studies computability in systems composed of multiple computers exchanging messages or sharing memory. The considered models take into account the possible failure of some of these computers, as well as variations in time and heterogeneity of their execution speeds. The presented results essentially consider agreement problems, systems prone to partitioning and failure detectors. The document establishes relations between known iterated models and the concept of failure detector and presents a hierarchy of agreement problems spanning from k-set agreement to s-simultaneous consensus. It also introduces a new universal construction based on s-simultaneous consensus objects and a family of iterated models allowing several processes to run in isolation
APA, Harvard, Vancouver, ISO, and other styles
33

Shoker, Ali. "Byzantine fault tolerance from static selection to dynamic switching." Toulouse 3, 2012. http://thesesups.ups-tlse.fr/1924/.

Full text
Abstract:
La Tolérance aux pannes Byzantines (BFT) est de plus en plus crucial avec l'évolution d'applications et en raison de la croissance de l'innovation technologique en informatique. Bien que des dizaines de protocoles BFT aient été introduites dans les années précédentes, leur mise en œuvre ne semble pas satisfaisant. Pour faire face à cette complexité, due à la dependence d'un protocol d'une situation, nous tentons une approche qui permettra de sélectionner un protocole en fonction d'une situation. Ceci nous paraît, en s'inspirant de tout système d'encrage, comme une démarche nécessaire pour aborder la problématique de la BFT. Dans cette thèse, nous introduisons un modèle de sélection ainsi que l'algorithme qui permet de simplifier et d'automatiser le processus d'élection d'un protocole. Ce mécanisme est conçu pour fonctionner selon 3 modes : statique, dynamique et heuristique. Les deux derniers modes, nécessitent l'introduction d'un système réactif, nous ont conduits à présenter un nouveau modèle BFT : Adapt. Il réagit à tout changement et effectue, d'une manière adaptée, la commutation entre les protocoles d'une façon dynamique. Le mode statique permet aux utilisateurs de BFT de choisir un protocole BFT en une seule fois. Ceci est très utile dans les services Web et les " Clouds " où le BFT peut être fournit comme un service inclut dans le contrat (SLA). Ce mode est essentiellement conçu pour les systèmes qui n'ont pas trop d'états fluctuants. Pour ce faire, un processus d'évaluation est en charge de faire correspondre, à priori, les préférences de l'utilisateur aux profils du protocole BFT nommé, en fonction des critères de fiabilité et de performance. Le protocole choisi est celui qui réalise le meilleur score d'évaluation. Le mécanisme est bien automatisé à travers des matrices mathématiques, et produit des sélections qui sont raisonnables. D'autres systèmes peuvent cependant avoir des conditions flottantes, il s'agit de la variation des charges ou de la taille de message qui n'est pas fixe. Dans ce cas, le mode statique ne peut continuer à être efficace et risque de ne pas pouvoir s'adapter aux nouvelles conditions. D'où la nécessité de trouver un moyen permettant de répondre aux nouvelles exigences d'une façon dynamique. Adapt combine un ensemble de protocoles BFT ainsi que leurs mécanismes de commutation pour assurer l'adaptation à l'évolution de l'état du système. Par conséquent, le "Meilleur" protocole est toujours sélectionné selon l'état du système. On obtient ainsi une qualité optimisée de service, i. E. , la fiabilité et la performance. Adapt contrôle l'état du système grâce à ses mécanismes d'événements, et utilise une méthode de "Support Vecor Regrssion" pour conduire aux prédictions en temps réel pour l'exécution des protocoles (par exemple, débit, latence, etc. ). Ceci nous conduit aussi à un mode heuristique. En utilisant des heuristiques prédéfinies, on optimise les préférences de l'utilisateur afin d'améliorer le processus de sélection. L'évaluation de notre approche montre que le choix du "meilleur" protocole est automatisé et proche de la réalité de la même façon que dans le mode statique. En mode dynamique, Adapt permet toujours d'obtenir la performance optimale des protocoles disponibles. L'évaluation démontre, en plus, que la performance globale du système peut être améliorée de manière significative. Explorer d'autres cas qui ne conduisent pas de basculer entre les protocoles. Ceci est rendu possible grâce à la réalisation des prévisions d'une grande precision qui peuvent atteindre plus de 98% dans de nombreux cas. La thèse montre que cette adaptabilité est rendue possible grâce à l'utilisation des heuristiques dans un mode dynamique
Byzantine Fault Tolerance (BFT) is becoming crucial with the revolution of online applications and due to the increasing number of innovations in computer technologies. Although dozens of BFT protocols have been introduced in the previous decade, their adoption by practitioners sounds disappointing. To some extant, this indicates that existing protocols are, perhaps, not yet too convincing or satisfactory. The problem is that researchers are still trying to establish 'the best protocol' using traditional methods, e. G. , through designing new protocols. However, theoretical and experimental analyses demonstrate that it is hard to achieve one-size-fits-all BFT protocols. Indeed, we believe that looking for smarter tac-tics like 'fasten fragile sticks with a rope to achieve a solid stick' is necessary to circumvent the issue. In this thesis, we introduce the first BFT selection model and algorithm that automate and simplify the election process of the 'preferred' BFT protocol among a set of candidate ones. The selection mechanism operates in three modes: Static, Dynamic, and Heuristic. For the two latter modes, we present a novel BFT system, called Adapt, that reacts to any potential changes in the system conditions and switches dynamically between existing BFT protocols, i. E. , seeking adaptation. The Static mode allows BFT users to choose a single BFT protocol only once. This is quite useful in Web Services and Clouds where BFT can be sold as a service (and signed in the SLA contract). This mode is basically designed for systems that do not have too fuctuating states. In this mode, an evaluation process is in charge of matching the user preferences against the profiles of the nominated BFT protocols considering both: reliability, and performance. The elected protocol is the one that achieves the highest evaluation score. The mechanism is well automated via mathematical matrices, and produces selections that are reasonable and close to reality. Some systems, however, may experience fluttering conditions, like variable contention or message payloads. In this case, the static mode will not be e?cient since a chosen protocol might not fit the new conditions. The Dynamic mode solves this issue. Adapt combines a collection of BFT protocols and switches between them, thus, adapting to the changes of the underlying system state. Consequently, the 'preferred' protocol is always polled for each system state. This yields an optimal quality of service, i. E. , reliability and performance. Adapt monitors the system state through its Event System, and uses a Support Vector Regression method to conduct run time predictions for the performance of the protocols (e. G. , throughput, latency, etc). Adapt also operates in a Heuristic mode. Using predefined heuristics, this mode optimizes user preferences to improve the selection process. The evaluation of our approach shows that selecting the 'preferred' protocol is automated and close to reality in the static mode. In the Dynamic mode, Adapt always achieves the optimal performance among available protocols. The evaluation demonstrates that the overall system performance can be improved significantly too. Other cases explore that it is not always worthy to switch between protocols. This is made possible through conducting predictions with high accuracy, that can reach more than 98% in many cases. Finally, the thesis shows that Adapt can be smarter through using heursitics
APA, Harvard, Vancouver, ISO, and other styles
34

Kurt, Mehmet Can. "Fault-tolerant Programming Models and Computing Frameworks." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1437390499.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Jeffery, Casey Miles. "Performance analysis of dynamic sparing and error correction techniques for fault tolerance in nanoscale memory structures." [Gainesville, Fla.] : University of Florida, 2004. http://purl.fcla.edu/fcla/etd/UFE0007163.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Tadepalli, Sriram Satish. "GEMS: A Fault Tolerant Grid Job Management System." Thesis, Virginia Tech, 2003. http://hdl.handle.net/10919/9661.

Full text
Abstract:
The Grid environments are inherently unstable. Resources join and leave the environment without any prior notification. Application fault detection, checkpointing and restart is of foremost importance in the Grid environments. The need for fault tolerance is especially acute for large parallel applications since the failure rate grows with the number of processors and the duration of the computation. A Grid job management system hides the heterogeneity of the Grid and the complexity of the Grid protocols from the user. The user submits a job to the Grid job management system and it finds the appropriate resource, submits the job and transfers the output files to the user upon job completion. However, current Grid job management systems do not detect application failures. The goal of this research is to develop a Grid job management system that can efficiently detect application failures. Failed jobs are restarted either on the same resource or the job is migrated to another resource and restarted. The research also aims to identify the role of local resource managers in the fault detection and migration of Grid applications.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
37

Schöll, Alexander [Verfasser], and Hans-Joachim [Akademischer Betreuer] Wunderlich. "Efficient fault tolerance for selected scientific computing algorithms on heterogeneous and approximate computer architectures / Alexander Schöll ; Betreuer: Hans-Joachim Wunderlich." Stuttgart : Universitätsbibliothek der Universität Stuttgart, 2018. http://d-nb.info/1164013211/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Bakhshi, Valojerdi Zeinab. "Persistent Fault-Tolerant Storage at the Fog Layer." Licentiate thesis, Mälardalens högskola, Inbyggda system, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-55680.

Full text
Abstract:
Clouds are powerful computer centers that provide computing and storage facilities that can be remotely accessed. The flexibility and cost-efficiency offered by clouds have made them very popular for business and web applications. The use of clouds is now being extended to safety-critical applications such as factories. However, cloud services do not provide time predictability which creates a hassle for such time-sensitive applications. Moreover, delays in the data communication between clouds and the devices the clouds control are unpredictable. Therefore, to increase predictability an intermediate layer between devices and the cloud is introduced. This layer, the Fog layer, aims to provide computational resources closer to the edge of the network. However, the fog computing paradigm relies on resource-constrained nodes, creating new potential challenges in resource management, scalability, and reliability. Solutions such as lightweight virtualization technologies can be leveraged for solving the dichotomy between performance and reliability in fog computing. In this context, container-based virtualization is a key technology providing lightweight virtualization for cloud computing that can be applied in fog computing as well. Such container-based technologies provide fault tolerance mechanisms that improve the reliability and availability of application execution.  By the study of a robotic use-case, we have realized that persistent data storage for stateful applications at the fog layer is particularly important. In addition, we identified the need to enhance the current container orchestration solution to fit fog applications executing in container-based architectures. In this thesis, we identify open challenges in achieving dependable fog platforms. Among these, we focus particularly on scalable, lightweight virtualization, auto-recovery, and re-integration solutions after failures in fog applications and nodes. We implement a testbed to deploy our use-case on a container-based fog platform and investigate the fulfillment of key dependability requirements. We enhance the architecture and identify the lack of persistent storage for stateful applications as an important impediment for the execution of control applications. We propose a solution for persistent fault-tolerant storage at the fog layer, which dissociates storage from applications to reduce application load and separates the concern of distributed storage. Our solution includes a replicated data structure supported by a consensus protocol that ensures distributed data consistency and fault tolerance in case of node failures. Finally, we use the UPPAAL verification tool to model and verify the fault tolerance and consistency of our solution.
APA, Harvard, Vancouver, ISO, and other styles
39

Raja, Chandrasekar Raghunath. "Designing Scalable and Efficient I/O Middleware for Fault-Resilient High-Performance Computing Clusters." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1417733721.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Gheorghiu, Alexandru. "Robust verification of quantum computation." Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/31542.

Full text
Abstract:
Quantum computers promise to offer a considerable speed-up in solving certain problems, compared to the best classical algorithms. In many instances, the gap between quantum and classical running times is conjectured to be exponential. While this is great news for those applications where quantum computers would provide such an advantage, it also raises a significant challenge: how can classical computers verify the correctness of quantum computations? In attempting to answer this question, a number of protocols have been developed in which a classical client (referred to as verifier) can interact with one or more quantum servers (referred to as provers) in order to certify the correctness of a quantum computation performed by the server(s). These protocols are of one of two types: either there are multiple non-communicating provers, sharing entanglement, and the verifier is completely classical; or, there is a single prover and the classical verifier has a device for preparing or measuring quantum states. The latter type of protocols are, arguably, more relevant to near term quantum computers, since having multiple quantum computers that share a large amount of entanglement is, from a technological standpoint, extremely challenging. Before the realisation of practical single-prover protocols, a number of challenges need to be addressed: how robust are these protocols to noise on the verifier's device? Can the protocols be made fault-tolerant without significantly increasing the requirements of the verifier? How do we know that the verifier's device is operating correctly? Could this device be eliminated completely, thus having a protocol with a fully classical verifier and a single quantum prover? Our work attempts to provide answers to these questions. First, we consider a single-prover verification protocol developed by Fitzsimons and Kashefi and show that this protocol is indeed robust with respect to deviations on the quantum state prepared by the verifier. We show that this is true even if those deviations are the result of a correlation with the prover's system. We then use this result to give a verification protocol which is device- independent. The protocol consists of a verifier with a measurement device and a single prover. Device-independence means that the verifier need not trust the measurement device (nor the prover) which can be assumed to be fully malicious (though not communicating with the prover). A key element in realising this protocol is a robust technique of Reichardt, Unger and Vazirani for testing, using non-local correlations, that two untrusted devices share a large number of entangled states. This technique is referred to as rigidity of non-local correlations. Our second result is to prove a rigidity result for a type of quantum correlations known as steering correlations. To do this, we first show that steering correlations can be used in order to certify maximally entangled states, in a setting in which each test is independent of the previous one. We also show that the fidelity with which we characterise the state, in this specific test, is optimal. We then improve the previous result by removing the independence assumption. This then leads to our desired rigidity result. We make use of it, in a similar fashion to the device-independent case, in order to give a verification protocol that is one-sided device-independent. The importance of this application is to show how different trust assumptions affect the efficiency of the protocol. Next, we describe a protocol for fault-tolerantly verifying quantum computations, with minimal "quantum requirements" for the verifier. Specifically, the verifier only requires a device for measuring single-qubit states. Both this device, and the prover's operations are assumed to be prone to errors. We show that under standard assumptions about the error model, it is possible to achieve verification of quantum computation using fault-tolerant principles. As a proof of principle, and to better illustrate the inner workings of the protocol, we describe a toy implementation of the protocol in a quantum simulator, and present the results we obtained, when running it for a small computation. Finally, we explore the possibility of having a verification protocol, with a classical verifier and a single prover, such that the prover is blind with respect to the verifier's computation. We give evidence that this is not possible. In fact, our result is only concerned with blind quantum computation with a classical client, and uses complexity theoretic results to argue why it is improbable for such a protocol to exist. We then use these complexity theoretic techniques to show that a client, with the ability to prepare and send quantum states to a quantum server, would not be able to delegate arbitrary NP problems to that server. In other words, even a client with quantum capabilities cannot exploit those capabilities to delegate the computation of NP problems, while keeping the input, to that computation, private. This is again true, provided certain complexity theoretic conjectures are true.
APA, Harvard, Vancouver, ISO, and other styles
41

Silva, Jaquilino Lopes. "A distributed platform for the volunteer execution of workflows on a local area network." Master's thesis, Faculdade de Ciências e Tecnologia, 2014. http://hdl.handle.net/10362/13102.

Full text
Abstract:
Thesis submitted in fulfilment of the requirements for the Degree of Master of Science in Computer Science
Albatroz Engineering has developed a framework for over-head power lines inspection data acquisition and analysis, which includes hardware and software. The framework’s software components include inspection data analysis and reporting tools, commonly known as PLMI2 application/platform. In PLMI2, the analysis of over-head power line maintenance inspection data consists of a sequence of Automatic Tasks (ATs) interleaved with Manual Tasks (MTs). An AT consists of a set of algorithms that receives as input one or more datasets, processes them and returns new datasets. In turn, an MT enables human supervisors (also known as lines inspection operators) to correct, improve and validate the results of ATs. ATs run faster than MTs and in the overall work cycle, ATs take less than 10% of total processing time, but still take a few minutes. There is data flow dependency among tasks, which can be modelled with a workflow and even if MTs are omitted from this workflow, it is possible to carry the sequence of ATs, postponing MTs. In fact, if the computing cost and waiting time are negligible, it may be advantageous to run ATs earlier in the workflow, prior to validation. To address this opportunity, Albatroz Engineering has invested in a new procedure to stream the data through all ATs fully unattended. Considering these scenarios, it could be useful to have a system capable of detecting available workstations at a given instant and subsequently distribute the ATs to them. In this way, operators could schedule the execution of future ATs for a given inspection data, while they are performing MTs of another. The requirements of the system to implement fall within the field Volunteer Computing Systems and we will address some of the challenges posed by these kinds of systems, namely the hosts volatility and failures. Volunteer Computing is a type of distributed computing which exploits idle CPU cycles from computing resources donated by volunteers and connected through the Internet/Intranet to compute large-scale simulations. This thesis proposes and designs a new distributed task scheduling system in the context of Volunteer Computing Systems, able to schedule the ATs of PLMI2 and exploit idle CPU cycles from workstations within the company’s local area network (LAN) to accelerate the data analysis, being aware of data flow interdependencies. To evaluate the proposed system, a prototype has been implemented, and the simulations results have shown that it is scalable and supports fault-tolerance of tasks execution, by employing the rescheduling mechanism.
APA, Harvard, Vancouver, ISO, and other styles
42

Guo, Yan. "Fault-tolerant resource allocation of an airborne network." Diss., Online access via UMI:, 2007.

Find full text
Abstract:
Thesis (M.S.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Department of Electrical and Computer Engineering, 2007.
Includes bibliographical references.
APA, Harvard, Vancouver, ISO, and other styles
43

Stoicescu, Miruna. "Architecting Resilient Computing Systems : a Component-Based Approach." Thesis, Toulouse, INPT, 2013. http://www.theses.fr/2013INPT0120/document.

Full text
Abstract:
L'évolution des systèmes pendant leur vie opérationnelle est incontournable. Les systèmes sûrs de fonctionnement doivent évoluer pour s'adapter à des changements comme la confrontation à de nouveaux types de fautes ou la perte de ressources. L'ajout de cette dimension évolutive à la fiabilité conduit à la notion de résilience informatique. Parmi les différents aspects de la résilience, nous nous concentrons sur l'adaptativité. La sûreté de fonctionnement informatique est basée sur plusieurs moyens, dont la tolérance aux fautes à l'exécution, où l'on attache des mécanismes spécifiques (Fault Tolerance Mechanisms, FTMs) à l'application. A ce titre, l'adaptation des FTMs à l'exécution s'avère un défi pour développer des systèmes résilients. Dans la plupart des travaux de recherche existants, l'adaptation des FTMs à l'exécution est réalisée de manière préprogrammée ou se limite à faire varier quelques paramètres. Tous les FTMs envisageables doivent être connus dès le design du système et déployés et attachés à l'application dès le début. Pourtant, les changements ont des origines variées et, donc, vouloir équiper un système pour le pire scénario est impossible. Selon les observations pendant la vie opérationnelle, de nouveaux FTMs peuvent être développés hors-ligne, mais intégrés pendant l'exécution. On dénote cette capacité comme adaptation agile, par opposition à l'adaptation préprogrammée. Dans cette thèse, nous présentons une approche pour développer des systèmes sûrs de fonctionnement flexibles dont les FTMs peuvent s'adapter à l'exécution de manière agile par des modifications à grain fin pour minimiser l'impact sur l'architecture initiale. D'abord, nous proposons une classification d'un ensemble de FTMs existants basée sur des critères comme le modèle de faute, les caractéristiques de l'application et les ressources nécessaires. Ensuite, nous analysons ces FTMs et extrayons un schéma d'exécution générique identifiant leurs parties communes et leurs points de variabilité. Après, nous démontrons les bénéfices apportés par les outils et les concepts issus du domaine du génie logiciel, comme les intergiciels réflexifs à base de composants, pour développer une librairie de FTMs adaptatifs à grain fin. Nous évaluons l'agilité de l'approche et illustrons son utilité à travers deux exemples d'intégration : premièrement, dans un processus de développement dirigé par le design pour les systèmes ubiquitaires et, deuxièmement, dans un environnement pour le développement d'applications pour des réseaux de capteurs
Evolution during service life is mandatory, particularly for long-lived systems. Dependable systems, which continuously deliver trustworthy services, must evolve to accommodate changes e.g., new fault tolerance requirements or variations in available resources. The addition of this evolutionary dimension to dependability leads to the notion of resilient computing. Among the various aspects of resilience, we focus on adaptivity. Dependability relies on fault tolerant computing at runtime, applications being augmented with fault tolerance mechanisms (FTMs). As such, on-line adaptation of FTMs is a key challenge towards resilience. In related work, on-line adaption of FTMs is most often performed in a preprogrammed manner or consists in tuning some parameters. Besides, FTMs are replaced monolithically. All the envisaged FTMs must be known at design time and deployed from the beginning. However, dynamics occurs along multiple dimensions and developing a system for the worst-case scenario is impossible. According to runtime observations, new FTMs can be developed off-line but integrated on-line. We denote this ability as agile adaption, as opposed to the preprogrammed one. In this thesis, we present an approach for developing flexible fault-tolerant systems in which FTMs can be adapted at runtime in an agile manner through fine-grained modifications for minimizing impact on the initial architecture. We first propose a classification of a set of existing FTMs based on criteria such as fault model, application characteristics and necessary resources. Next, we analyze these FTMs and extract a generic execution scheme which pinpoints the common parts and the variable features between them. Then, we demonstrate the use of state-of-the-art tools and concepts from the field of software engineering, such as component-based software engineering and reflective component-based middleware, for developing a library of fine-grained adaptive FTMs. We evaluate the agility of the approach and illustrate its usability throughout two examples of integration of the library: first, in a design-driven development process for applications in pervasive computing and, second, in a toolkit for developing applications for WSNs
APA, Harvard, Vancouver, ISO, and other styles
44

Zhan, Zhiyuan. "Meeting Data Sharing Needs of Heterogeneous Distributed Users." Diss., Georgia Institute of Technology, 2007. http://hdl.handle.net/1853/14598.

Full text
Abstract:
The fast growth of wireless networking and mobile computing devices has enabled us to access information from anywhere at any time. However, varying user needs and system resource constraints are two major heterogeneity factors that pose a challenge to information sharing systems. For instance, when a new information item is produced, different users may have different requirements for when the new value should become visible. The resources that each device can contribute to such information sharing applications also vary. Therefore, how to enable information sharing across computing platforms with varying resources to meet different user demands is an important problem for distributed systems research. In this thesis, we address the heterogeneity challenge faced by such systems. We assume that shared information is encapsulated in distributed objects, and we use object replication to increase system scalability and robustness, which introduces the consistency problem. Many consistency models have been proposed in recent years but they are either too strong and do not scale very well, or too weak to meet many users' requirements. We propose a Mixed Consistency (MC) model as a solution. We introduce an access constraints based approach to combine both strong and weak consistency models together. We also propose a MC protocol that combines existing implementations together with minimum modifications. It is designed to tolerate crash failures and slow processes/communication links in the system. We also explore how the heterogeneity challenge can be addressed in the transportation layer by developing an agile dissemination protocol. We implement our MC protocol on top of a distributed publisher-subscriber middleware, Echo. We finally measure the performance of our MC implementation. The results of the experiments are consistent with our expectations. Based on the functionality and performance of mixed consistency protocols, we believe that this model is effective in addressing the heterogeneity of user requirements and available resources in distributed systems.
APA, Harvard, Vancouver, ISO, and other styles
45

Jeganathan, Nithyananda Siva. "A CONTROLLER AREA NETWORK LAYER FOR RECONFIGURABLE EMBEDDED SYSTEMS." UKnowledge, 2007. http://uknowledge.uky.edu/gradschool_theses/484.

Full text
Abstract:
Dependable and Fault-tolerant computing is actively being pursued as a research area since the 1980s in various fields involving development of safety-critical applications. The ability of the system to provide reliable functional service as per its design is a key paradigm in dependable computing. For providing reliable service in fault-tolerant systems, dynamic reconfiguration has to be supported to enable recovery from errors (induced by faults) or graceful degradation in case of service failures. Reconfigurable Distributed applications provided a platform to develop fault-tolerant systems and these reconfigurable architectures requires an embedded network that is inherently fault-tolerant and capable of handling movement of tasks between nodes/processors within the system during dynamic reconfiguration. The embedded network should provide mechanisms for deterministic message transfer under faulty environments and support fault detection/isolation mechanisms within the network framework. This thesis describes the design, implementation and validation of an embedded networking layer using Controller Area Network (CAN) to support reconfigurable embedded systems.
APA, Harvard, Vancouver, ISO, and other styles
46

Viana, Antonio Eduardo Bernardes. "Uma Abordagem Autonômica para Tolerância a Falhas na Execução de Aplicações em Desktop Grids." Universidade Federal do Maranhão, 2011. http://tedebc.ufma.br:8080/jspui/handle/tede/479.

Full text
Abstract:
Made available in DSpace on 2016-08-17T14:53:19Z (GMT). No. of bitstreams: 1 Antonio Eduardo Bernardes Viana.pdf: 1275198 bytes, checksum: 77012d28ed5d52f89b69093e99e04279 (MD5) Previous issue date: 2011-09-05
Computers grids are characterized by the high dynamism of its execution environment, resources and applications heterogeneity, and the requirement for high scalability. These features turn tasks such as configuration, maintenance and recovery of failed applications quite challenging and is becoming increasingly difficult to perform them only by human agents. The autonomic computing paradigm denotes computer systems capable of changing their behavior dynamically in response to changes in the execution environment. For achieving this, the software is generally organized following the MAPE-K (Monitoring, Analysis, Planning, Execution and Knowledge) model, in which managers perform the execution environment sensing activities, context analysis, planning and execution of dynamic reconfiguration actions, based on shared knowledge about the controlled system. In this work we present an autonomic mechanism based on the MAPE-K model to provide fault tolerance for applications running on computer grids, which is capable of monitoring the execution environment and, based on the evaluation of the collected data, to decide which reconfiguration actions must eventually be applied to the fault tolerance mechanism in order to keep the system in balance with the goals of minimizing the applications average completion time and to provide a high success rate in completing their tasks. This paper also describes the performance evaluation of the proposed autonomic mechanism, accomplished through the use of simulation techniques that took into account several opportunistic desktop grids typical environmental scenarios.
Grades de computadores são caracterizadas pelo alto dinamismo de seu ambiente de execução, alta heterogeneidade de recursos e tarefas e por requererem grande escalabilidade. Essas características tornam tarefas como configuração, manutenção e recuperação da execução de aplicações em caso de falhas bastante desafiadoras e cada vez mais difíceis de serem realizadas exclusivamente por agentes humanos. A computação autonômica denota sistemas computacionais capazes de mudar seu comportamento dinamicamente em resposta a variações do ambiente de execução. Para isso, o software é geralmente organizado seguindo-se o modelo MAPE-K (Monitoring, Analysis, Planning, Execution and Knowledge), no qual gerentes autonômicos realizam as atividades de sensoriamento do ambiente de execução, análise de contexto, planejamento e execução de ações de reconfiguração dinâmica, compartilhando algum conhecimento sobre o sistema controlado. Nesse trabalho apresentamos um mecanismo autonômico baseado no modelo MAPE-K para prover tolerância a falhas na execução de aplicações em grades de computadores capaz de monitorar o ambiente de execução e, a partir da avaliação dos dados coletados, decidir quais ações de reconfiguração devem eventualmente ser aplicadas ao mecanismo de tolerância falhas para manter o sistema em equilíbrio com os objetivos de minimizar o tempo médio de conclusão das aplicações e prover alta taxa de sucesso na conclusão de suas tarefas. Este trabalho descreve ainda a avaliação de desempenho do mecanismo autonômico proposto, realizada através do uso técnicas de simulação e que levou em consideração aos diversos cenários típicos de ambientes de desktop grids oportunistas.
APA, Harvard, Vancouver, ISO, and other styles
47

Rao, Shrisha. "Safety and hazard analysis in concurrent systems." Diss., University of Iowa, 2005. http://ir.uiowa.edu/etd/106.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Karl, Holger. "Responsive Execution of Parallel Programs in Distributed Computing Environments." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, 1999. http://dx.doi.org/10.18452/14455.

Full text
Abstract:
Vernetzte Standardarbeitsplatzrechner (sog. Cluster) sind eine attraktive Umgebung zur Ausf"uhrung paralleler Programme; f"ur einige Anwendungsgebiete bestehen jedoch noch immer ungel"oste Probleme. Ein solches Problem ist die Verl"asslichkeit und Rechtzeitigkeit der Programmausf"uhrung: In vielen Anwendungen ist es wichtig, sich auf die rechtzeitige Fertigstellung eines Programms verlassen zu k"onnen. Mechanismen zur Kombination dieser Eigenschaften f"ur parallele Programme in verteilten Rechenumgebungen sind das Hauptanliegen dieser Arbeit. Zur Behandlung dieses Anliegens ist eine gemeinsame Metrik f"ur Verl"asslichkeit und Rechtzeitigkeit notwendig. Eine solche Metrik ist die Responsivit"at, die f"ur die Bed"urfnisse dieser Arbeit verfeinert wird. Als Fallstudie werden Calypso und Charlotte, zwei Systeme zur parallelen Programmierung, im Hinblick auf Responsivit"at untersucht und auf mehreren Abstraktionsebenen werden Ansatzpunkte zur Verbesserung ihrer Responsivit"at identifiziert. L"osungen f"ur diese Ansatzpunkte werden zu allgemeineren Mechanismen f"ur (parallele) responsive Dienste erweitert. Im Einzelnen handelt es sich um 1. eine Analyse der Responsivit"at von Calypsos ``eager scheduling'' (ein Verfahren zur Lastbalancierung und Fehlermaskierung), 2. die Behebung eines ``single point of failure,'' zum einen durch eine Responsivit"atsanalyse von Checkpointing, zum anderen durch ein auf Standardschnittstellen basierendes System zur Replikation bestehender Software, 3. ein Verfahren zur garantierten Ressourcenzuteilung f"ur parallele Programme und 4.die Einbeziehung semantischer Information "uber das Kommunikationsmuster eines Programms in dessen Ausf"uhrung zur Verbesserung der Leistungsf"ahigkeit. Die vorgeschlagenen Mechanismen sind kombinierbar und f"ur den Einsatz in Standardsystemen geeignet. Analyse und Experimente zeigen, dass diese Mechanismen die Responsivit"at passender Anwendungen verbessern.
Clusters of standard workstations have been shown to be an attractive environment for parallel computing. However, there remain unsolved problems to make them suitable to some application scenarios. One of these problems is a dependable and timely program execution: There are many applications in which a program should be successfully completed at a predictable point of time. Mechanisms to combine the properties of both dependable and timely execution of parallel programs in distributed computing environments are the main objective of this dissertation. Addressing these properties requires a joint metric for dependability and timeliness. Responsiveness is such a metric; it is refined for the purposes of this work. As a case study, Calypso and Charlotte, two parallel programming systems, are analyzed and their shortcomings on several abstraction levels with regard to responsiveness are identified. Solutions for them are presented and generalized, resulting in widely applicable mechanisms for (parallel) responsive services. Specifically, these solutions are: 1) a responsiveness analysis of Calypso's eager scheduling (a mechanism for load balancing and fault masking), 2) ameliorating a single point of failure by a responsiveness analysis of checkpointing and by a standard interface-based system for replication of legacy software, 3) managing resources in a way suitable for parallel programs, and 4) using semantical information about the communication pattern of a program to improve its performance. All proposed mechanisms can be combined and are suitable for use in standard environments. It is shown by analysis and experiments that these mechanisms improve the responsiveness of eligible applications.
APA, Harvard, Vancouver, ISO, and other styles
49

Mohammed, Bashir. "A Framework for Efficient Management of Fault Tolerance in Cloud Data Centres and High-Performance Computing Systems: An Investigation and Performance analysis of a Cloud Based Virtual Machine Success and Failure Rate in a typical Cloud Computing Environment and Prediction Methods." Thesis, University of Bradford, 2019. http://hdl.handle.net/10454/17400.

Full text
Abstract:
Cloud computing is increasingly attracting huge attention both in academic research and industry initiatives and has been widely used to solve advanced computation problem. As cloud datacentres continue to grow in scale and complexity, the risk of failure of Virtual Machines (VM) and hosts running several jobs and processing large amount of user request increases and consequently becomes even more difficult to predict potential failures within a datacentre. However, even though fault tolerance continues to be an issue of growing concern in cloud and HPC systems, mitigating the impact of failure and providing accurate predictions with enough lead time remains a difficult research problem. Traditional existing fault-tolerance strategies such as regular check-point/restart and replication are not adequate due to emerging complexities in the systems and do not scale well in the cloud due to resource sharing and distributed systems networks. In the thesis, a new reliable Fault Tolerance scheme using an intelligent optimal strategy is presented to ensure high system availability, reduced task completion time and efficient VM allocation process. Specifically, (i) A generic fault tolerance algorithm for cloud data centres and HPC systems in the cloud was developed. (ii) A verification process is developed to a fully dimensional VM specification during allocation in the presence of fault. In comparison to existing approaches, the results obtained shows an increase in success rate of the VMs, a reduction in response time of VM allocation and an improved overall performance. (iii) A failure prediction model is further developed, and the predictive capabilities of machine learning is explored by applying several algorithms to improve the accuracy of prediction. Experimental results indicate that the average prediction accuracy of the proposed model when predicting failure is about 90% accurate compared to existing algorithms, which implies that the approach can effectively predict potential system and application failures within the system.
APA, Harvard, Vancouver, ISO, and other styles
50

Lemos, Fernando Tarlá Cardoso. "Uma arquitetura otimizada para a detecção de falhas em grades computacionais." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/3/3141/tde-19072013-115312/.

Full text
Abstract:
A detecção de falhas em uma plataforma distribuída é um componente essencial para uma grande quantidade de estratégias de tolerância a falhas, como a restauração do estado das aplicações distribuídas através de checkpointing e message logging. Porém, esta detecção frequentemente depende da comunicação confiável entre os nós de processamento e os módulos de detecção de falhas. Em grades computacionais hierárquicas com limitações de conectividade, a comunicação direta entre nós e módulos de detecção é muitas vezes impossível. Outro fator que dificulta a detecção de falhas em grades computacionais é a localização geograficamente esparsa entre as instituições e os recursos computacionais disponíveis na grade e a consequente utilização de redes de longa distância para os conectar. Esta dissertação apresenta uma arquitetura para a detecção de falhas em plataformas distribuídas otimizada para o funcionamento em grades computacionais hierárquicas, levando suas limitações e requisitos em consideração. A arquitetura, denominada GFDA (Grid Fault Detection Architecture), é estruturada em módulos de detecção das falhas que afetam nós computacionais disponibilizados na grade, módulos de detecção de falhas das aplicações distribuídas, e módulos de coleção, processamento e encaminhamento das notificações de falha e recuperação emitidas pelos módulos de detecção. Detalhes da implementação e da verificação do funcionamento correto da arquitetura são apresentados, bem como resultados obtidos através da execução de componentes da arquitetura em um cluster de computadores simulado através de máquinas virtuais. São propostas técnicas para a otimização da qualidade de serviço de detecção de falhas. Os resultados obtidos com a utilização destas técnicas são comparados com resultados obtidos com abordagens tradicionais. Observa-se que as técnicas implementadas na arquitetura GFDA para o processamento de notificações de falha e recuperação e a introdução de redundância nas mensagens trocadas entre os módulos de detecção de falhas traz resultados positivos em condições adversas de conectividade. Conclui-se que a arquitetura GFDA contribui para o estabelecimento de uma solução viável para a detecção de falhas em uma grade computacional hierárquica em que há restrições de conectividade entre os nós computacionais.
In distributed platforms, fault detection is an essential requirement to a wide range of fault tolerance techniques, such as restoring the state of distributed applications with checkpointing and message logging. However, fault detection often depends on reliable communication between the processing nodes and detection fault modules. Direct communication between the nodes and detection modules is often impossible in hierarchical grid computing platforms. The physical distance between the institutions and resources available on the grid, and thus the requirement of long distance networks connecting them, is another factor that makes direct fault detection in computer grids a challenge. This thesis presents a fault detection architecture for distributed platforms, optimized for usage in hierarchical grids and thus taking into account its restrictions and requirements. The architecture, named GFDA (Grid Fault Detection Architecture), is structured as fault detection modules for faults that affect the computing nodes available on the grid, detection modules for faults that affect the distributed applications, and modules that perform the collection, processing and forwarding of the fault and recovery notifications generated by the detection modules. This thesis presents implementation details, an evaluation of the correctness of the designed architecture, and results obtained through the deployment of parts of the architecture in a simulated cluster that uses virtual machines to simulate computing nodes. Techniques to optimize the quality of the detection fault service are proposed. The results obtained through the usage of such techniques are compared to the results obtained through traditional approaches. Positive results were extracted even under adverse connectivity conditions by using techniques such as the processing of fault and recovery notifications and the introduction of redundant information in the messages exchanged between the detection modules. It is concluded that the GFDA architecture contributes to the establishment of a viable solution for fault detection in a hierarchical grid computing platform that presents connectivity restrictions between the nodes.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography