Dissertations / Theses on the topic 'Fault-tolerant computing. electronic data processing'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 30 dissertations / theses for your research on the topic 'Fault-tolerant computing. electronic data processing.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Subramaniyan, Rajagopal. "Gossip-based failure detection and consensus for terascale computing." [Gainesville, Fla.] : University of Florida, 2003. http://purl.fcla.edu/fcla/etd/UFE0000799.
Full textDamani, Om Prakash. "Optimistic protocols for fault-tolerance in distributed systems /." Digital version accessible at:, 1999. http://wwwlib.umi.com/cr/utexas/main.
Full textYi, Byungho. "Faults and fault-tolerance in distributed computing systems : the election problem." Diss., Georgia Institute of Technology, 1994. http://hdl.handle.net/1853/8312.
Full textLin, Luke. "Localizing the effects of failure in distributed systems." Diss., Georgia Institute of Technology, 1990. http://hdl.handle.net/1853/8207.
Full textWilkes, Charles Thomas. "Programming methodologies for resilience and availability." Diss., Georgia Institute of Technology, 1987. http://hdl.handle.net/1853/8308.
Full textTarafdar, Ashis. "Software fault tolerance in distributed systems using controlled re-execution /." Digital version accessible at:, 2000. http://wwwlib.umi.com/cr/utexas/main.
Full textBazzi, Rida Adnan. "Automatically increasing fault tolerance in distributed systems." Diss., Georgia Institute of Technology, 1994. http://hdl.handle.net/1853/8133.
Full textSoria-Rodriguez, Pedro. "Multicast-Based Interactive-Group Object-Replication For Fault Tolerance." Digital WPI, 1999. https://digitalcommons.wpi.edu/etd-theses/1069.
Full text岑蘭 and Lan Sham. "Performance study of a new disk shadowing scheme." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1997. http://hub.hku.hk/bib/B31214599.
Full textSham, Lan. "Performance study of a new disk shadowing scheme /." Hong Kong : University of Hong Kong, 1997. http://sunzi.lib.hku.hk/hkuto/record.jsp?B18735502.
Full textZhou, Wanlei, and mikewood@deakin edu au. "Building reliable distributed systems." Deakin University. School of Computing and Mathematics, 2001. http://tux.lib.deakin.edu.au./adt-VDU/public/adt-VDU20051017.160921.
Full textZhan, Zhiyuan. "Meeting Data Sharing Needs of Heterogeneous Distributed Users." Diss., Georgia Institute of Technology, 2007. http://hdl.handle.net/1853/14598.
Full textSubbiah, Arun. "Design and evaluation of a distributed diagnosis algorithm for arbitrary network topologies in dynamic fault environments." Thesis, Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/13273.
Full textRao, Shrisha. "Safety and hazard analysis in concurrent systems." Diss., University of Iowa, 2005. http://ir.uiowa.edu/etd/106.
Full textPaul, Arnab. "Designing Secure and Robust Distribted and Pervasive Systems with Error Correcting Codes." Diss., Georgia Institute of Technology, 2005. http://hdl.handle.net/1853/6848.
Full textCai, Zhongtang. "Risk-based proactive availability management." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/22581.
Full textCommittee Member: Ahamad, Mustaque; Committee Member: Eisenhauer, Greg; Committee Member: Milojicic, Dejan; Committee Member: Pu, Calton; Committee Member: Schwan, Karsten.
Sotoma, Irineu. "Qualidade de serviço de detectores de defeitos na presença de rajadas de perdas de mensagens." [s.n.], 2006. http://repositorio.unicamp.br/jspui/handle/REPOSIP/276287.
Full textTese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-07T10:13:43Z (GMT). No. of bitstreams: 1 Sotoma_Irineu_D.pdf: 1483229 bytes, checksum: 9fd71c5e3e9cefbd8ffefab03b2eb566 (MD5) Previous issue date: 2006
Resumo: A Qualidade de Serviço (QoS) de detectores de defeitos determina a rapidez que um detector de defeitos q detecta a quebra de um processo p, e a precisão que q informa essa quebra. Em redes de longa distância e em redes sem fio, a ocorrência de quebras de processo, altas variações de atraso e perdas de pacotes em rajadas são comuns. Nestas condições, uma escolha adequada de parâmetros, por um configurador de detectores de defeitos, para manter o detector de defeitos satisfazendo os requisitos de QoS, é requerida. Por isso, este trabalho propõe um configurador de detector de defeitos que leva em conta a distribuição de probabilidade de comprimento de rajadas de perdas de pacotes de mensagem, através do uso de um modelo de Markov. Os resultados da simulação mostram que os parâmetros fornecidos pelo configurador proposto tendem a levar o detector de defeitos a satisfazer os requisitos de QoS em redes sujeitas a rajadas de perdas. Adicionalmente, a pesquisa mostra que é possível melhorar a precisão do detector de defeitos usando uma combinação de estimadores simples de atrasos de mensagens
Abstract: The quality of service (QoS) of failure detectors determines how fast a failure detector q detects the crash of a process p, and how accurate q informs the p crash. In wide area networks and wireless networks, the occurrence of process crashes, high delay variations and burst losses in message packets are common. In these conditions, an adequate choice in the failure detector parameters, by a failure detector configurator, to keep the failure detector satisfying the QoS requirements, is required. Therefore, this work proposes a failure detector Configurator which takes into account the probability distribution of loss burst lengths of message packets, by using a Markov model. The simulation results show that the parameters provided by the proposed configurator tend to lead the failure detector to satisfy the QoS requirements in networks subject to message loss bursts. Additionally, the work shows that is possible improve the accuracy of the failure detector by using a simple combination of simple message delay estimators
Doutorado
Mestre em Ciência da Computação
Oriani, André 1984. "Uma solução de alta disponibilidade para o sistema de arquivos distribuidos do Hadoop." [s.n.], 2013. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275641.
Full textDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-22T22:11:10Z (GMT). No. of bitstreams: 1 Oriani_Andre_M.pdf: 3560692 bytes, checksum: 90ac96e4274dea19b7bcaec78aa959f8 (MD5) Previous issue date: 2013
Resumo: Projetistas de sistema geralmente optam por sistemas de arquivos baseados em cluster como solução de armazenamento para ambientes de computação de alto desempenho. A razão para isso é que eles provêm dados com confiabilidade, consistência e alta vazão. Porém a maioria desses sistemas de arquivos emprega uma arquitetura centralizada, o que compromete sua disponibilidade. Este trabalho foca especificamente em um exemplar de tais sistemas, o Hadoop Distributed File System (HDFS). O trabalho propõe um hot standby para o nó mestre do HDFS a fim de conferir-lhe alta disponibilidade. O hot standby é implementado por meio da (i) extensão da replicação de estado do mestre realizada por seu checkpoint helper, o Backup Node; e por meio da (ii) introdução de um mecanismo automático de failover. O passo (i) aproveitou-se da técnica de duplicação de mensagens desenvolvida por outra técnica de alta disponibilidade para o HDFS chamada Avatar Nodes. O passo (ii) empregou ZooKeeper, um serviço distribuído de coordenação. Essa estratégia resultou em mudanças de código pequenas, cerca de 0,18% do código original, o que faz a solução ser de fácil estudo e manutenção. Experimentos mostraram que o custo adicional imposto pela replicação não aumentou em mais de 11% o consumo médio de recursos pelos nós do sistema nem diminuiu a vazão de dados comparando-se com a versão original do HDFS. A transição completa para o hot standby pode tomar até 60 segundos quando sob cargas de trabalho dominadas por operações de E/S, mas menos de 0,4 segundos em cenários com predomínio de requisições de metadados. Estes resultados evidenciam que a solução desenvolvida nesse trabalho alcançou seus objetivos de produzir uma solução de alta disponibilidade para o HDFS com baixo custo e capaz de reagir a falhas em um breve espaço de tempo
Abstract: System designers generally adopt cluster-based file systems as the storage solution for high-performance computing environments. That happens because they provide data with reliability, consistency and high throughput. But most of those fie systems employ a centralized architecture which compromises their availability. This work focuses on a specimen of such systems, the Hadoop Distributed File System (HDFS). A hot standby for the master node of HDFS is proposed in order to bring high availability to the system. The hot standby was achieved by (i) extending the master's state replication performed by its checkpointer helper, the Backup Node; and by (ii) introducing an automatic failover mechanism. Step (i) took advantage of the message duplication technique developed by other high availability solution for HDFS named AvatarNodes. Step (ii) employed ZooKeeper, a distributed coordination service. That approach resulted on small code changes, around 0.18% of the original code, which makes the solution easy to understand and to maintain. Experiments showed that the overhead implied by replication did not increase the average resource consumption of system nodes by more than 11% nor did it diminish the data throughput compared to the original version of HDFS. The complete transition for the hot standby can take up to 60 seconds on workloads dominated by I/O operations, but less than 0.4 seconds when there is predominance of metadata requisitions. Those results show that the solution developed on this work achieved the goals of producing a high availability solution for the HDFS with low overhead and short reaction time to failures
Mestrado
Ciência da Computação
Mestre em Ciência da Computação
Batchu, Rajanikanth Reddy. "Incorporating fault-tolerant features into message-passing middleware." Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-04072003-215052.
Full textChen, Changgui, and mikewood@deakin edu au. "A Reactive system model for building fault-tolerant distributed applications." Deakin University. School of Computing and Mathematics, 2001. http://tux.lib.deakin.edu.au./adt-VDU/public/adt-VDU20050915.134208.
Full textKurt, Mehmet Can. "Fault-tolerant Programming Models and Computing Frameworks." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1437390499.
Full textKao, Ming-lai. "A reconfigurable fault-tolerant multiprocessor system for real-time control /." The Ohio State University, 1986. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487266011223248.
Full textLi, Yingjie. "Information dissemination and routing in communication networks." Columbus, Ohio : Ohio State University, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1132767756.
Full textJones, Clinton Christopher. "Determining Coefficients of Checking Polynomials for an Algebraic Method of Fault Tolerant Computations of Numerical Functions." Thesis, Georgia Institute of Technology, 2004. http://hdl.handle.net/1853/5242.
Full textHoltby, Dan. "Lower bound for scalable Byzantine agreement." Thesis, 2006. http://hdl.handle.net/1828/2069.
Full textOlander, Peter Andrew. "Built-in tests for a real-time embedded system." Thesis, 1991. http://hdl.handle.net/10413/5680.
Full textThesis (M.Sc.)-University of Natal, Durban, 1991.
Park, Seungjin. "Fault-tolerant communications in parallel systems." Thesis, 1993. http://hdl.handle.net/1957/37044.
Full textAlMohammad, Bader Fahed AlBedaiwi. "On resource placements and fault-tolerant broadcasting in toroidal networks." Thesis, 1997. http://hdl.handle.net/1957/33680.
Full textGraduation date: 1998
"Design and implementation of a fault-tolerant multimedia network and a local map based (LMB) self-healing scheme for arbitrary topology networks." 1997. http://library.cuhk.edu.hk/record=b5889296.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 1997.
Includes bibliographical references (leaves 101-[106]).
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Overview --- p.1
Chapter 1.2 --- Service Survivability Planning --- p.2
Chapter 1.3 --- Categories of Outages --- p.3
Chapter 1.4 --- Goals of Restoration --- p.4
Chapter 1.5 --- Technology Impacts on Network Survivability --- p.5
Chapter 1.6 --- Performance Models and Measures in Quantifying Network Sur- vivability --- p.6
Chapter 1.7 --- Organization of Thesis --- p.6
Chapter 2 --- Design and Implementation of A Survivable High-Speed Mul- timedia Network --- p.8
Chapter 2.1 --- An Overview of CUM LAUDE NET --- p.8
Chapter 2.2 --- The Network Architecture --- p.9
Chapter 2.2.1 --- Architectural Overview --- p.9
Chapter 2.2.2 --- Router-Node Design --- p.11
Chapter 2.2.3 --- Buffer Allocation --- p.12
Chapter 2.2.4 --- Buffer Transmission Priority --- p.14
Chapter 2.2.5 --- Congestion Control --- p.15
Chapter 2.3 --- Protocols --- p.16
Chapter 2.3.1 --- Design Overview --- p.16
Chapter 2.3.2 --- ACTA - The MAC Protocol --- p.17
Chapter 2.3.3 --- Protocol Layering --- p.18
Chapter 2.3.4 --- "Segment, Datagram and Packet Format" --- p.20
Chapter 2.3.5 --- Fast Packet Routing --- p.22
Chapter 2.3.6 --- Local Host NIU --- p.24
Chapter 2.4 --- The Network Restoration Strategy --- p.25
Chapter 2.4.1 --- The Dual-Ring Model and Assumptions --- p.26
Chapter 2.4.2 --- Scenarios of Network Failure and Remedies --- p.26
Chapter 2.4.3 --- Distributed Fault-Tolerant Algorithm --- p.26
Chapter 2.4.4 --- Distributed Auto-Healing Algorithm --- p.28
Chapter 2.4.5 --- The Network Management Signals --- p.31
Chapter 2.5 --- Performance Evaluation --- p.32
Chapter 2.5.1 --- Restoration Time --- p.32
Chapter 2.5.2 --- Reliability Measures --- p.34
Chapter 2.5.3 --- Network Availability During Restoration --- p.41
Chapter 2.6 --- The Prototype --- p.42
Chapter 2.7 --- Technical Problems Encountered --- p.45
Chapter 2.8 --- Chapter Summary and Future Development --- p.46
Chapter 3 --- A Simple Experimental Network Management Software - NET- MAN --- p.48
Chapter 3.1 --- Introduction to NETMAN --- p.48
Chapter 3.2 --- Network Management Basics --- p.49
Chapter 3.2.1 --- The Level of Management Protocols --- p.49
Chapter 3.2.2 --- Architecture Model --- p.51
Chapter 3.2.3 --- TCP/IP Network Management Protocol Architecture --- p.53
Chapter 3.2.4 --- A Standard Network Management Protocol On Internet - SNMP --- p.54
Chapter 3.2.5 --- A Standard For Managed Information --- p.55
Chapter 3.3 --- The CUM LAUDE Network Management Protocol Suite (CNMPS) --- p.56
Chapter 3.3.1 --- The Architecture --- p.53
Chapter 3.3.2 --- Goals of the CNMPS --- p.59
Chapter 3.4 --- Highlights of NETMAN --- p.61
Chapter 3.5 --- Functional Descriptions of NETMAN --- p.63
Chapter 3.5.1 --- Topology Menu --- p.64
Chapter 3.5.2 --- Fault Manager Menu --- p.65
Chapter 3.5.3 --- Performance Meter Menu --- p.65
Chapter 3.5.4 --- Gateway Utility Menu --- p.67
Chapter 3.5.5 --- Tools Menu --- p.67
Chapter 3.5.6 --- Help Menu --- p.68
Chapter 3.6 --- Chapter Summary --- p.68
Chapter 4 --- A Local Map Based (LMB) Self-Healing Scheme for Arbitrary Topology Networks --- p.70
Chapter 4.1 --- Introduction --- p.79
Chapter 4.2 --- An Overview of Existing DCS-Based Restoration Algorithms --- p.72
Chapter 4.3 --- The Network Model and Assumptions --- p.74
Chapter 4.4 --- Basics of the LMB Scheme --- p.75
Chapter 4.4.1 --- Restoration Concepts --- p.75
Chapter 4.4.2 --- Terminology --- p.76
Chapter 4.4.3 --- Algorithm Parameters --- p.77
Chapter 4.5 --- Performance Assessments --- p.78
Chapter 4.6 --- The LMB Network Restoration Scheme --- p.80
Chapter 4.6.1 --- Initialization - Local Map Building --- p.80
Chapter 4.6.2 --- The LMB Restoration Messages Set --- p.81
Chapter 4.6.3 --- Phase I - Local Map Update Phase --- p.81
Chapter 4.6.4 --- Phase II - Update Acknowledgment Phase --- p.82
Chapter 4.6.5 --- Phase III - Restoration and Confirmation Phase --- p.83
Chapter 4.6.6 --- Phase IV - Cancellation Phase --- p.83
Chapter 4.6.7 --- Re-Initialization --- p.84
Chapter 4.6.8 --- Path Route Monitoring --- p.84
Chapter 4.7 --- Performance Evaluation --- p.84
Chapter 4.7.1 --- The Testbeds --- p.84
Chapter 4.7.2 --- Simulation Results --- p.86
Chapter 4.7.3 --- Storage Requirements --- p.89
Chapter 4.8 --- The LMB Scheme on ATM and SONET environment --- p.92
Chapter 4.9 --- Future Work --- p.94
Chapter 4.10 --- Chapter Summary --- p.94
Chapter 5 --- Conclusion and Future Work --- p.96
Chapter 5.1 --- Conclusion --- p.95
Chapter 5.2 --- Future Work --- p.99
Bibliography --- p.101
Chapter A --- Derivation of Communicative Probability --- p.107
Chapter B --- List of Publications --- p.110
Yan, Jiaxiang. "Modeling, monitoring and optimization of discrete event systems using Petri nets." 2014. http://hdl.handle.net/1805/3874.
Full textYan, Jiaxiang. M.S.E.C.E., Purdue University, May 2013. Modeling, Monitoring and Optimization of Discrete Event Systems Using Petri Nets. Major Professor: Lingxi Li. In last decades, the research of discrete event systems (DESs) has attracts more and more attention because of the fast development of intelligent control strategies. Such control measures combine the conventional control strategies with discrete decision-making processes which simulate human decision-making processes. Due to the scale and complexity of common DESs, the dedicated models, monitoring methods and optimal control strategies for them are necessary. Among various DES models, Petri nets are famous for the advantage in dealing with asynchronous processes. They have been widely applied in intelligent transportation systems (ITS) and communication technology in recent years. With encoding of the Petri net state, we can also enable fault detection and identification capability in DESs and mitigate potential human errors. This thesis studies various problems in the context of DESs that can be modeled by Petri nets. In particular, we focus on systematic modeling, asynchronous monitoring and optimal control strategies design of Petri nets. This thesis starts by looking at the systematic modeling of ITS. A microscopic model of signalized intersection and its two-layer timed Petri net representation is proposed in this thesis, where the first layer is the representation of the intersection and the second layer is the representation of the traffic light system. Deterministic and stochastic transitions are both involved in such Petri net representation. The detailed operation process of such Petri net representation is stated. The improvement of such Petri net representation is also provided with comparison to previous models. Then we study the asynchronous monitoring of sensor networks. An event sequence reconstruction algorithm for a given sensor network based on asynchronous observations of its state changes is proposed in this thesis. We assume that the sensor network is modeled as a Petri net and the asynchronous observations are in the form of state (token) changes at different places in the Petri net. More specifically, the observed sequences of state changes are provided by local sensors and are asynchronous, i.e., they only contain partial information about the ordering of the state changes that occur. We propose an approach that is able to partition the given net into several subnets and reconstruct the event sequence for each subnet. Then we develop an algorithm that is able to reconstruct the event sequences for the entire net that are consistent with: 1) the asynchronous observations of state changes; 2) the event sequences of each subnet; and 3) the structure of the given Petri net. We discuss the algorithmic complexity. The final problem studied in this thesis is the optimal design method of Petri net controllers with fault-tolerant ability. In particular, we consider multiple faults detection and identification in Petri nets that have state machine structures (i.e., every transition in the net has only one input place and one output place). We develop the approximation algorithms to design the fault-tolerant Petri net controller which achieves the minimal number of connections with the original controller. A design example for an automated guided vehicle (AGV) system is also provided to illustrate our approaches.