Dissertations / Theses: 'Fault-tolerant computing. electronic data processing'

1

Subramaniyan, Rajagopal. "Gossip-based failure detection and consensus for terascale computing." [Gainesville, Fla.] : University of Florida, 2003. http://purl.fcla.edu/fcla/etd/UFE0000799.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Damani, Om Prakash. "Optimistic protocols for fault-tolerance in distributed systems /." Digital version accessible at:, 1999. http://wwwlib.umi.com/cr/utexas/main.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Yi, Byungho. "Faults and fault-tolerance in distributed computing systems : the election problem." Diss., Georgia Institute of Technology, 1994. http://hdl.handle.net/1853/8312.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Lin, Luke. "Localizing the effects of failure in distributed systems." Diss., Georgia Institute of Technology, 1990. http://hdl.handle.net/1853/8207.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Wilkes, Charles Thomas. "Programming methodologies for resilience and availability." Diss., Georgia Institute of Technology, 1987. http://hdl.handle.net/1853/8308.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Tarafdar, Ashis. "Software fault tolerance in distributed systems using controlled re-execution /." Digital version accessible at:, 2000. http://wwwlib.umi.com/cr/utexas/main.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Bazzi, Rida Adnan. "Automatically increasing fault tolerance in distributed systems." Diss., Georgia Institute of Technology, 1994. http://hdl.handle.net/1853/8133.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Soria-Rodriguez, Pedro. "Multicast-Based Interactive-Group Object-Replication For Fault Tolerance." Digital WPI, 1999. https://digitalcommons.wpi.edu/etd-theses/1069.

Full text

Abstract:

"Distributed systems are clusters of computers working together on one task. The sharing of information across different architectures, and the timely and efficient use of the network resources for communication among computers are some of the problems involved in the implementation of a distributed system. In the case of a low latency system, the network utilization and the responsiveness of the communication mechanism are even more critical. This thesis introduces a new approach for the distribution of messages to computers in the system, in which, the Common Object Request Broker Architecture (CORBA) is used in conjunction with IP multicast to implement a fault-tolerant, low latency distributed system. Fault tolerance is achieved by replication of the current state of the system across several hosts. An update of the current state is initiated by a client application that contacts one of the state object replicas. The new information needs to be distributed to all the members of the distributed system (the object replicas). This state update is accomplished by using a two-phase commit protocol, which is implemented using a binary tree structure along with IP multicast to reduce the amount of network utilization, distribute the computation load associated with state propagation, and to achieve faster communication among the members of the distributed system. The use of IP multicast enhances the speed of message distribution, while the two-phase commit protocol encapsulates IP multicast to produce a reliable multicast service that is suitable for fault tolerant, distributed low latency applications. The binary tree structure, finally, is essential for the load sharing of the state commit response collection processing. "

APA, Harvard, Vancouver, ISO, and other styles

9

岑蘭 and Lan Sham. "Performance study of a new disk shadowing scheme." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1997. http://hub.hku.hk/bib/B31214599.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Sham, Lan. "Performance study of a new disk shadowing scheme /." Hong Kong : University of Hong Kong, 1997. http://sunzi.lib.hku.hk/hkuto/record.jsp?B18735502.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Zhou, Wanlei, and mikewood@deakin edu au. "Building reliable distributed systems." Deakin University. School of Computing and Mathematics, 2001. http://tux.lib.deakin.edu.au./adt-VDU/public/adt-VDU20051017.160921.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Zhan, Zhiyuan. "Meeting Data Sharing Needs of Heterogeneous Distributed Users." Diss., Georgia Institute of Technology, 2007. http://hdl.handle.net/1853/14598.

Full text

Abstract:

The fast growth of wireless networking and mobile computing devices has enabled us to access information from anywhere at any time. However, varying user needs and system resource constraints are two major heterogeneity factors that pose a challenge to information sharing systems. For instance, when a new information item is produced, different users may have different requirements for when the new value should become visible. The resources that each device can contribute to such information sharing applications also vary. Therefore, how to enable information sharing across computing platforms with varying resources to meet different user demands is an important problem for distributed systems research. In this thesis, we address the heterogeneity challenge faced by such systems. We assume that shared information is encapsulated in distributed objects, and we use object replication to increase system scalability and robustness, which introduces the consistency problem. Many consistency models have been proposed in recent years but they are either too strong and do not scale very well, or too weak to meet many users' requirements. We propose a Mixed Consistency (MC) model as a solution. We introduce an access constraints based approach to combine both strong and weak consistency models together. We also propose a MC protocol that combines existing implementations together with minimum modifications. It is designed to tolerate crash failures and slow processes/communication links in the system. We also explore how the heterogeneity challenge can be addressed in the transportation layer by developing an agile dissemination protocol. We implement our MC protocol on top of a distributed publisher-subscriber middleware, Echo. We finally measure the performance of our MC implementation. The results of the experiments are consistent with our expectations. Based on the functionality and performance of mixed consistency protocols, we believe that this model is effective in addressing the heterogeneity of user requirements and available resources in distributed systems.

APA, Harvard, Vancouver, ISO, and other styles

13

Subbiah, Arun. "Design and evaluation of a distributed diagnosis algorithm for arbitrary network topologies in dynamic fault environments." Thesis, Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/13273.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Rao, Shrisha. "Safety and hazard analysis in concurrent systems." Diss., University of Iowa, 2005. http://ir.uiowa.edu/etd/106.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Paul, Arnab. "Designing Secure and Robust Distribted and Pervasive Systems with Error Correcting Codes." Diss., Georgia Institute of Technology, 2005. http://hdl.handle.net/1853/6848.

Full text

Abstract:

This thesis investigates the role of error-correcting codes in Distributed and Pervasive Computing. The main results are at the intersection of Security and Fault Tolerance for these environments. There are two primary areas that are explored in this thesis. 1. We have investigated protocols for large scale fault tolerant secure distributed storage. The two main concerns here are security and redundancy. In one arm of this research we developed SAFE, a distributed storage system based on a new protocol that offers a two-in-one solution to fault-tolerance and confidentiality. This protocol is based on cryptographic properties of error correction codes. In another arm, we developed esf, another prototype distributed persistent storage; esf facilitates seamless hardware extension of storage units, high resilience to loads and provides high availability. The main ingredient in its design is a modern class of erasure codes known as the {em Fountain Codes}. One problem in such large storage is the heavy overhead of the associated fingerprints needed for checking data integrity. esf deploys a clever integrity check mechanism by use of a data structure known as the {em Merkle Tree} to address this issue. 2. We also investigated the design of a new remote authentication protocol. Applications over long range wireless would benefit quite a bit from this design. We designed and implemented LAWN, a lightweight remote authentication protocol for wireless networks that deploys a randomized approximation scheme based on Error correcting codes. We have evaluated in detail the performance of LAWN; while it adds very low overhead of computation, the savings in bandwidth and power are quite dramatic.

APA, Harvard, Vancouver, ISO, and other styles

16

Cai, Zhongtang. "Risk-based proactive availability management." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/22581.

Full text

Abstract:

Thesis (Ph. D.)--Computing, Georgia Institute of Technology, 2008.
Committee Member: Ahamad, Mustaque; Committee Member: Eisenhauer, Greg; Committee Member: Milojicic, Dejan; Committee Member: Pu, Calton; Committee Member: Schwan, Karsten.

APA, Harvard, Vancouver, ISO, and other styles

17

Sotoma, Irineu. "Qualidade de serviço de detectores de defeitos na presença de rajadas de perdas de mensagens." [s.n.], 2006. http://repositorio.unicamp.br/jspui/handle/REPOSIP/276287.

Full text

Abstract:

Orientador: Edmundo Roberto Mauro Madeira
Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-07T10:13:43Z (GMT). No. of bitstreams: 1 Sotoma_Irineu_D.pdf: 1483229 bytes, checksum: 9fd71c5e3e9cefbd8ffefab03b2eb566 (MD5) Previous issue date: 2006
Resumo: A Qualidade de Serviço (QoS) de detectores de defeitos determina a rapidez que um detector de defeitos q detecta a quebra de um processo p, e a precisão que q informa essa quebra. Em redes de longa distância e em redes sem fio, a ocorrência de quebras de processo, altas variações de atraso e perdas de pacotes em rajadas são comuns. Nestas condições, uma escolha adequada de parâmetros, por um configurador de detectores de defeitos, para manter o detector de defeitos satisfazendo os requisitos de QoS, é requerida. Por isso, este trabalho propõe um configurador de detector de defeitos que leva em conta a distribuição de probabilidade de comprimento de rajadas de perdas de pacotes de mensagem, através do uso de um modelo de Markov. Os resultados da simulação mostram que os parâmetros fornecidos pelo configurador proposto tendem a levar o detector de defeitos a satisfazer os requisitos de QoS em redes sujeitas a rajadas de perdas. Adicionalmente, a pesquisa mostra que é possível melhorar a precisão do detector de defeitos usando uma combinação de estimadores simples de atrasos de mensagens
Abstract: The quality of service (QoS) of failure detectors determines how fast a failure detector q detects the crash of a process p, and how accurate q informs the p crash. In wide area networks and wireless networks, the occurrence of process crashes, high delay variations and burst losses in message packets are common. In these conditions, an adequate choice in the failure detector parameters, by a failure detector configurator, to keep the failure detector satisfying the QoS requirements, is required. Therefore, this work proposes a failure detector Configurator which takes into account the probability distribution of loss burst lengths of message packets, by using a Markov model. The simulation results show that the parameters provided by the proposed configurator tend to lead the failure detector to satisfy the QoS requirements in networks subject to message loss bursts. Additionally, the work shows that is possible improve the accuracy of the failure detector by using a simple combination of simple message delay estimators
Doutorado
Mestre em Ciência da Computação

APA, Harvard, Vancouver, ISO, and other styles

18

Oriani, André 1984. "Uma solução de alta disponibilidade para o sistema de arquivos distribuidos do Hadoop." [s.n.], 2013. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275641.

Full text

Abstract:

Orientador: Islene Calciolari Garcia
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-22T22:11:10Z (GMT). No. of bitstreams: 1 Oriani_Andre_M.pdf: 3560692 bytes, checksum: 90ac96e4274dea19b7bcaec78aa959f8 (MD5) Previous issue date: 2013
Resumo: Projetistas de sistema geralmente optam por sistemas de arquivos baseados em cluster como solução de armazenamento para ambientes de computação de alto desempenho. A razão para isso é que eles provêm dados com confiabilidade, consistência e alta vazão. Porém a maioria desses sistemas de arquivos emprega uma arquitetura centralizada, o que compromete sua disponibilidade. Este trabalho foca especificamente em um exemplar de tais sistemas, o Hadoop Distributed File System (HDFS). O trabalho propõe um hot standby para o nó mestre do HDFS a fim de conferir-lhe alta disponibilidade. O hot standby é implementado por meio da (i) extensão da replicação de estado do mestre realizada por seu checkpoint helper, o Backup Node; e por meio da (ii) introdução de um mecanismo automático de failover. O passo (i) aproveitou-se da técnica de duplicação de mensagens desenvolvida por outra técnica de alta disponibilidade para o HDFS chamada Avatar Nodes. O passo (ii) empregou ZooKeeper, um serviço distribuído de coordenação. Essa estratégia resultou em mudanças de código pequenas, cerca de 0,18% do código original, o que faz a solução ser de fácil estudo e manutenção. Experimentos mostraram que o custo adicional imposto pela replicação não aumentou em mais de 11% o consumo médio de recursos pelos nós do sistema nem diminuiu a vazão de dados comparando-se com a versão original do HDFS. A transição completa para o hot standby pode tomar até 60 segundos quando sob cargas de trabalho dominadas por operações de E/S, mas menos de 0,4 segundos em cenários com predomínio de requisições de metadados. Estes resultados evidenciam que a solução desenvolvida nesse trabalho alcançou seus objetivos de produzir uma solução de alta disponibilidade para o HDFS com baixo custo e capaz de reagir a falhas em um breve espaço de tempo
Abstract: System designers generally adopt cluster-based file systems as the storage solution for high-performance computing environments. That happens because they provide data with reliability, consistency and high throughput. But most of those fie systems employ a centralized architecture which compromises their availability. This work focuses on a specimen of such systems, the Hadoop Distributed File System (HDFS). A hot standby for the master node of HDFS is proposed in order to bring high availability to the system. The hot standby was achieved by (i) extending the master's state replication performed by its checkpointer helper, the Backup Node; and by (ii) introducing an automatic failover mechanism. Step (i) took advantage of the message duplication technique developed by other high availability solution for HDFS named AvatarNodes. Step (ii) employed ZooKeeper, a distributed coordination service. That approach resulted on small code changes, around 0.18% of the original code, which makes the solution easy to understand and to maintain. Experiments showed that the overhead implied by replication did not increase the average resource consumption of system nodes by more than 11% nor did it diminish the data throughput compared to the original version of HDFS. The complete transition for the hot standby can take up to 60 seconds on workloads dominated by I/O operations, but less than 0.4 seconds when there is predominance of metadata requisitions. Those results show that the solution developed on this work achieved the goals of producing a high availability solution for the HDFS with low overhead and short reaction time to failures
Mestrado
Ciência da Computação
Mestre em Ciência da Computação

APA, Harvard, Vancouver, ISO, and other styles

19

Batchu, Rajanikanth Reddy. "Incorporating fault-tolerant features into message-passing middleware." Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-04072003-215052.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Chen, Changgui, and mikewood@deakin edu au. "A Reactive system model for building fault-tolerant distributed applications." Deakin University. School of Computing and Mathematics, 2001. http://tux.lib.deakin.edu.au./adt-VDU/public/adt-VDU20050915.134208.

Full text

Abstract:

The development of fault-tolerant computing systems is a very difficult task. Two reasons contributed to this difficulty can be described as follows. The First is that, in normal practice, fault-tolerant computing policies and mechanisms are deeply embedded into most application programs, so that these application programs cannot cope with changes in environments, policies and mechanisms. These factors may change frequently in a distributed environment, especially in a heterogeneous environment. Therefore, in order to develop better fault-tolerant systems that can cope with constant changes in environments and user requirements, it is essential to separate the fault tolerant computing policies and mechanisms in application programs. The second is, on the other hand, a number of techniques have been proposed for the construction of reliable and fault-tolerant computing systems. Many computer systems are being developed to tolerant various hardware and software failures. However, most of these systems are to be used in specific application areas, since it is extremely difficult to develop systems that can be used in general-purpose fault-tolerant computing. The motivation of this thesis is based on these two aspects. The focus of the thesis is on developing a model based on the reactive system concepts for building better fault-tolerant computing applications. The reactive system concepts are an attractive paradigm for system design, development and maintenance because it separates policies from mechanisms. The stress of the model is to provide flexible system architecture for the general-purpose fault-tolerant application development, and the model can be applied in many specific applications. With this reactive system model, we can separate fault-tolerant computing polices and mechanisms in the applications, so that the development and maintenance of fault-tolerant computing systems can be made easier.

APA, Harvard, Vancouver, ISO, and other styles

21

Kurt, Mehmet Can. "Fault-tolerant Programming Models and Computing Frameworks." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1437390499.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Kao, Ming-lai. "A reconfigurable fault-tolerant multiprocessor system for real-time control /." The Ohio State University, 1986. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487266011223248.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Li, Yingjie. "Information dissemination and routing in communication networks." Columbus, Ohio : Ohio State University, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1132767756.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Jones, Clinton Christopher. "Determining Coefficients of Checking Polynomials for an Algebraic Method of Fault Tolerant Computations of Numerical Functions." Thesis, Georgia Institute of Technology, 2004. http://hdl.handle.net/1853/5242.

Full text

Abstract:

This thesis presents a practical means for determining checking polynomials for the fault tolerant computation of numerical functions. This method is based on certain algebraic features of the numerical functions such as the transcendence degree of a field extension. Checking polynomials are given for representative simple and compound numerical functions. Some of these checking models are implemented in a simulation environment. The program developed provides the means for generating checking polynomials for a broad class of numerical functions. Considerations for designing and deploying checking models are given. This numerical technique can lower costs and conserve system resources when engineering for remote or nanoscale supercomputing environments.

APA, Harvard, Vancouver, ISO, and other styles

25

Holtby, Dan. "Lower bound for scalable Byzantine agreement." Thesis, 2006. http://hdl.handle.net/1828/2069.

Full text

Abstract:

We consider the problem of computing Byzantine Agreement in a synchronous network with n processors each with a private random string, where each pair of processors is connected by a private communication line. The adversary is malicious and non-adaptive, i.e., it must choose the processors to corrupt at the start of the algorithm. Byzantine Agreement is known to be computable in this model in an expected constant number of rounds. We consider a scalable model where in each round each uncorrupt processor can send to any set of log n other processors and listen to any set of log n processors. We define the loss of a computation to be the number of uncorrupt processors whose output, does not agree with the output of the majority of uncorrupt processors, We show that. if there are I corrupt processors, then any randomised protocol which has probability at least 1/2 -h 1/ log u of loss less than t 2/3 / 16fn1/3log5/3n requires at least f rounds.

APA, Harvard, Vancouver, ISO, and other styles

26

Olander, Peter Andrew. "Built-in tests for a real-time embedded system." Thesis, 1991. http://hdl.handle.net/10413/5680.

Full text

Abstract:

Beneath the facade of the applications code of a well-designed real-time embedded system lies intrinsic firmware that facilitates a fast and effective means of detecting and diagnosing inevitable hardware failures. These failures can encumber the availability of a system, and, consequently, an identification of the source of the malfunction is needed. It is shown that the number of possible origins of all manner of failures is immense. As a result, fault models are contrived to encompass prevalent hardware faults. Furthermore, the complexity is reduced by determining syndromes for particular circuitry and applying test vectors at a functional block level. Testing phases and philosophies together with standardisation policies are defined to ensure the compliance of system designers to the underlying principles of evaluating system integrity. The three testing phases of power-on self tests at system start up, on-line health monitoring and off-line diagnostics are designed to ensure that the inherent test firmware remains inconspicuous during normal applications. The prominence of the code is, however, apparent on the detection or diagnosis of a hardware failure. The authenticity of the theoretical models, standardisation policies and built-in test philosophies are illustrated by means of their application to an intricate real-time system. The architecture and the software design implementing the idealogies are described extensively. Standardisation policies, enhanced by the proposition of generic tests for common core components, are advocated at all hierarchical levels. The presentation of the integration of the hardware and software are aimed at portraying the moderately complex nature of the task of generating a set of built-in tests for a real-time embedded system. In spite of generic policies, the intricacies of the architecture are found to have a direct influence on software design decisions. It is thus concluded that the diagnostic objectives of the user requirements specification be lucidly expressed by both operational and maintenance personnel for all testing phases. Disparity may exist between the system designer and the end user in the understanding of the requirements specification defining the objectives of the diagnosis. It is thus essential for complete collaboration between the two parties throughout the development life cycle, but especially during the preliminary design phase. Thereafter, the designer would be able to decide on the sophistication of the system testing capabilities.
Thesis (M.Sc.)-University of Natal, Durban, 1991.

APA, Harvard, Vancouver, ISO, and other styles

27

Park, Seungjin. "Fault-tolerant communications in parallel systems." Thesis, 1993. http://hdl.handle.net/1957/37044.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

AlMohammad, Bader Fahed AlBedaiwi. "On resource placements and fault-tolerant broadcasting in toroidal networks." Thesis, 1997. http://hdl.handle.net/1957/33680.

Full text

Abstract:

Parallel computers are classified into: Multiprocessors, and multicomputers. A multiprocessor system usually has a shared memory through which its processors can communicate. On the other hand, the processors of a multicomputer system communicate by message passing through an interconnection network. A widely used class of interconnection networks is the toroidal networks. Compared to a hypercube, a torus has a larger diameter, but better tradeoffs, such as higher channel bandwidth and lower node degree. Results on resource placements and fault-tolerant broadcasting in toroidal networks are presented. Given a limited number of resources, it is desirable to distribute these resources over the interconnection network so that the distance between a non-resource and a closest resource is minimized. This problem is known as distance-d placement. In such a placement, each non-resource must be within a distance of d or less from at least one resource, where the number of resources used is the least possible. Solutions for distance-d placements in 2D and 3D tori are proposed. These solutions are compared with placements used so far in practice. Simulation experiments show that the proposed solutions are superior to the placements used in practice in terms of reducing average network latency. The complexity of a multicomputer increases the chances of having processor failures. Therefore, designing fault-tolerant communication algorithms is quite necessary for a sufficient utilization of such a system. Broadcasting (single-node one-to-all) in a multicomputer is one of the important communication primitives. A non-redundant fault-tolerant broadcasting algorithm in a faulty toroidal network is designed. The algorithm can adapt up to (2n-2) processor failures. Compared to the optimal algorithm in a fault-free n-dimensional toroidal network, the proposed algorithm requires at most 3 extra communication steps using cut through packet routing, and (n + 1) extra steps using store-and-forward routing.
Graduation date: 1998

APA, Harvard, Vancouver, ISO, and other styles

29

"Design and implementation of a fault-tolerant multimedia network and a local map based (LMB) self-healing scheme for arbitrary topology networks." 1997. http://library.cuhk.edu.hk/record=b5889296.

Full text

Abstract:

by Arion Ko Kin Wa.
Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.
Includes bibliographical references (leaves 101-[106]).
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Overview --- p.1
Chapter 1.2 --- Service Survivability Planning --- p.2
Chapter 1.3 --- Categories of Outages --- p.3
Chapter 1.4 --- Goals of Restoration --- p.4
Chapter 1.5 --- Technology Impacts on Network Survivability --- p.5
Chapter 1.6 --- Performance Models and Measures in Quantifying Network Sur- vivability --- p.6
Chapter 1.7 --- Organization of Thesis --- p.6
Chapter 2 --- Design and Implementation of A Survivable High-Speed Mul- timedia Network --- p.8
Chapter 2.1 --- An Overview of CUM LAUDE NET --- p.8
Chapter 2.2 --- The Network Architecture --- p.9
Chapter 2.2.1 --- Architectural Overview --- p.9
Chapter 2.2.2 --- Router-Node Design --- p.11
Chapter 2.2.3 --- Buffer Allocation --- p.12
Chapter 2.2.4 --- Buffer Transmission Priority --- p.14
Chapter 2.2.5 --- Congestion Control --- p.15
Chapter 2.3 --- Protocols --- p.16
Chapter 2.3.1 --- Design Overview --- p.16
Chapter 2.3.2 --- ACTA - The MAC Protocol --- p.17
Chapter 2.3.3 --- Protocol Layering --- p.18
Chapter 2.3.4 --- "Segment, Datagram and Packet Format" --- p.20
Chapter 2.3.5 --- Fast Packet Routing --- p.22
Chapter 2.3.6 --- Local Host NIU --- p.24
Chapter 2.4 --- The Network Restoration Strategy --- p.25
Chapter 2.4.1 --- The Dual-Ring Model and Assumptions --- p.26
Chapter 2.4.2 --- Scenarios of Network Failure and Remedies --- p.26
Chapter 2.4.3 --- Distributed Fault-Tolerant Algorithm --- p.26
Chapter 2.4.4 --- Distributed Auto-Healing Algorithm --- p.28
Chapter 2.4.5 --- The Network Management Signals --- p.31
Chapter 2.5 --- Performance Evaluation --- p.32
Chapter 2.5.1 --- Restoration Time --- p.32
Chapter 2.5.2 --- Reliability Measures --- p.34
Chapter 2.5.3 --- Network Availability During Restoration --- p.41
Chapter 2.6 --- The Prototype --- p.42
Chapter 2.7 --- Technical Problems Encountered --- p.45
Chapter 2.8 --- Chapter Summary and Future Development --- p.46
Chapter 3 --- A Simple Experimental Network Management Software - NET- MAN --- p.48
Chapter 3.1 --- Introduction to NETMAN --- p.48
Chapter 3.2 --- Network Management Basics --- p.49
Chapter 3.2.1 --- The Level of Management Protocols --- p.49
Chapter 3.2.2 --- Architecture Model --- p.51
Chapter 3.2.3 --- TCP/IP Network Management Protocol Architecture --- p.53
Chapter 3.2.4 --- A Standard Network Management Protocol On Internet - SNMP --- p.54
Chapter 3.2.5 --- A Standard For Managed Information --- p.55
Chapter 3.3 --- The CUM LAUDE Network Management Protocol Suite (CNMPS) --- p.56
Chapter 3.3.1 --- The Architecture --- p.53
Chapter 3.3.2 --- Goals of the CNMPS --- p.59
Chapter 3.4 --- Highlights of NETMAN --- p.61
Chapter 3.5 --- Functional Descriptions of NETMAN --- p.63
Chapter 3.5.1 --- Topology Menu --- p.64
Chapter 3.5.2 --- Fault Manager Menu --- p.65
Chapter 3.5.3 --- Performance Meter Menu --- p.65
Chapter 3.5.4 --- Gateway Utility Menu --- p.67
Chapter 3.5.5 --- Tools Menu --- p.67
Chapter 3.5.6 --- Help Menu --- p.68
Chapter 3.6 --- Chapter Summary --- p.68
Chapter 4 --- A Local Map Based (LMB) Self-Healing Scheme for Arbitrary Topology Networks --- p.70
Chapter 4.1 --- Introduction --- p.79
Chapter 4.2 --- An Overview of Existing DCS-Based Restoration Algorithms --- p.72
Chapter 4.3 --- The Network Model and Assumptions --- p.74
Chapter 4.4 --- Basics of the LMB Scheme --- p.75
Chapter 4.4.1 --- Restoration Concepts --- p.75
Chapter 4.4.2 --- Terminology --- p.76
Chapter 4.4.3 --- Algorithm Parameters --- p.77
Chapter 4.5 --- Performance Assessments --- p.78
Chapter 4.6 --- The LMB Network Restoration Scheme --- p.80
Chapter 4.6.1 --- Initialization - Local Map Building --- p.80
Chapter 4.6.2 --- The LMB Restoration Messages Set --- p.81
Chapter 4.6.3 --- Phase I - Local Map Update Phase --- p.81
Chapter 4.6.4 --- Phase II - Update Acknowledgment Phase --- p.82
Chapter 4.6.5 --- Phase III - Restoration and Confirmation Phase --- p.83
Chapter 4.6.6 --- Phase IV - Cancellation Phase --- p.83
Chapter 4.6.7 --- Re-Initialization --- p.84
Chapter 4.6.8 --- Path Route Monitoring --- p.84
Chapter 4.7 --- Performance Evaluation --- p.84
Chapter 4.7.1 --- The Testbeds --- p.84
Chapter 4.7.2 --- Simulation Results --- p.86
Chapter 4.7.3 --- Storage Requirements --- p.89
Chapter 4.8 --- The LMB Scheme on ATM and SONET environment --- p.92
Chapter 4.9 --- Future Work --- p.94
Chapter 4.10 --- Chapter Summary --- p.94
Chapter 5 --- Conclusion and Future Work --- p.96
Chapter 5.1 --- Conclusion --- p.95
Chapter 5.2 --- Future Work --- p.99
Bibliography --- p.101
Chapter A --- Derivation of Communicative Probability --- p.107
Chapter B --- List of Publications --- p.110

APA, Harvard, Vancouver, ISO, and other styles

30

Yan, Jiaxiang. "Modeling, monitoring and optimization of discrete event systems using Petri nets." 2014. http://hdl.handle.net/1805/3874.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI)
Yan, Jiaxiang. M.S.E.C.E., Purdue University, May 2013. Modeling, Monitoring and Optimization of Discrete Event Systems Using Petri Nets. Major Professor: Lingxi Li. In last decades, the research of discrete event systems (DESs) has attracts more and more attention because of the fast development of intelligent control strategies. Such control measures combine the conventional control strategies with discrete decision-making processes which simulate human decision-making processes. Due to the scale and complexity of common DESs, the dedicated models, monitoring methods and optimal control strategies for them are necessary. Among various DES models, Petri nets are famous for the advantage in dealing with asynchronous processes. They have been widely applied in intelligent transportation systems (ITS) and communication technology in recent years. With encoding of the Petri net state, we can also enable fault detection and identification capability in DESs and mitigate potential human errors. This thesis studies various problems in the context of DESs that can be modeled by Petri nets. In particular, we focus on systematic modeling, asynchronous monitoring and optimal control strategies design of Petri nets. This thesis starts by looking at the systematic modeling of ITS. A microscopic model of signalized intersection and its two-layer timed Petri net representation is proposed in this thesis, where the first layer is the representation of the intersection and the second layer is the representation of the traffic light system. Deterministic and stochastic transitions are both involved in such Petri net representation. The detailed operation process of such Petri net representation is stated. The improvement of such Petri net representation is also provided with comparison to previous models. Then we study the asynchronous monitoring of sensor networks. An event sequence reconstruction algorithm for a given sensor network based on asynchronous observations of its state changes is proposed in this thesis. We assume that the sensor network is modeled as a Petri net and the asynchronous observations are in the form of state (token) changes at different places in the Petri net. More specifically, the observed sequences of state changes are provided by local sensors and are asynchronous, i.e., they only contain partial information about the ordering of the state changes that occur. We propose an approach that is able to partition the given net into several subnets and reconstruct the event sequence for each subnet. Then we develop an algorithm that is able to reconstruct the event sequences for the entire net that are consistent with: 1) the asynchronous observations of state changes; 2) the event sequences of each subnet; and 3) the structure of the given Petri net. We discuss the algorithmic complexity. The final problem studied in this thesis is the optimal design method of Petri net controllers with fault-tolerant ability. In particular, we consider multiple faults detection and identification in Petri nets that have state machine structures (i.e., every transition in the net has only one input place and one output place). We develop the approximation algorithms to design the fault-tolerant Petri net controller which achieves the minimal number of connections with the original controller. A design example for an automated guided vehicle (AGV) system is also provided to illustrate our approaches.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Fault-tolerant computing. electronic data processing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles