Log in

Relevant bibliographies by topics / Fault-tolerance computing / Journal articles

To see the other types of publications on this topic, follow the link: Fault-tolerance computing.

Journal articles on the topic 'Fault-tolerance computing'

Author: Grafiati

Published: 4 June 2021

Last updated: 19 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Fault-tolerance computing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Bhavsar, Sejal Atit, and Kirit J. Modi. "Design and Development of Framework for Platform Level Issues in Fog Computing." International Journal of Electronics, Communications, and Measurement Engineering 8, no. 1 (January 2019): 1–20. http://dx.doi.org/10.4018/ijecme.2019010101.

Full text

Abstract:

Fog computing is a paradigm that extends cloud computing services to the edge of the network. Fog computing provides data, storage, compute and application services to end users. The distinguishing characteristics of fog computing are its proximity to the end users. The application services are hosted on network edges like on routers, switches, etc. The goal of fog computing is to improve the efficiency and reduce the amount of data that needs to be transported to cloud for analysis, processing and storage. Due to heterogeneous characteristics of fog computing, there are some issues, i.e. security, fault tolerance, resource scheduling and allocation. To better understand fault tolerance, we highlighted the basic concepts of fault tolerance by understanding different fault tolerance techniques i.e. Reactive, Proactive and the hybrid. In addition to the fault tolerance, how to balance resource utilization and security in fog computing are also discussed here. Furthermore, to overcome platform level issues of fog computing, Hybrid fault tolerance model using resource management and security is presented by us.

APA, Harvard, Vancouver, ISO, and other styles

2

Rajagopal, Aghila. "FAULT TOLERANCE IN MOBILE GRID COMPUTING." International Journal of Electronic Commerce Studies 5, no. 1 (June 2014): 115–22. http://dx.doi.org/10.7903/ijecs.1107.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Zhang, Junna, Ao Zhou, Qibo Sun, Shangguang Wang, and Fangchun Yang. "Overview on Fault Tolerance Strategies of Composite Service in Service Computing." Wireless Communications and Mobile Computing 2018 (June 19, 2018): 1–8. http://dx.doi.org/10.1155/2018/9787503.

Full text

Abstract:

In order to build highly reliable composite service via Service Oriented Architecture (SOA) in the Mobile Fog Computing environment, various fault tolerance strategies have been widely studied and got notable achievements. In this paper, we provide a comprehensive overview of key fault tolerance strategies. Firstly, fault tolerance strategies are categorized into static and dynamic fault tolerance according to the phase of their adoption. Secondly, we review various static fault tolerance strategies. Then, dynamic fault tolerance implementation mechanisms are analyzed. Finally, main challenges confronted by fault tolerance for composite service are reviewed.

APA, Harvard, Vancouver, ISO, and other styles

4

VARGHESE, BLESSON, GERARD MCKEE, and VASSIL ALEXANDROV. "CAN AGENT INTELLIGENCE BE USED TO ACHIEVE FAULT TOLERANT PARALLEL COMPUTING SYSTEMS?" Parallel Processing Letters 21, no. 04 (December 2011): 379–96. http://dx.doi.org/10.1142/s012962641100028x.

Full text

Abstract:

The work reported in this paper is motivated towards validating an alternative approach for fault tolerance over traditional methods like checkpointing that constrain efficacious fault tolerance. Can agent intelligence be used to achieve fault tolerant parallel computing systems? If so, "What agent capabilities are required for fault tolerance?", "What parallel computational tasks can benefit from such agent capabilities?" and "How can agent capabilities be implemented for fault tolerance?" need to be addressed. Cognitive capabilities essential for achieving fault tolerance through agents are considered. Parallel reduction algorithms are identified as a class of algorithms that can benefit from cognitive agent capabilities. The Message Passing Interface is utilized for implementing an intelligent agent based approach. Preliminary results obtained from the experiments validate the feasibility of an agent based approach for achieving fault tolerance in parallel computing systems.

APA, Harvard, Vancouver, ISO, and other styles

5

Latchoumy, p., and Sheik Abdul Khader. "Survey On Fault Tolerance In Grid Computing." International Journal of Computer Science & Engineering Survey 2, no. 4 (November 30, 2011): 97–110. http://dx.doi.org/10.5121/ijcses.2011.2407.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Saha, Goutam Kumar. "Software-based computing security and fault tolerance." Ubiquity 2004, June (June 2004): 2. http://dx.doi.org/10.1145/1022348.1008538.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Hamidi, Hodjat, Abbas Vafaei, and Seyed Amir Hassan Monadjemi. "Analysis and Evaluation of a New Algorithm Based Fault Tolerance for Computing Systems." International Journal of Grid and High Performance Computing 4, no. 1 (January 2012): 37–51. http://dx.doi.org/10.4018/jghpc.2012010103.

Full text

Abstract:

In this paper, the authors present a new approach to algorithm based fault tolerance (ABFT) for High Performance computing system. The Algorithm Based Fault Tolerance approach transforms a system that does not tolerate a specific type of fault, called the fault-intolerant system, to a system that provides a specific level of fault tolerance, namely recovery. The ABFT techniques that detect errors rely on the comparison of parity values computed in two ways, the parallel processing of input parity values produce output parity values comparable with parity values regenerated from the original processed outputs, can apply convolution codes for the redundancy. This method is a new approach to concurrent error correction in fault-tolerant computing systems. This paper proposes a novel computing paradigm to provide fault tolerance for numerical algorithms. The authors also present, implement, and evaluate early detection in ABFT.

APA, Harvard, Vancouver, ISO, and other styles

8

Wong, Ming Ming, Dennis M. L. D. Wong, Cishen Zhang, and Ismat Hijazin. "A New Lightweight and High Fault Tolerance Sobel Edge Detection Using Stochastic Computing." International Journal of Computer and Electrical Engineering 9, no. 2 (2017): 403–20. http://dx.doi.org/10.17706/ijcee.2017.9.2.403-420.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Mohammadian, Vahid, Nima Jafari Navimipour, Mehdi Hosseinzadeh, and Aso Darwesh. "Comprehensive and Systematic Study on the Fault Tolerance Architectures in Cloud Computing." Journal of Circuits, Systems and Computers 29, no. 15 (June 22, 2020): 2050240. http://dx.doi.org/10.1142/s0218126620502400.

Full text

Abstract:

Providing dynamic resources is based on the virtualization features of the cloud environment. Cloud computing as an emerging technology uses a high availability of services at any time, in any place and independent of the hardware. However, fault tolerance is one of the main problems and challenges in cloud computing. This subject has an important effect on cloud computing, but, as far as we know, there is not a comprehensive and systematic study in this field. Accordingly, in this paper, the existing methods and mechanisms are discussed in different groups, such as proactive and reactive, types of fault detection, etc. Various fault tolerance techniques are provided and discussed. The advantages and disadvantages of these techniques are shown on the basis of the technology that they have used. Generally, the contributions of this research provide a summary of the available challenges associated with fault tolerance, a description of several important fault tolerance methods in the cloud computing and the key regions for the betterment of fault tolerance techniques in the future works. The advantages and disadvantages of the selected articles in each category are also highlighted and their significant challenges are discussed to provide the research lines for further studies.

APA, Harvard, Vancouver, ISO, and other styles

10

Patra, Prasenjit Kumar, Harshpreet Singh, Rajwinder Singh, Saptarshi Das, Nilanjan Dey, and Anghel Drugarin Cornelia Victoria. "Replication and Resubmission Based Adaptive Decision for Fault Tolerance in Real Time Cloud Computing." International Journal of Service Science, Management, Engineering, and Technology 7, no. 2 (April 2016): 46–60. http://dx.doi.org/10.4018/ijssmet.2016040104.

Full text

Abstract:

Cloud computing an adoptable technology is the upshot evolution of on demand service in the computing epitome of immense scale distributed computing. With the raising asks and welfares of cloud computing infrastructure, society can take leverage of intensive computing capability services and scalable, virtualized vicinity of cloud computing to carry out real time tasks executed on a remote cloud computing node. Due to the indeterminate latency and minimal control over computing node, sway the reliability factor. Therefore, there is a raise of requisite for fault tolerance to achieve reliability in the real time cloud infrastructure. In this paper, a model which provides fault tolerance named “Replication and resubmission based adaptive decision for fault tolerance in real-time cloud computing (RRADFTRC)” for real time cloud computing is projected with result. In the projected model, the system endure the faults and makes the adaptive decision on the basis of proper resource allocation of tasks with a new style of approach in real time cloud vicinity.

APA, Harvard, Vancouver, ISO, and other styles

11

Et. al., Bidush Kumar Sahoo ,. "Factors Affecting Fault Tolerance during Load Balancing in Cloud Computing." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 11 (May 28, 2021): 1523–33. http://dx.doi.org/10.17762/turcomat.v12i11.6079.

Full text

Abstract:

Cloud computing is built upon the advancement of virtualization and distributed computing to support cost-efficient usage of computing resources and to provide on demand services. After methodical analysis on various factors affecting fault tolerance during load balancing is performed and it is concluded that the factors influencing fault tolerance in load balancing are cloud security, adaptability etc. in comparatively more software firms. In this paper, we have created a model for various IT industries for checking the fault tolerance during Load balancing. An exploration is done with the help of some renowned IT farms and industries in South India. This work consists of 20 hypotheses which may affect the fault tolerance during load balancing in South India. It is verified by using potential statistical analysis tool i.e. Statistical Package for Social Science (SPSS).

APA, Harvard, Vancouver, ISO, and other styles

12

Dhingra, Mridula, and Neha Gupta. "Comparative analysis of fault tolerance models and their challenges in cloud computing." International Journal of Engineering & Technology 6, no. 2 (May 3, 2017): 36. http://dx.doi.org/10.14419/ijet.v6i2.7565.

Full text

Abstract:

Cloud Computing is a vital platform for viable and non-viable clients. It provides the reliable services to clients through data centers which contains servers, storage etc. One of the major challengein cloud computing environment is that services should be run without errors or faults. In cloud computing environment various computations are run on real time applications so that chances of errors becomes high, for these reasons applications running in cloud environment should be reliable and must have the ability of fault tolerance. In this paper, authors have discussed many fault tolerance techniques and compared various model of fault tolerance.

APA, Harvard, Vancouver, ISO, and other styles

13

Amin, Zeeshan, Harshpreet Singh, and Nisha Sethi. "Review on Fault Tolerance Techniques in Cloud Computing." International Journal of Computer Applications 116, no. 18 (April 22, 2015): 11–17. http://dx.doi.org/10.5120/20435-2768.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Chaturvedi, Puja. "Fault Tolerance using Fitness Algorithm in Cloud Computing." International Journal for Research in Applied Science and Engineering Technology 7, no. 6 (June 30, 2019): 907–12. http://dx.doi.org/10.22214/ijraset.2019.6156.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Derbal, Youcef. "A new fault-tolerance framework for grid computing." Multiagent and Grid Systems 2, no. 2 (September 8, 2006): 115–33. http://dx.doi.org/10.3233/mgs-2006-2203.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Mohammed, Debakla, Khaldi Miloud, Meftah Boudjelal, and Rebbah Mohammed. "Fault Tolerance in Grid Computing by Resource Clustering." International Journal of Internet Technology and Secured Transactions 10, no. 1 (2020): 1. http://dx.doi.org/10.1504/ijitst.2020.10020944.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Khaldi, Miloud, Mohammed Rebbah, Boudjelal Meftah, and Mohammed Debakla. "Fault tolerance in grid computing by resource clustering." International Journal of Internet Technology and Secured Transactions 10, no. 1/2 (2020): 120. http://dx.doi.org/10.1504/ijitst.2020.104577.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Kim, Hyun C., and V. S. S. Nair. "Software fault tolerance for distributed object based computing." Journal of Systems and Software 39, no. 2 (November 1997): 103–17. http://dx.doi.org/10.1016/s0164-1212(96)00167-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Shum, Kam Hong. "Effective fault tolerance for agent-based cluster computing." Journal of Systems and Software 48, no. 3 (November 1999): 189–96. http://dx.doi.org/10.1016/s0164-1212(99)00057-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Xia, Lixue, Wenqin Huangfu, Tianqi Tang, Xiling Yin, Krishnendu Chakrabarty, Yuan Xie, Yu Wang, and Huazhong Yang. "Stuck-at Fault Tolerance in RRAM Computing Systems." IEEE Journal on Emerging and Selected Topics in Circuits and Systems 8, no. 1 (March 2018): 102–15. http://dx.doi.org/10.1109/jetcas.2017.2776980.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

pandi, S. Veera, and K. Alagar samy. "Fault Tolerance Avoidance in Cloud Computing Software Applications." International Journal of Computer Trends and Technology 43, no. 3 (January 25, 2017): 166–69. http://dx.doi.org/10.14445/22312803/ijctt-v43p126.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Tohma, Y. "Incorporating fault tolerance into an autonomic-computing environment." IEEE Distributed Systems Online 5, no. 2 (February 2004): 1–12. http://dx.doi.org/10.1109/mdso.2004.1270783.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Lauer, Michael, Matthieu Amy, Jean-Charles Fabre, Matthieu Roy, William Excoffon, and Miruna Stoicescu. "Resilient computing on ROS using adaptive fault tolerance." Journal of Software: Evolution and Process 30, no. 3 (November 3, 2017): e1917. http://dx.doi.org/10.1002/smr.1917.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Bukhari, Saufi, Ku Ruhana Ku-Mahamud, and Hiroaki Morino. "Dynamic ACO-based Fault Tolerance in Grid Computing." International Journal of Grid and Distributed Computing 10, no. 12 (December 31, 2017): 117–24. http://dx.doi.org/10.14257/ijgdc.2017.10.12.11.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Zhang, Yingqian, Efrat Manisterski, Sarit Kraus, V. S. Subrahmanian, and David Peleg. "Computing the fault tolerance of multi-agent deployment." Artificial Intelligence 173, no. 3-4 (March 2009): 437–65. http://dx.doi.org/10.1016/j.artint.2008.11.007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Limam, Said, and Ghalem Belalem. "A Migration Approach for Fault Tolerance in Cloud Computing." International Journal of Grid and High Performance Computing 6, no. 2 (April 2014): 24–37. http://dx.doi.org/10.4018/ijghpc.2014040102.

Full text

Abstract:

Cloud computing has become a significant technology and a great solution for providing a flexible, on-demand, and dynamically scalable computing infrastructure for many applications. Cloud computing also presents a significant technology trends. With the cloud computing technology, users use a variety of devices to access programs, storage, and application-development platforms over the Internet, via services offered by cloud computing providers. The probability of failure occur during the execution becomes stronger when the number of node increases; since it is impossible to fully prevent failures, one solution is to implement fault tolerance mechanisms. Fault tolerance has become a major task for computer engineers and software developers because the occurrence of faults increases the cost of using resources. In this paper, the authors have proposed an approach that is a combination of migration and checkpoint mechanism. The checkpoint mechanism minimizes the time lost and reduces the effect of failures on application execution while the migration mechanism guarantee the continuity of application execution and avoid any loss due to hardware failure in a way transparent and efficient. The results obtained by the simulation show the effectiveness of our approaches to fault tolerance in term of execution time and masking effects of failures.

APA, Harvard, Vancouver, ISO, and other styles

27

David, Beaulah, and Dr R. Santhosh. "Fault Tolerance and QoS based Pervasive Computing using Markov State Transition Model." International Journal of Engineering & Technology 7, no. 4 (September 17, 2018): 2403. http://dx.doi.org/10.14419/ijet.v7i4.12664.

Full text

Abstract:

Fault-tolerance is significant in pervasive computing environments. Recently, few research works has been developed for reducing the fault, occurring in pervasive computing. However, there is a need for a fault tolerance mechanism to reduce the link failures and unwanted mobile node access (in pervasive computing environment). In order to overcome these limitations, Markov State Transition Based Fault Tolerance (MST-FT) Model is proposed. The main objective of MST-FT Model is to achieve resource efficient QoS in pervasive computing environment by avoiding the link failures and unwanted mobile node usages. Initially, the optimization of link failures is achieved by maintaining Markov chain of high energy mobile nodes on the wireless network communication path. The mobile nodes with higher energy and minimal drain rate are combined to form a chain in its corresponding path of communication in order to minimize the link failures in pervasive computing. Next, the inappropriate mobile node usage is avoided by selecting only the authorized mobile nodes for Markov chain construction to effective network communication, which resulting in improved fault tolerant rate. Therefore, MST-FT Model provides higher resource efficient QoS as compared to existing works. The performance of MST-FT Model is measured in terms of fault tolerant rate, execution time, energy consumption rate and quality of service level. The simulation results show that the MST-FT Model is able to improve the fault tolerant rate by 13% and also reduces the energy consumption rate of resource efficient QoS by 25%, when compared to previous works.

APA, Harvard, Vancouver, ISO, and other styles

28

GORAYA, MAJOR SINGH, and LAKHWINDER KAUR. "FAULT TOLERANCE TASK EXECUTION THROUGH COOPERATIVE COMPUTING IN GRID." Parallel Processing Letters 23, no. 01 (March 2013): 1350003. http://dx.doi.org/10.1142/s0129626413500035.

Full text

Abstract:

To achieve fault tolerant task execution in grid, cooperative computing system (CCS) is proposed in this paper. Grid resources with similar statistical characteristics constitute the computing nodes in CCS. CCS executes the allocated task, considered as its primary task, by organizing the computing nodes as active and active-standby. At a moment of time, one of the node acts as active node to execute the task whereas rest of the nodes act as active-standby to provide execution backup to the task. Computing nodes in CCS may fail during task execution due to the failure/exit of their corresponding resources. To maintain the fault tolerant ability of CCS, a failed node is repaired dynamically by replacing its corresponding resource with alternative matching resource from grid. The number of computing nodes in CCS is decided by optimizing the service reliability with respect to the execution overhead of the primary task. Resource usage is optimized in CCS by overloading the primary task at each active-standby node with a low priority secondary task. Active-standby nodes execute their low priority secondary tasks concurrently to providing execution backup to the primary task. Service reliability, system throughput and task delay is observed in the simulation experiments to explore the fault tolerant ability of CCS. A task set of 500 grid tasks is repeatedly executed by varying task duration and rate of resource failure. Simulation results show that CCS outperforms the existing fault tolerant approaches being used in grid. In CCS, fault tolerant task execution is achieved without compromising on account of resource utilization as well.

APA, Harvard, Vancouver, ISO, and other styles

29

Rodrigues, Gennaro, Fernanda Lima Kastensmidt, and Alberto Bosio. "Survey on Approximate Computing and Its Intrinsic Fault Tolerance." Electronics 9, no. 4 (March 26, 2020): 557. http://dx.doi.org/10.3390/electronics9040557.

Full text

Abstract:

This work is a survey on approximate computing and its impact on fault tolerance, especially for safety-critical applications. It presents a multitude of approximation methodologies, which are typically applied at software, architecture, and circuit level. Those methodologies are discussed and compared on all their possible levels of implementations (some techniques are applied at more than one level). Approximation is also presented as a means to provide fault tolerance and high reliability: Traditional error masking techniques, such as triple modular redundancy, can be approximated and thus have their implementation and execution time costs reduced compared to the state of the art.

APA, Harvard, Vancouver, ISO, and other styles

30

Chandra, P. S. R., T. V. Prasad, A. Pavani Kumari, S. Madhu, and I. Lova Prasad. "Security Approaches of Fault Tolerance using Cloud Computing Platform." Indian Journal of Public Health Research & Development 9, no. 12 (2018): 1514. http://dx.doi.org/10.5958/0976-5506.2018.02219.2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Alshayeji, Mohammad H., Mohammad Al-Rousan, Eman Yossef, and Hanem Ellethy. "A Study on Fault Tolerance Mechanisms in Cloud Computing." International Journal of Computer and Electrical Engineering 10, no. 1 (2018): 62–71. http://dx.doi.org/10.17706/ijcee.2018.10.1.62-71.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Xu, Qi, Song Chen, Hao Geng, Bo Yuan, Bei Yu, Feng Wu, and Zhengfeng Huang. "Fault tolerance in memristive crossbar-based neuromorphic computing systems." Integration 70 (January 2020): 70–79. http://dx.doi.org/10.1016/j.vlsi.2019.09.008.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Kaur Rai, Anjandeep. "High Adaptive Fault Tolerance in Real Time Cloud Computing." IOSR Journal of Engineering 4, no. 3 (March 2014): 24–27. http://dx.doi.org/10.9790/3021-04362427.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

KumarPatra, Prasenjit, Harshpreet Singh, and Gurpreet Singh. "Fault Tolerance Techniques and Comparative Implementation in Cloud Computing." International Journal of Computer Applications 64, no. 14 (February 15, 2013): 37–41. http://dx.doi.org/10.5120/10705-5643.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Nazari Cheraghlou, Mehdi, Ahmad Khadem-Zadeh, and Majid Haghparast. "A survey of fault tolerance architecture in cloud computing." Journal of Network and Computer Applications 61 (February 2016): 81–92. http://dx.doi.org/10.1016/j.jnca.2015.10.004.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Bosilca, George, Rémi Delmas, Jack Dongarra, and Julien Langou. "Algorithm-based fault tolerance applied to high performance computing." Journal of Parallel and Distributed Computing 69, no. 4 (April 2009): 410–16. http://dx.doi.org/10.1016/j.jpdc.2008.12.002.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Redinbo, G. R., and R. Manomohan. "Fault tolerance in computing, compressing, and transmitting FFT data." IEEE Transactions on Communications 49, no. 12 (2001): 2095–105. http://dx.doi.org/10.1109/26.974256.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Mohammed, Bashir, Mariam Kiran, Kabiru M. Maiyama, Mumtaz M. Kamala, and Irfan-Ullah Awan. "Failover strategy for fault tolerance in cloud computing environment." Software: Practice and Experience 47, no. 9 (April 4, 2017): 1243–74. http://dx.doi.org/10.1002/spe.2491.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

James, Mark L., Andrew A. Shapiro, Paul L. Springer, and Hans P. Zima. "Adaptive Fault Tolerance for Scalable Cluster Computing in Space." International Journal of High Performance Computing Applications 23, no. 3 (July 20, 2009): 227–41. http://dx.doi.org/10.1177/1094342009106190.

Full text

Abstract:

Future missions of deep-space exploration face the challenge of building more capable autonomous spacecraft and planetary rovers. Given the communication latencies and bandwidth limitations for such missions, the need for increased autonomy becomes mandatory, along with the requirement for enhanced on-board computational capabilities while in deep-space or time-critical situations. This will result in dramatic changes in the way missions are conducted and supported by on-board computing systems. Specifically, the traditional approach of relying exclusively on radiation-hardened hardware and modular redundancy will not be able to deliver the required computational power. As a consequence, such systems are expected to include high-capability low-power components based on emerging commercial-off-the-shelf (COTS) multi-core technology. In this paper we describe the design of a generic framework for introspection that supports runtime monitoring and analysis of program execution as well as a feedback-oriented recovery from faults. Our focus is on providing flexible software fault tolerance matched to the requirements and properties of applications by exploiting knowledge that is either contained in an application knowledge base, provided by users, or automatically derived from specifications. A prototype implementation is currently in progress at the Jet Propulsion Laboratory, California Institute of Technology, targeting a cluster of cell broadband engines.

APA, Harvard, Vancouver, ISO, and other styles

40

Hasan, Moin, and Major Singh Goraya. "Fault tolerance in cloud computing environment: A systematic survey." Computers in Industry 99 (August 2018): 156–72. http://dx.doi.org/10.1016/j.compind.2018.03.027.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Qureshi, Kalim, Fiaz Gul Khan, Paul Manuel, and Babar Nazir. "A hybrid fault tolerance technique in grid computing system." Journal of Supercomputing 56, no. 1 (January 19, 2010): 106–28. http://dx.doi.org/10.1007/s11227-009-0345-y.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Svore, K. M., A. W. Cross, I. L. Chuang, and A. V. Aho. "A flow-map model for analyzing pseudothresholds in fault-tolerant quantum computing." Quantum Information and Computation 6, no. 3 (May 2006): 193–212. http://dx.doi.org/10.26421/qic6.3-1.

Full text

Abstract:

An arbitrarily reliable quantum computer can be efficiently constructed from noisy components using a recursive simulation procedure, provided that those components fail with probability less than the fault-tolerance threshold. Recent estimates of the threshold are near some experimentally achieved gate fidelities. However, the landscape of threshold estimates includes pseudothresholds, threshold estimates based on a subset of components and a low level of the recursion. In this paper, we observe that pseudothresholds are a generic phenomenon in fault-tolerant computation. We define pseudothresholds and present classical and quantum fault-tolerant circuits exhibiting pseudothresholds that differ by a factor of $4$ from fault-tolerance thresholds for typical relationships between component failure rates. We develop tools for visualizing how reliability is influenced by recursive simulation in order to determine the asymptotic threshold. Finally, we conjecture that refinements of these methods may establish upper bounds on the fault-tolerance threshold for particular codes and noise models.

APA, Harvard, Vancouver, ISO, and other styles

43

Goundar, Sam, and Akashdeep Bhardwaj. "Efficient Fault Tolerance on Cloud Environments." International Journal of Cloud Applications and Computing 8, no. 3 (July 2018): 20–31. http://dx.doi.org/10.4018/ijcac.2018070102.

Full text

Abstract:

With mission critical web applications and resources being hosted on cloud environments, and cloud services growing fast, the need for having greater level of service assurance regarding fault tolerance for availability and reliability has increased. The high priority now is ensuring a fault tolerant environment that can keep the systems up and running. To minimize the impact of downtime or accessibility failure due to systems, network devices or hardware, the expectations are that such failures need to be anticipated and handled proactively in fast, intelligent way. This article discusses the fault tolerance system for cloud computing environments, analyzes whether this is effective for Cloud environments.

APA, Harvard, Vancouver, ISO, and other styles

44

Siddiqui, Zahid Ali, Jeong-A. Lee, and Unsang Park. "SEDC-Based Hardware-Level Fault Tolerance and Fault Secure Checker Design for Big Data and Cloud Computing." Scientific Programming 2018 (June 7, 2018): 1–16. http://dx.doi.org/10.1155/2018/7306837.

Full text

Abstract:

Fault tolerance is of great importance for big data systems. Although several software-based application-level techniques exist for fault security in big data systems, there is a potential research space at the hardware level. Big data needs to be processed inexpensively and efficiently, for which traditional hardware architectures are, although adequate, not optimum for this purpose. In this paper, we propose a hardware-level fault tolerance scheme for big data and cloud computing that can be used with the existing software-level fault tolerance for improving the overall performance of the systems. The proposed scheme uses the concurrent error detection (CED) method to detect hardware-level faults, with the help of Scalable Error Detecting Codes (SEDC) and its checker. SEDC is an all unidirectional error detection (AUED) technique capable of detecting multiple unidirectional errors. The SEDC scheme exploits data segmentation and parallel encoding features for assigning code words. Consequently, the SEDC scheme can be scaled to any binary data length “n” with constant latency and less complexity, compared to other AUED schemes, hence making it a perfect candidate for use in big data processing hardware. We also present a novel area, delay, and power efficient, scalable fault secure checker design based on SEDC. In order to show the effectiveness of our scheme, we (1) compared the cost of hardware-based fault tolerance with an existing software-based fault tolerance technique used in HDFS and (2) compared the performance of the proposed checker in terms of area, speed, and power dissipation with the famous Berger code and m-out-of-2m code checkers. The experimental results show that (1) the proposed SEDC-based hardware-level fault tolerance scheme significantly reduces the average cost associated with software-based fault tolerance in a big data application, and (2) the proposed fault secure checker outperforms the state-of-the-art checkers in terms of area, delay, and power dissipation.

APA, Harvard, Vancouver, ISO, and other styles

45

McIntosh–Smith, Simon, Rob Hunt, James Price, and Alex Warwick Vesztrocy. "Application-based fault tolerance techniques for sparse matrix solvers." International Journal of High Performance Computing Applications 32, no. 5 (May 10, 2017): 627–40. http://dx.doi.org/10.1177/1094342017694946.

Full text

Abstract:

High-performance computing systems continue to increase in size in the quest for ever higher performance. The resulting increased electronic component count, coupled with the decrease in feature sizes of the silicon manufacturing processes used to build these components, may result in future exascale systems being more susceptible to soft errors caused by cosmic radiation than in current high-performance computing systems. Through the use of techniques such as hardware-based error-correcting codes and checkpoint-restart, many of these faults can be mitigated at the cost of increased hardware overhead, run-time, and energy consumption that can be as much as 10–20%. Some predictions expect these overheads to continue to grow over time. For extreme scale systems, these overheads will represent megawatts of power consumption and millions of dollars of additional hardware costs, which could potentially be avoided with more sophisticated fault-tolerance techniques. In this paper we present new software-based fault tolerance techniques that can be applied to one of the most important classes of software in high-performance computing: iterative sparse matrix solvers. Our new techniques enables us to exploit knowledge of the structure of sparse matrices in such a way as to improve the performance, energy efficiency, and fault tolerance of the overall solution.

APA, Harvard, Vancouver, ISO, and other styles

46

AUSIELLO, GIORGIO, ANDREA RIBICHINI, PAOLO G. FRANCIOSA, and GIUSEPPE F. ITALIANO. "COMPUTING GRAPH SPANNERS IN SMALL MEMORY: FAULT-TOLERANCE AND STREAMING." Discrete Mathematics, Algorithms and Applications 02, no. 04 (December 2010): 591–605. http://dx.doi.org/10.1142/s1793830910000905.

Full text

Abstract:

Let G be an undirected graph with m edges and n vertices. A spanner of G is a subgraph which preserves approximate distances between all pairs of vertices. An f-vertex fault-tolerant spanner is a subgraph which preserves approximate distances, under the failure of any set of at most f vertices. The contribution of this paper is twofold: we present algorithms for computing fault-tolerant spanners, and propose streaming algorithms for computing spanners in very small internal memory. In particular, we give algorithms for computing f-vertex fault-tolerant (3,2)- and (2,1)-spanners of G with the following bounds: our (3,2)-spanner contains O(f4/3n4/3) edges and can be computed in time Õ(f2m), while our(2, 1)-spanner contains O(fn3/2) edges and can be computed in time [Formula: see text]. Both algorithms improve significantly on previously known bounds. Assume that the graph G is presented as an input stream of edges, which may appear in any arbitrary order, and that we do not know in advance m and n. We show how to compute efficiently (3, 2)- and (2, 1)-spanners of G, using only very small internal memory and as low access external memory device. Our spanners have asymptotically optimal size and the I/O complexity of our algorithms for computing such spanners is optimal upto apolylogarithmic factor. Our f-vertex fault-tolerant (3, 2)- and (2, 1)-spanners can also be computed efficiently in the same computational model described above.

APA, Harvard, Vancouver, ISO, and other styles

47

JAHED-MOTLAGH, MOHAMMAD R., BEHNAM KIA, WILLIAM L. DITTO, and SUDESHNA SINHA. "FAULT TOLERANCE AND DETECTION IN CHAOTIC COMPUTERS." International Journal of Bifurcation and Chaos 17, no. 06 (June 2007): 1955–68. http://dx.doi.org/10.1142/s0218127407018142.

Full text

Abstract:

We introduce a structural testing method for a dynamics based computing device. Our scheme detects different physical defects, manifesting themselves as parameter variations in the chaotic system at the core of the logic blocks. Since this testing method exploits the dynamical properties of chaotic systems to detect damaged logic blocks, the damaged elements can be detected by very few testing inputs, leading to very low testing time. Further the method does not entail dedicated or extra hardware for testing. Specifically, we demonstrate the method on one-dimensional unimodal chaotic maps. Some ideas for testing higher dimensional maps and flows are also presented.

APA, Harvard, Vancouver, ISO, and other styles

48

Kumar, Atul, and Deepti Malhotra. "Study of Various Proactive Fault Tolerance Techniques in Cloud Computing." International Journal of Computer Sciences and Engineering 06, no. 03 (April 30, 2018): 81–87. http://dx.doi.org/10.26438/ijcse/v6si3.8187.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Pandita, Archana, and Prabhat K. Upadhyay. "Fault Tolerance Aware Scheduling for Brokers in Cloud Computing Datacenters." Recent Patents on Computer Science 10, no. 4 (June 6, 2018): 299–307. http://dx.doi.org/10.2174/2213275911666180419152348.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Jhawar, Ravi, Vincenzo Piuri, and Marco Santambrogio. "Fault Tolerance Management in Cloud Computing: A System-Level Perspective." IEEE Systems Journal 7, no. 2 (June 2013): 288–97. http://dx.doi.org/10.1109/jsyst.2012.2221934.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!