A Model of Speculative Parallel Scheduling in Networks of Unreliable Sensors

Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 264)

Abstract

As systems scale up, their mean-time-to-failure reduces drastically. We consider parallel servers subject to permanent failures but such that only one needs to survive in order to execute a given task. This kind of failure-model is appropriate in at least two types of systems: systems in which repair cannot take place (e.g. spacecraft) and systems that have strict deadlines (e.g. navigation systems). We use multiple replicas to perform the same task in order to improve the reliability of systems. The server in the system is subject to failure while it is on and the time to failure is memoryless, i.e. exponentially distributed. We derive expressions for the Laplace transform of the sojourn time distribution of a tagged task, jointly with the probability that the tagged task completes service, for a network of one or more parallel servers with exponential service times and times to failure.

References

  1. 1.
    Ben-Ari M (2006) Principles of concurrent and distributed programming. Addison-Wesley Longman, BostonGoogle Scholar
  2. 2.
    Dean J, Barroso LA (2013) The tail at scale. Commun ACM 56(2):74–80CrossRefGoogle Scholar
  3. 3.
    Gelenbe E (1989) Random neural networks with positive and negative signals and product form solution. Neural Comput 1(4):502–510CrossRefGoogle Scholar
  4. 4.
    Gelenbe E (1993) G-networks with triggered customer movement. J Appl Prob 30:742–748CrossRefMATHMathSciNetGoogle Scholar
  5. 5.
    Harrison PG, Patel NM (1992) Performance modelling of communication networks and computer architectures (International Computer S. Addison-Wesley Longman, BostonGoogle Scholar
  6. 6.
    Harrison P, Pitel E (1993) Sojourn times in single server queues with negative customers. J Appl Prob 30:943–963CrossRefMATHMathSciNetGoogle Scholar
  7. 7.
    Macedo DF, Correia LH, dos Santos AL, Loureiro AA, Nogueira JMS, Pujolle G (2006) Evaluating fault tolerance aspects in routing protocols for wireless sensor networks. Challenges in Ad Hoc Networking, Springer, Berlin, In, pp 285–294Google Scholar
  8. 8.
    Maxion RA, Siewiorek DP, Elkind SA (1987) Techniques and architectures for fault-tolerant computing. Ann Rev Comput Sci 2(1):469–520CrossRefGoogle Scholar
  9. 9.
    Nathan (2013) Nasas mars rover curiosity forced to backup computer as result of computer glitch. http://planetsave.com/2013/03/03/nasas-mars-rover-curiosity-forced-to-b%ackup-computer-as-result-of-computer-glitch/
  10. 10.
    Stewart WJ (2011) Probability, Markov chains, queues, and simulation: the mathematical basis of performance modeling. Princeton University Press, New JerseyGoogle Scholar
  11. 11.
    Tang C, Li Q, Hua B, Liu A (2009) Developing reliable web services using independent replicas. In: Fifth International Conference on Semantics, Knowledge and Grid (SKG 2009) IEEE. pp 330–333Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  1. 1.Department of ComputingImperial College LondonLondonUK

Personalised recommendations