The evolution of distributed computing systems: from fundamental to new frontiers

Abstract

Distributed systems have been an active field of research for over 60 years, and has played a crucial role in computer science, enabling the invention of the Internet that underpins all facets of modern life. Through technological advancements and their changing role in society, distributed systems have undergone a perpetual evolution, with each change resulting in the formation of a new paradigm. Each new distributed system paradigm—of which modern prominence include cloud computing, Fog computing, and the Internet of Things (IoT)—allows for new forms of commercial and artistic value, yet also ushers in new research challenges that must be addressed in order to realize and enhance their operation. However, it is necessary to precisely identify what factors drive the formation and growth of a paradigm, and how unique are the research challenges within modern distributed systems in comparison to prior generations of systems. The objective of this work is to study and evaluate the key factors that have influenced and driven the evolution of distributed system paradigms, from early mainframes, inception of the global inter-network, and to present contemporary systems such as edge computing, Fog computing and IoT. Our analysis highlights assumptions that have driven distributed systems appear to be changing, including (1) an accelerated fragmentation of paradigms driven by commercial interests and physical limitations imposed by the end of Moore’s law, (2) a transition away from generalized architectures and frameworks towards increasing specialization, and (3) each paradigm architecture results in some form of pivoting between centralization and decentralization coordination. Finally, we discuss present day and future challenges of distributed research pertaining to studying complex phenomena at scale and the role of distributed systems research in the context of climate change.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

Notes

  1. 1.

    The first webpage—http://info.cern.ch/hypertext/WWW/TheProject.html

  2. 2.

    History of Distributed Systems—https://medium.com/microservices-learning/the-evolution-of-distributed-systems-fec4d35beffd

References

  1. 1.

    Armbrust M et al (2009) Above the clouds: A Berkeley view of cloud computing. EECS Department, University of California, Berkeley, no. January, pp 1–25, 2009

  2. 2.

    Lamport L (1978) Time, clocks, and the ordering of events in a distributed system. Commun ACM 21(7):558–565

    Article  Google Scholar 

  3. 3.

    Chow Y-C (1979) Models for dynamic load balancing in a heterogeneous multiple processor system. IEEE Trans Comput 10(5):354–361

    MathSciNet  Article  Google Scholar 

  4. 4.

    Botta A, De Donato W, Persico V, Pescap A (2016) Integration of cloud computing and internet of things: a survey. Future Gen Comput Syst 56:684–700

    Article  Google Scholar 

  5. 5.

    Yu X, MI Fellow IEEE, Xue Y (2016) Smart grids: a cyber–physical systems perspective. Proc IEEE 104(5):1058–1070

  6. 6.

    Cisco Systems (2016) Fog computing and the internet of things: extend the cloud to where the things are, p 6. www.Cisco.com

  7. 7.

    Walker Bruce TG, Popek G, English R, Kline C (1983) The LOCUS distributed operating system. ACM SIGOPS Oper Syst Rev 17:49–70

    Article  Google Scholar 

  8. 8.

    Birrell AD, Levin R, Schroeder MD, Needham RM (1982) Grapevine: an exercise in distributed computing. Commun. ACM 25(4):260–274

    Article  Google Scholar 

  9. 9.

    Hindman B, Konwinski A, Zaharia M, Ghodsi A, Joseph AD, Katz RH, Shenker S, Stoica I (2011) Mesos: a platform for fine-grained resource sharing in the data center. NSDI 11:22–22

    Google Scholar 

  10. 10.

    . Delgado P, Dinu F, Kermarrec A-M, Zwaenepoel W (2015) Hawk: hybrid datacenter scheduling. In: USENIX ATC, 2015, pp 499–510

  11. 11.

    Peltz C (2003) Web services orchestration and choreography. IEEE Internet Comput 36(10):46–52

    Article  Google Scholar 

  12. 12.

    Arnautov S et al (2016) SCONE: Secure Linux containers with Intel SGX. In: Proceedings of 12th USENIX symposium on operating systems design and implementation, OSDI 2016, pp 689–703

  13. 13.

    I. R. Z. Michael Kaufmann, IBM Research Zurich, Karlsruhe Institute of Technology; Kornilios Kourtis (2017) The HCl scheduler: going all-in on heterogeneity. In: 9th {USENIX} workshop on hot topics in cloud computing (HotCloud 17), pp 1–7

  14. 14.

    Naha RK et al (2018) Fog computing: survey of trends, architectures, requirements, and research directions, vol 6, pp 47980–48009

  15. 15.

    Li X et al (2018) Holistic virtual machine scheduling in cloud datacenters towards minimizing total energy. IEEE Trans Parallel Distrib Syst 29:1317–1331

    Article  Google Scholar 

  16. 16.

    Vavilapallih V, Murthyh A, Douglasm C, Konarh M, Evansy R, Gravesy T, Lowey J, Sethh S, Sahah B, Curinom C, O’Malleyh O, Agarwali S, Shahh H, Radiah S, Reed B, Baldeschwieler E (2013) Apache Hadoop YARN. In: SoCC, 2013, pp 1–16

  17. 17.

    Burns B, Grant B, Oppenheimer D, Brewer E, Wilkes J (2016) Borg, omega, and kubernetes. Commun. ACM 59(5):50–57

    Article  Google Scholar 

  18. 18.

    Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I (2013) Discretized streams: fault-tolerant streaming computation at scale. In: SOSP 2013—proceedings of the 24th ACM symposium on operating systems principles, no. 1, pp 423–438

  19. 19.

    Karanasos K, Rao S, Curino C, Douglas C, Chaliparambil K, Fumarola GM, Heddaya S, Ramakrishnan R, Sakalanaga S (2015) Mercury: hybrid centralized and distributed scheduling in large shared clusters. In: USENIX ATC, 2015, pp 485–497

  20. 20.

    Enslow PH (1978) What is a distributed data processing system? Computer 11(1):13–21

    Article  Google Scholar 

  21. 21.

    Gerard L (1977) Distributed systems—towards a formal approach. In: IFIP Congress, 1977

  22. 22.

    Algirdas Avižienis LC, Laprie J-C, Randell B (2004) Basic concepts and taxonomy of dependable and secure computing. IEEE Trans Dependable Secur Comput 1(1):11–33

    Article  Google Scholar 

  23. 23.

    Birrell AD, Nelson BJAY (1984) Implementing remote procedure calls. ACM Trans Comput Syst 2(1):39–59

    Article  Google Scholar 

  24. 24.

    Thain D, Tannenbaum T, Livny M (2005) Distributed computing in practice: the Condor experience. Concurr Comput Pract Exp 17(2–4):323–356

    Article  Google Scholar 

  25. 25.

    Lamport L, Shostak R, Pease M (1982) The Byzantine Generals Problem. ACM Trans Program Lang Syst 4(3):382–401

    Article  Google Scholar 

  26. 26.

    Figde C (1991) Logical time in distributed computing systems. Computer (Long Beach CA) 24:28–33

    Google Scholar 

  27. 27.

    Friedemann M (1999) Virtual time and global states of distributed systems. SIAM J Comput 28(5):1829–1847

    MathSciNet  Article  Google Scholar 

  28. 28.

    Sunderam VS, Geist GA, Dongarra J, Manchek R (1994) The PVM concurrent computing system: evolution, experiences, and trends. Parallel Comput 20(4):531–545

    Article  Google Scholar 

  29. 29.

    Gropp W (1998) An introduction to MPI parallel programming with the message passing interface, pp 1–48s

  30. 30.

    Gummadi PK, Saroiu S, Gribble SD (2002) A measurement study of Napster and Gnutella as examples of peer-to-peer file sharing systems. ACM SIGCOMM Comput Commun Rev 32(1):82–82

  31. 31.

    Anderson DP, Cobb J, Korpela E, Lebofsky M, Werthimer D (2002) Seti@home an experiment in public-resource computing. Commun ACM 45(11):56–61

    Article  Google Scholar 

  32. 32.

    Fazio M, Celesti A, Ranjan R, Liu C, Chen L, Villari M (2016) (2016) Open issues in scheduling microservices in the cloud the types of devices that might. IEEE Cloud Comput 3(5):81–88

  33. 33.

    Foster I, Zhao Y, Raicu I, Lu S (2008) Cloud computing and grid computing 360-degree compared. In: Grid computing environ work GCE 2008, pp 1–10

  34. 34.

    Mell P, Grance T (2011) The NIST definition of cloud computing recommendations of the National Institute of Standards and Technology. Nist Spec Publ 145:7

    Google Scholar 

  35. 35.

    Singh S, Chana I (2016) A survey on resource scheduling in cloud computing: issues and challenges. J Grid Comput 14(2):217–264

    Article  Google Scholar 

  36. 36.

    Baheti R, Gill H (2011) Cyber-physical systems. Impact Control Technol 1:161–166

    Google Scholar 

  37. 37.

    Karnouskos S (2011) Cyber-physical systems in the SmartGrid. In: 2011 9th international conference on industrial informatics, vol 1 VN-re, 2011

  38. 38.

    Evans D (2011) The internet of things—how the next evolution of the internet is changing everything. In: CISCO white paper, no. April, pp 1–11

  39. 39.

    Cerf VG, RE Icahn (1974) A protocol for packet network intercommunication. In: ACM SIGCOMM computer communication review 71 vol 35, number 2, April 2005, pp 71–82

  40. 40.

    Mockapetris Paul DK (1988) Development of the domain name system. In: SIGCOMM ’88 Symposium, Communication, Architectures and Protocols, 1988

  41. 41.

    Flynn MJ (1966) Very high-speed computing systems. Proc IEEE 54(12):1901–1909

    Article  Google Scholar 

  42. 42.

    Singh S, Chana I, Singh M (2017) The journey of QoS-aware autonomic cloud computing. IT Prof 19(2):42–49

    Article  Google Scholar 

  43. 43.

    Casavant TL, Kuhl JG (1988) A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Trans Soft Eng 14(2):141–154

  44. 44.

    Compton K, Hauck S (2002) Reconfigurable computing : a survey of systems and software. 34(2):171–210

  45. 45.

    Amdahl GM (1967) Validity of the single processor approach to achieving large scale computing capabilities. In: AFIPS spring joint computer conference, pp 1–4

  46. 46.

    Lindsay D, Gill SS, Garraghan P (2019) PRISM: an experiment framework for straggler analytics in containerized clusters. In: Proceedings of the 5th international workshop on container technologies and container clouds, pp 13–18

  47. 47.

    Yu J, Buyya R A taxonomy of workflow management systems for grid computing, pp 1–31

  48. 48.

    Foster I, Kesselman C, Tuecke S (2001) The anatomy of the grid. Hand Clin 17(4):525–532

    Article  Google Scholar 

  49. 49.

    Sterling T, Becker DJ, Savarase D, Dorband JE, Ranawake UA, Packer CV (1995) BEOWULF: a parallel workstation for scientific computation. In: Proceedings of the 24th international conference on parallel processing, pp 2–5

  50. 50.

    Gill SS, Ouyang X, Garraghan P (2020) Tails in the cloud: a survey and taxonomy of straggler management within large-scale cloud data centres. J Supercomput 50:10050–10089

  51. 51.

    Singh S, Chana I (2015) QoS-aware autonomic resource management in cloud computing: a systematic review. 48(3)

  52. 52.

    Leiner BM et al (2000s) Internet society (ISOC) all about the internet : a brief history of the internet internet society (ISOC) all about the internet : a brief history of the internet, pp 1–18

  53. 53.

    Gill SS et al (2019) Transformative effects of IoT, blockchain and artificial intelligence on cloud computing: evolution, vision, trends and open challenges. Internet Things 8:100118

    Article  Google Scholar 

  54. 54.

    Whitmore A, Agarwal A, Da Xu L (2015) The internet of things—a survey of topics and trends. no. March 2014, pp 261–274

  55. 55.

    Gill SS, Garraghan P, Buyya R (2019) ROUTER: Fog enabled cloud based intelligent resource management approach for smart home IoT devices. J Syst Softw 154:125–138

    Article  Google Scholar 

  56. 56.

    Brogi A, Forti S, Guerrero C, Lera I (2019) How to place your apps in the fog—state of the art and open challenges

  57. 57.

    Shi W, Cao J, Zhang Q, Li Y, Xu L (2016) Edge Computing: Vision and Challenges. IEEE Internet Things J. 3(5):637–646

    Article  Google Scholar 

  58. 58.

    Waldrop M (2016) The chips are down for Moore’s law. Nature 530:144

    Article  Google Scholar 

  59. 59.

    Verma A, Pedrosa L, Korupolu M, Oppenheimer D, Tune E, Wilkes J (2015) Large-scale cluster management at google with Borg. In: Proceedings of the tenth European conference on computer systems, EuroSys ’15. ACM, New York, pp 18:1–18:17

  60. 60.

    Gog I, Schwarzkopf M, Gleave A, Watson RMN, Hand S (201) Firmament: fast, centralized cluster scheduling at scale. In: Proceedings of 12th USENIX symposium on operating systems design and implementation, 2016, pp 99–115

  61. 61.

    Ousterhout K, Wendell P, Zaharia M, Stoica I (2013) Sparrow: distributed, low latency scheduling. In: Proceedings of the 24th ACM symposium on operating systems principles, 2013, pp 69–84

  62. 62.

    Blair G (2018) Complex distributed systems: the need for fresh perspectives. In: IEEE ICDCS, pp 1410–1421

  63. 63.

    Xiao W et al (2018) Gandiva, introspective cluster scheduling for deep learning. In: OSDI, 2018

  64. 64.

    Gill SS, Shaghaghi A (2020) Security-aware autonomic allocation of cloud resources: a model, research trends, and future directions. J Organ End User Comput (JOEUC) 32(3):15–22

    Article  Google Scholar 

  65. 65.

    Garraghan P et al (2018) Emergent failures: rethinking cloud reliability at scale. IEEE Cloud Comput 5:12–21

    Article  Google Scholar 

  66. 66.

    Gao J (2014) Machine learning applications for data center optimization. Google White Paper, 2014

  67. 67.

    Liao X (2018) Moving from Exascale to Zettascale computing: challenges and techniques. Front Inf Technol Electron Eng 19:1236–1244

    Article  Google Scholar 

  68. 68.

    Van Heddeghem W, Lambert S, Lannoo B, Colle D, Pickavet M, Demeester P (2014) Trends in worldwide ICT electricity consumption from 2007 to 2012. Comput Commun 50:64–76

  69. 69.

    Gossart C (2014) Rebound effects and ICT: a review of the literature. In: ICT innovations for sustainability, pp 435–448

  70. 70.

    IPCC (2018) Global warming of 1.5 °C. Intergovernmental Panel on Climate Change, 2018

  71. 71.

    Chandra A, Weissman J, Heintz B (2013) Decentralized edge clouds. IEEE Internet Computing 17(5):70–73

    Article  Google Scholar 

  72. 72.

    Ferrer AJ, Manuel Marquès J, Jorba J (2019) Towards the decentralised cloud: survey on approaches and challenges for mobile, ad hoc, and edge computing. ACM Comput Surv 51(6):1–36

    Article  Google Scholar 

  73. 73.

    Khan MA, Algarni F, Quasim MT (2020) Decentralised internet of things. In: Decentralised internet of things. Springer, Cham, pp 3–20

  74. 74.

    Psaras I (2018) Decentralised edge-computing and IoT through distributed trust. In: Proceedings of the 16th annual international conference on mobile systems, applications, and services, pp 505–507

  75. 75.

    Alqahtani A, Solaiman E, Patel P, Dustdar S, Ranjan R (2019) Service level agreement specification for end-to-end IoT application ecosystems. Softw Pract Exp 49(12):1689–1711

    Article  Google Scholar 

  76. 76.

    Xiao W, Bhardwaj R, Ramjee R, Sivathanu M, Kwatra N, Han Z, Patel P, Peng X, Zhao H, Zhang Q, Yang F, Zhou L (2018) Gandiva: introspective cluster scheduling for deep learning. In: Proceedings of the 13th USENIX conference on operating systems design and implementation (OSDI’18). USENIX Association, USA, pp 595–610

  77. 77.

    Gill SS, Garraghan P, Stankovski V, Casale G, Thulasiram RK, Ghosh SK, Ramamohanarao K, Buyya R (2019) Holistic resource management for sustainable and reliable cloud computing: An innovative solution to global challenge. J Syst Softw 155:104–129

    Article  Google Scholar 

  78. 78.

    Yang R, Hu C, Sun X, Garraghan P, Wo T, Wen Z, Peng H, Xu J, Li C (2020) Performance-aware speculative resource oversubscription for large-scale clusters. IEEE Trans Parallel Distrib Syst 31(7):1499–1517

    Article  Google Scholar 

  79. 79.

    Ma K, Li X, Chen W, Zhang C, Wang X (2012) GreenGPU: a holistic approach to energy efficiency in GPU-CPU heterogeneous architectures. In: Proceedings of international conference on parallel processing, pp 48–57

  80. 80.

    Gill SS, Tuli S, Toosi AN, Cuadrado F, Garraghan P, Bahsoon R, Lutfiyya H et al (2020) ThermoSim: deep learning based framework for modeling and simulation of thermal-aware resource management for cloud computing environments. J Syst Softw 164:110596

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the UK Engineering and Physical Sciences Research Council (EP/P031617/1).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Sukhpal Singh Gill.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lindsay, D., Gill, S.S., Smirnova, D. et al. The evolution of distributed computing systems: from fundamental to new frontiers. Computing 103, 1859–1878 (2021). https://doi.org/10.1007/s00607-020-00900-y

Download citation

Keywords

  • Distributed computing
  • Computing systems
  • Evolution
  • Green computing

Mathematics subject classification

  • 68M14
  • 68U35
  • 86A08
  • 01-02