On the Design and Implementation of an Efficient Lock-Free Scheduler

  • Florian Negele
  • Felix Friedrich
  • Suwon Oh
  • Bernhard Egger
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10353)

Abstract

Schedulers for symmetric multiprocessing (SMP) machines use sophisticated algorithms to schedule processes onto the available processor cores. Hardware-dependent code and the use of locks to protect shared data structures from simultaneous access lead to poor portability, the difficulty to prove correctness, and a myriad of problems associated with locking such as limiting the available parallelism, deadlocks, starvation, interrupt handling, and so on. In this work we explore what can be achieved in terms of portability and simplicity in an SMP scheduler that achieves similar performance to state-of-the-art schedulers. By strictly limiting ourselves to only lock-free data structures in the scheduler, the problems associated with locking vanish altogether. We show that by employing implicit cooperative scheduling, additional guarantees can be made that allow novel and very efficient implementations of memory-efficient unbounded lock-free queues. Cooperative multitasking has the additional benefit that it provides an extensive hardware independence. It even allows the scheduler to be used as a runtime library for applications running on top of standard operating systems. In a comparison against Windows Server and Linux running on up to 64 cores we analyze the performance of the lock-free scheduler and show that it matches or even outperforms the performance of these two state-of-the-art schedulers in a variety of benchmarks.

Keywords

Lock-free scheduling Cooperative multitasking Run-time environments Multicore architectures 

References

  1. 1.
    Advanced Micro Devices, Inc. AMD64 Architecture Programmer’s Manual Volume 2: System Programming, May 2013. Revision 3.23Google Scholar
  2. 2.
    Bläser, L.: A component language for pointer-free concurrent programming and its application to simulation. PhD thesis, ETH Zrich (2007)Google Scholar
  3. 3.
    Conway, M.E.: Design of a separable transition-diagram compiler. Commun. ACM 6(7), 396–408 (1963)CrossRefMATHGoogle Scholar
  4. 4.
    Fog, A.: The Microarchitecture of Intel. Technical University of Denmark, AMD and VIA CPUs (2014)Google Scholar
  5. 5.
    Greenwald, M., Cheriton, D.: The synergy between non-blocking synchronization and operating system structure. In: Second Symposium on Operating Systems Design and Implementation, OSDI 1996 (1996)Google Scholar
  6. 6.
    Herlihy, M.: A methodology for implementing highly concurrent data structures. In: Proceedings of the Second ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming, PPOPP 1990 (1990)Google Scholar
  7. 7.
    Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13(1), 124–149 (1991)CrossRefGoogle Scholar
  8. 8.
    Herlihy, M., Luchangco, V., Martin, P., Moir, M.: Dynamic sized lockfree, data structures. Technical report (2002)Google Scholar
  9. 9.
    Herlihy, M., Luchangco, V., Moir, M.: The repeat offender problem: a mechanism for supporting dynamic-sized, lock-free data structures. In: Malkhi, D. (ed.) DISC 2002. LNCS, vol. 2508, pp. 339–353. Springer, Heidelberg (2002). doi:10.1007/3-540-36108-1_23 CrossRefGoogle Scholar
  10. 10.
    Herlihy, M., Luchangco, V., Moir, M.: Obstruction-free synchronization: double-ended queues as an example. In: Proceedings of the 23rd International Conference on Distributed Computing Systems, ICDCS 2003 (2003)Google Scholar
  11. 11.
    Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann Elsevier Science (2008)Google Scholar
  12. 12.
    Hohmuth, M., Härtig, H.: Pragmatic nonblocking synchronization for realtime systems. In: Proceedings of the 2001 USENIX Annual Technical Conference, USENIX 2001 (2001)Google Scholar
  13. 13.
    Hunt, G.C., Larus, J.R.: Singularity: rethinking the software stack. SIGOPS Oper. Syst. Rev. 41(2), 37–49 (2007)CrossRefGoogle Scholar
  14. 14.
    Hwang, K., Briggs, F.A.: Computer Architecture and Parallel Processing. McGraw-Hill, New York (1984)MATHGoogle Scholar
  15. 15.
    IBM Corporation. IBM System/370 Extended Architecture Principles of Operation. Publication Number SA22-7085-0 (1983)Google Scholar
  16. 16.
    Joukov, N., Iyer, R., Traeger, A., Wright, C.P., Zadok, E.: Versatile, portable, and efficient OS profiling via latency analysis. In: Proceedings of the Twentieth ACM Symposium on Operating Systems Principles, SOSP 2005, pp. 1–14. ACM, New York (2005)Google Scholar
  17. 17.
    Kulkarni, A., Lumsdaine, A., Lang, M., Ionkov, L.: Optimizing latency and throughput for spawning processes on massively multicore processors. In: Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2012, pp. 6:1–6:7. ACM, New York (2012)Google Scholar
  18. 18.
    Martin, P., Moir, M., Steele, G.: Dcas-based concurrent deques supporting bulk allocation. Technical report, Sun Microsystems Laboratories (2002)Google Scholar
  19. 19.
    Massalin, H., Pu, C.: A lock-free multiprocessor OS kernel. Technical report, Department of Computer Science, Columbia University (1991)Google Scholar
  20. 20.
    Mellor-Crummey, J.M., LeBlanc, T.J.: A software instruction counter. In: Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS III, pp. 78–86. ACM, New York (1989)Google Scholar
  21. 21.
    Mellor-Crummey, J.M.: Concurrent queues: practical fetch-and-\(\phi \) algorithms. Technical report 229, Computer Science Deptartement, University of Rochester (1987)Google Scholar
  22. 22.
    Michael, M.M.: Hazard pointers: safe memory reclamation for lock-free objects. IEEE Trans. Parallel Distrib. Syst. 15(6), 491–504 (2004)CrossRefGoogle Scholar
  23. 23.
    Michael, M.M., Scott, M.L.: Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: Proceedings of the Fifteenth Annual ACM Symposium on Principles of Distributed Computing, PODC 1996 (1996)Google Scholar
  24. 24.
    Molnar, I.: Modular scheduler core and completely fair scheduler [CFS] (1997). http://lwn.net/Articles/230501/
  25. 25.
    Moura, A.L.D., Ierusalimschy, R.: Revisiting coroutines. ACM Trans. Program. Lang. Syst. 31(2), 6:1–6:31 (2009)CrossRefGoogle Scholar
  26. 26.
    Muller, P.J.: The active object system design and multiprocessor implementation. Ph.d. thesis, Swiss Federal Institute of Technology Zurich (ETH Zurich) (2002)Google Scholar
  27. 27.
    Sun Microsystems. Multithreading in the Solaris(TM) Operating Environment (2002)Google Scholar
  28. 28.
    Valois, J.D.: Implementing lock-free queues. In: Proceedings of the Seventh International Conference on Parallel and Distributed Computing Systems, PDCS 1994 (1994)Google Scholar
  29. 29.
    Wirth, N.: The programming language Oberon. Softw. Pract. Exp. 18(7), 671–690 (1988)CrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Florian Negele
    • 1
  • Felix Friedrich
    • 1
  • Suwon Oh
    • 2
  • Bernhard Egger
    • 2
  1. 1.Department of Computer ScienceETH ZürichZürichSwitzerland
  2. 2.Department of Computer Science and EngineeringSeoul National UniversitySeoulKorea

Personalised recommendations