Abstract
Current operating systems are complex systems that were designed before today’s computing environments. This makes it difficult for them to meet the scalability, heterogeneity, availability, and security challenges in current cloud and parallel computing environments. To address these problems, we propose a radically new OS design based on data-centric architecture: all operating system state should be represented uniformly as database tables, and operations on this state should be made via queries from otherwise stateless tasks. This design makes it easy to scale and evolve the OS without whole-system refactoring, inspect and debug system state, upgrade components without downtime, manage decisions using machine learning, and implement sophisticated security features. We discuss how a database OS (DBOS) can improve the programmability and performance of many of today’s most important applications and propose a plan for the development of a DBOS proof of concept.
DBOS committee members in alphabetical order. The DBOS Committee, dbos-project@googlegroups.com.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In this paper, we will use Lambda as an exemplar of any resource allocation system that supports “pay only for what you use.”.
References
Agrawal, S.R., et al.: A many-core architecture for in-memory data processing. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-50 2017, pp. 245–258. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3123939.3123985
Ardelean, D., Diwan, A., Erdman, C.: Performance analysis of cloud applications. In: 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2018), pp. 405–417. USENIX Association, Renton (2018). https://www.usenix.org/conference/nsdi18/presentation/ardelean
Arnold, J., Kaashoek, M.F.: Ksplice: Automatic rebootless kernel updates. In: Proceedings of the 4th ACM European Conference on Computer Systems, EuroSys 2009, pp. 187–198. Association for Computing Machinery, New York (2009). https://doi.org/10.1145/1519065.1519085
Atlidakis, V., Andrus, J., Geambasu, R., Mitropoulos, D., Nieh, J.: POSIX abstractions in modern operating systems: the old, the new, and the missing. In: Proceedings of the Eleventh European Conference on Computer Systems, EuroSys 2016. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2901318.2901350
Barroso, L., Marty, M., Patterson, D., Ranganathan, P.: Attack of the killer microseconds. Commun. ACM 60(4), 48–54 (2017). https://doi.org/10.1145/3015146
Bauer, M.: Paranoid penguin: an introduction to Novell AppArmor. Linux J. 2006(148), 13 (2006)
Baumann, A., et al.: The multikernel: a new OS architecture for scalable multicore systems. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP 2009, pp. 29–44. Association for Computing Machinery, New York (2009). https://doi.org/10.1145/1629575.1629579
Belay, A., Bittau, A., Mashtizadeh, A., Terei, D., Mazières, D., Kozyrakis, C.: Dune: Safe user-level access to privileged CPU features. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI 2012, pp. 335–348. USENIX Association, USA (2012)
Belay, A., Prekas, G., Klimovic, A., Grossman, S., Kozyrakis, C., Bugnion, E.: IX: A protected dataplane operating system for high throughput and low latency. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2014), pp. 49–65. USENIX Association, Broomfield (2014). https://www.usenix.org/conference/osdi14/technical-sessions/presentation/belay
Bhat, S.S., Eqbal, R., Clements, A.T., Kaashoek, M.F., Zeldovich, N.: Scaling a file system to many cores using an operation log. In: Proceedings of the 26th Symposium on Operating Systems Principles, SOSP 2017, pp. 69–86. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3132747.3132779
Boyd-Wickizer, S., et al.: An analysis of linux scalability to many cores. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI 2010, pp. 1–16. USENIX Association, USA (2010)
Byun, C., et al.: Large scale parallelization using file-based communications. In: 2019 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–7 (2019)
Byun, C., et al.: LLMapReduce: multi-level map-reduce for high performance data analysis. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–8 (2016)
Byun, C., et al.: Optimizing Xeon phi for interactive data analysis. In: 2019 IEEE High Performance Extreme Computing Conference (HPEC) (2019). https://doi.org/10.1109/hpec.2019.8916300
Cailliau, P., Davis, T., Gadepally, V., Kepner, J., Lipman, R., Lovitz, J., Ouaknine, K.: RedisGraph graphBLAS enabled graph database. IEEE (2019). https://doi.org/10.1109/ipdpsw.2019.00054
Cantrill, B.M., Shapiro, M.W., Leventhal, A.H.: Dynamic instrumentation of production systems. In: Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC 2004, p. 2. USENIX Association, USA (2004)
Carbune, V., Coppey, T., Daryin, A., Deselaers, T., Sarda, N., Yagnik, J.: SmartChoices: hybridizing programming and machine learning. In: Reinforcement Learning for Real Life (RL4RealLife) Workshop in the 36th International Conference on Machine Learning (ICML) (2019). https://arxiv.org/abs/1810.00619
Castro, P., Ishakian, V., Muthusamy, V., Slominski, A.: The rise of serverless computing. Commun. ACM 62(12), 44–54 (2019)
Chamberlin, D.D., et al.: A history and evaluation of system R. Commun. ACM 24(10), 632–646 (1981). https://doi.org/10.1145/358769.358784
Chandra, R., Kim, T., Zeldovich, N.: Asynchronous intrusion recovery for interconnected web services, pp. 213–227 (2013). https://doi.org/10.1145/2517349.2522725
Cortez, E., Bonde, A., Muzio, A., Russinovich, M., Fontoura, M., Bianchini, R.: Resource central: understanding and predicting workloads for improved resource management in large cloud platforms. In: Proceedings of the 26th Symposium on Operating Systems Principles, SOSP 2017, pp. 153–167. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3132747.3132772
Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013). https://doi.org/10.1145/2408776.2408794
Delimitrou, C., Kozyrakis, C.: Paragon: QoS-aware scheduling for heterogeneous datacenters. In: Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2013, pp. 77–88. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2451116.2451125
Feiner, P., Brown, A.D., Goel, A.: Comprehensive kernel instrumentation via dynamic binary translation. In: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pp. 135–146. Association for Computing Machinery, New York (2012). https://doi.org/10.1145/2150976.2150992
Fouladi, S., et al.: From laptop to lambda: outsourcing everyday jobs to thousands of transient functional containers. In: 2019 USENIX Annual Technical Conference (USENIX ATC 2019), pp. 475–488. USENIX Association, Renton (2019). https://www.usenix.org/conference/atc19/presentation/fouladi
Fuller, B., et al.: SoK: cryptographically protected database search, pp. 172–191 (2017)
Gadepally, V., et al.: The bigDAWG polystore system and architecture, pp. 1–6 (2016)
Gadepally, V., et al.: Computing on masked data to improve the security of big data, pp. 1–6 (2015)
Gadepally, V., et al.: D4M: Bringing associative arrays to database engines, pp. 1–6 (2015)
Gadepally, V., et al.: Heterogeneous Data Management, Polystores, and Analytics for Healthcare: VLDB 2019 Workshops, Poly and DMAH, Revised Selected Papers, vol. 11721. Springer Nature (2019)
Gan, Y., et al.: An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, pp. 3–18. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3297858.3304013
Gleixner, T.: Refactoring the Linux kernel (2017). https://kernel-recipes.org/en/2017/talks/refactoring-the-linux-kernel/
Hutchison, D., Kepner, J., Gadepally, V., Fuchs, A.: Graphulo implementation of server-side sparse matrix multiply in the accumulo database. In: 2015 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–7 (2015)
Jouppi, N.P., et al.: A domain-specific supercomputer for training deep neural networks. Commun. ACM 63(7), 67–78 (2020). https://doi.org/10.1145/3360307
Kamath, A.K., Monis, L., Karthik, A.T., Talawar, B.: Storage class memory: principles, problems, and possibilities (2019)
Kedia, P., Bansal, S.: Fast dynamic binary translation for the kernel. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP 2013, pp. 101–115. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2517349.2522718
Kepner, J., et al.: Mathematical foundations of the graphBLAS, pp. 1–9 (2016)
Kepner, J., et al.: TabulaROSA: tabular operating system architecture for massively parallel heterogeneous compute engines. In: 2018 IEEE High Performance extreme Computing Conference (HPEC), pp. 1–8 (2018)
Kepner, J., et al.: Associative array model of SQL, NoSQL, and NewSQL databases, pp. 1–9 (2016)
Kepner, J., et al.: Computing on masked data: a high performance method for improving big data veracity, pp. 1–6 (2014)
Kepner, J.: Parallel MATLAB for multicore and multinode computers. SIAM (2009)
Kepner, J., Cho, K., Claffy, K., Gadepally, V., Michaleas, P., Milechin, L.: Hypersparse neural network analysis of large-scale internet traffic. IEEE (2019). https://doi.org/10.1109/hpec.2019.8916263
Kepner, J., et al.: 75,000,000,000 streaming inserts/second using hierarchical hypersparse graphblas matrices (2020)
Kepner, J., Jananthan, H.: Mathematics of Big Data: Spreadsheets, Databases, Matrices, and Graphs. MIT Press, Massachusetts (2018)
Khan, Y., Zimmermann, A., Jha, A., Gadepally, V., D’Aquin, M., Sahay, R.: One size does not fit all: Querying web polystores. IEEE Access 7, 9598–9617 (2019)
Kim, T., Wang, X., Zeldovich, N., Kaashoek, M.F.: Intrusion recovery using selective re-execution. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI 2010, pp. 89–104. USENIX Association, USA (2010)
Klimovic, A., Litz, H., Kozyrakis, C.: Reflex: Remote flash \(=\) local flash. In: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2017, pp. 345–359. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3037697.3037732
Klimovic, A., Wang, Y., Stuedi, P., Trivedi, A., Pfefferle, J., Kozyrakis, C.: Pocket: elastic ephemeral storage for serverless analytics. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 427–444. USENIX Association, Carlsbad (2018). https://www.usenix.org/conference/osdi18/presentation/klimovic
Larabel, M.: The performance cost to selinux on fedora 31 (2020). https://www.phoronix.com/scan.php?page=article&item=fedora-31-selinux&num=1
Leiserson, C.E., et al.: There’s plenty of room at the top: what will drive computer performance after Moore’s law? Science 368(6495) (2020)
Scaling in the Linux networking stack. https://www.kernel.org/doc/html/latest/networking/scaling.html
Lozi, J.P., Lepers, B., Funston, J., Gaud, F., Quéma, V., Fedorova, A.: The Linux scheduler: a decade of wasted cores. In: Proceedings of the Eleventh European Conference on Computer Systems, EuroSys 2016. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2901318.2901326
Lozi, J.P., Lepers, B., Funston, J., Gaud, F., Quéma, V., Fedorova, A.: The Linux scheduler: a decade of wasted cores. In: Proceedings of the Eleventh European Conference on Computer Systems, EuroSys 2016. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2901318.2901326
Lu, J., Holubová, I., Cautis, B.: Multi-model databases and tightly integrated polystores: current practices, comparisons, and open challenges, pp. 2301–2302 (2018)
Mao, H., Schwarzkopf, M., Venkatakrishnan, S.B., Meng, Z., Alizadeh, M.: Learning scheduling algorithms for data processing clusters. In: Wu, J., Hall, W. (eds.) Proceedings of the ACM Special Interest Group on Data Communication, SIGCOMM 2019, Beijing, China, 19–23 August 2019, pp. 270–288. ACM (2019). https://doi.org/10.1145/3341302.3342080
Mirhoseini, A., Goldie, A., Pham, H., Steiner, B., Le, Q.V., Dean, J.: Hierarchical planning for device placement (2018). https://openreview.net/pdf?id=Hkc-TeZ0W
Mitchell, C., Geng, Y., Li, J.: Using one-sided RDMA reads to build a fast, CPU-efficient key-value store. In: Proceedings of the 2013 USENIX Conference on Annual Technical Conference, USENIX ATC 2013, pp. 103–114. USENIX Association, USA (2013)
Padioleau, Y., Lawall, J.L., Muller, G.: Understanding collateral evolution in Linux device drivers. SIGOPS Oper. Syst. Rev. 40(4), 59–71 (2006). https://doi.org/10.1145/1218063.1217942
Poddar, R., Boelter, T., Popa, R.A.: Arx: an encrypted database using semantically secure encryption. Proc. VLDB Endowment 12(11), 1664–1678 (2019)
Poimboeuf, J.: Introducing kpatch: dynamic kernel patching (2014). http://rhelblog.redhat.com/2014/02/26/kpatch/
Popa, R.A., Redfield, C.M., Zeldovich, N., Balakrishnan, H.: CryptDB: protecting confidentiality with encrypted query processing. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pp. 85–100 (2011)
Reuther, A., et al.: Scalable system scheduling for HPC and big data. J. Parallel Distrib. Comput. 111, 76–92 (2018). https://doi.org/10.1016/j.jpdc.2017.06.009
Reuther, A., et al.: Interactive supercomputing on 40,000 cores for machine learning and data analysis. In: 2018 IEEE High Performance extreme Computing Conference (HPEC) (2018). https://doi.org/10.1109/hpec.2018.8547629
Shan, Y., Huang, Y., Chen, Y., Zhang, Y.: LegoOS: a disseminated, distributed OS for hardware resource disaggregation. In: Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, OSDI 2018, pp. 69–87. USENIX Association, USA (2018)
Smalley, S., Vance, C., Salamon, W.: Implementing SELinux as a Linux security module. Technical report (2001)
The snowflake cloud data platform. https://www.snowflake.com/
Song, C., Lee, B., Lu, K., Harris, W., Kim, T., Lee, W.: Enforcing kernel security invariants with data flow integrity. In: 23rd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, California, USA, 21–24 February 2016. The Internet Society (2016). http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2017/09/enforcing-kernal-security-invariants-data-flow-integrity.pdf
Stonebraker, M., Çetintemel, U.: “One size fits all” an idea whose time has come and gone. In: Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker, pp. 441–462 (2018)
Tan, R., Chirkova, R., Gadepally, V., Mattson, T.G.: Enabling query processing across heterogeneous data models: a survey, pp. 3211–3220 (2017)
Thumshin, J.: Introduction to the Linux block I/O layer (2016). https://media.ccc.de/v/784-introduction-to-the-linux-block-i-o-layer
Tsai, C.C., Jain, B., Abdul, N.A., Porter, D.E.: A study of modern Linux API usage and compatibility: what to support when you’re supporting. In: Proceedings of the Eleventh European Conference on Computer Systems, EuroSys 2016. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2901318.2901341
Weisenthal, J.: Reinhart and Rogoff: ‘full stop’, we made a microsoft excel blunder in our debt study, and it makes a difference (2013). https://www.businessinsider.com/reinhart-and-rogoff-admit-excel-blunder-2013-4
Wikipedia: AWS Lambda (2020). https://en.wikipedia.org/wiki/AWS_Lambda
Wikipedia: General data protection regulation (2020). https://en.wikipedia.org/wiki/General_Data_Protection_Regulation
Attribute-based access control – Wikipedia, the free encyclopedia (2020). https://en.wikipedia.org/w/index.php?title=Attribute-based_access_control&oldid=967477902
Completely fair scheduler – Wikipedia, the free encyclopedia (2020). https://en.wikipedia.org/w/index.php?title=Completely_Fair_Scheduler&oldid=959791832
Yakoubov, S., Gadepally, V., Schear, N., Shen, E., Yerukhimovich, A.: A survey of cryptographic approaches to securing big-data analytics in the cloud, pp. 1–6 (2014)
Zamanian, E., Yu, X., Stonebraker, M., Kraska, T.: Rethinking database high availability with RDMA networks. Proc. VLDB Endow. 12(11), 1637–1650 (2019). https://doi.org/10.14778/3342263.3342639
Zeldovich, N., Boyd-Wickizer, S., Kohler, E., Mazières, D.: Making information flow explicit in HiStar. Commun. ACM 54(11), 93–101 (2011). https://doi.org/10.1145/2018396.2018419
Acknowledgments
This work was partially supported by National Science Foundation CCF-1533644 and United States Air Force Research Laboratory Cooperative Agreement Number FA8750-19-2-1000. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or the United States Air Force. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. The authors would also like to thank Charles Leiserson, Peter Michaleas, Albert Reuther, Michael Jones, and the MIT Supercloud Team.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Cafarella, M. et al. (2021). A Polystore Based Database Operating System (DBOS). In: Gadepally, V., et al. Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2020 2020. Lecture Notes in Computer Science(), vol 12633. Springer, Cham. https://doi.org/10.1007/978-3-030-71055-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-71055-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71054-5
Online ISBN: 978-3-030-71055-2
eBook Packages: Computer ScienceComputer Science (R0)