Abstract
Deep learning technology is widely used in industry and academia nowadays. Several kinds of objects are involved in deep learning workflows, including algorithms, models, and labeled datasets. The effectiveness of organizing and understanding the relationship among these objects determines the efficiency of development and production. This paper proposes OMProv, which is a provenance mechanism for recording the lineage within each kind of object, and the relationship among different kinds of objects in the same execution. A weighted directed acyclic graph-based version graph abstraction and a version inference algorithm are proposed. They are consciously designed to fit the characteristics of deep learning scenarios. OMProv has been implemented in OMAI, an all-in-one deep learning platform for the cloud. OMProv helps users organize objects effectively and intuitively, and understand the root causes of the changed job results like performance or accuracy in an efficient way. The management of deep learning lifecycles and related data assets can also be simplified by using OMProv.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Acar, U., Buneman, P., Cheney, J., Van Den Bussche, J., Kwasnikowska, N., Vansummeren, S.: A graph model of data and workflow provenance. In: Proceedings of the 2nd Workshop on Theory and Practice of Provenance, pp. 1–10 (2010)
Agrawal, P., et al.: Data platform for machine learning. In: Proceedings of the 2019 International Conference on Management of Data, pp. 1803–1816 (2019)
ArangoDB Inc: ArangoDB (2011). https://www.arangodb.com
Duarte, J.C., Cavalcanti, M.C.R., de Souza Costa, I., Esteves, D.: An interoperable service for the provenance of machine learning experiments. In: Proceedings of the 2017 International Conference on Web Intelligence, pp. 132–138 (2017)
Jin, R., Ruan, N., Xiang, Y., Wang, H.: Path-tree: an efficient reachability indexing scheme for large directed graphs. ACM Trans. Database Syst. 36(1), 1–44 (2011)
Lin, J., Xie, D., Yu, B.: Research on Cloud Service Adaptation of Deep Learning. Softw. Guide 19(6), 1–8 (2020). (in Chinese)
Miao, H., Chavan, A., Deshpande, A.: ProvDB: lifecycle management of collaborative analysis workflows. In: Proceedings of the 2nd Workshop on Human-in-the-Loop Data Analytics, pp. 1–6 (2017)
Miao, H., Li, A., Davis, L.S., Deshpande, A.: Towards unified data and lifecycle management for deep learning. In: Proceedings of the 33rd International Conference on Data Engineering, pp. 571–582 (2017)
Moreau, L., et al.: The open provenance model core specification (v1.1). Future Gener. Comput. Syst. 27(6), 743–756 (2011)
Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance techniques. Technical report IUB-CS-TR618, Computer Science Department, Indiana University (2005)
Simon, K.: An improved algorithm for transitive closure on acyclic digraphs. Theor. Comput. Sci. 58(1–3), 325–346 (1988)
The Free Software Foundation: GNU Wdiff (2014). https://www.gnu.org/software/wdiff/
Tsay, J., Mummert, T., Bobroff, N., Braz, A., Westerink, P., Hirzel, M.: Runway: machine learning model experiment management tool. In: Proceedings of SysML Conference 2018, pp. 1–3 (2018)
Vartak, M., et al.: ModelDB: a system for machine learning model management. In: Proceedings of the 1st Workshop on Human-in-the-Loop Data Analytics, pp. 1–3 (2016)
Wang, J., Crawl, D., Purawat, S., Nguyen, M., Altintas, I.: Big data provenance: challenges, state of the art and opportunities. In: Proceedings of the 2015 IEEE International Conference on Big Data, pp. 2509–2516 (2015)
Yu, J.X., Cheng, J.: Graph reachability queries: a survey. In: Aggarwal, C., Wang, H. (eds.) Managing and Mining Graph Data. ADBS, vol. 40, pp. 181–215. Springer, Boston (2010). https://doi.org/10.1007/978-1-4419-6045-0_6
Zhang, Y., Xu, F., Frise, E., Wu, S., Yu, B., Xu, W.: DataLab: a version data management and analytics system. In: Proceedings of the 2nd International Workshop on Big Data Software Engineering, pp. 12–18 (2016)
Acknowledgments
We would like to thank the OMAI development team for the contributions to the high-quality implementation of this software.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Lin, J., Xie, D. (2020). OMProv: Provenance Mechanism for Objects in Deep Learning. In: Bellatreche, L., et al. ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium. TPDL ADBIS 2020 2020. Communications in Computer and Information Science, vol 1260. Springer, Cham. https://doi.org/10.1007/978-3-030-55814-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-55814-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55813-0
Online ISBN: 978-3-030-55814-7
eBook Packages: Computer ScienceComputer Science (R0)