Skip to main content

OMProv: Provenance Mechanism for Objects in Deep Learning

  • Conference paper
  • First Online:
ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium (TPDL 2020, ADBIS 2020)

Abstract

Deep learning technology is widely used in industry and academia nowadays. Several kinds of objects are involved in deep learning workflows, including algorithms, models, and labeled datasets. The effectiveness of organizing and understanding the relationship among these objects determines the efficiency of development and production. This paper proposes OMProv, which is a provenance mechanism for recording the lineage within each kind of object, and the relationship among different kinds of objects in the same execution. A weighted directed acyclic graph-based version graph abstraction and a version inference algorithm are proposed. They are consciously designed to fit the characteristics of deep learning scenarios. OMProv has been implemented in OMAI, an all-in-one deep learning platform for the cloud. OMProv helps users organize objects effectively and intuitively, and understand the root causes of the changed job results like performance or accuracy in an efficient way. The management of deep learning lifecycles and related data assets can also be simplified by using OMProv.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Acar, U., Buneman, P., Cheney, J., Van Den Bussche, J., Kwasnikowska, N., Vansummeren, S.: A graph model of data and workflow provenance. In: Proceedings of the 2nd Workshop on Theory and Practice of Provenance, pp. 1–10 (2010)

    Google Scholar 

  2. Agrawal, P., et al.: Data platform for machine learning. In: Proceedings of the 2019 International Conference on Management of Data, pp. 1803–1816 (2019)

    Google Scholar 

  3. ArangoDB Inc: ArangoDB (2011). https://www.arangodb.com

  4. Duarte, J.C., Cavalcanti, M.C.R., de Souza Costa, I., Esteves, D.: An interoperable service for the provenance of machine learning experiments. In: Proceedings of the 2017 International Conference on Web Intelligence, pp. 132–138 (2017)

    Google Scholar 

  5. Jin, R., Ruan, N., Xiang, Y., Wang, H.: Path-tree: an efficient reachability indexing scheme for large directed graphs. ACM Trans. Database Syst. 36(1), 1–44 (2011)

    Article  Google Scholar 

  6. Lin, J., Xie, D., Yu, B.: Research on Cloud Service Adaptation of Deep Learning. Softw. Guide 19(6), 1–8 (2020). (in Chinese)

    Google Scholar 

  7. Miao, H., Chavan, A., Deshpande, A.: ProvDB: lifecycle management of collaborative analysis workflows. In: Proceedings of the 2nd Workshop on Human-in-the-Loop Data Analytics, pp. 1–6 (2017)

    Google Scholar 

  8. Miao, H., Li, A., Davis, L.S., Deshpande, A.: Towards unified data and lifecycle management for deep learning. In: Proceedings of the 33rd International Conference on Data Engineering, pp. 571–582 (2017)

    Google Scholar 

  9. Moreau, L., et al.: The open provenance model core specification (v1.1). Future Gener. Comput. Syst. 27(6), 743–756 (2011)

    Article  Google Scholar 

  10. Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance techniques. Technical report IUB-CS-TR618, Computer Science Department, Indiana University (2005)

    Google Scholar 

  11. Simon, K.: An improved algorithm for transitive closure on acyclic digraphs. Theor. Comput. Sci. 58(1–3), 325–346 (1988)

    Article  MathSciNet  Google Scholar 

  12. The Free Software Foundation: GNU Wdiff (2014). https://www.gnu.org/software/wdiff/

  13. Tsay, J., Mummert, T., Bobroff, N., Braz, A., Westerink, P., Hirzel, M.: Runway: machine learning model experiment management tool. In: Proceedings of SysML Conference 2018, pp. 1–3 (2018)

    Google Scholar 

  14. Vartak, M., et al.: ModelDB: a system for machine learning model management. In: Proceedings of the 1st Workshop on Human-in-the-Loop Data Analytics, pp. 1–3 (2016)

    Google Scholar 

  15. Wang, J., Crawl, D., Purawat, S., Nguyen, M., Altintas, I.: Big data provenance: challenges, state of the art and opportunities. In: Proceedings of the 2015 IEEE International Conference on Big Data, pp. 2509–2516 (2015)

    Google Scholar 

  16. Yu, J.X., Cheng, J.: Graph reachability queries: a survey. In: Aggarwal, C., Wang, H. (eds.) Managing and Mining Graph Data. ADBS, vol. 40, pp. 181–215. Springer, Boston (2010). https://doi.org/10.1007/978-1-4419-6045-0_6

    Chapter  Google Scholar 

  17. Zhang, Y., Xu, F., Frise, E., Wu, S., Yu, B., Xu, W.: DataLab: a version data management and analytics system. In: Proceedings of the 2nd International Workshop on Big Data Software Engineering, pp. 12–18 (2016)

    Google Scholar 

Download references

Acknowledgments

We would like to thank the OMAI development team for the contributions to the high-quality implementation of this software.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Lin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lin, J., Xie, D. (2020). OMProv: Provenance Mechanism for Objects in Deep Learning. In: Bellatreche, L., et al. ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium. TPDL ADBIS 2020 2020. Communications in Computer and Information Science, vol 1260. Springer, Cham. https://doi.org/10.1007/978-3-030-55814-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-55814-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-55813-0

  • Online ISBN: 978-3-030-55814-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics