Skip to main content

Shifting Left for Early Detection of Machine-Learning Bugs

  • Conference paper
  • First Online:
Formal Methods (FM 2023)

Abstract

Computational notebooks are widely used for machine learning (ML). However, notebooks raise new correctness concerns beyond those found in traditional programming environments. ML library APIs are easy to misuse, and the notebook execution model raises entirely new problems concerning reproducibility. It is common to use static analyses to detect bugs and enforce best practices in software applications. However, when configured with new types of rules tailored to notebooks, these analyses can also detect notebook-specific problems. We present our initial efforts in understanding how static analysis for notebooks differs from analysis of traditional application software. We created six new rules for the CodeGuru Reviewer based on discussions with ML practitioners. We ran the tool on close to 10,000 experimentation notebooks, resulting in an average of approximately 1 finding per 7 notebooks. Approximately 60% of the findings that we reviewed are real notebook defects. (Due to confidentiality limitations, we cannot disclose the exact number of notebook files and findings.)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/

  2. Apache: Apache MXNet (2022). https://mxnet.apache.org/versions/1.9.1/

  3. Bessey, A., et al.: A few billion lines of code later: using static analysis to find bugs in the real world. Commun. ACM 53(2), 66–75 (2010). https://doi.org/10.1145/1646353.1646374. ISSN 0001-0782

    Article  Google Scholar 

  4. Boyd, S., Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  5. Chollet, F., et al.: Keras (2015). https://keras.io

  6. Dilhara, M., Ketkar, A., Dig, D.: Understanding software-2.0: a study of machine learning library usage and evolution. ACM Trans. Softw. Eng. Methodol. 30(4) (2021). https://doi.org/10.1145/3453478. ISSN 1049-331X

  7. Distefano, D., Fähndrich, M., Logozzo, F., O’Hearn, P.W.: Scaling static analyses at Facebook. Commun. ACM 62(8), 62–70 (2019). https://doi.org/10.1145/3338112. ISSN 0001-0782

    Article  Google Scholar 

  8. Dolby, J., Shinnar, A., Allain, A., Reinen, J.: Ariadne: analysis for machine learning programs. In: Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL 2018, pp. 1–10. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3211346.3211349. ISBN 9781450358347

  9. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design patterns: abstraction and reuse of object-oriented design. In: Nierstrasz, O.M. (ed.) ECOOP 1993. LNCS, vol. 707, pp. 406–431. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-47910-4_21

    Chapter  Google Scholar 

  10. Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., Vechev, M.: AI2: safety and robustness certification of neural networks with abstract interpretation. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 3–18 (2018). https://doi.org/10.1109/SP.2018.00058

  11. Grotov, K., Titov, S., Sotnikov, V., Golubev, Y., Bryksin, T.: A large-scale comparison of Python code in Jupyter notebooks and scripts. In: Proceedings of the 19th International Conference on Mining Software Repositories, MSR 2022, pp. 353–364. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3524842.3528447. ISBN 9781450393034

  12. Guest, G., Bunce, A., Johnson, L.: How many interviews are enough? An experiment with data saturation and variability. Field Methods 18(1), 59–82 (2006)

    Article  Google Scholar 

  13. Humbatova, N., Jahangirova, G., Bavota, G., Riccio, V., Stocco, A., Tonella, P.: Taxonomy of real faults in deep learning systems. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ICSE 2020, pp. 1110–1121. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3377811.3380395. ISBN 9781450371216

  14. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)

    Google Scholar 

  15. Kluyver, T., et al.: Jupyter notebooks - a publishing format for reproducible computational workflows. In: Loizides, F., Scmidt, B. (eds.) Positioning and Power in Academic Publishing: Players, Agents and Agendas, pp. 87–90. IOS Press (2016). https://eprints.soton.ac.uk/403913/

  16. Lagouvardos, S., Dolby, J., Grech, N., Antoniadis, A., Smaragdakis, Y.: Static analysis of shape in TensorFlow programs. In: Hirschfeld, R., Pape, T. (eds.) 34th European Conference on Object-Oriented Programming (ECOOP 2020). Leibniz International Proceedings in Informatics (LIPIcs), vol. 166, pp. 15:1–15:29. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Dagstuhl (2020). https://doi.org/10.4230/LIPIcs.ECOOP.2020.15, https://drops.dagstuhl.de/opus/volltexte/2020/13172. ISBN 978-3-95977-154-2, ISSN 1868-8969

  17. LeCun, Y., Touresky, D., Hinton, G., Sejnowski, T.: A theoretical framework for back-propagation. In: Proceedings of the 1988 Connectionist Models Summer School, vol. 1, pp. 21–28 (1988)

    Google Scholar 

  18. Liu, C., et al.: Detecting TensorFlow program bugs in real-world industrial environment. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 55–66 (2021). https://doi.org/10.1109/ASE51524.2021.9678891

  19. Madhyastha, P., Jain, R.: On model stability as a function of random seed. arXiv preprint arXiv:1909.10447 (2019)

  20. Microsoft: Pyright: Static type checker for Python (2022). https://github.com/microsoft/pyright

  21. Mukherjee, R., Tripp, O., Liblit, B., Wilson, M.: Static analysis for AWS best practices in Python code. In: Ali, K., Vitek, J. (eds.) 36th European Conference on Object-Oriented Programming, ECOOP 2022, 6–10 June 2022, Berlin, Germany. LIPIcs, vol. 222, pp. 14:1–14:28. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022), https://doi.org/10.4230/LIPIcs.ECOOP.2022.14

  22. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32 (2019)

    Google Scholar 

  23. Pimentel, J.A.F., Murta, L., Braganholo, V., Freire, J.: A large-scale study about quality and reproducibility of Jupyter notebooks. In: Proceedings of the 16th International Conference on Mining Software Repositories, MSR 2019, pp. 507–517. IEEE Press (2019). https://doi.org/10.1109/MSR.2019.00077

  24. Python Software Foundation: The Python standard library: typing—support for type hints: typing.Union (2022). https://docs.python.org/3/library/typing.html#typing.Union

  25. Python Software Foundation: The Python standard library: typing—support for type hints: The Any type (2022). https://docs.python.org/3/library/typing.html#the-any-type

  26. Quaranta, L.: Assessing the quality of computational notebooks for a frictionless transition from exploration to production. In: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings, ICSE 2022, pp. 256–260. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3510454.3517055. ISBN 9781450392235

  27. Quaranta, L., Calefato, F., Lanubile, F.: Pynblint: a static analyzer for Python Jupyter notebooks. In: 2022 IEEE/ACM 1st International Conference on AI Engineering - Software Engineering for AI (CAIN), pp. 48–49 (2022). https://doi.org/10.1145/3522664.3528612

  28. Rasley, J., Rajbhandari, S., Ruwase, O., He, Y.: DeepSpeed: system optimizations enable training deep learning models with over 100 billion parameters. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3505–3506 (2020)

    Google Scholar 

  29. Research, I.: WALA: The T. J. Watson libraries for analysis (2022). https://github.com/wala/WALA

  30. Ruder, S.: An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016)

  31. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  32. Subotić, P., Milikić, L., Stojić, M.: A static analysis framework for data science notebooks. In: Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP 2022, pp. 13–22. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3510457.3513032. ISBN 9781450392266

  33. Urban, C.: Static analysis of data science software. In: Chang, B.-Y.E. (ed.) SAS 2019. LNCS, vol. 11822, pp. 17–23. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32304-2_2. ISBN 978-3-030-32304-2

  34. Wan, C., Liu, S., Hoffmann, H., Maire, M., Lu, S.: Are machine learning cloud APIs used correctly? In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 125–137 (2021). https://doi.org/10.1109/ICSE43902.2021.00024

  35. Wan, Z., Xia, X., Lo, D., Murphy, G.C.: How does machine learning change software development practices? IEEE Trans. Software Eng. 47(9), 1857–1871 (2021). https://doi.org/10.1109/TSE.2019.2937083

    Article  Google Scholar 

  36. Wang, J., Kuo, T.y., Li, L., Zeller, A.: Restoring reproducibility of Jupyter notebooks. In: 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pp. 288–289 (2020)

    Google Scholar 

  37. Wu, D., Shen, B., Chen, Y., Jiang, H., Qiao, L.: Tensfa: detecting and repairing tensor shape faults in deep learning systems. In: 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), pp. 11–21 (2021). https://doi.org/10.1109/ISSRE52982.2021.00014

  38. Zhang, Y., Ren, L., Chen, L., Xiong, Y., Cheung, S.C., Xie, T.: Detecting numerical bugs in neural network architectures. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020, pp. 826–837. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3368089.3409720. ISBN 9781450370431

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linghui Luo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liblit, B. et al. (2023). Shifting Left for Early Detection of Machine-Learning Bugs. In: Chechik, M., Katoen, JP., Leucker, M. (eds) Formal Methods. FM 2023. Lecture Notes in Computer Science, vol 14000. Springer, Cham. https://doi.org/10.1007/978-3-031-27481-7_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-27481-7_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-27480-0

  • Online ISBN: 978-3-031-27481-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics