Shifting Left for Early Detection of Machine-Learning Bugs

Liblit, Ben; Luo, Linghui; Molina, Alejandro; Mukherjee, Rajdeep; Patterson, Zachary; Piskachev, Goran; Schäf, Martin; Tripp, Omer; Visser, Willem

doi:10.1007/978-3-031-27481-7_33

Ben Liblit¹⁰,
Linghui Luo¹¹,
Alejandro Molina¹²,
Rajdeep Mukherjee¹⁴,
Zachary Patterson¹³,
Goran Piskachev¹¹,
Martin Schäf¹⁵,
Omer Tripp¹⁴ &
…
Willem Visser¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14000))

Included in the following conference series:

International Symposium on Formal Methods

1 Citations

Abstract

Computational notebooks are widely used for machine learning (ML). However, notebooks raise new correctness concerns beyond those found in traditional programming environments. ML library APIs are easy to misuse, and the notebook execution model raises entirely new problems concerning reproducibility. It is common to use static analyses to detect bugs and enforce best practices in software applications. However, when configured with new types of rules tailored to notebooks, these analyses can also detect notebook-specific problems. We present our initial efforts in understanding how static analysis for notebooks differs from analysis of traditional application software. We created six new rules for the CodeGuru Reviewer based on discussions with ML practitioners. We ran the tool on close to 10,000 experimentation notebooks, resulting in an average of approximately 1 finding per 7 notebooks. Approximately 60% of the findings that we reviewed are real notebook defects. (Due to confidentiality limitations, we cannot disclose the exact number of notebook files and findings.)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Can static analysis tools find more defects?

Article 08 November 2022

Supporting Code Review by Automatic Detection of Potentially Buggy Changes

Static Testing

References

Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/
Apache: Apache MXNet (2022). https://mxnet.apache.org/versions/1.9.1/
Bessey, A., et al.: A few billion lines of code later: using static analysis to find bugs in the real world. Commun. ACM 53(2), 66–75 (2010). https://doi.org/10.1145/1646353.1646374. ISSN 0001-0782
Article Google Scholar
Boyd, S., Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Chollet, F., et al.: Keras (2015). https://keras.io
Dilhara, M., Ketkar, A., Dig, D.: Understanding software-2.0: a study of machine learning library usage and evolution. ACM Trans. Softw. Eng. Methodol. 30(4) (2021). https://doi.org/10.1145/3453478. ISSN 1049-331X
Distefano, D., Fähndrich, M., Logozzo, F., O’Hearn, P.W.: Scaling static analyses at Facebook. Commun. ACM 62(8), 62–70 (2019). https://doi.org/10.1145/3338112. ISSN 0001-0782
Article Google Scholar
Dolby, J., Shinnar, A., Allain, A., Reinen, J.: Ariadne: analysis for machine learning programs. In: Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL 2018, pp. 1–10. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3211346.3211349. ISBN 9781450358347
Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design patterns: abstraction and reuse of object-oriented design. In: Nierstrasz, O.M. (ed.) ECOOP 1993. LNCS, vol. 707, pp. 406–431. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-47910-4_21
Chapter Google Scholar
Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., Vechev, M.: AI2: safety and robustness certification of neural networks with abstract interpretation. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 3–18 (2018). https://doi.org/10.1109/SP.2018.00058
Grotov, K., Titov, S., Sotnikov, V., Golubev, Y., Bryksin, T.: A large-scale comparison of Python code in Jupyter notebooks and scripts. In: Proceedings of the 19th International Conference on Mining Software Repositories, MSR 2022, pp. 353–364. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3524842.3528447. ISBN 9781450393034
Guest, G., Bunce, A., Johnson, L.: How many interviews are enough? An experiment with data saturation and variability. Field Methods 18(1), 59–82 (2006)
Article Google Scholar
Humbatova, N., Jahangirova, G., Bavota, G., Riccio, V., Stocco, A., Tonella, P.: Taxonomy of real faults in deep learning systems. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ICSE 2020, pp. 1110–1121. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3377811.3380395. ISBN 9781450371216
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Google Scholar
Kluyver, T., et al.: Jupyter notebooks - a publishing format for reproducible computational workflows. In: Loizides, F., Scmidt, B. (eds.) Positioning and Power in Academic Publishing: Players, Agents and Agendas, pp. 87–90. IOS Press (2016). https://eprints.soton.ac.uk/403913/
Lagouvardos, S., Dolby, J., Grech, N., Antoniadis, A., Smaragdakis, Y.: Static analysis of shape in TensorFlow programs. In: Hirschfeld, R., Pape, T. (eds.) 34th European Conference on Object-Oriented Programming (ECOOP 2020). Leibniz International Proceedings in Informatics (LIPIcs), vol. 166, pp. 15:1–15:29. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Dagstuhl (2020). https://doi.org/10.4230/LIPIcs.ECOOP.2020.15, https://drops.dagstuhl.de/opus/volltexte/2020/13172. ISBN 978-3-95977-154-2, ISSN 1868-8969
LeCun, Y., Touresky, D., Hinton, G., Sejnowski, T.: A theoretical framework for back-propagation. In: Proceedings of the 1988 Connectionist Models Summer School, vol. 1, pp. 21–28 (1988)
Google Scholar
Liu, C., et al.: Detecting TensorFlow program bugs in real-world industrial environment. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 55–66 (2021). https://doi.org/10.1109/ASE51524.2021.9678891
Madhyastha, P., Jain, R.: On model stability as a function of random seed. arXiv preprint arXiv:1909.10447 (2019)
Microsoft: Pyright: Static type checker for Python (2022). https://github.com/microsoft/pyright
Mukherjee, R., Tripp, O., Liblit, B., Wilson, M.: Static analysis for AWS best practices in Python code. In: Ali, K., Vitek, J. (eds.) 36th European Conference on Object-Oriented Programming, ECOOP 2022, 6–10 June 2022, Berlin, Germany. LIPIcs, vol. 222, pp. 14:1–14:28. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022), https://doi.org/10.4230/LIPIcs.ECOOP.2022.14
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Pimentel, J.A.F., Murta, L., Braganholo, V., Freire, J.: A large-scale study about quality and reproducibility of Jupyter notebooks. In: Proceedings of the 16th International Conference on Mining Software Repositories, MSR 2019, pp. 507–517. IEEE Press (2019). https://doi.org/10.1109/MSR.2019.00077
Python Software Foundation: The Python standard library: typing—support for type hints: typing.Union (2022). https://docs.python.org/3/library/typing.html#typing.Union
Python Software Foundation: The Python standard library: typing—support for type hints: The Any type (2022). https://docs.python.org/3/library/typing.html#the-any-type
Quaranta, L.: Assessing the quality of computational notebooks for a frictionless transition from exploration to production. In: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings, ICSE 2022, pp. 256–260. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3510454.3517055. ISBN 9781450392235
Quaranta, L., Calefato, F., Lanubile, F.: Pynblint: a static analyzer for Python Jupyter notebooks. In: 2022 IEEE/ACM 1st International Conference on AI Engineering - Software Engineering for AI (CAIN), pp. 48–49 (2022). https://doi.org/10.1145/3522664.3528612
Rasley, J., Rajbhandari, S., Ruwase, O., He, Y.: DeepSpeed: system optimizations enable training deep learning models with over 100 billion parameters. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3505–3506 (2020)
Google Scholar
Research, I.: WALA: The T. J. Watson libraries for analysis (2022). https://github.com/wala/WALA
Ruder, S.: An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Subotić, P., Milikić, L., Stojić, M.: A static analysis framework for data science notebooks. In: Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP 2022, pp. 13–22. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3510457.3513032. ISBN 9781450392266
Urban, C.: Static analysis of data science software. In: Chang, B.-Y.E. (ed.) SAS 2019. LNCS, vol. 11822, pp. 17–23. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32304-2_2. ISBN 978-3-030-32304-2
Wan, C., Liu, S., Hoffmann, H., Maire, M., Lu, S.: Are machine learning cloud APIs used correctly? In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 125–137 (2021). https://doi.org/10.1109/ICSE43902.2021.00024
Wan, Z., Xia, X., Lo, D., Murphy, G.C.: How does machine learning change software development practices? IEEE Trans. Software Eng. 47(9), 1857–1871 (2021). https://doi.org/10.1109/TSE.2019.2937083
Article Google Scholar
Wang, J., Kuo, T.y., Li, L., Zeller, A.: Restoring reproducibility of Jupyter notebooks. In: 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pp. 288–289 (2020)
Google Scholar
Wu, D., Shen, B., Chen, Y., Jiang, H., Qiao, L.: Tensfa: detecting and repairing tensor shape faults in deep learning systems. In: 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), pp. 11–21 (2021). https://doi.org/10.1109/ISSRE52982.2021.00014
Zhang, Y., Ren, L., Chen, L., Xiong, Y., Cheung, S.C., Xie, T.: Detecting numerical bugs in neural network architectures. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020, pp. 826–837. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3368089.3409720. ISBN 9781450370431

Download references

Author information

Authors and Affiliations

Amazon Web Services, Arlington, USA
Ben Liblit
Amazon Web Services, Berlin, Germany
Linghui Luo & Goran Piskachev
Amazon, Seattle, USA
Alejandro Molina
The University of Texas at Dallas, Richardson, USA
Zachary Patterson
Amazon Web Services, Santa Clara, USA
Rajdeep Mukherjee, Omer Tripp & Willem Visser
Amazon Web Services, New York, USA
Martin Schäf

Authors

Ben Liblit
View author publications
You can also search for this author in PubMed Google Scholar
Linghui Luo
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro Molina
View author publications
You can also search for this author in PubMed Google Scholar
Rajdeep Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Zachary Patterson
View author publications
You can also search for this author in PubMed Google Scholar
Goran Piskachev
View author publications
You can also search for this author in PubMed Google Scholar
Martin Schäf
View author publications
You can also search for this author in PubMed Google Scholar
Omer Tripp
View author publications
You can also search for this author in PubMed Google Scholar
Willem Visser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Linghui Luo .

Editor information

Editors and Affiliations

University of Toronto, Toronto, ON, Canada
Marsha Chechik
RWTH Aachen University, Aachen, Germany
Joost-Pieter Katoen
University of Lübeck, Lübeck, Germany
Martin Leucker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liblit, B. et al. (2023). Shifting Left for Early Detection of Machine-Learning Bugs. In: Chechik, M., Katoen, JP., Leucker, M. (eds) Formal Methods. FM 2023. Lecture Notes in Computer Science, vol 14000. Springer, Cham. https://doi.org/10.1007/978-3-031-27481-7_33

Download citation

DOI: https://doi.org/10.1007/978-3-031-27481-7_33
Published: 03 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27480-0
Online ISBN: 978-3-031-27481-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Shifting Left for Early Detection of Machine-Learning Bugs

Abstract

Access this chapter

Similar content being viewed by others

Can static analysis tools find more defects?

Supporting Code Review by Automatic Detection of Potentially Buggy Changes

Static Testing

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Shifting Left for Early Detection of Machine-Learning Bugs

Abstract

Access this chapter

Similar content being viewed by others

Can static analysis tools find more defects?

Supporting Code Review by Automatic Detection of Potentially Buggy Changes

Static Testing

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation