Skip to main content

A Study of Learning Data Structure Invariants Using Off-the-shelf Tools

  • Conference paper
  • First Online:
Model Checking Software (SPIN 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11636))

Included in the following conference series:

Abstract

Data structure invariants play a key role in checking correctness of code, e.g., a model checker can use an invariant, e.g., acyclicity of a binary tree, that is written in the form of an assertion to search for program executions that violate it, e.g., erroneously introduce a cycle in the structure. Traditionally, the properties are written manually by the users. However, writing them manually can itself be error-prone, which can lead to false alarms or missed bugs. This paper presents a controlled experiment on applying a suite of off-the-shelf machine learning (ML) tools to learn properties of dynamically allocated data structures that reside on the program heap. Specifically, we use 10 data structure subjects, and systematically create training and test data for 6 ML methods, which include decision trees, support vector machines, and neural networks, for binary classification, e.g., to classify input structures as valid binary search trees. The study reveals two key findings. One, most of the ML methods studied – with off-the-shelf parameter settings and without fine tuning – achieve at least 90% accuracy on all of the subjects. Two, high accuracy is achieved even when the size of the training data is significantly smaller than the size of the test data. We believe future work can utilize the learnt invariants to automate dynamic and static analyses, thereby enabling advances in machine learning to further enhance software testing and verification techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Korat GitHub repository. https://github.com/korattest/korat

  2. Scikit-Learn Library. https://scikit-learn.org/stable/. Accessed 18 Aug 2019

  3. Bodik, R.: Program synthesis: opportunities for the next decade. In: 20th ACM SIGPLAN International Conference on Functional Programming, p. 1 (2015)

    Google Scholar 

  4. Boyapati, C., Khurshid, S., Marinov, D.: Korat: automated testing based on Java predicates. In: ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 123–133 (2002)

    Article  Google Scholar 

  5. Briand, L.C., Labiche, Y., Liu, X.: Using machine learning to support debugging with tarantula. In: 18th IEEE International Symposium on Software Reliability, pp. 137–146 (2007)

    Google Scholar 

  6. Brouwer, A.E., Haemers, W.H.: Spectra of Graphs. Springer, New York (2012). https://doi.org/10.1007/978-1-4614-1939-6

    Book  MATH  Google Scholar 

  7. Chen, Y.-F., Hong, C.-D., Lin, A.W., Rümmer, P.: Learning to prove safety over parameterised concurrent systems. In: Formal Methods in Computer Aided Design (FMCAD), pp. 76–83 (2017)

    Google Scholar 

  8. Clarke, E.M., Kroening, D., Yorav, K.: Behavioral consistency of C and verilog programs using bounded model checking. In: 40th Design Automation Conference, (DAC), pp. 368–371 (2003)

    Google Scholar 

  9. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  10. Csallner, C., Tillmann, N., Smaragdakis, Y.: DySy: dynamic symbolic execution for invariant inference. In: 30th International Conference on Software Engineering, pp. 281–290 (2008)

    Google Scholar 

  11. Demsky, B., Rinard, M.C.: Automatic detection and repair of errors in data structures. In: Proceedings of the 2003 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications, OOPSLA, pp. 78–95 (2003)

    Google Scholar 

  12. Dillig, I., Dillig, T., Li, B., McMillan, K.: Inductive invariant generation via abductive inference. In: ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, pp. 443–456 (2013)

    Google Scholar 

  13. Elkarablieh, B., Garcia, I., Suen, Y.L., Khurshid, S.: Assertion-based repair of complex data structures. In: IEEE/ACM International Conference on Automated Software Engineering, pp. 64–73 (2007)

    Google Scholar 

  14. Ernst, M.D., Czeisler, A., Griswold, W.G., Notkin, D.: Quickly detecting relevant program invariants. In: International Conference on Software Engineering, pp. 449–458 (2000)

    Google Scholar 

  15. Ernst, M.D., et al.: The daikon system for dynamic detection of likely invariants. Sci. Comput. Program. 69(1–3), 35–45 (2007)

    Article  MathSciNet  Google Scholar 

  16. Molina, F., Degiovanni, R., Ponzio, P., Regis, G., Aguirre, N., Frias, M.F.: Training binary classifiers as data structure invariants. In: International Conference on Software Engineering (ICSE), May 2019

    Google Scholar 

  17. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)

    Article  MathSciNet  Google Scholar 

  18. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)

    Article  MathSciNet  Google Scholar 

  19. Garg, P., Neider, D., Madhusudan, P., Roth, D.: Learning invariants using decision trees and implication counterexamples. In: 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 499–512 (2016)

    Google Scholar 

  20. Godefroid, P.: Model checking for programming languages using VeriSoft. In: 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 174–186 (1997)

    Google Scholar 

  21. Gulwani, S., Dimensions in program synthesis. In: 12th International ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming, pp. 13–24 (2010)

    Google Scholar 

  22. Ho, T.K.: Random decision forests. In: Third International Conference on Document Analysis and Recognition, vol. 1 (1995)

    Google Scholar 

  23. Holzmann, G.: The SPIN Model Checker: Primer and Reference Manual, 1st edn. Addison-Wesley Professional, Boston (2011)

    Google Scholar 

  24. Jackson, D., Vaziri, M.: Finding bugs with a constraint solver. In: International Symposium on Software Testing and Analysis (ISSTA), pp. 14–25 (2000)

    Google Scholar 

  25. Jha, S., Gulwani, S., Seshia, S.A., Tiwari, A.: Oracle-guided component-based program synthesis. In: 32nd ACM/IEEE International Conference on Software Engineering, vol. 1, pp. 215–224 (2010)

    Google Scholar 

  26. Jump, M., McKinley, K.S.: Dynamic shape analysis via degree metrics. In: 8th International Symposium on Memory Management (ISMM), pp. 119–128 (2009)

    Google Scholar 

  27. Korel, B.: Automated software test data generation. IEEE Trans. Softw. Eng. 16(8), 870–879 (1990)

    Article  Google Scholar 

  28. Liskov, B., Guttag, J.V.: Program Development in Java - Abstraction, Specification, and Object-Oriented Design. Addison-Wesley, Boston (2001)

    MATH  Google Scholar 

  29. Malik, M., Pervaiz, A., Uzuncaova, E., Khurshid, S.: Deryaft: a tool for generating representation invariants of structurally complex data. In: ACM/IEEE 30th International Conference on Software Engineering (2008)

    Google Scholar 

  30. Malik, M.Z.: Dynamic shape analysis of program heap using graph spectra: NIER track. In: 33rd International Conference on Software Engineering (ICSE), pp. 952–955 (2011)

    Google Scholar 

  31. Manna, Z., Waldinger, R.: A deductive approach to program synthesis. ACM Trans. Program. Lang. Syst. 2(1), 90–121 (1980)

    Article  Google Scholar 

  32. McMillan, K.L.: Quantified invariant generation using an interpolating saturation prover. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 413–427. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_31

    Chapter  Google Scholar 

  33. Mera, E., Lopez-García, P., Hermenegildo, M.: Integrating software testing and run-time checking in an assertion verification framework. In: Hill, P.M., Warren, D.S. (eds.) ICLP 2009. LNCS, vol. 5649, pp. 281–295. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02846-5_25

    Chapter  Google Scholar 

  34. Meyer, B.: Class invariants: concepts, problems, solutions. CoRR, abs/1608.07637 (2016)

    Google Scholar 

  35. Møller, A., Schwartzbach, M.I.: The pointer assertion logic engine. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 221–231 (2001)

    Google Scholar 

  36. Murtagh, F.: Multilayer perceptrons for classification and regression. Neurocomputing 2(5), 183–197 (1991)

    Article  MathSciNet  Google Scholar 

  37. Pacheco, C., Lahiri, S.K., Ernst, M.D., Ball, T.: Feedback-directed random test generation. In: 29th International Conference on Software Engineering, pp. 75–84 (2007)

    Google Scholar 

  38. Provost, F.: Machine learning from imbalanced data sets 101. In: Proceedings of the AAAI 2000 Workshop on Imbalanced Data Sets, vol. 68, pp. 1–3. AAAI Press (2000)

    Google Scholar 

  39. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  40. Reynolds, J.C.: Separation logic: a logic for shared mutable data structures. In: 17th Annual IEEE Symposium on Logic in Computer Science (2002)

    Google Scholar 

  41. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)

    Article  MathSciNet  Google Scholar 

  42. Sagiv, S., Reps, T.W., Wilhelm, R.: Parametric shape analysis via 3-valued logic. In: 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 105–118 (1999)

    Google Scholar 

  43. Sankaranarayanan, S., Sipma, H.B., Manna, Z.: Non-linear loop invariant generation using gröbner bases. In: 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 318–329 (2004)

    Google Scholar 

  44. Si, X., Dai, H., Raghothaman, M., Naik, M., Song, L.: Learning loop invariants for program verification. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 7751–7762 (2018)

    Google Scholar 

  45. Singh, S., Zhang, M., Khurshid, S.: Learning guided enumerative synthesis for superoptimization (2019, under submission)

    Google Scholar 

  46. Solar-Lezama, A.: Program synthesis by sketching. Ph.D. thesis (2008)

    Google Scholar 

  47. Visser, W., Havelund, K., Brat, G.P., Park, S.: Model checking programs. In: Fifteenth IEEE International Conference on Automated Software Engineering (ASE), pp. 3–12 (2000)

    Google Scholar 

  48. Zee, K., Kuncak, V., Rinard, M.C.: Full functional verification of linked data structures. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 349–361 (2008)

    Google Scholar 

Download references

Acknowledgments

This research was partially supported by the US National Science Foundation under Grant Nos. CCF-1704790 and CCF-1718903.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Muhammad Usman , Wenxi Wang , Kaiyuan Wang , Cagdas Yelen , Nima Dini or Sarfraz Khurshid .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Usman, M., Wang, W., Wang, K., Yelen, C., Dini, N., Khurshid, S. (2019). A Study of Learning Data Structure Invariants Using Off-the-shelf Tools. In: Biondi, F., Given-Wilson, T., Legay, A. (eds) Model Checking Software. SPIN 2019. Lecture Notes in Computer Science(), vol 11636. Springer, Cham. https://doi.org/10.1007/978-3-030-30923-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30923-7_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30922-0

  • Online ISBN: 978-3-030-30923-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics