A Study of Learning Data Structure Invariants Using Off-the-shelf Tools

Usman, Muhammad; Wang, Wenxi; Wang, Kaiyuan; Yelen, Cagdas; Dini, Nima; Khurshid, Sarfraz

doi:10.1007/978-3-030-30923-7_13

Muhammad Usman¹¹,
Wenxi Wang¹¹,
Kaiyuan Wang¹¹,
Cagdas Yelen¹¹,
Nima Dini¹¹ &
…
Sarfraz Khurshid¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11636))

Included in the following conference series:

International Symposium on Model Checking Software

614 Accesses
6 Citations

Abstract

Data structure invariants play a key role in checking correctness of code, e.g., a model checker can use an invariant, e.g., acyclicity of a binary tree, that is written in the form of an assertion to search for program executions that violate it, e.g., erroneously introduce a cycle in the structure. Traditionally, the properties are written manually by the users. However, writing them manually can itself be error-prone, which can lead to false alarms or missed bugs. This paper presents a controlled experiment on applying a suite of off-the-shelf machine learning (ML) tools to learn properties of dynamically allocated data structures that reside on the program heap. Specifically, we use 10 data structure subjects, and systematically create training and test data for 6 ML methods, which include decision trees, support vector machines, and neural networks, for binary classification, e.g., to classify input structures as valid binary search trees. The study reveals two key findings. One, most of the ML methods studied – with off-the-shelf parameter settings and without fine tuning – achieve at least 90% accuracy on all of the subjects. Two, high accuracy is achieved even when the size of the training data is significantly smaller than the size of the test data. We believe future work can utilize the learnt invariants to automate dynamic and static analyses, thereby enabling advances in machine learning to further enhance software testing and verification techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Korat GitHub repository. https://github.com/korattest/korat
Scikit-Learn Library. https://scikit-learn.org/stable/. Accessed 18 Aug 2019
Bodik, R.: Program synthesis: opportunities for the next decade. In: 20th ACM SIGPLAN International Conference on Functional Programming, p. 1 (2015)
Google Scholar
Boyapati, C., Khurshid, S., Marinov, D.: Korat: automated testing based on Java predicates. In: ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 123–133 (2002)
Article Google Scholar
Briand, L.C., Labiche, Y., Liu, X.: Using machine learning to support debugging with tarantula. In: 18th IEEE International Symposium on Software Reliability, pp. 137–146 (2007)
Google Scholar
Brouwer, A.E., Haemers, W.H.: Spectra of Graphs. Springer, New York (2012). https://doi.org/10.1007/978-1-4614-1939-6
Book MATH Google Scholar
Chen, Y.-F., Hong, C.-D., Lin, A.W., Rümmer, P.: Learning to prove safety over parameterised concurrent systems. In: Formal Methods in Computer Aided Design (FMCAD), pp. 76–83 (2017)
Google Scholar
Clarke, E.M., Kroening, D., Yorav, K.: Behavioral consistency of C and verilog programs using bounded model checking. In: 40th Design Automation Conference, (DAC), pp. 368–371 (2003)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Csallner, C., Tillmann, N., Smaragdakis, Y.: DySy: dynamic symbolic execution for invariant inference. In: 30th International Conference on Software Engineering, pp. 281–290 (2008)
Google Scholar
Demsky, B., Rinard, M.C.: Automatic detection and repair of errors in data structures. In: Proceedings of the 2003 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications, OOPSLA, pp. 78–95 (2003)
Google Scholar
Dillig, I., Dillig, T., Li, B., McMillan, K.: Inductive invariant generation via abductive inference. In: ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, pp. 443–456 (2013)
Google Scholar
Elkarablieh, B., Garcia, I., Suen, Y.L., Khurshid, S.: Assertion-based repair of complex data structures. In: IEEE/ACM International Conference on Automated Software Engineering, pp. 64–73 (2007)
Google Scholar
Ernst, M.D., Czeisler, A., Griswold, W.G., Notkin, D.: Quickly detecting relevant program invariants. In: International Conference on Software Engineering, pp. 449–458 (2000)
Google Scholar
Ernst, M.D., et al.: The daikon system for dynamic detection of likely invariants. Sci. Comput. Program. 69(1–3), 35–45 (2007)
Article MathSciNet Google Scholar
Molina, F., Degiovanni, R., Ponzio, P., Regis, G., Aguirre, N., Frias, M.F.: Training binary classifiers as data structure invariants. In: International Conference on Software Engineering (ICSE), May 2019
Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Article MathSciNet Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)
Article MathSciNet Google Scholar
Garg, P., Neider, D., Madhusudan, P., Roth, D.: Learning invariants using decision trees and implication counterexamples. In: 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 499–512 (2016)
Google Scholar
Godefroid, P.: Model checking for programming languages using VeriSoft. In: 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 174–186 (1997)
Google Scholar
Gulwani, S., Dimensions in program synthesis. In: 12th International ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming, pp. 13–24 (2010)
Google Scholar
Ho, T.K.: Random decision forests. In: Third International Conference on Document Analysis and Recognition, vol. 1 (1995)
Google Scholar
Holzmann, G.: The SPIN Model Checker: Primer and Reference Manual, 1st edn. Addison-Wesley Professional, Boston (2011)
Google Scholar
Jackson, D., Vaziri, M.: Finding bugs with a constraint solver. In: International Symposium on Software Testing and Analysis (ISSTA), pp. 14–25 (2000)
Google Scholar
Jha, S., Gulwani, S., Seshia, S.A., Tiwari, A.: Oracle-guided component-based program synthesis. In: 32nd ACM/IEEE International Conference on Software Engineering, vol. 1, pp. 215–224 (2010)
Google Scholar
Jump, M., McKinley, K.S.: Dynamic shape analysis via degree metrics. In: 8th International Symposium on Memory Management (ISMM), pp. 119–128 (2009)
Google Scholar
Korel, B.: Automated software test data generation. IEEE Trans. Softw. Eng. 16(8), 870–879 (1990)
Article Google Scholar
Liskov, B., Guttag, J.V.: Program Development in Java - Abstraction, Specification, and Object-Oriented Design. Addison-Wesley, Boston (2001)
MATH Google Scholar
Malik, M., Pervaiz, A., Uzuncaova, E., Khurshid, S.: Deryaft: a tool for generating representation invariants of structurally complex data. In: ACM/IEEE 30th International Conference on Software Engineering (2008)
Google Scholar
Malik, M.Z.: Dynamic shape analysis of program heap using graph spectra: NIER track. In: 33rd International Conference on Software Engineering (ICSE), pp. 952–955 (2011)
Google Scholar
Manna, Z., Waldinger, R.: A deductive approach to program synthesis. ACM Trans. Program. Lang. Syst. 2(1), 90–121 (1980)
Article Google Scholar
McMillan, K.L.: Quantified invariant generation using an interpolating saturation prover. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 413–427. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_31
Chapter Google Scholar
Mera, E., Lopez-García, P., Hermenegildo, M.: Integrating software testing and run-time checking in an assertion verification framework. In: Hill, P.M., Warren, D.S. (eds.) ICLP 2009. LNCS, vol. 5649, pp. 281–295. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02846-5_25
Chapter Google Scholar
Meyer, B.: Class invariants: concepts, problems, solutions. CoRR, abs/1608.07637 (2016)
Google Scholar
Møller, A., Schwartzbach, M.I.: The pointer assertion logic engine. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 221–231 (2001)
Google Scholar
Murtagh, F.: Multilayer perceptrons for classification and regression. Neurocomputing 2(5), 183–197 (1991)
Article MathSciNet Google Scholar
Pacheco, C., Lahiri, S.K., Ernst, M.D., Ball, T.: Feedback-directed random test generation. In: 29th International Conference on Software Engineering, pp. 75–84 (2007)
Google Scholar
Provost, F.: Machine learning from imbalanced data sets 101. In: Proceedings of the AAAI 2000 Workshop on Imbalanced Data Sets, vol. 68, pp. 1–3. AAAI Press (2000)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Google Scholar
Reynolds, J.C.: Separation logic: a logic for shared mutable data structures. In: 17th Annual IEEE Symposium on Logic in Computer Science (2002)
Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Article MathSciNet Google Scholar
Sagiv, S., Reps, T.W., Wilhelm, R.: Parametric shape analysis via 3-valued logic. In: 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 105–118 (1999)
Google Scholar
Sankaranarayanan, S., Sipma, H.B., Manna, Z.: Non-linear loop invariant generation using gröbner bases. In: 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 318–329 (2004)
Google Scholar
Si, X., Dai, H., Raghothaman, M., Naik, M., Song, L.: Learning loop invariants for program verification. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 7751–7762 (2018)
Google Scholar
Singh, S., Zhang, M., Khurshid, S.: Learning guided enumerative synthesis for superoptimization (2019, under submission)
Google Scholar
Solar-Lezama, A.: Program synthesis by sketching. Ph.D. thesis (2008)
Google Scholar
Visser, W., Havelund, K., Brat, G.P., Park, S.: Model checking programs. In: Fifteenth IEEE International Conference on Automated Software Engineering (ASE), pp. 3–12 (2000)
Google Scholar
Zee, K., Kuncak, V., Rinard, M.C.: Full functional verification of linked data structures. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 349–361 (2008)
Google Scholar

Download references

Acknowledgments

This research was partially supported by the US National Science Foundation under Grant Nos. CCF-1704790 and CCF-1718903.

Author information

Authors and Affiliations

University of Texas at Austin, Austin, TX, 78712, USA
Muhammad Usman, Wenxi Wang, Kaiyuan Wang, Cagdas Yelen, Nima Dini & Sarfraz Khurshid

Authors

Muhammad Usman
View author publications
You can also search for this author in PubMed Google Scholar
Wenxi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kaiyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Cagdas Yelen
View author publications
You can also search for this author in PubMed Google Scholar
Nima Dini
View author publications
You can also search for this author in PubMed Google Scholar
Sarfraz Khurshid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Muhammad Usman , Wenxi Wang , Kaiyuan Wang , Cagdas Yelen , Nima Dini or Sarfraz Khurshid .

Editor information

Editors and Affiliations

Avast Software, Prague, Czech Republic
Fabrizio Biondi
Université Catholique de Louvain, Louvain-la-Neuve, Belgium
Thomas Given-Wilson
Université Catholique de Louvain, Louvain-la-Neuve, Belgium
Axel Legay

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Usman, M., Wang, W., Wang, K., Yelen, C., Dini, N., Khurshid, S. (2019). A Study of Learning Data Structure Invariants Using Off-the-shelf Tools. In: Biondi, F., Given-Wilson, T., Legay, A. (eds) Model Checking Software. SPIN 2019. Lecture Notes in Computer Science(), vol 11636. Springer, Cham. https://doi.org/10.1007/978-3-030-30923-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-30923-7_13
Published: 02 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30922-0
Online ISBN: 978-3-030-30923-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics