Refinement Type Contracts for Verification of Scientific Investigative Software

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12031)


Our scientific knowledge is increasingly built on software output. User code which defines data analysis pipelines and computational models is essential for research in the natural and social sciences, but little is known about how to ensure its correctness. The structure of this code and the development process used to build it limit the utility of traditional testing methodology. Formal methods for software verification have seen great success in ensuring code correctness but generally require more specialized training, development time, and funding than is available in the natural and social sciences. Here, we present a Python library which uses lightweight formal methods to provide correctness guarantees without the need for specialized knowledge or substantial time investment. Our package provides runtime verification of function entry and exit condition contracts using refinement types. It allows checking hyperproperties within contracts and offers automated test case generation to supplement online checking. We co-developed our tool with a medium-sized (\(\approx \)3000 LOC) software package which simulates decision-making in cognitive neuroscience. In addition to helping us locate trivial bugs earlier on in the development cycle, our tool was able to locate four bugs which may have been difficult to find using traditional testing methods. It was also able to find bugs in user code which did not contain contracts or refinement type annotations. This demonstrates how formal methods can be used to verify the correctness of scientific software which is difficult to test with mainstream approaches.


Formal methods Scientific software Contracts Refinement types Python Runtime verification 



Thank you to Ruzica Piskac and Anastasia Ershova for a critical review of the manuscript; Clarence Lehman, Daeyeol Lee, and John Murray for helpful discussions; Michael Scudder for PyDDM code reviews; and Norman Lam for PyDDM development and code reviews. Funding was provided by the Gruber Foundation.


  1. 1.
    Arai, R., Sato, S., Iwasaki, H.: A debugger-cooperative higher-order contract system in python. In: Igarashi, A. (ed.) APLAS 2016. LNCS, vol. 10017, pp. 148–168. Springer, Cham (2016). Scholar
  2. 2.
    Barnett, M., Schulte, W.: Runtime verification of .NET contracts. J. Syst. Softw. 65(3), 199–208 (2003)CrossRefGoogle Scholar
  3. 3.
    Boldo, S., Filliatre, J.C.: Formal verification of floating-point programs. In: 18th IEEE Symposium on Computer Arithmetic (ARITH 2007). IEEE, June 2007Google Scholar
  4. 4.
    Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10(3), 186–198 (2009)CrossRefGoogle Scholar
  5. 5.
    Carver, J.C., Kendall, R.P., Squires, S.E., Post, D.E.: Software development environments for scientific and engineering software: a series of case studies. In: 29th International Conference on Software Engineering (ICSE 2007). IEEE, May 2007Google Scholar
  6. 6.
    Chen, T., Ho, J.W., Liu, H., Xie, X.: An innovative approach for testing bioinformatics programs using metamorphic testing. BMC Bioinform. 10(1), 24 (2009)CrossRefGoogle Scholar
  7. 7.
    Chen, T.Y., Cheung, S.C., Yiu, S.M.: Metamorphic testing: a new approach for generating next test cases. Technical report HKUST-CS98-01, The Hong Kong University of Science and Technology (1998)Google Scholar
  8. 8.
    Chilana, P.K., Palmer, C.L., Ko, A.J.: Comparing bioinformatics software development by computer scientists and biologists: an exploratory study. In: 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering. IEEE, May 2009Google Scholar
  9. 9.
    Clune, T.L., Rood, R.B.: Software testing and verification in climate model development. IEEE Softw. 28(6), 49–55 (2011)CrossRefGoogle Scholar
  10. 10.
    Dimopoulos, S., Krintz, C., Wolski, R., Gupta, A.: SuperContra: cross-language, cross-runtime contracts as a service. In: 2015 IEEE International Conference on Cloud Engineering. IEEE, March 2015Google Scholar
  11. 11.
    Dimoulas, C., Findler, R.B., Felleisen, M.: Option contracts. In: Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications - OOPSLA 2013. ACM Press (2013)Google Scholar
  12. 12.
    Duran, J.W., Ntafos, S.C.: An evaluation of random testing. IEEE Trans. Softw. Eng. SE-10(4), 438–444 (1984)CrossRefGoogle Scholar
  13. 13.
    Eilers, M., Müller, P.: Nagini: a static verifier for Python. In: Chockler, H., Weissenbacher, G. (eds.) CAV 2018. LNCS, vol. 10981, pp. 596–603. Springer, Cham (2018). Scholar
  14. 14.
    Freeman, T.: Refinement types for ML. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, USA (1994)Google Scholar
  15. 15.
    Gewaltig, M.O., Cannon, R.: Current practice in software development for computational neuroscience and how to improve it. PLoS Comput. Biol. 10(1), e1003376 (2014)CrossRefGoogle Scholar
  16. 16.
    Giannoulatou, E., Park, S.H., Humphreys, D.T., Ho, J.W.: Verification and validation of bioinformatics software without a gold standard: a case study of BWA and bowtie. BMC Bioinform. 15(Suppl 16), S15 (2014)CrossRefGoogle Scholar
  17. 17.
    Goubault, E., Putot, S.: Static analysis of finite precision computations. In: Jhala, R., Schmidt, D. (eds.) VMCAI 2011. LNCS, vol. 6538, pp. 232–247. Springer, Heidelberg (2011). Scholar
  18. 18.
    Gunnels, J.A., van de Geijn, R.A.: Formal methods for high-performance linear algebra libraries. In: Boisvert, R.F., Tang, P.T.P. (eds.) The Architecture of Scientific Software. ITIFIP, vol. 60, pp. 193–210. Springer, Boston, MA (2001). Scholar
  19. 19.
    Hannay, J.E., MacLeod, C., Singer, J., Langtangen, H.P., Pfahl, D., Wilson, G.: How do scientists develop and use scientific software? In: 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering. IEEE, May 2009Google Scholar
  20. 20.
    Hatcliff, J., Leavens, G.T., Leino, K.R.M., Müller, P., Parkinson, M.: Behavioral interface specification languages. ACM Comput. Surv. 44(3), 1–58 (2012)zbMATHCrossRefGoogle Scholar
  21. 21.
    Hatton, L., Roberts, A.: How accurate is scientific software? IEEE Trans. Softw. Eng. 20(10), 785–797 (1994)CrossRefGoogle Scholar
  22. 22.
    Heaton, D., Carver, J.C.: Claims about the use of software engineering practices in science: a systematic literature review. Inf. Softw. Technol. 67, 207–219 (2015)CrossRefGoogle Scholar
  23. 23.
    Herndon, T., Ash, M., Pollin, R.: Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. Camb. J. Econ. 38(2), 257–279 (2013)CrossRefGoogle Scholar
  24. 24.
    Hochstein, L., Basili, V.: The ASC-alliance projects: a case study of large-scale parallel scientific code development. Computer 41(3), 50–58 (2008)CrossRefGoogle Scholar
  25. 25.
    Hook, D., Kelly, D.: Testing for trustworthiness in scientific software. In: 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering. IEEE, May 2009Google Scholar
  26. 26.
    Johanson, A., Hasselbring, W.: Software engineering for computational science: past, present, future. Comput. Sci. Eng. 20, 90–109 (2018)CrossRefGoogle Scholar
  27. 27.
    Joppa, L.N., et al.: Troubling trends in scientific software use. Science 340(6134), 814–815 (2013)CrossRefGoogle Scholar
  28. 28.
    Kamali, A.H., Giannoulatou, E., Chen, T.Y., Charleston, M.A., McEwan, A.L., Ho, J.W.K.: How to test bioinformatics software? Biophys. Rev. 7(3), 343–352 (2015)CrossRefGoogle Scholar
  29. 29.
    Kanewala, U., Bieman, J.M.: Techniques for testing scientific programs without an oracle. In: 2013 5th International Workshop on Software Engineering for Computational Science and Engineering (SE-CSE). IEEE, May 2013Google Scholar
  30. 30.
    Kanewala, U., Bieman, J.M.: Testing scientific software: a systematic literature review. Inf. Softw. Technol. 56(10), 1219–1232 (2014)CrossRefGoogle Scholar
  31. 31.
    Kelly, D.: Scientific software development viewed as knowledge acquisition: towards understanding the development of risk-averse scientific software. J. Syst. Softw. 109, 50–61 (2015)CrossRefGoogle Scholar
  32. 32.
    Lundgren, A., Kanewala, U.: Experiences of testing bioinformatics programs for detecting subtle faults. In: Proceedings of the International Workshop on Software Engineering for Science - SE4Science 2016. ACM Press (2016)Google Scholar
  33. 33.
    Miller, G.: A scientist’s nightmare: software problem leads to five retractions. Science 314(5807), 1856–1857 (2006)CrossRefGoogle Scholar
  34. 34.
    Paine, D., Lee, C.P.: Who has plots? Proc. ACM Hum.-Comput. Interact. 1(CSCW), 1–21 (2017)CrossRefGoogle Scholar
  35. 35.
    Park, I.M., Meister, M.L.R., Huk, A.C., Pillow, J.W.: Encoding and decoding in parietal cortex during sensorimotor decision-making. Nat. Neurosci. 17(10), 1395–1403 (2014)CrossRefGoogle Scholar
  36. 36.
    Patel, K., Hierons, R.M.: A mapping study on testing non-testable systems. Softw. Qual. J. 26(4), 1373–1413 (2017)CrossRefGoogle Scholar
  37. 37.
    Plosch, R.: Design by contract for Python. In: Proceedings of Joint 4th International Computer Science Conference and 4th Asia Pacific Software Engineering Conference. IEEE Computer Society (1997)Google Scholar
  38. 38.
    Ratcliff, R., McKoon, G.: The diffusion decision model: theory and data for two-choice decision tasks. Neural Comput. 20(4), 873–922 (2008)zbMATHCrossRefGoogle Scholar
  39. 39.
    Saltelli, A., Funtowicz, S.: When all models are wrong. Issues Sci. Technol. 30(2), 79–85 (2014)Google Scholar
  40. 40.
    Sanders, R., Kelly, D.: Dealing with risk in scientific software development. IEEE Softw. 25(4), 21–28 (2008)CrossRefGoogle Scholar
  41. 41.
    Sarma, G.P., Jacobs, T.W., Watts, M.D., Ghayoomie, S.V., Larson, S.D., Gerkin, R.C.: Unit testing, model validation, and biological simulation. F1000Research 5, 1946 (2016)CrossRefGoogle Scholar
  42. 42.
    Shinn, M., Romero-Garcia, R., Seidlitz, J., Váša, F., Vértes, P.E., Bullmore, E.: Versatility of nodal affiliation to communities. Sci. Rep. 7(1), 1–10 (2017)CrossRefGoogle Scholar
  43. 43.
    Siek, J., Taha, W.: Gradual typing for objects. In: Ernst, E. (ed.) ECOOP 2007. LNCS, vol. 4609, pp. 2–27. Springer, Heidelberg (2007). Scholar
  44. 44.
    Soergel, D.A.W.: Rampant software errors may undermine scientific results. F1000Research 3, 303 (2015)CrossRefGoogle Scholar
  45. 45.
    Takikawa, A., Feltey, D., Greenman, B., New, M.S., Vitek, J., Felleisen, M.: Is sound gradual typing dead? In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages - POPL 2016. ACM Press (2016)Google Scholar
  46. 46.
    Vazou, N., Seidel, E.L., Jhala, R., Vytiniotis, D., Peyton-Jones, S.: Refinement types for Haskell. In: Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming - ICFP 2014. ACM Press (2014)Google Scholar
  47. 47.
    Vitousek, M.M., Kent, A.M., Siek, J.G., Baker, J.: Design and evaluation of gradual typing for Python. In: Proceedings of the 10th ACM Symposium on Dynamic languages - DLS 2014. ACM Press (2014)Google Scholar
  48. 48.
    Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)MathSciNetzbMATHCrossRefGoogle Scholar
  49. 49.
    Weyuker, E.J.: On testing non-testable programs. Comput. J. 25(4), 465–470 (1982)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Yale UniversityNew HavenUSA

Personalised recommendations