TSTL: the template scripting testing language

  • Josie Holmes
  • Alex Groce
  • Jervis Pinto
  • Pranjal Mittal
  • Pooria Azimi
  • Kevin Kellar
  • James O’Brien
Regular Paper


A test harness, in automated test generation, defines the set of valid tests for a system, as well as their correctness properties. The difficulty of writing test harnesses is a major obstacle to the adoption of automated test generation and model checking. Languages for writing test harnesses are usually tied to a particular tool and unfamiliar to programmers, and often limit expressiveness. Writing test harnesses directly in the language of the software under test (SUT) is a tedious, repetitive, and error-prone task, offers little or no support for test case manipulation and debugging, and produces hard-to-read, hard-to-maintain code. Using existing harness languages or writing directly in the language of the SUT also tends to limit users to one algorithm for test generation, with little ability to explore alternative methods. In this paper, we present TSTL, the template scripting testing language, a domain-specific language (DSL) for writing test harnesses. TSTL compiles harness definitions into an interface for testing, making generic test generation and manipulation tools for all SUTs possible. TSTL includes tools for generating, manipulating, and analyzing test cases, including simple model checkers. This paper motivates TSTL via a large-scale testing effort, directed by an end-user, to find faults in the most widely used geographic information systems tool. This paper emphasizes a new approach to automated testing, where, rather than focus on developing a monolithic tool to extend, the aim is to convert a test harness into a language extension. This approach makes testing not a separate activity to be performed using a tool, but as natural to users of the language of the system under test as is the use of domain-specific libraries such as ArcPy, NumPy, or QIIME, in their domains. TSTL is a language and tool infrastructure, but is also a way to bring testing activities under the control of an existing programming language in a simple, natural way.


Software testing Domain-specific languages Explicit-state model checking End-user testing Geographic information systems 



The authors would like to thank John Regehr, David R. MacIver, Klaus Havelund, our anonymous reviewers, and students in CS362, CS562, and CS569, for discussions related to this work. A portion of this work was funded by NSF Grants CCF-1054786 and CCF-1217824.


  1. 1.
    Groce, A., Erwig, M.: Finding common ground: choose, assert, and assume. In: Workshop on Dynamic Analysis, pp. 12–17 (2012)Google Scholar
  2. 2.
    Groce, A., Joshi, R.: Random testing and model checking: building a common framework for nondeterministic exploration. In: Workshop on Dynamic Analysis, pp. 22–28 (2008)Google Scholar
  3. 3.
    Gamma, E., Beck, K.: JUnit. Accessed 1 Dec 2016
  4. 4.
    Groce, A., Havelund, K., Holzmann, G., Joshi, R., Xu, R.G.: Establishing flight software reliability: testing, model checking, constraint-solving, monitoring and learning. Ann. Math. Artif. Intell. 70(4), 315–349 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Groce, A., Holzmann, G., Joshi, R.: Randomized differential testing as a prelude to formal verification. In: International Conference on Software Engineering, pp. 621–631 (2007)Google Scholar
  6. 6.
    Cadar, C., Dunbar, D., Engler, D.: KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: Operating System Design and Implementation, pp. 209–224 (2008)Google Scholar
  7. 7.
    JPF: the Swiss army knife of Java(TM) verification. Accessed 1 Dec 2016
  8. 8.
    Visser, W., Havelund, K., Brat, G., Park, S., Lerda, F.: Model checking programs. Autom. Softw. Eng. 10(2), 203–232 (2003)CrossRefGoogle Scholar
  9. 9.
    Kroening, D.: The CBMC homepage. Accessed 1 Dec 2016
  10. 10.
    Kroening, D., Clarke, E.M., Lerda, F.: A tool for checking ANSI-C programs. In: Tools and Algorithms for the Construction and Analysis of Systems, pp. 168–176 (2004)Google Scholar
  11. 11.
    Visser, W., Păsăreanu, C., Pelanek, R.: Test input generation for Java containers using state matching. In: International Symposium on Software Testing and Analysis, pp. 37–48 (2006)Google Scholar
  12. 12.
  13. 13.
    Groce, A., Fern, A., Pinto, J., Bauer, T., Alipour, A., Erwig, M., Lopez, C.: Lightweight automated testing with adaptation-based programming. In: IEEE International Symposium on Software Reliability Engineering, pp. 161–170 (2012)Google Scholar
  14. 14.
    Fraser, G., Arcuri, A.: EvoSuite: automatic test suite generation for object-oriented software. In: ACM SIGSOFT Symposium/European Conference on Foundations of Software Engineering, pp. 416–419 (2011)Google Scholar
  15. 15.
    Pacheco, C., Lahiri, S.K., Ernst, M.D., Ball, T.: Feedback-directed random test generation. In: International Conference on Software Engineering, pp. 75–84 (2007)Google Scholar
  16. 16.
    Groce, A., Pinto, J., Azimi, P., Mittal, P., Holmes, J., Kellar, K.: TSTL: the template scripting testing language. Accessed 1 Dec 2016
  17. 17.
    Groce, A., Pinto, J.: A little language for testing. In: NASA Formal Methods Symposium, pp. 204–218 (2015)Google Scholar
  18. 18.
    Groce, A., Pinto, J., Azimi, P., Mittal, P.: TSTL: a language and tool for testing (demo). In: ACM International Symposium on Software Testing and Analysis, pp. 414–417 (2015)Google Scholar
  19. 19.
    NumPy. Accessed 1 Dec 2016
  20. 20.
    SciPy. Accessed 1 Dec 2016
  21. 21.
    Caporaso, J.G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F.D., Costello, E.K., Fierer, N., Pena, A.G., Goodrich, J.K., Gordon, J.I., et al.: QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7(5), 335–336 (2010)CrossRefGoogle Scholar
  22. 22.
    Biopython. Accessed 1 Dec 2016
  23. 23.
    scikit-bio. Accessed 1 Dec 2016
  24. 24.
    Groce, A., Fern, A., Erwig, M., Pinto, J., Bauer, T., Alipour, A.: Learning-based test programming for programmers. In: International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, pp. 752–786 (2012)Google Scholar
  25. 25.
    Fowler, M.: Domain-Specific Languages. Addison-Wesley Professional, Boston (2010)Google Scholar
  26. 26.
    Bentley, J.: Programming pearls: little languages. Commun. ACM 29(8), 711–721 (1986)CrossRefGoogle Scholar
  27. 27.
    Gligoric, M., Gvero, T., Jagannath, V., Khurshid, S., Kuncak, V., Marinov, D.: Test generation through programming in UDITA. In: International Conference on Software Engineering, pp. 225–234 (2010)Google Scholar
  28. 28.
    Holzmann, G.J.: The SPIN Model Checker: Primer and Reference Manual. Addison-Wesley Professional, Reading (2003)Google Scholar
  29. 29.
    Holzmann, G., Joshi, R.: Model-driven software verification. In: SPIN Workshop on Model Checking of Software, pp. 76–91 (2004)Google Scholar
  30. 30.
    Holzmann, G., Joshi, R., Groce, A.: Model driven code checking. Autom. Softw. Eng. 15(3–4), 283–297 (2008)CrossRefzbMATHGoogle Scholar
  31. 31.
    Groce, A., Havelund, K., Smith, M.: From scripts to specifications: the evolution of a flight software testing effort. In: International Conference on Software Engineering, pp. 129–138 (2010)Google Scholar
  32. 32.
    Utting, M., Pretschner, A., Legeard, B.: A taxonomy of model-based testing approaches. Softw. Test. Verif. Reliab. 22(5), 297–312 (2012). doi: 10.1002/stvr.456 CrossRefGoogle Scholar
  33. 33.
    Kellar, K.: Tstl-java. Accessed 1 Dec 2016
  34. 34.
    Zeller, A., Hildebrandt, R.: Simplifying and isolating failure-inducing input. Softw. Eng. IEEE Trans. 28(2), 183–200 (2002)CrossRefGoogle Scholar
  35. 35.
    Csallner, C., Smaragdakis, Y.: JCrasher: an automatic robustness tester for Java. Softw. Pract. Exp. 34(11), 1025–1050 (2004)CrossRefGoogle Scholar
  36. 36.
    Claessen, K., Hughes, J.: QuickCheck: a lightweight tool for random testing of haskell programs. In: ICFP, pp. 268–279 (2000)Google Scholar
  37. 37.
    MacIver, D.R.: Hypothesis: Test faster, fix more. Accessed 1 Dec 2016
  38. 38.
    McKeeman, W.: Differential testing for software. Digit. Tech. J. Dig. Equip. Corp. 10(1), 100–107 (1998)Google Scholar
  39. 39.
    Burnett, M., Cook, C., Rothermel, G.: End-user software engineering. Commun. ACM 47(9), 53–58 (2004)CrossRefGoogle Scholar
  40. 40.
    Burnett, M.M., Myers, B.A.: Future of end-user software engineering: beyond the silos. In: Future of Software Engineering, pp. 201–211 (2014)Google Scholar
  41. 41.
    Rothermel, G., Burnett, M., Li, L., DuPois, C., Sheretov, A.: A methodology for testing spreadsheets. ACM Trans. Softw. Eng. Method. 10(1), 110–147 (2001)CrossRefGoogle Scholar
  42. 42.
    Groce, A., Kulesza, T., Zhang, C., Shamasunder, S., Burnett, M.M., Wong, W., Stumpf, S., Das, S., Shinsel, A., Bice, F., McIntosh, K.: You are the only possible oracle: effective test selection for end users of interactive machine learning systems. IEEE Trans. Softw. Eng. 40(3), 307–323 (2014)CrossRefGoogle Scholar
  43. 43.
    Groce, A., Holzmann, G., Joshi, R., Xu, R.G.: Putting flight software through the paces with testing, model checking, and constraint-solving. In: Workshop on Constraints in Formal Verification, pp. 1–15 (2008)Google Scholar
  44. 44.
    Andrews, J., Zhang, Y.R., Groce, A.: Comparing automated unit testing strategies. Technical report 736, Department of Computer Science, University of Western Ontario (2010)Google Scholar
  45. 45.
    Dijkstra, E.W.: A Discipline of Programming. Prentice-Hall, Englewood Cliffs (1976)zbMATHGoogle Scholar
  46. 46.
    Floyd, R.W.: Nondeterministic algorithms. J. ACM 14(4), 636–644 (1967). doi: 10.1145/321420.321422 CrossRefzbMATHGoogle Scholar
  47. 47.
    McCarthy, J.: A basis for a mathematical theory of computation, preliminary report. In: Papers Presented at the May 9-11, 1961, Western Joint IRE-AIEE-ACM Computer Conference, IRE-AIEE-ACM ’61 (Western), pp. 225–238. ACM, New York, NY, USA (1961). doi: 10.1145/1460690.1460715
  48. 48.
    Batchelder, N.: Accessed 1 Dec 2016
  49. 49.
    Groce, A., Zhang, C., Eide, E., Chen, Y., Regehr, J.: Swarm testing. In: International Symposium on Software Testing and Analysis, pp. 78–88 (2012)Google Scholar
  50. 50.
    Gligoric, M., Groce, A., Zhang, C., Sharma, R., Alipour, A., Marinov, D.: Comparing non-adequate test suites using coverage criteria. In: International Symposium on Software Testing and Analysis, pp. 302–313 (2013)Google Scholar
  51. 51.
    Le, V., Afshari, M., Su, Z.: Compiler validation via equivalence modulo inputs. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 216–226 (2014)Google Scholar
  52. 52.
    Hamlet, R.: Random testing. In: Encyclopedia of Software Engineering, pp. 970–978. Wiley (1994)Google Scholar
  53. 53.
    Clarke, E.M., Grumberg, O., Peled, D.: Model Checking. MIT Press, Cambridge (2000)Google Scholar
  54. 54.
    Edelkamp, S., Leue, S., Lluch-Lafuente, A.: Directed explicit-state model checking in the validation of communication protocols. Int. J. Softw. Tools Technol. Transf. 5(2), 247–267 (2004). doi: 10.1007/s10009-002-0104-3 CrossRefzbMATHGoogle Scholar
  55. 55.
    Groce, A., Visser, W.: Model checking Java programs using structural heuristics. In: International Symposium on Software Testing and Analysis, pp. 12–21 (2002)Google Scholar
  56. 56.
    Courcoubetis, C., Vardi, M.Y., Wolper, P., Yannakakis, M.: Memory efficient algorithms for the verification of temporal properties. In: Proceedings of the 2nd International Workshop on Computer Aided Verification, CAV ’90, pp. 233–242. Springer-Verlag, London, UK. Accessed 1 Dec 2016 (1991)
  57. 57.
    Groce, A., Alipour, M.A., Zhang, C., Chen, Y., Regehr, J.: Cause reduction for quick testing. In: 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation (ICST), pp. 243–252. IEEE (2014)Google Scholar
  58. 58.
    Groce, A., Alipour, M.A., Zhang, C., Chen, Y., Regehr, J.: Cause reduction: Delta-debugging, even without bugs. J. Softw. Test. Verif. Reliab. 26(1), 40–68 (2016)CrossRefGoogle Scholar
  59. 59.
    Rothermel, G., Untch, R., Chu, C., Harrold, M.J.: Test case prioritization. Trans. Softw. Eng. 27, 929–948 (2001)CrossRefGoogle Scholar
  60. 60.
    Rothermel, G., Untch, R.H., Chu, C., Harrold, M.J.: Test case prioritization: an empirical study. In: Proceedings of the IEEE International Conference on Software Maintenance, ICSM ’99, pp. 179–188. IEEE Computer Society, Washington, DC, USA (1999). Accessed 1 Dec 2016
  61. 61.
    Zhang, C., Groce, A., Alipour, M.A.: Using test case reduction and prioritization to improve symbolic execution. In: International Symposium on Software Testing and Analysis, pp. 160–170 (2014)Google Scholar
  62. 62.
    Foundation, F.S.: GMP: The Gnu multiple precision arithmetic library. Accessed 1 Dec 2016
  63. 63.
    Groce, A.: Left shift of zero allocates memory. Accessed 1 Dec 2016
  64. 64.
    Groce, A.: Raising zero to a large power mismatch with Python long. Accessed 1 Dec 2016
  65. 65.
    SymPy Development Team: SymPy. Accessed 1 Dec 2016
  66. 66.
    Klockner, A.: PyOpenCL. Accessed 1 Dec 2016
  67. 67.
    Khronos Group: The open standard for parallel programming of heterogenous systems. Accessed 1 Dec 2016
  68. 68.
    Gonzalez, J.: FuzzyWuzzy. Accessed 1 Dec 2016
  69. 69.
    AstroPy: a community Python library for astronomy. Accessed 1 Dec 2016
  70. 70.
    Godefroid, P., Klarlund, N., Sen, K.: DART: directed automated random testing. In: Programming Language Design and Implementation, pp. 213–223 (2005)Google Scholar
  71. 71.
    Andrews, J.H., Groce, A., Weston, M., Xu, R.G.: Random test run length and effectiveness. In: Automated Software Engineering, pp. 19–28 (2008)Google Scholar
  72. 72.
    Andrews, J.H., Haldar, S., Lei, Y., Li, C.H.F.: Tool support for randomized unit testing. In: Proceedings of the First International Workshop on Randomized Testing, Portland, Maine, pp. 36–45 (2006)Google Scholar
  73. 73.
    Andrews, J.H., Menzies, T., Li, F.C.: Genetic algorithms for randomized unit testing. IEEE Trans. Softw. Eng. (TSE) 37(1), 80–94 (2011)CrossRefGoogle Scholar
  74. 74.
    Arcuri, A., Briand, L.: Adaptive random testing: An illusion of effectiveness. In: International Symposium on Software Testing and Analysis, pp. 265–275 (2011)Google Scholar
  75. 75.
    Arcuri, A., Iqbal, M.Z.Z., Briand, L.C.: Formal analysis of the effectiveness and predictability of random testing. In: International Symposium on Software Testing and Analysis, pp. 219–230 (2010)Google Scholar
  76. 76.
    Chen, T.Y., Leung, H., Mak, I.K.: Adaptive random testing. In: Advances in Computer Science, pp. 320–329 (2004)Google Scholar
  77. 77.
    Ciupa, I., Leitner, A., Oriol, M., Meyer, B.: Experimental assessment of random testing for object-oriented software. In: Rosenblum, D.S., Elbaum, S.G. (eds.) International Symposium on Software Testing and Analysis, pp. 84–94. ACM (2007)Google Scholar
  78. 78.
    Duran, J.W., Ntafos, S.C.: Evaluation of random testing. IEEE Trans. Softw. Eng. 10(4), 438–444 (1984)CrossRefGoogle Scholar
  79. 79.
    Hamlet, R.: When only random testing will do. In: International Workshop on Random Testing, pp. 1–9 (2006)Google Scholar
  80. 80.
    Sharma, R., Gligoric, M., Arcuri, A., Fraser, G., Marinov, D.: Testing container classes: Random or systematic? In: Fundamental Approaches to Software Engineering, pp. 262–277 (2011)Google Scholar
  81. 81.
    Anand, S., Burke, E.K., Chen, T.Y., Clark, J., Cohen, M.B., Grieskamp, W., Harman, M., Harrold, M.J., McMinn, P.: An orchestrated survey of methodologies for automated software test case generation. J. Syst. Softw. 86(8), 1978–2001 (2013)CrossRefGoogle Scholar
  82. 82.
    Orso, A., Rothermel, G.: Software testing: A research travelogue (2000–2014). In: Proceedings of the on Future of Software Engineering, FOSE, pp. 117–132 (2014)Google Scholar
  83. 83.
    Nilsson, R.: ScalaCheck: property-based testing for Scala. Accessed 1 Dec 2016
  84. 84.
    Milicevic, A., Misailovic, S., Marinov, D., Khurshid, S.: Korat: A tool for generating structurally complex test inputs. In: International Conference on Software Engineering, pp. 771–774 (2007)Google Scholar
  85. 85.
    Giannakopoulou, D., Howar, F., Isberner, M., Lauderdale, T., Rakamarić, Z., Raman, V.: Taming test inputs for separation assurance. In: International Conference on Automated Software Engineering, pp. 373–384 (2014)Google Scholar
  86. 86.
    Felderer, M., Zech, P., Fiedler, F., Breu, R.: A tool-based methodology for system testing of service-oriented systems. In: 2010 Second International Conference on Advances in System Testing and Validation Lifecycle (VALID), pp. 108–113 (2010). doi: 10.1109/VALID.2010.12
  87. 87.
    Santiago, D., Cando, A., Mack, C., Nunez, G., Thomas, T., King, T.M.: Towards domain-specific testing languages for software-as-a-service. In: Proceedings of the 2nd International Workshop on Model-Driven Engineering for High Performance and Cloud computing co-located with 16th International Conference on Model Driven Engineering Languages and Systems (MODELS), pp. 43–52 (2013)Google Scholar
  88. 88.
    Im, K., Im, T., McGregor, J.D.: Automating test case definition using a domain specific language. In: Proceedings of the 46th Annual Southeast Regional Conference on XX, ACM-SE, vol. 46, pp. 180–185. ACM, New York, NY, USA (2008). doi: 10.1145/1593105.1593152
  89. 89.
    Chelimsky, D., Astels, D., Helmkamp, B., North, D., Dennis, Z., Hellesoy, A.: The RSpec Book: Behaviour Driven Development with Rspec, Cucumber, and Friends, 1st edn. Pragmatic Bookshelf, Raleigh, NC (2010)Google Scholar
  90. 90.
    Lei, Y., Andrews, J.H.: Minimization of randomized unit test cases. In: International Symposium on Software Reliability Engineering, pp. 267–276 (2005)Google Scholar
  91. 91.
    Pike, L.: SmartCheck: automatic and efficient counterexample reduction and generalization. In: ACM SIGPLAN Symposium on Haskell, pp. 53–64 (2014)Google Scholar
  92. 92.
    Daka, E., Campos, J., Dorn, J., Fraser, G., Weimer, W.: Generating readable unit tests for Guava. In: Search-Based Software Engineering—7th International Symposium, SSBSE 2015, Bergamo, Italy, 5–7 September 2015, Proceedings, pp. 235–241 (2015)Google Scholar
  93. 93.
    Daka, E., Campos, J., Fraser, G., Dorn, J., Weimer, W.: Modeling readability to improve unit tests. In: Foundations of Software Engineering, ESEC/FSE, pp. 107–118 (2015)Google Scholar
  94. 94.
    Maogui, H., Jinfeng, W.: Application of automated testing tool in GIS modeling. In: World Congress on Software Engineering, pp. 184–188 (2009)Google Scholar
  95. 95.
    AbSharma: Functional testing of GIS applications (automated testing). Accessed 1 Dec 2016
  96. 96.
    XBOSOFT: GIS software testing—lessons learned. Accessed 1 Dec 2016
  97. 97.
    GRASS Development Team: Testing GRASS GIS source code and modules. Accessed 1 Dec 2016
  98. 98.
    Segal, J.: Some problems of professional end user developers. In: IEEE Symposium on Visual Languages and Human-Centric Computing (2007)Google Scholar
  99. 99.
    Rothermel, K., Cook, C., Burnett, M., Schonfeld, J., Green, T., Rothermel, G.: WYSIWYT testing in the spreadsheet paradigm: an empirical evaluation. Int. Conf. Softw. Eng. 22, 230–240 (2000)Google Scholar
  100. 100.
    Phalgune, A., Kissinger, C., Burnett, M., Cook, C., Beckwith, L., Ruthruff, J.: Garbage in, garbage out? an empirical look at oracle mistakes by end-user programmers. In: IEEE Symp. Visual Languages and Human-Centric Computing, pp. 45–52 (2005)Google Scholar
  101. 101.
    Kulesza, T., Burnett, M., Stumpf, S., Wong, W.K., Das, S., Groce, A., Shinsel, A., Bice, F., McIntosh, K.: Where are my intelligent assistant’s mistakes? a systematic testing approach. In: International Symposium on End-User Development, pp. 171–186 (2011)Google Scholar
  102. 102.
    Shinsel, A., Kulesza, T., Burnett, M.M., Curan, W., Groce, A., Stumpf, S., Wong, W.K.: Mini-crowdsourcing end-user assessment of intelligent assistants: a cost-benefit study. In: Visual Languages and Human-Centric Computing, pp. 47–54 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Josie Holmes
    • 1
  • Alex Groce
    • 2
  • Jervis Pinto
    • 2
  • Pranjal Mittal
    • 2
  • Pooria Azimi
    • 2
  • Kevin Kellar
    • 2
  • James O’Brien
    • 3
  1. 1.Department of GeographyPennsylvania State UniversityState CollegeUSA
  2. 2.School of Electrical Engineering and Computer ScienceOregon State UniversityCorvallisUSA
  3. 3.Risk FrontiersMacquarie UniversitySydneyAustralia

Personalised recommendations