Automated Software Engineering

, Volume 26, Issue 4, pp 795–837 | Cite as

How effective are existing Java API specifications for finding bugs during runtime verification?

  • Owolabi LegunsenEmail author
  • Nader Al Awar
  • Xinyue Xu
  • Wajih Ul Hassan
  • Grigore Roşu
  • Darko Marinov


Runtime verification can be used to find bugs early, during software development, by monitoring test executions against formal specifications (specs). The quality of runtime verification depends on the quality of the specs. While previous research has produced many specs for the Java API, manually or through automatic mining, there has been no large-scale study of their bug-finding effectiveness. Our conference paper presented the first in-depth study of the bug-finding effectiveness of previously proposed specs. We used JavaMOP to monitor 182 manually written and 17 automatically mined specs against more than 18K manually written and 2.1M automatically generated test methods in 200 open-source projects. The average runtime overhead was under \(4.3{\times }\). We inspected 652 violations of manually written specs and (randomly sampled) 200 violations of automatically mined specs. We reported 95 bugs, out of which developers already fixed or accepted 76. However, most violations, 82.81% of 652 and 97.89% of 200, were false alarms. Based on our empirical results, we conclude that (1) runtime verification technology has matured enough to incur tolerable runtime overhead during testing, and (2) the existing API specifications can find many bugs that developers are willing to fix; however, (3) the false alarm rates are worrisome and suggest that substantial effort needs to be spent on engineering better specs and properly evaluating their effectiveness. We repeated our experiments on a different set of 18 projects and inspected all resulting 742 violations. The results are similar, and our conclusions are the same.


Runtime verification Monitoring-oriented programming Specification quality Software testing Empirical study 



Karl Hajal, Milica Hadzi-Tanovic and Igor Lima helped with inspecting violations in our validation study and submitting pull requests. We thank Alex Gyori, Farah Hariri, Cosmin Radoi, and August Shi for feedback on early drafts of this paper, Rahul Gopinath for discussions and help with Randoop, and He Xiao and Yi Zhang for help with JavaMOP. We also thank all authors of papers who replied to our emails concerning their mined specs. This research was partially supported by the NSF Grants CCF-1421503, CCF-1421575, CCF-1438982, CCF-1439957, CNS-1646305, CNS-1740916, and CCF-1763788. Wajih Ul Hassan was partially supported by the Sohaib and Sara Abassi Fellowship. We gratefully acknowledge support for research on testing from Microsoft and Qualcomm.


  1. Allan, C., Avgustinov, P., Christensen, A.S., Hendren, L., Kuzins, S., Lhoták, O., de Moor, O., Sereni, D., Sittampalam, G., Tibble, J.: Adding trace matching with free variables to AspectJ. In: OOPSLA, pp. 345–364 (2005)Google Scholar
  2. Arnold, M., Vechev, M., Yahav, E.: QVM: An efficient runtime for detecting defects in deployed systems. In: OOPSLA, pp. 143–162 (2008)Google Scholar
  3. Beckman, N.E., Nori, A.V.: Probabilistic, modular and scalable inference of typestate specifications. In: PLDI, pp. 211–221 (2011)Google Scholar
  4. Blackburn, S.M., Garner, R., Hoffmann, C., Khang, A.M., McKinley, K.S., Bentzur, R., Diwan, A., Feinberg, D., Frampton, D., Guyer, S.Z., Hirzel, M., Hosking, A., Jump, M., Lee, H., Moss, J.E.B., Phansalkar, A., Stefanović, D., VanDrunen, T., von Dincklage, D., Wiedermann, B. The DaCapo benchmarks: Java benchmarking development and analysis. In: OOPSLA, pp. 169–190 (2006)Google Scholar
  5. Bodden, E.: MOPBox: a library approach to runtime verification. In: RV Tool Demo, pp. 365–369 (2011)CrossRefGoogle Scholar
  6. Bodden, E., Hendren, L., Lam, P., Lhoták, O., Naeem, N.A.: Collaborative runtime verification with tracematches. In: RV, pp. 22–37 (2007a)Google Scholar
  7. Bodden, E., Hendren, L.J., Lhoták, O.: A staged static program analysis to improve the performance of runtime monitoring. In: ECOOP, pp. 525–549 (2007b)Google Scholar
  8. Bodden, E., Lam, P., Hendren, L.: Finding programming errors earlier by evaluating runtime monitors ahead-of-time. In: FSE, pp. 36–47 (2008)Google Scholar
  9. Chen, D., Zhang, Y., Wang, R., Li, X., Peng, L., Wei, W.: Mining universal specification based on probabilistic model. In: SEKE, pp. 471–476 (2015)Google Scholar
  10. Chen, F., Roşu, G.: Towards monitoring-oriented programming: a paradigm combining specification and implementation. In: RV, pp. 108–127 (2003)Google Scholar
  11. Cochran, W.G.: Sampling Techniques. Wiley, New York (1977)zbMATHGoogle Scholar
  12. Dallmeier, V., Knopp, N., Mallon, C., Hack, S., Zeller, A.: Generating test cases for specification mining. In: ISSTA, pp. 85–96 (2010)Google Scholar
  13. Dwyer, M.B., Purandare, R., Person, S.: Runtime verification in context: can optimizing error detection improve fault diagnosis? In: RV, pp. 36–50 (2010)Google Scholar
  14. Emopers: Closing ObjectOutputStream before calling toByteArray on the underlying ByteArrayOutputStream. (2015). Accessed 15 Nov 2019
  15. Emopers: Checking the validity of input ListIterators. (2019). Accessed 15 Nov 2019
  16. Forejt, V., Kwiatkowska, M., Parker, D., Qu, H., Ujma, M.: Incremental runtime verification of probabilistic systems. In: RV, pp. 314–319 (2012)Google Scholar
  17. Formal Systems Laboratory: JavaMOP. (2014). Accessed 15 Nov 2019
  18. Formal Systems Laboratory: Collections\(\_\)SynchronizedCollection. (2015a). Accessed 15 Nov 2019
  19. Formal Systems Laboratory: JavaMOPAgent Documentation. (2015b). Accessed 15 Nov 2019
  20. Formal Systems Laboratory: FSL Specification Database. (2016). Accessed 15 Nov 2019
  21. Gabel, M., Su, Z.: Online inference and enforcement of temporal properties. In: ICSE, pp. 15–24 (2010)Google Scholar
  22. Gabel, M., Su, Z.: Testing mined specifications. In: FSE, pp. 1–11 (2012)Google Scholar
  23. Hussein, S., Meredith, P., Roşu, G.: Security-policy monitoring and enforcement with JavaMOP. In: PLAS, pp. 1–11 (2012)Google Scholar
  24. Jin, D., Meredith, P.O., Griffith, D., Roşu, G.: Garbage collection for monitoring parametric properties. In: PLDI, pp. 415–424 (2011)Google Scholar
  25. Jin, D., Meredith, P.O., Lee, C., Roşu, G.: JavaMOP: Efficient parametric runtime monitoring framework. In: ICSE Demo, pp. 1427–1430 (2012a)Google Scholar
  26. Jin, D., Meredith, P.O., Roşu, G.: Scalable parametric runtime monitoring. Technical report, Computer Science Department, UIUC (2012b)Google Scholar
  27. Joda, S.: Joda-Time. (2016). Accessed 15 Nov 2019
  28. Karaorman, M., Freeman, J.: jMonitor: Java runtime event specification and monitoring library. In: RV, pp. 181–200 (2004)CrossRefGoogle Scholar
  29. Krka, I., Brun, Y., Medvidovic, N.: Automatic mining of specifications from invocation traces and method invariants. In: FSE, pp. 178–189 (2014)Google Scholar
  30. Le Goues, C., Weimer, W.: Specification mining with few false positives. In: TACAS, pp. 292–306 (2009)Google Scholar
  31. Lee, C., Chen, F., Roşu, G.: Mining parametric specifications. In: ICSE, pp. 591–600 (2011)Google Scholar
  32. Lee, C., Jin, D., Meredith, P.O., Roşu, G.: Towards categorizing and formalizing the JDK API. Technical report, Computer Science Department, UIUC (2012)Google Scholar
  33. Legunsen, O., Marinov, D., Roşu, G.: Evolution-aware monitoring-oriented programming. In: ICSE NIER, pp. 615–618 (2015)Google Scholar
  34. Legunsen, O., Hariri, F., Shi, A., Lu, Y., Zhang, L., Marinov, D.: An extensive study of static regression test selection in modern software evolution. In: FSE, pp. 583–594 (2016a)Google Scholar
  35. Legunsen, O., Hassan, W.U., Xu, X., Rosu, G., Marinov, D.: How good are the specs? A study of the bug-finding effectiveness of existing Java API specifications. In: ASE, pp. 602–613 (2016b)Google Scholar
  36. Legunsen, O., Hassan, W.U., Xu, X., Roşu, G., Marinov, D.: Supplementary material for this paper. (2016c). Accessed 15 Nov 2019
  37. Legunsen, O., Shi, A., Marinov, D.: STARTS: STAtic Regression Test Selection. In: ASE, pp. 949–954 (2017)Google Scholar
  38. Legunsen, O., Zhang, Y., Hadzi-Tanovic, M., Roşu, G., Marinov, D.: Techniques for evolution-aware runtime verification. In: ICST, pp. 300–311 (2019)Google Scholar
  39. Lemieux, C.: Mining temporal properties of data invariants. In: ICSE SRC, pp. 751–753 (2015)Google Scholar
  40. Lemieux, C., Park, D., Beschastnikh, I.: General LTL specification mining. In: ASE, pp. 81–92 (2015)Google Scholar
  41. Ley, M.: CompleteSearch DBLP. (2015). Accessed 15 Nov 2019
  42. Luo, Q., Zhang, Y., Lee, C., Jin, D., Meredith, P.O., Şerbănuţă, T.F., Roşu, G.: RV-Monitor: efficient parametric runtime verification with simultaneous properties. In: RV, pp. 285–300 (2014)CrossRefGoogle Scholar
  43. Mao, D., Chen, L., Zhang, L.: An extensive study on cross-project predictive mutation testing. In: ICST, pp. 160–171 (2019)Google Scholar
  44. Meredith, P., Roşu, G.: Efficient parametric runtime verification with deterministic string rewriting. In: ASE, pp. 70–80 (2013)Google Scholar
  45. Meredith, P., Jin, D., Chen, F., Roşu, G.: Efficient monitoring of parametric context-free patterns. In: ASE, pp. 148–157 (2008)Google Scholar
  46. Navabpour, S., Wu, C.W.W., Bonakdarpour, B., Fischmeister, S.: Efficient techniques for near-optimal instrumentation in time-triggered runtime verification. In: RV, pp. 208–222 (2011)CrossRefGoogle Scholar
  47. Nguyen, A.C., Khoo, S.C.: Extracting significant specifications from mining through mutation testing. In: ICFEM, pp. 472–488 (2011)Google Scholar
  48. Nguyen, H.A., Dyer, R., Nguyen, T.N., Rajan, H.: Mining preconditions of APIs in large-scale code corpus. In: FSE, pp. 166–177 (2014)Google Scholar
  49. Oracle: java.lang.instrument. (2015a). Accessed 15 Nov 2019
  50. Oracle: java.lang.Math. (2015b). Accessed 15 Nov 2019
  51. Oracle: (2015c). Accessed 15 Nov 2019
  52. Oracle: java.util.Collections. (2015d). Accessed 15 Nov 2019
  53. Pacheco, C., Ernst, M.D.: Randoop: feedback-directed random testing for Java. In: OOPSLA Companion, pp. 815–816 (2007)Google Scholar
  54. Pacheco, C., Ernst, M.D.: Randoop. (2016). Accessed 15 Nov 2019
  55. Pacheco, C., Lahiri, S.K., Ernst, M.D., Ball, T.: Feedback-directed random test generation. In: ICSE, pp. 75–84 (2007)Google Scholar
  56. Pacheco, C., Lahiri, S.K., Ball, T.: Finding errors in .NET with feedback-directed random testing. In: ISSTA, pp. 87–96 (2008)Google Scholar
  57. Pradel, M.: Dynamically inferring, refining, and checking API usage protocols. In: OOPSLA Companion, pp. 773–774 (2009)Google Scholar
  58. Pradel, M.: Statically checking API protocol conformance with mined multi-object specifications (supplementary material). (2015). Accessed 15 Nov 2019
  59. Pradel, M., Gross, T.R.: Automatic generation of object usage specifications from large method traces. In: ASE, pp. 371–382 (2009)Google Scholar
  60. Pradel, M., Gross, T.R.: Leveraging test generation and specification mining for automated bug detection without false positives. In: ICSE, pp. 288–298 (2012)Google Scholar
  61. Pradel, M., Bichsel, P., Gross, T.R.: A framework for the evaluation of specification miners based on finite state machines. In: ICSM, pp. 1–10 (2010)Google Scholar
  62. Pradel, M., Jaspan, C., Aldrich, J., Gross, T.R.: Statically checking API protocol conformance with mined multi-object specifications. In: ICSE, pp. 925–935 (2012)Google Scholar
  63. Purandare, R., Dwyer, M.B., Elbaum, S.: Optimizing monitoring of finite state properties through monitor compaction. In: ISSTA, pp. 280–290 (2013)Google Scholar
  64. Reger, G., Barringer, H., Rydeheard, D.: A pattern-based approach to parametric specification mining. In: ASE, pp. 658–663 (2013)Google Scholar
  65. Robillard, M.P., Bodden, E., Kawrykow, D., Mezini, M., Ratchford, T.: Automated API property inference techniques. TSE 39(5), 613–637 (2013)Google Scholar
  66. Shamshiri, S., Just, R., Rojas, J., Fraser, G., McMinn, P., Arcuri, A.: Do automatically generated unit tests find real faults? An empirical study of effectiveness and challenges. In: ASE, pp. 201–211 (2015)Google Scholar
  67. Sun, J., Xiao, H., Liu, Y., Lin, S.W., Qin, S.: TLV: abstraction through testing, learning, and validation. In: ESEC/FSE, pp. 698–709 (2015)Google Scholar
  68. Tan, S.H., Marinov, D., Tan, L., Leavens, G.T.: @tComment: testing Javadoc comments to detect comment-code inconsistencies. In: ICST, pp. 260–269 (2012)Google Scholar
  69. The JaCoCo Team: JaCoCo Java Code Coverage Library. (2018). Accessed 15 Nov 2019
  70. Thummalapenta, S., Xie, T.: Alattin: mining alternative patterns for detecting neglected conditions. In: ASE, pp. 283–294 (2009)Google Scholar
  71. Wasylkowski, A., Zeller, A.: Mining temporal specifications from object usage. In: ASE, pp. 295–306 (2009)Google Scholar
  72. Weimer, W., Necula, G.: Mining temporal specifications for error detection. In: TACAS, pp. 461–476 (2005)Google Scholar
  73. Wu, C.W.W., Kumar, D., Bonakdarpour, B., Fischmeister, S.: Reducing monitoring overhead by integrating event- and time-triggered techniques. In: RV, pp. 304–321 (2013)CrossRefGoogle Scholar
  74. Wu, Q., Liang, G., Wang, Q., Xie, T., Mei, H.: Iterative mining of resource-releasing specifications. In: ASE, pp. 233–242 (2011)Google Scholar
  75. Zhang, J., Wang, Z., Zhang, L., Hao, D., Zang, L., Cheng, S., Zhang, L.: Predictive mutation testing. In: ISSTA, pp. 342–353 (2016)Google Scholar
  76. Zhang, J., Zhang, L., Harman, M., Hao, D., Jia, Y., Zhang, L.: Predictive mutation testing. In: TSE, pp. 898–918 (2018)CrossRefGoogle Scholar
  77. Zhong, H., Zhang, L., Xie, T., Mei, H.: Inferring resource specifications from natural language API documentation. In: ASE, pp. 307–318 (2009)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.University of Illinois at Urbana-ChampaignUrbanaUSA
  2. 2.American University of BeirutBeirutLebanon

Personalised recommendations