Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

How effective are existing Java API specifications for finding bugs during runtime verification?

  • 154 Accesses


Runtime verification can be used to find bugs early, during software development, by monitoring test executions against formal specifications (specs). The quality of runtime verification depends on the quality of the specs. While previous research has produced many specs for the Java API, manually or through automatic mining, there has been no large-scale study of their bug-finding effectiveness. Our conference paper presented the first in-depth study of the bug-finding effectiveness of previously proposed specs. We used JavaMOP to monitor 182 manually written and 17 automatically mined specs against more than 18K manually written and 2.1M automatically generated test methods in 200 open-source projects. The average runtime overhead was under \(4.3{\times }\). We inspected 652 violations of manually written specs and (randomly sampled) 200 violations of automatically mined specs. We reported 95 bugs, out of which developers already fixed or accepted 76. However, most violations, 82.81% of 652 and 97.89% of 200, were false alarms. Based on our empirical results, we conclude that (1) runtime verification technology has matured enough to incur tolerable runtime overhead during testing, and (2) the existing API specifications can find many bugs that developers are willing to fix; however, (3) the false alarm rates are worrisome and suggest that substantial effort needs to be spent on engineering better specs and properly evaluating their effectiveness. We repeated our experiments on a different set of 18 projects and inspected all resulting 742 violations. The results are similar, and our conclusions are the same.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. 1.

    These specs are publicly available (Pradel 2015).


  1. Allan, C., Avgustinov, P., Christensen, A.S., Hendren, L., Kuzins, S., Lhoták, O., de Moor, O., Sereni, D., Sittampalam, G., Tibble, J.: Adding trace matching with free variables to AspectJ. In: OOPSLA, pp. 345–364 (2005)

  2. Arnold, M., Vechev, M., Yahav, E.: QVM: An efficient runtime for detecting defects in deployed systems. In: OOPSLA, pp. 143–162 (2008)

  3. Beckman, N.E., Nori, A.V.: Probabilistic, modular and scalable inference of typestate specifications. In: PLDI, pp. 211–221 (2011)

  4. Blackburn, S.M., Garner, R., Hoffmann, C., Khang, A.M., McKinley, K.S., Bentzur, R., Diwan, A., Feinberg, D., Frampton, D., Guyer, S.Z., Hirzel, M., Hosking, A., Jump, M., Lee, H., Moss, J.E.B., Phansalkar, A., Stefanović, D., VanDrunen, T., von Dincklage, D., Wiedermann, B. The DaCapo benchmarks: Java benchmarking development and analysis. In: OOPSLA, pp. 169–190 (2006)

  5. Bodden, E.: MOPBox: a library approach to runtime verification. In: RV Tool Demo, pp. 365–369 (2011)

  6. Bodden, E., Hendren, L., Lam, P., Lhoták, O., Naeem, N.A.: Collaborative runtime verification with tracematches. In: RV, pp. 22–37 (2007a)

  7. Bodden, E., Hendren, L.J., Lhoták, O.: A staged static program analysis to improve the performance of runtime monitoring. In: ECOOP, pp. 525–549 (2007b)

  8. Bodden, E., Lam, P., Hendren, L.: Finding programming errors earlier by evaluating runtime monitors ahead-of-time. In: FSE, pp. 36–47 (2008)

  9. Chen, D., Zhang, Y., Wang, R., Li, X., Peng, L., Wei, W.: Mining universal specification based on probabilistic model. In: SEKE, pp. 471–476 (2015)

  10. Chen, F., Roşu, G.: Towards monitoring-oriented programming: a paradigm combining specification and implementation. In: RV, pp. 108–127 (2003)

  11. Cochran, W.G.: Sampling Techniques. Wiley, New York (1977)

  12. Dallmeier, V., Knopp, N., Mallon, C., Hack, S., Zeller, A.: Generating test cases for specification mining. In: ISSTA, pp. 85–96 (2010)

  13. Dwyer, M.B., Purandare, R., Person, S.: Runtime verification in context: can optimizing error detection improve fault diagnosis? In: RV, pp. 36–50 (2010)

  14. Emopers: Closing ObjectOutputStream before calling toByteArray on the underlying ByteArrayOutputStream. https://github.com/JodaOrg/joda-time/pull/339 (2015). Accessed 15 Nov 2019

  15. Emopers: Checking the validity of input ListIterators. https://github.com/imglib/imglib2/pull/259 (2019). Accessed 15 Nov 2019

  16. Forejt, V., Kwiatkowska, M., Parker, D., Qu, H., Ujma, M.: Incremental runtime verification of probabilistic systems. In: RV, pp. 314–319 (2012)

  17. Formal Systems Laboratory: JavaMOP. http://fsl.cs.illinois.edu/index.php/JavaMOP (2014). Accessed 15 Nov 2019

  18. Formal Systems Laboratory: Collections\(\_\)SynchronizedCollection. http://fsl.cs.illinois.edu/annotated-java/__properties/html/java/util/Collections_SynchronizedCollection.html (2015a). Accessed 15 Nov 2019

  19. Formal Systems Laboratory: JavaMOPAgent Documentation. https://github.com/runtimeverification/javamop/blob/master/docs/JavaMOPAgentUsage.md (2015b). Accessed 15 Nov 2019

  20. Formal Systems Laboratory: FSL Specification Database. https://runtimeverification.com/monitor/propertydb (2016). Accessed 15 Nov 2019

  21. Gabel, M., Su, Z.: Online inference and enforcement of temporal properties. In: ICSE, pp. 15–24 (2010)

  22. Gabel, M., Su, Z.: Testing mined specifications. In: FSE, pp. 1–11 (2012)

  23. Hussein, S., Meredith, P., Roşu, G.: Security-policy monitoring and enforcement with JavaMOP. In: PLAS, pp. 1–11 (2012)

  24. Jin, D., Meredith, P.O., Griffith, D., Roşu, G.: Garbage collection for monitoring parametric properties. In: PLDI, pp. 415–424 (2011)

  25. Jin, D., Meredith, P.O., Lee, C., Roşu, G.: JavaMOP: Efficient parametric runtime monitoring framework. In: ICSE Demo, pp. 1427–1430 (2012a)

  26. Jin, D., Meredith, P.O., Roşu, G.: Scalable parametric runtime monitoring. Technical report, Computer Science Department, UIUC (2012b)

  27. Joda, S.: Joda-Time. http://www.joda.org/joda-time/ (2016). Accessed 15 Nov 2019

  28. Karaorman, M., Freeman, J.: jMonitor: Java runtime event specification and monitoring library. In: RV, pp. 181–200 (2004)

  29. Krka, I., Brun, Y., Medvidovic, N.: Automatic mining of specifications from invocation traces and method invariants. In: FSE, pp. 178–189 (2014)

  30. Le Goues, C., Weimer, W.: Specification mining with few false positives. In: TACAS, pp. 292–306 (2009)

  31. Lee, C., Chen, F., Roşu, G.: Mining parametric specifications. In: ICSE, pp. 591–600 (2011)

  32. Lee, C., Jin, D., Meredith, P.O., Roşu, G.: Towards categorizing and formalizing the JDK API. Technical report, Computer Science Department, UIUC (2012)

  33. Legunsen, O., Marinov, D., Roşu, G.: Evolution-aware monitoring-oriented programming. In: ICSE NIER, pp. 615–618 (2015)

  34. Legunsen, O., Hariri, F., Shi, A., Lu, Y., Zhang, L., Marinov, D.: An extensive study of static regression test selection in modern software evolution. In: FSE, pp. 583–594 (2016a)

  35. Legunsen, O., Hassan, W.U., Xu, X., Rosu, G., Marinov, D.: How good are the specs? A study of the bug-finding effectiveness of existing Java API specifications. In: ASE, pp. 602–613 (2016b)

  36. Legunsen, O., Hassan, W.U., Xu, X., Roşu, G., Marinov, D.: Supplementary material for this paper. http://fsl.cs.illinois.edu/spec-eval (2016c). Accessed 15 Nov 2019

  37. Legunsen, O., Shi, A., Marinov, D.: STARTS: STAtic Regression Test Selection. In: ASE, pp. 949–954 (2017)

  38. Legunsen, O., Zhang, Y., Hadzi-Tanovic, M., Roşu, G., Marinov, D.: Techniques for evolution-aware runtime verification. In: ICST, pp. 300–311 (2019)

  39. Lemieux, C.: Mining temporal properties of data invariants. In: ICSE SRC, pp. 751–753 (2015)

  40. Lemieux, C., Park, D., Beschastnikh, I.: General LTL specification mining. In: ASE, pp. 81–92 (2015)

  41. Ley, M.: CompleteSearch DBLP. http://www.dblp.org/search/index.php (2015). Accessed 15 Nov 2019

  42. Luo, Q., Zhang, Y., Lee, C., Jin, D., Meredith, P.O., Şerbănuţă, T.F., Roşu, G.: RV-Monitor: efficient parametric runtime verification with simultaneous properties. In: RV, pp. 285–300 (2014)

  43. Mao, D., Chen, L., Zhang, L.: An extensive study on cross-project predictive mutation testing. In: ICST, pp. 160–171 (2019)

  44. Meredith, P., Roşu, G.: Efficient parametric runtime verification with deterministic string rewriting. In: ASE, pp. 70–80 (2013)

  45. Meredith, P., Jin, D., Chen, F., Roşu, G.: Efficient monitoring of parametric context-free patterns. In: ASE, pp. 148–157 (2008)

  46. Navabpour, S., Wu, C.W.W., Bonakdarpour, B., Fischmeister, S.: Efficient techniques for near-optimal instrumentation in time-triggered runtime verification. In: RV, pp. 208–222 (2011)

  47. Nguyen, A.C., Khoo, S.C.: Extracting significant specifications from mining through mutation testing. In: ICFEM, pp. 472–488 (2011)

  48. Nguyen, H.A., Dyer, R., Nguyen, T.N., Rajan, H.: Mining preconditions of APIs in large-scale code corpus. In: FSE, pp. 166–177 (2014)

  49. Oracle: java.lang.instrument. http://docs.oracle.com/javase/7/docs/api/java/lang/instrument/package-summary.html (2015a). Accessed 15 Nov 2019

  50. Oracle: java.lang.Math. https://docs.oracle.com/javase/7/docs/api/java/lang/Math.html (2015b). Accessed 15 Nov 2019

  51. Oracle: java.net.URL. https://docs.oracle.com/javase/7/docs/api/java/net/URL.html (2015c). Accessed 15 Nov 2019

  52. Oracle: java.util.Collections. https://docs.oracle.com/javase/7/docs/api/java/util/Collections.html (2015d). Accessed 15 Nov 2019

  53. Pacheco, C., Ernst, M.D.: Randoop: feedback-directed random testing for Java. In: OOPSLA Companion, pp. 815–816 (2007)

  54. Pacheco, C., Ernst, M.D.: Randoop. https://randoop.github.io/randoop/ (2016). Accessed 15 Nov 2019

  55. Pacheco, C., Lahiri, S.K., Ernst, M.D., Ball, T.: Feedback-directed random test generation. In: ICSE, pp. 75–84 (2007)

  56. Pacheco, C., Lahiri, S.K., Ball, T.: Finding errors in .NET with feedback-directed random testing. In: ISSTA, pp. 87–96 (2008)

  57. Pradel, M.: Dynamically inferring, refining, and checking API usage protocols. In: OOPSLA Companion, pp. 773–774 (2009)

  58. Pradel, M.: Statically checking API protocol conformance with mined multi-object specifications (supplementary material). http://mp.binaervarianz.de/icse2012-statically/ (2015). Accessed 15 Nov 2019

  59. Pradel, M., Gross, T.R.: Automatic generation of object usage specifications from large method traces. In: ASE, pp. 371–382 (2009)

  60. Pradel, M., Gross, T.R.: Leveraging test generation and specification mining for automated bug detection without false positives. In: ICSE, pp. 288–298 (2012)

  61. Pradel, M., Bichsel, P., Gross, T.R.: A framework for the evaluation of specification miners based on finite state machines. In: ICSM, pp. 1–10 (2010)

  62. Pradel, M., Jaspan, C., Aldrich, J., Gross, T.R.: Statically checking API protocol conformance with mined multi-object specifications. In: ICSE, pp. 925–935 (2012)

  63. Purandare, R., Dwyer, M.B., Elbaum, S.: Optimizing monitoring of finite state properties through monitor compaction. In: ISSTA, pp. 280–290 (2013)

  64. Reger, G., Barringer, H., Rydeheard, D.: A pattern-based approach to parametric specification mining. In: ASE, pp. 658–663 (2013)

  65. Robillard, M.P., Bodden, E., Kawrykow, D., Mezini, M., Ratchford, T.: Automated API property inference techniques. TSE 39(5), 613–637 (2013)

  66. Shamshiri, S., Just, R., Rojas, J., Fraser, G., McMinn, P., Arcuri, A.: Do automatically generated unit tests find real faults? An empirical study of effectiveness and challenges. In: ASE, pp. 201–211 (2015)

  67. Sun, J., Xiao, H., Liu, Y., Lin, S.W., Qin, S.: TLV: abstraction through testing, learning, and validation. In: ESEC/FSE, pp. 698–709 (2015)

  68. Tan, S.H., Marinov, D., Tan, L., Leavens, G.T.: @tComment: testing Javadoc comments to detect comment-code inconsistencies. In: ICST, pp. 260–269 (2012)

  69. The JaCoCo Team: JaCoCo Java Code Coverage Library. https://www.jacoco.org/jacoco (2018). Accessed 15 Nov 2019

  70. Thummalapenta, S., Xie, T.: Alattin: mining alternative patterns for detecting neglected conditions. In: ASE, pp. 283–294 (2009)

  71. Wasylkowski, A., Zeller, A.: Mining temporal specifications from object usage. In: ASE, pp. 295–306 (2009)

  72. Weimer, W., Necula, G.: Mining temporal specifications for error detection. In: TACAS, pp. 461–476 (2005)

  73. Wu, C.W.W., Kumar, D., Bonakdarpour, B., Fischmeister, S.: Reducing monitoring overhead by integrating event- and time-triggered techniques. In: RV, pp. 304–321 (2013)

  74. Wu, Q., Liang, G., Wang, Q., Xie, T., Mei, H.: Iterative mining of resource-releasing specifications. In: ASE, pp. 233–242 (2011)

  75. Zhang, J., Wang, Z., Zhang, L., Hao, D., Zang, L., Cheng, S., Zhang, L.: Predictive mutation testing. In: ISSTA, pp. 342–353 (2016)

  76. Zhang, J., Zhang, L., Harman, M., Hao, D., Jia, Y., Zhang, L.: Predictive mutation testing. In: TSE, pp. 898–918 (2018)

  77. Zhong, H., Zhang, L., Xie, T., Mei, H.: Inferring resource specifications from natural language API documentation. In: ASE, pp. 307–318 (2009)

Download references


Karl Hajal, Milica Hadzi-Tanovic and Igor Lima helped with inspecting violations in our validation study and submitting pull requests. We thank Alex Gyori, Farah Hariri, Cosmin Radoi, and August Shi for feedback on early drafts of this paper, Rahul Gopinath for discussions and help with Randoop, and He Xiao and Yi Zhang for help with JavaMOP. We also thank all authors of papers who replied to our emails concerning their mined specs. This research was partially supported by the NSF Grants CCF-1421503, CCF-1421575, CCF-1438982, CCF-1439957, CNS-1646305, CNS-1740916, and CCF-1763788. Wajih Ul Hassan was partially supported by the Sohaib and Sara Abassi Fellowship. We gratefully acknowledge support for research on testing from Microsoft and Qualcomm.

Author information

Correspondence to Owolabi Legunsen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Legunsen, O., Al Awar, N., Xu, X. et al. How effective are existing Java API specifications for finding bugs during runtime verification?. Autom Softw Eng 26, 795–837 (2019). https://doi.org/10.1007/s10515-019-00267-1

Download citation


  • Runtime verification
  • Monitoring-oriented programming
  • Specification quality
  • Software testing
  • Empirical study