Advertisement

On the Impact of Order Information in API Usage Patterns

  • Ervina ÇerganiEmail author
  • Mira MeziniEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1077)

Abstract

Many approaches have been proposed for learning Application Programming Interface (API) usage patterns from code repositories. Depending on the underlying technique, the mined patterns may (1) be strictly sequential, (2) consider partial order between method calls, or (3) not consider order information. Understanding the trade-offs between these pattern types with respect to real code is important in many applications (e.g. misuse detection), given that APIs often have usage constraints, such as restrictions on call order. API misuses, i.e., violations of these constraints, may lead to software crashes, bugs and vulnerabilities.

In this paper, we present the results of a work that addresses this need. We have constructed a benchmark based on an episode mining algorithm that can be configured to learn three type of patterns: sequential, partial, and no-order patterns. We use the benchmark in two ways. First, we use it to empirically study the different types of the mined API usage patterns based on three well-defined metrics: expressiveness, consistency and generalizability. Second, we evaluate the effect of the different pattern types within the real application context of using them as an input to a misuse detector. We run the benchmark on two existing datasets consisting of: (1) 360 C# code repositories, and (2) four Java projects. We use the C# data set to empirically study the resulting API usage patterns, and the Java data set to evaluate the effect of different pattern types on the application context of misuse detection. For this purpose, we build EMDetect for detecting API misuses in Java projects.

Our results show practical evidence that not only do partial-order patterns represent a generalized super set of sequential-order patterns, partial-order mining also finds additional patterns missed by sequence mining, which are used by a larger number of developers across code repositories. Additionally, our study empirically quantifies the importance of the order information encoded in sequential and partial-order patterns for representing correct co-occurrences of code elements in real code. On the application context of misuse detection, our results show that sequential-order patterns perform better in terms of precision by ranking true positives higher in the top findings, while partial-order patterns perform better in terms of recall by being able to find more misuses in the source code. Last but not least, our benchmark can be used by other researchers to explore additional properties of API patterns, and for building-up other applications based on API usage patterns.

Keywords

API usage pattern types API misuse detection Events mining Empirical study Benchmark 

References

  1. 1.
    Achar, A., Laxman, S., Viswanathan, R., Sastry, P.: Discovering injective episodes with general partial orders. Data Min. Knowl. Disc. 25, 67–108 (2012)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Achar, A., Sastry, P.: Statistical significance of episodes with general partial orders. Inf. Sci. 296, 175–200 (2015) MathSciNetCrossRefGoogle Scholar
  3. 3.
    Acharya, M., Xie, T.: Mining API error-handling specifications from source code. In: Chechik, M., Wirsing, M. (eds.) FASE 2009. LNCS, vol. 5503, pp. 370–384. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-00593-0_25CrossRefGoogle Scholar
  4. 4.
    Acharya, M., Xie, T., Pei, J., Xu, J.: Mining API patterns as partial orders from source code: from usage scenarios to specifications. In: European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 25–34 (2007)Google Scholar
  5. 5.
    Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD, pp. 207–216 (1993)CrossRefGoogle Scholar
  6. 6.
    Amann, S., Nguyen, H.A., Nadi, S., Nguyen, T.N., Mezini, M.: A systematic evaluation of static API-misuse detectors. IEEE Trans. Softw. Eng. 1–1 (2018). abs/1712.00242Google Scholar
  7. 7.
    Amann, S.: A systematic approach to benchmark and improve automated static detection of Java-API misuses. Ph.D. thesis, Darmstadt University of Technology, Germany (2018)Google Scholar
  8. 8.
    Amann, S., Nadi, S., Nguyen, H.A., Nguyen, T.N., Mezini, M.: Mubench: a benchmark for API-misuse detectors. In: International Conference on Mining Software Repositories, pp. 464–467 (2016)Google Scholar
  9. 9.
    Buse, R.P., Weimer, W.: Synthesizing API usage examples. In: Proceedings of the 34th International Conference on Software Engineering, pp. 782–792. IEEE Press (2012)Google Scholar
  10. 10.
    Çergani, E., Proksch, S., Nadi, S., Mezini, M.: Investigating order information in API-usage patterns: a benchmark and empirical study. In: International Conference on Software Technologies, ICSOFT 2018, Porto, Portugal, 26–28 July 2018, pp. 91–102 (2018)Google Scholar
  11. 11.
    De Roover, C., Lammel, R., Pek, E.: Multi-dimensional exploration of API usage. In: 2013 IEEE 21st International Conference on Program Comprehension (ICPC), pp. 152–161. IEEE (2013)Google Scholar
  12. 12.
    Gabel, M., Su, Z.: Javert: fully automatic mining of general temporal properties from dynamic traces. In: ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 339–349 (2008)Google Scholar
  13. 13.
    Haase, J., Brefeld, U.: Mining positional data streams. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2014. LNCS (LNAI), vol. 8983, pp. 102–116. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-17876-9_7CrossRefGoogle Scholar
  14. 14.
    Li, Z., Zhou, Y.: PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code. In: ACM SIGSOFT Software Engineering Notes, pp. 306–315 (2005)CrossRefGoogle Scholar
  15. 15.
    Ma, H., Amor, R., Tempero, E.: Usage patterns of the java standard API. In: Software Engineering Conference 2006, pp. 342–352 (2006)Google Scholar
  16. 16.
    Mannila, H., Toivonen, H., Inkeri Verkamo, A.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov. 1, 259–289 (1997)CrossRefGoogle Scholar
  17. 17.
    Martin, R.C.: Agile Software Development: Principles, Patterns, and Practices. Prentice Hall PTR, Upper Saddle River (2003)Google Scholar
  18. 18.
    Mendez, D., Baudry, B., Monperrus, M.: Empirical evidence of large-scale diversity in API usage of object-oriented software. In: Source Code Analysis and Manipulation, pp. 43–52 (2013)Google Scholar
  19. 19.
    Michail, A.: Data mining library reuse patterns using generalized association rules. In: International Conference on Software Engineering, pp. 167–176 (2000)Google Scholar
  20. 20.
    Monperrus, M., Mezini, M.: Detecting missing method calls as violations of the majority rule. ACM Trans. Softw. Eng. Methodol. (TOSEM) 22(1), 7 (2013)CrossRefGoogle Scholar
  21. 21.
    Montandon, J.E., Borges, H., Felix, D., Valente, M.T.: Documenting APIs with examples: lessons learned with the APIMiner platform. In: WCRE, pp. 401–408 (2013)Google Scholar
  22. 22.
    Negara, S., Codoban, M., Dig, D., Johnson, R.E.: Mining fine-grained code changes to detect unknown change patterns. In: International Conference on Software Engineering, pp. 803–813 (2014)Google Scholar
  23. 23.
    Nguyen, A.T., et al.: API code recommendation using statistical learning from fine-grained changes. In: ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 511–522 (2016)Google Scholar
  24. 24.
    Nguyen, A.T., Nguyen, T.N.: Graph-based statistical language model for code. In: International Conference on Software Engineering, pp. 858–868 (2015)Google Scholar
  25. 25.
    Nguyen, A.T., et al.: Graph-based pattern-oriented, context-sensitive source code completion. In: International Conference on Software Engineering, pp. 69–79 (2012)Google Scholar
  26. 26.
    Nguyen, H.V., Nguyen, H.A., Nguyen, A.T., Nguyen, T.N.: Mining interprocedural, data-oriented usage patterns in javascript web applications. In: International Conference on Software Engineering, pp. 791–802 (2014)Google Scholar
  27. 27.
    Nguyen, T.T., Nguyen, H.A., Pham, N.H., Al-Kofahi, J.M., Nguyen, T.N.: Graph-based mining of multiple object usage patterns. In: Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 383–392. ACM (2009)Google Scholar
  28. 28.
    Pham, H.V., Vu, P.M., Nguyen, T.T., et al.: Learning API usages from bytecode: a statistical approach. In: International Conference on Software Engineering, pp. 416–427 (2016)Google Scholar
  29. 29.
    Pradel, M., Bichsel, P., Gross, T.R.: A framework for the evaluation of specification miners based on finite state machines. In: IEEE International Conference on Software Maintenance, pp. 1–10 (2010)Google Scholar
  30. 30.
    Proksch, S., Amann, S., Nadi, S., Mezini, M.: A dataset of simplified syntax trees for c#. In: International Conference on Mining Software Repositories, pp. 476–479 (2016)Google Scholar
  31. 31.
    Qiu, D., Li, B., Leung, H.: Understanding the API usage in java. Inf. Softw. Technol. 73, 81–100 (2016)CrossRefGoogle Scholar
  32. 32.
    Ramanathan, M.K., Grama, A., Jagannathan, S.: Path-sensitive inference of function precedence protocols. In: International Conference on Software Engineering, pp. 240–250 (2007)Google Scholar
  33. 33.
    Raychev, V., Vechev, M., Yahav, E.: Code completion with statistical language models. In: ACM SIGPLAN Notices, pp. 419–428 (2014)CrossRefGoogle Scholar
  34. 34.
    Robillard, M.P., Bodden, E., Kawrykow, D., Mezini, M., Ratchford, T.: Automated API property inference techniques. IEEE Trans. Softw. Eng. 39, 613–637 (2013)CrossRefGoogle Scholar
  35. 35.
    Thummalapenta, S., Xie, T.: Alattin: Mining alternative patterns for detecting neglected conditions. In: International Conference on Automated Software Engineering, pp. 283–294 (2009)Google Scholar
  36. 36.
    Thummalapenta, S., Xie, T.: Mining exception-handling rules as sequence association rules. In: Proceedings of the 31st International Conference on Software Engineering, pp. 496–506. IEEE Computer Society (2009)Google Scholar
  37. 37.
    Wang, J., Dang, Y., Zhang, H., Chen, K., Xie, T., Zhang, D.: Mining succinct and high-coverage API usage patterns from source code. In: Proceedings of the 10th Working Conference on Mining Software Repositories, pp. 319–328. IEEE Press (2013)Google Scholar
  38. 38.
    Wasylkowski, A., Zeller, A.: Mining temporal specifications from object usage. Autom. Softw. Eng. 18(3), 263–292 (2011)CrossRefGoogle Scholar
  39. 39.
    Wasylkowski, A., Zeller, A., Lindig, C.: Detecting object usage anomalies. In: European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, pp. 35–44 (2007)Google Scholar
  40. 40.
    Zhong, H., Mei, H.: An empirical study on API usages. IEEE Trans. Softw. Eng. 45, 319–334 (2018)CrossRefGoogle Scholar
  41. 41.
    Zhong, H., Xie, T., Zhang, L., Pei, J., Mei, H.: MAPO: mining and recommending API usage patterns. In: Drossopoulou, S. (ed.) ECOOP 2009. LNCS, vol. 5653, pp. 318–343. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-03013-0_15 CrossRefGoogle Scholar
  42. 42.
    Zhong, H., Zhang, L., Xie, T., Mei, H.: Inferring resource specifications from natural language API documentation. In: International Conference on Automated Software Engineering, pp. 307–318 (2009)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Software Technology GroupTechnische Universität DarmstadtDarmstadtGermany

Personalised recommendations