Skip to main content
Log in

Inferring specifications for resources from natural language API documentation

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Many software libraries, especially those commercial ones, provide API documentation in natural languages to describe correct API usages. However, developers may still write code that is inconsistent with API documentation, partially because many developers are reluctant to carefully read API documentation as shown by existing research. As these inconsistencies may indicate defects, researchers have proposed various detection approaches, and these approaches need many known specifications. As it is tedious to write specifications manually for all APIs, various approaches have been proposed to mine specifications automatically. In the literature, most existing mining approaches rely on analyzing client code, so these mining approaches would fail to mine specifications when client code is not sufficient. Instead of analyzing client code, we propose an approach, called Doc2Spec, that infers resource specifications from API documentation in natural languages. We evaluated our approach on the Javadocs of five libraries. The results show that our approach performs well on real scale libraries, and infers various specifications with relatively high precisions, recalls, and F-scores. We further used inferred specifications to detect defects in open source projects. The results show that specifications inferred by Doc2Spec are useful to detect real defects in existing projects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Acharya, M., Xie, T.: Mining API error-handling specifications from source code. In: Proc. Fundamental Approaches to Software Engineering, pp. 370–384 (2009)

    Chapter  Google Scholar 

  • Acharya, M., Xie, T., Pei, J., Xu, J.: Mining API patterns as partial orders from source code: From usage scenarios to specifications. In: Proc. 6th ESEC/FSE, pp. 25–34 (2007)

    Google Scholar 

  • Alur, R., Černý, P., Madhusudan, P., Nam, W.: Synthesis of interface specifications for Java classes. In: Proc. 32nd POPL, pp. 98–109 (2005)

    Google Scholar 

  • Ambriola, V., Gervasi, V.: Processing natural language requirements. In: Proc. 12th ASE, pp. 36–45. IEEE Computer Society, Los Alamitos (1997)

    Google Scholar 

  • Ammons, G., Bodík, R., Larus, J.: Mining specifications. In: Proc. 29th POPL, pp. 4–16 (2002)

    Google Scholar 

  • Anvik, J., Hiew, L., Murphy, G.: Who should fix this bug? In: Proc. 28th ICSE, pp. 361–370 (2006)

    Google Scholar 

  • Arnout, K., Meyer, B.: Uncovering hidden contracts: The .NET example. Computer 36(11), 48–55 (2003)

    Article  Google Scholar 

  • Baum, L., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 164–171 (1970)

  • Buse, R., Weimer, W.: Automatic documentation inference for exceptions. In: Proc. ISSTA, pp. 273–282 (2008)

    Chapter  Google Scholar 

  • Buse, R., Weimer, W.: Automatically documenting program changes. In: Proc. 26th ASE, pp. 33–42 (2010)

    Google Scholar 

  • Chinchor, N.: MUC-7 named entity task definition. In: Proc. 7th MUC (1997)

    Google Scholar 

  • Cohen, W., Sarawagi, S.: Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods. In: Proc. 10th KDD, pp. 89–98 (2004)

    Google Scholar 

  • Dag, J., Regnell, B., Gervasi, V., Brinkkemper, S.: A linguistic-engineering approach to large-scale requirements management. IEEE Softw. 3, 3 (2005)

    Google Scholar 

  • Dagenais, B., Hendren, L.J.: Enabling static analysis for partial Java programs. In: Proc. 23rd OOPSLA, pp. 313–328 (2008)

    Google Scholar 

  • Dekel, U., Herbsleb, J.D.: Reading the documentation of invoked API functions in program comprehension. In: Proc. 17th ICPC, pp. 168–177 (2009a)

    Google Scholar 

  • Dekel, U., Herbsleb, J.D.: Improving API documentation usability with knowledge pushing. In: Proc. 31st ICSE, pp. 320–330 (2009b)

    Google Scholar 

  • Engler, D., Chen, D., Chou, A.: Bugs as inconsistent behavior: A general approach to inferring errors in systems code. In: Proc. 18th SOSP, pp. 57–72 (2001)

    Google Scholar 

  • Fantechi, A., Gnesi, S., Lami, G., Maccari, A.: Applications of linguistic techniques for use case analysis. Requir. Eng. 8(3), 161–170 (2003)

    Article  Google Scholar 

  • Fellbaum, C., et al.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  • Fry, Z., Shepherd, D., Hill, E., Pollock, L., Vijay-Shanker, K.: Analysing source code: looking for useful verb-direct object pairs in all the right places. IET Softw. 2(1), 27–36 (2008)

    Article  Google Scholar 

  • Gabel, M., Su, Z.: Symbolic mining of temporal specifications. In: Proc. 13th ICSE, pp. 51–60 (2008)

    Google Scholar 

  • Gabel, M., Su, Z.: Online inference and enforcement of temporal properties. In: Proc. 32nd ICSE, pp. 15–24 (2010)

    Google Scholar 

  • Gegick, M., Rotella, P., Xie, T.: Identifying security bug reports via text mining: An industrial case study. In: Proc. 7th MSR, pp. 11–20 (2010)

    Google Scholar 

  • Gervasi, V., Zowghi, D.: Reasoning about inconsistencies in natural language requirements. ACM Trans. Softw. Eng. Methodol. 14(3), 277–330 (2005)

    Article  Google Scholar 

  • Goldin, L., Berry, D.: AbstFinder, a prototype natural language text abstraction finder for use in requirements elicitation. Autom. Softw. Eng. 4(4), 375–412 (1997)

    Article  Google Scholar 

  • Gowri, M., Grothoff, C., Chandra, S.: Deriving object typestates in the presence of inter-object references. In: Proc. 20th OOPSLA, pp. 77–96 (2005)

    Google Scholar 

  • Hayes, J., Dekhtyar, A., Sundaram, S.: Advancing candidate link generation for requirements tracing: The study of methods. IEEE Trans. Softw. Eng. 32(1), 4–19 (2006)

    Article  Google Scholar 

  • Henkel, J., Diwan, A.: A tool for writing and debugging algebraic specifications. In: Proc. 26th ICSE, pp. 449–458 (2004)

    Google Scholar 

  • Hirschman, L.: MUC-7 coreference task definition. In: Proc. 7th MUC (1997)

    Google Scholar 

  • Horie, M., Chiba, S.: Tool support for crosscutting concerns of API documentation. In: Proc. 8th AOSD, pp. 97–108 (2010)

    Chapter  Google Scholar 

  • Høst, E.W., Østvold, B.M.: Debugging method names. In: Proc. 23rd ECOOP, pp. 294–317 (2009)

    Google Scholar 

  • Igarashi, A., Kobayashi, N.: Resource usage analysis. ACM Trans. Program. Lang. Syst. 27(2), 264–313 (2005)

    Article  Google Scholar 

  • Jeong, G., Kim, S., Zimmermann, T.: Improving bug triage with bug tossing graphs. In: Proc. 7th ESEC/FSE, pp. 111–120. ACM, New York (2009)

    Google Scholar 

  • Kof, L.: Scenarios: Identifying missing objects and actions by means of computational linguistics. In: Proc. 15th RE, pp. 121–130 (2007)

    Google Scholar 

  • Kremenek, T., Twohey, P., Back, G., Ng, A., Engler, D.: From uncertainty to belief: Inferring the specification within. In: Proc. 7th OSDI, pp. 259–272 (2006)

    Google Scholar 

  • Lee, C., Chen, F., Rosu, G.: Mining parametric specifications. In: Proc. 33rd ICSE, pp. 591–600 (2011)

    Google Scholar 

  • Li, Z., Zhou, Y.: PR-Miner: Automatically extracting implicit programming rules and detecting violations in large software code. In: Proc. ESEC/FSE, pp. 306–315 (2005)

    Chapter  Google Scholar 

  • Livshits, V., Zimmermann, T.: Dynamine: Finding common error patterns by mining software revision histories. In: Proc. ESEC/FSE, pp. 31–40 (2005)

    Google Scholar 

  • Lo, D., Khoo, S.: SMArTIC: towards building an accurate, robust and scalable specification miner. In: Proc. 14th FSE, pp. 265–275 (2006)

    Google Scholar 

  • Lo, D., Maoz, S.: Scenario-based and value-based specification mining: better together. In: Proc. 25th ASE, pp. 387–396 (2010)

    Google Scholar 

  • Lu, S., Park, S., Seo, E., Zhou, Y.: Learning from mistakes – a comprehensive study on real world concurrency bug characteristics. In: Proc. 13th ASPLOS, pp. 329–339 (2008)

    Chapter  Google Scholar 

  • Meziane, F., Athanasakis, N., Ananiadou, S.: Generating natural language specifications from UML class diagrams. Requir. Eng. 13(1), 1–18 (2008)

    Article  Google Scholar 

  • Mikheev, A., Moens, M., Grover, C.: Named entity recognition without gazetteers. In: Proc. 9th EACL, pp. 1–8 (1999)

    Google Scholar 

  • Novick, D., Ward, K.: Why don’t people read the manual. In: Proc. 24th SIGDOC, pp. 11–18 (2006)

    Chapter  Google Scholar 

  • Olson, D.: Advanced Data Mining Techniques. Springer, Berlin (2008)

    MATH  Google Scholar 

  • Padioleau, Y., Tan, L., Zhou, Y.: Listening to programmers—Taxonomies and characteristics of comments in operating system code. In: Proc. 31st ICSE, pp. 331–341 (2009)

    Google Scholar 

  • Perry, E., Sanko, M., Wright, B., Pfaeffle, T.: Oracle 9i JDBC developer’s guide and reference. Technical report, March 2002. http://www.oracle.com

  • Raman, A., Patrick, J.: The sk-strings method for inferring PFSA. In: Proc. Machine Learning Workshop Automata Induction, Grammatical Inference, and Language Acquisition (1997)

    Google Scholar 

  • Ramanathan, M., Grama, A., Jagannathan, S.: Path-sensitive inference of function precedence protocols. In: Proc. 29th ICSE, pp. 240–250 (2007)

    Google Scholar 

  • Rivest, R., Schapire, R.: Inference of finite automata using homing sequences. In: Machine Learning: From Theory to Applications, pp. 51–73 (1993)

    Google Scholar 

  • Robillard, M.P., DeLine, R.: A field study of API learning obstacles. Empir. Softw. Eng. (2011). doi:10.1007/s10664-010-9150-8

    Google Scholar 

  • Runeson, P., Alexandersson, M., Nyholm, O.: Detection of duplicate defect reports using natural language processing. In: Proc. 29th ICSE, pp. 499–510 (2007)

    Google Scholar 

  • Sawyer, P., Rayson, P., Garside, R.: REVERE: Support for requirements synthesis from documents. Inf. Syst. Front. 4(3), 343–353 (2002)

    Article  Google Scholar 

  • Shepherd, D., Fry, Z., Hill, E., Pollock, L., Vijay-Shanker, K.: Using natural language program analysis to locate and understand action-oriented concerns. In: Proc. 6th AOSD, pp. 212–224 (2007)

    Google Scholar 

  • Shi, L., Zhong, H., Xie, T., Li, M.: An empirical study on evolution of API documentation. In: Proc. FASE, pp. 416–431 (2011)

    Google Scholar 

  • Sridhara, G., Hill, E., Muppaneni, D., Pollock, L.L., Vijay-Shanker, K.: Towards automatically generating summary comments for Java methods. In: Proc. 25th ASE, pp. 43–52 (2010)

    Google Scholar 

  • Stylos, J., Faulring, A., Yang, Z., Myers, B.: Improving API documentation using API usage information. In: Proc. IVL/HCC, pp. 119–126 (2009)

    Google Scholar 

  • Tan, L., Yuan, D., Krishna, G., Zhou, Y.: /* iComment: Bugs or Bad Comments?*/. In: Proc. 21st SOSP, pp. 145–158 (2007)

    Google Scholar 

  • Thummalapenta, S., Xie, T.: SpotWeb: Detecting framework hotspots and coldspots via mining open source code on the web. In: Proc. 23rd ASE, pp. 327–336 (2008)

    Google Scholar 

  • Thummalapenta, S., Xie, T.: Mining exception-handling rules as sequence association rules. In: Proc. 31th International Conference on Software Engineering, May 2009, pp. 496–506 (2009a)

    Google Scholar 

  • Thummalapenta, S., Xie, T.: Alattin: Mining alternative patterns for detecting neglected conditions. In: Proc. 24th Automated Software Engineering, pp. 283–294 (2009b)

    Google Scholar 

  • Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13(2), 260–269 (1967)

    Article  MATH  Google Scholar 

  • Wang, X., Zhang, L., Xie, T., Anvik, J., Sun, J.: An approach to detecting duplicate bug reports using natural language and execution information. In: Proc. 30th ICSE, pp. 461–470 (2008)

    Google Scholar 

  • Wasylkowski, A., Zeller, A., Lindig, C.: Detecting object usage anomalies. In: Proc. ESEC/FSE, pp. 35–44 (2007)

    Google Scholar 

  • Weimer, W., Necula, G.: Mining temporal specifications for error detection. In: Proc. TACAS, pp. 461–476 (2005)

    Google Scholar 

  • Whaley, J., Martin, M., Lam, M.: Automatic extraction of object-oriented component interfaces. In: Proc. ISSTA, pp. 218–228 (2002)

    Google Scholar 

  • Williams, C., Hollingsworth, J.: Automatic mining of source code repositories to improve bug finding techniques. IEEE Trans. Softw. Eng. 31(6), 466–480 (2005)

    Article  Google Scholar 

  • Würsch, M., Ghezzi, G., Reif, G., Gall, H.: Supporting developers with natural language queries. In: Proc. 32nd ICSE, pp. 165–174 (2010)

    Google Scholar 

  • Xu, G., Rountev, A.: Precise memory leak detection for Java software using container profiling. In: Proc. 30th ICSE, pp. 151–160 (2008)

    Google Scholar 

  • Yang, J., Evans, D., Bhardwaj, D., Bhat, T., Das, M.: Perracotta: mining temporal API rules from imperfect traces. In: Proc. 28th ICSE, pp. 282–291 (2006)

    Google Scholar 

  • Zhong, H., Zhang, L., Mei, H.: Early filtering of polluting method calls for mining temporal specifications. In: Proc. 15th APSEC, pp. 9–16 (2008a)

    Google Scholar 

  • Zhong, H., Zhang, L., Mei, H.: Inferring specifications of object oriented APIs from API source code. In: Proc. 15th APSEC, pp. 221–228 (2008b)

    Google Scholar 

  • Zhong, H., Xie, T., Zhang, L., Pei, J., Mei, H.: MAPO: Mining and recommending API usage patterns. In: Proc. 23rd ECOOP, pp. 318–343 (2009a)

    Google Scholar 

  • Zhong, H., Zhang, L., Xie, T., Mei, H.: Inferring resource specifications from natural language API documentation. In: Proc. 24th ASE, pp. 307–318 (2009b)

    Google Scholar 

  • Zhou, G., Su, J.: Named entity recognition using an HMM-based chunk tagger. In: Proc. 40th ACL, pp. 473–480 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Zhong.

Additional information

This paper is a revised, expanded version of a paper (Zhong et al. 2009b) presented at the 24th IEEE/ACM International Conference on Automated Software Engineering Conference (ASE 2009), which won the best paper award of the conference and the ACM SIGSOFT distinguished paper award. The work of this paper was done when Hao Zhong was a PhD student with Peking University under the supervision of Prof. Hong Mei, and the revisions over the previous ASE 2009 paper (Zhong et al. 2009b) were done when Hao Zhong became an assistant professor with Chinese Academy of Sciences since 2009.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhong, H., Zhang, L., Xie, T. et al. Inferring specifications for resources from natural language API documentation. Autom Softw Eng 18, 227–261 (2011). https://doi.org/10.1007/s10515-011-0082-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-011-0082-3

Keywords

Navigation