Inferring specifications for resources from natural language API documentation

Zhong, Hao; Zhang, Lu; Xie, Tao; Mei, Hong

doi:10.1007/s10515-011-0082-3

Inferring specifications for resources from natural language API documentation

Published: 09 April 2011

Volume 18, pages 227–261, (2011)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Hao Zhong¹,
Lu Zhang^2,3,
Tao Xie⁴ &
…
Hong Mei^2,3

324 Accesses
14 Citations
Explore all metrics

Abstract

Many software libraries, especially those commercial ones, provide API documentation in natural languages to describe correct API usages. However, developers may still write code that is inconsistent with API documentation, partially because many developers are reluctant to carefully read API documentation as shown by existing research. As these inconsistencies may indicate defects, researchers have proposed various detection approaches, and these approaches need many known specifications. As it is tedious to write specifications manually for all APIs, various approaches have been proposed to mine specifications automatically. In the literature, most existing mining approaches rely on analyzing client code, so these mining approaches would fail to mine specifications when client code is not sufficient. Instead of analyzing client code, we propose an approach, called Doc2Spec, that infers resource specifications from API documentation in natural languages. We evaluated our approach on the Javadocs of five libraries. The results show that our approach performs well on real scale libraries, and infers various specifications with relatively high precisions, recalls, and F-scores. We further used inferred specifications to detect defects in open source projects. The results show that specifications inferred by Doc2Spec are useful to detect real defects in existing projects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Open Source Software Documentation Mining for Quality Assessment

An analysis of correctness for API recommendation: are the unmatched results useless?

Article 13 August 2020

KG2Code: Correct Code Examples Mining Service Based on Knowledge Graph for Fixing API Misuses

References

Acharya, M., Xie, T.: Mining API error-handling specifications from source code. In: Proc. Fundamental Approaches to Software Engineering, pp. 370–384 (2009)
Chapter Google Scholar
Acharya, M., Xie, T., Pei, J., Xu, J.: Mining API patterns as partial orders from source code: From usage scenarios to specifications. In: Proc. 6th ESEC/FSE, pp. 25–34 (2007)
Google Scholar
Alur, R., Černý, P., Madhusudan, P., Nam, W.: Synthesis of interface specifications for Java classes. In: Proc. 32nd POPL, pp. 98–109 (2005)
Google Scholar
Ambriola, V., Gervasi, V.: Processing natural language requirements. In: Proc. 12th ASE, pp. 36–45. IEEE Computer Society, Los Alamitos (1997)
Google Scholar
Ammons, G., Bodík, R., Larus, J.: Mining specifications. In: Proc. 29th POPL, pp. 4–16 (2002)
Google Scholar
Anvik, J., Hiew, L., Murphy, G.: Who should fix this bug? In: Proc. 28th ICSE, pp. 361–370 (2006)
Google Scholar
Arnout, K., Meyer, B.: Uncovering hidden contracts: The .NET example. Computer 36(11), 48–55 (2003)
Article Google Scholar
Baum, L., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 164–171 (1970)
Buse, R., Weimer, W.: Automatic documentation inference for exceptions. In: Proc. ISSTA, pp. 273–282 (2008)
Chapter Google Scholar
Buse, R., Weimer, W.: Automatically documenting program changes. In: Proc. 26th ASE, pp. 33–42 (2010)
Google Scholar
Chinchor, N.: MUC-7 named entity task definition. In: Proc. 7th MUC (1997)
Google Scholar
Cohen, W., Sarawagi, S.: Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods. In: Proc. 10th KDD, pp. 89–98 (2004)
Google Scholar
Dag, J., Regnell, B., Gervasi, V., Brinkkemper, S.: A linguistic-engineering approach to large-scale requirements management. IEEE Softw. 3, 3 (2005)
Google Scholar
Dagenais, B., Hendren, L.J.: Enabling static analysis for partial Java programs. In: Proc. 23rd OOPSLA, pp. 313–328 (2008)
Google Scholar
Dekel, U., Herbsleb, J.D.: Reading the documentation of invoked API functions in program comprehension. In: Proc. 17th ICPC, pp. 168–177 (2009a)
Google Scholar
Dekel, U., Herbsleb, J.D.: Improving API documentation usability with knowledge pushing. In: Proc. 31st ICSE, pp. 320–330 (2009b)
Google Scholar
Engler, D., Chen, D., Chou, A.: Bugs as inconsistent behavior: A general approach to inferring errors in systems code. In: Proc. 18th SOSP, pp. 57–72 (2001)
Google Scholar
Fantechi, A., Gnesi, S., Lami, G., Maccari, A.: Applications of linguistic techniques for use case analysis. Requir. Eng. 8(3), 161–170 (2003)
Article Google Scholar
Fellbaum, C., et al.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Fry, Z., Shepherd, D., Hill, E., Pollock, L., Vijay-Shanker, K.: Analysing source code: looking for useful verb-direct object pairs in all the right places. IET Softw. 2(1), 27–36 (2008)
Article Google Scholar
Gabel, M., Su, Z.: Symbolic mining of temporal specifications. In: Proc. 13th ICSE, pp. 51–60 (2008)
Google Scholar
Gabel, M., Su, Z.: Online inference and enforcement of temporal properties. In: Proc. 32nd ICSE, pp. 15–24 (2010)
Google Scholar
Gegick, M., Rotella, P., Xie, T.: Identifying security bug reports via text mining: An industrial case study. In: Proc. 7th MSR, pp. 11–20 (2010)
Google Scholar
Gervasi, V., Zowghi, D.: Reasoning about inconsistencies in natural language requirements. ACM Trans. Softw. Eng. Methodol. 14(3), 277–330 (2005)
Article Google Scholar
Goldin, L., Berry, D.: AbstFinder, a prototype natural language text abstraction finder for use in requirements elicitation. Autom. Softw. Eng. 4(4), 375–412 (1997)
Article Google Scholar
Gowri, M., Grothoff, C., Chandra, S.: Deriving object typestates in the presence of inter-object references. In: Proc. 20th OOPSLA, pp. 77–96 (2005)
Google Scholar
Hayes, J., Dekhtyar, A., Sundaram, S.: Advancing candidate link generation for requirements tracing: The study of methods. IEEE Trans. Softw. Eng. 32(1), 4–19 (2006)
Article Google Scholar
Henkel, J., Diwan, A.: A tool for writing and debugging algebraic specifications. In: Proc. 26th ICSE, pp. 449–458 (2004)
Google Scholar
Hirschman, L.: MUC-7 coreference task definition. In: Proc. 7th MUC (1997)
Google Scholar
Horie, M., Chiba, S.: Tool support for crosscutting concerns of API documentation. In: Proc. 8th AOSD, pp. 97–108 (2010)
Chapter Google Scholar
Høst, E.W., Østvold, B.M.: Debugging method names. In: Proc. 23rd ECOOP, pp. 294–317 (2009)
Google Scholar
Igarashi, A., Kobayashi, N.: Resource usage analysis. ACM Trans. Program. Lang. Syst. 27(2), 264–313 (2005)
Article Google Scholar
Jeong, G., Kim, S., Zimmermann, T.: Improving bug triage with bug tossing graphs. In: Proc. 7th ESEC/FSE, pp. 111–120. ACM, New York (2009)
Google Scholar
Kof, L.: Scenarios: Identifying missing objects and actions by means of computational linguistics. In: Proc. 15th RE, pp. 121–130 (2007)
Google Scholar
Kremenek, T., Twohey, P., Back, G., Ng, A., Engler, D.: From uncertainty to belief: Inferring the specification within. In: Proc. 7th OSDI, pp. 259–272 (2006)
Google Scholar
Lee, C., Chen, F., Rosu, G.: Mining parametric specifications. In: Proc. 33rd ICSE, pp. 591–600 (2011)
Google Scholar
Li, Z., Zhou, Y.: PR-Miner: Automatically extracting implicit programming rules and detecting violations in large software code. In: Proc. ESEC/FSE, pp. 306–315 (2005)
Chapter Google Scholar
Livshits, V., Zimmermann, T.: Dynamine: Finding common error patterns by mining software revision histories. In: Proc. ESEC/FSE, pp. 31–40 (2005)
Google Scholar
Lo, D., Khoo, S.: SMArTIC: towards building an accurate, robust and scalable specification miner. In: Proc. 14th FSE, pp. 265–275 (2006)
Google Scholar
Lo, D., Maoz, S.: Scenario-based and value-based specification mining: better together. In: Proc. 25th ASE, pp. 387–396 (2010)
Google Scholar
Lu, S., Park, S., Seo, E., Zhou, Y.: Learning from mistakes – a comprehensive study on real world concurrency bug characteristics. In: Proc. 13th ASPLOS, pp. 329–339 (2008)
Chapter Google Scholar
Meziane, F., Athanasakis, N., Ananiadou, S.: Generating natural language specifications from UML class diagrams. Requir. Eng. 13(1), 1–18 (2008)
Article Google Scholar
Mikheev, A., Moens, M., Grover, C.: Named entity recognition without gazetteers. In: Proc. 9th EACL, pp. 1–8 (1999)
Google Scholar
Novick, D., Ward, K.: Why don’t people read the manual. In: Proc. 24th SIGDOC, pp. 11–18 (2006)
Chapter Google Scholar
Olson, D.: Advanced Data Mining Techniques. Springer, Berlin (2008)
MATH Google Scholar
Padioleau, Y., Tan, L., Zhou, Y.: Listening to programmers—Taxonomies and characteristics of comments in operating system code. In: Proc. 31st ICSE, pp. 331–341 (2009)
Google Scholar
Perry, E., Sanko, M., Wright, B., Pfaeffle, T.: Oracle 9i JDBC developer’s guide and reference. Technical report, March 2002. http://www.oracle.com
Raman, A., Patrick, J.: The sk-strings method for inferring PFSA. In: Proc. Machine Learning Workshop Automata Induction, Grammatical Inference, and Language Acquisition (1997)
Google Scholar
Ramanathan, M., Grama, A., Jagannathan, S.: Path-sensitive inference of function precedence protocols. In: Proc. 29th ICSE, pp. 240–250 (2007)
Google Scholar
Rivest, R., Schapire, R.: Inference of finite automata using homing sequences. In: Machine Learning: From Theory to Applications, pp. 51–73 (1993)
Google Scholar
Robillard, M.P., DeLine, R.: A field study of API learning obstacles. Empir. Softw. Eng. (2011). doi:10.1007/s10664-010-9150-8
Google Scholar
Runeson, P., Alexandersson, M., Nyholm, O.: Detection of duplicate defect reports using natural language processing. In: Proc. 29th ICSE, pp. 499–510 (2007)
Google Scholar
Sawyer, P., Rayson, P., Garside, R.: REVERE: Support for requirements synthesis from documents. Inf. Syst. Front. 4(3), 343–353 (2002)
Article Google Scholar
Shepherd, D., Fry, Z., Hill, E., Pollock, L., Vijay-Shanker, K.: Using natural language program analysis to locate and understand action-oriented concerns. In: Proc. 6th AOSD, pp. 212–224 (2007)
Google Scholar
Shi, L., Zhong, H., Xie, T., Li, M.: An empirical study on evolution of API documentation. In: Proc. FASE, pp. 416–431 (2011)
Google Scholar
Sridhara, G., Hill, E., Muppaneni, D., Pollock, L.L., Vijay-Shanker, K.: Towards automatically generating summary comments for Java methods. In: Proc. 25th ASE, pp. 43–52 (2010)
Google Scholar
Stylos, J., Faulring, A., Yang, Z., Myers, B.: Improving API documentation using API usage information. In: Proc. IVL/HCC, pp. 119–126 (2009)
Google Scholar
Tan, L., Yuan, D., Krishna, G., Zhou, Y.: /* iComment: Bugs or Bad Comments?*/. In: Proc. 21st SOSP, pp. 145–158 (2007)
Google Scholar
Thummalapenta, S., Xie, T.: SpotWeb: Detecting framework hotspots and coldspots via mining open source code on the web. In: Proc. 23rd ASE, pp. 327–336 (2008)
Google Scholar
Thummalapenta, S., Xie, T.: Mining exception-handling rules as sequence association rules. In: Proc. 31th International Conference on Software Engineering, May 2009, pp. 496–506 (2009a)
Google Scholar
Thummalapenta, S., Xie, T.: Alattin: Mining alternative patterns for detecting neglected conditions. In: Proc. 24th Automated Software Engineering, pp. 283–294 (2009b)
Google Scholar
Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13(2), 260–269 (1967)
Article MATH Google Scholar
Wang, X., Zhang, L., Xie, T., Anvik, J., Sun, J.: An approach to detecting duplicate bug reports using natural language and execution information. In: Proc. 30th ICSE, pp. 461–470 (2008)
Google Scholar
Wasylkowski, A., Zeller, A., Lindig, C.: Detecting object usage anomalies. In: Proc. ESEC/FSE, pp. 35–44 (2007)
Google Scholar
Weimer, W., Necula, G.: Mining temporal specifications for error detection. In: Proc. TACAS, pp. 461–476 (2005)
Google Scholar
Whaley, J., Martin, M., Lam, M.: Automatic extraction of object-oriented component interfaces. In: Proc. ISSTA, pp. 218–228 (2002)
Google Scholar
Williams, C., Hollingsworth, J.: Automatic mining of source code repositories to improve bug finding techniques. IEEE Trans. Softw. Eng. 31(6), 466–480 (2005)
Article Google Scholar
Würsch, M., Ghezzi, G., Reif, G., Gall, H.: Supporting developers with natural language queries. In: Proc. 32nd ICSE, pp. 165–174 (2010)
Google Scholar
Xu, G., Rountev, A.: Precise memory leak detection for Java software using container profiling. In: Proc. 30th ICSE, pp. 151–160 (2008)
Google Scholar
Yang, J., Evans, D., Bhardwaj, D., Bhat, T., Das, M.: Perracotta: mining temporal API rules from imperfect traces. In: Proc. 28th ICSE, pp. 282–291 (2006)
Google Scholar
Zhong, H., Zhang, L., Mei, H.: Early filtering of polluting method calls for mining temporal specifications. In: Proc. 15th APSEC, pp. 9–16 (2008a)
Google Scholar
Zhong, H., Zhang, L., Mei, H.: Inferring specifications of object oriented APIs from API source code. In: Proc. 15th APSEC, pp. 221–228 (2008b)
Google Scholar
Zhong, H., Xie, T., Zhang, L., Pei, J., Mei, H.: MAPO: Mining and recommending API usage patterns. In: Proc. 23rd ECOOP, pp. 318–343 (2009a)
Google Scholar
Zhong, H., Zhang, L., Xie, T., Mei, H.: Inferring resource specifications from natural language API documentation. In: Proc. 24th ASE, pp. 307–318 (2009b)
Google Scholar
Zhou, G., Su, J.: Named entity recognition using an HMM-based chunk tagger. In: Proc. 40th ACL, pp. 473–480 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Internet Software Technologies, Institute of Software, Chinese Academy of Sciences, Beijing, China
Hao Zhong
School of Electronics Engineering and Computer Science, Peking University, Beijing, China
Lu Zhang & Hong Mei
The Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, Beijing, China
Lu Zhang & Hong Mei
Department of Computer Science, North Carolina State University, Raleigh, USA
Tao Xie

Authors

Hao Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Lu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Xie
View author publications
You can also search for this author in PubMed Google Scholar
Hong Mei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hao Zhong.

Additional information

This paper is a revised, expanded version of a paper (Zhong et al. 2009b) presented at the 24th IEEE/ACM International Conference on Automated Software Engineering Conference (ASE 2009), which won the best paper award of the conference and the ACM SIGSOFT distinguished paper award. The work of this paper was done when Hao Zhong was a PhD student with Peking University under the supervision of Prof. Hong Mei, and the revisions over the previous ASE 2009 paper (Zhong et al. 2009b) were done when Hao Zhong became an assistant professor with Chinese Academy of Sciences since 2009.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhong, H., Zhang, L., Xie, T. et al. Inferring specifications for resources from natural language API documentation. Autom Softw Eng 18, 227–261 (2011). https://doi.org/10.1007/s10515-011-0082-3

Download citation

Received: 14 June 2010
Accepted: 23 March 2011
Published: 09 April 2011
Issue Date: December 2011
DOI: https://doi.org/10.1007/s10515-011-0082-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inferring specifications for resources from natural language API documentation

Abstract

Access this article

Similar content being viewed by others

Open Source Software Documentation Mining for Quality Assessment

An analysis of correctness for API recommendation: are the unmatched results useless?

KG2Code: Correct Code Examples Mining Service Based on Knowledge Graph for Fixing API Misuses

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Inferring specifications for resources from natural language API documentation

Abstract

Access this article

Similar content being viewed by others

Open Source Software Documentation Mining for Quality Assessment

An analysis of correctness for API recommendation: are the unmatched results useless?

KG2Code: Correct Code Examples Mining Service Based on Knowledge Graph for Fixing API Misuses

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation