Skip to main content

Automated Web Service Specification Generation Through a Transformation-Based Learning

Part of the Lecture Notes in Computer Science book series (LNPSE,volume 12409)


Web Application Programming Interface (API) allows third-party and subscribed users to access data and functions of a software application through the network or the Internet. Web APIs expose data and functions to the public users, authorized users or enterprise users. Web API providers publish API documentations to help users to understand how to interact with web-based API services, and how to use the APIs in their integration systems. The exponential raise of the number of public web service APIs may cause a challenge for software engineers to choose an efficient API. The challenge may become more complicated when web APIs updated regularly by API providers. In this paper, we introduce a novel transformation-based approach which crawls the web to collect web API documentations (unstructured documents). It generates a web API Language model from API documentations, employs different machine learning algorithms to extract information and produces a structured web API specification that compliant to Open API Specification (OAS) format. The proposed approach improves information extraction patterns and learns the variety of structured and terminologies. In our experiment, we collect a sheer number of web API documentations. Our evaluation shows that the proposed approach find RESTful API documentations with 75% accuracy, constructs API endpoints with 84%, constructs endpoint attributes with 95%, and assigns endpoints to attributes with an accuracy 98%. The proposed approach were able to produces more than 2,311 OAS web API Specifications.


  • Web API service
  • Natural language processing
  • Machine learning

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. 1.


  2. 2.


  3. 3.

  4. 4.

    Available at:


  1. Abney, S.: Semisupervised Learning for Computational Linguistics. Chapman and Hall/CRC, Boca Raton (2007)

    CrossRef  Google Scholar 

  2. Automation, S.B.: Selenium ide (2014)

    Google Scholar 

  3. Bahrami, M., Chen, W.P.: WATAPI: composing web API specification from API documentations through an intelligent and interactive annotation tool. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 4573–4578. IEEE (2019)

    Google Scholar 

  4. Bahrami, M., Park, J., Liu, L., Chen, W.P.: API learning: applying machine learning to manage the rise of API economy. In: Companion Proceedings of the The Web Conference 2018, pp. 151–154 (2018)

    Google Scholar 

  5. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)

    Google Scholar 

  6. Choudhary, S., Thomas, I., Bahrami, M., Sumioka, M.: Accelerating the digital transformation of business and society through composite business ecosystems. In: Barolli, L., Takizawa, M., Xhafa, F., Enokido, T. (eds.) AINA 2019. AISC, vol. 926, pp. 419–430. Springer, Cham (2020).

    CrossRef  Google Scholar 

  7. Cremaschi, M., De Paoli, F.: Toward automatic semantic API descriptions to support services composition. In: De Paoli, F., Schulte, S., Broch Johnsen, E. (eds.) ESOCC 2017. LNCS, vol. 10465, pp. 159–167. Springer, Cham (2017).

    CrossRef  Google Scholar 

  8. Dehak, N., Dehak, R., Glass, J.R., Reynolds, D.A., Kenny, P.: Cosine similarity scoring without score normalization techniques. In: Odyssey, p. 15 (2010)

    Google Scholar 

  9. Goldberg, Y., Levy, O.: Word2vec explained: deriving Mikolov et al’.s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014)

  10. Gu, X., Zhang, H., Zhang, D., Kim, S.: Deep API learning. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 631–642. ACM (2016)

    Google Scholar 

  11. Hou, L., Zhao, S., Li, X., Chatzimisios, P., Zheng, K.: Design and implementation of application programming interface for internet of things cloud. Int. J. Netw. Manag. 27(3), e1936 (2017)

    Google Scholar 

  12. Li, Y., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., Jagadish, H.: Regular expression learning for information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 21–30. Association for Computational Linguistics (2008)

    Google Scholar 

  13. Masse, M.: REST API Design Rulebook: Designing Consistent RESTful Web Service Interfaces. O’Reilly Media, Inc., Sebastopol (2011)

    Google Scholar 

  14. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  15. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  16. Myers, B.A., Stylos, J.: Improving API usability. Commun. ACM 59(6), 62–69 (2016)

    CrossRef  Google Scholar 

  17. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)

    Google Scholar 

  18. Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora. TLTB, vol. 11, pp. 157–176. Springer, Dordrecht (1999).

    CrossRef  Google Scholar 

  19. Rehurek, R., Sojka, P.: Gensim-python framework for vector space modelling. NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic, vol. 3, no. 2 (2011)

    Google Scholar 

  20. Robillard, M.P., Deline, R.: A field study of API learning obstacles. Empir. Softw. Eng. 16(6), 703–732 (2011)

    CrossRef  Google Scholar 

  21. Rong, X.: Word2vec parameter learning explained. arXiv preprint arXiv:1411.2738 (2014)

  22. Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017)

    CrossRef  MathSciNet  Google Scholar 

  23. Thomas, R., et al.: Architectural styles and the design of network-based software architectures. University of California, Irvine (2000)

    Google Scholar 

  24. Yang, J., Wittern, E., Ying, A.T., Dolby, J., Tan, L.: Automatically extracting web API specifications from HTML documentation. arXiv preprint arXiv:1801.08928 (2018)

  25. Zhong, H., Zhang, L., Xie, T., Mei, H.: Inferring resource specifications from natural language API documentation. In: Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, pp. 307–318 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Mehdi Bahrami .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bahrami, M., Chen, WP. (2020). Automated Web Service Specification Generation Through a Transformation-Based Learning. In: Wang, Q., Xia, Y., Seshadri, S., Zhang, LJ. (eds) Services Computing – SCC 2020. SCC 2020. Lecture Notes in Computer Science(), vol 12409. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59591-3

  • Online ISBN: 978-3-030-59592-0

  • eBook Packages: Computer ScienceComputer Science (R0)