Skip to main content

Automated Information Extraction from Web APIs Documentation

  • Conference paper
Web Information Systems Engineering - WISE 2012 (WISE 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7651))

Included in the following conference series:

Abstract

A fundamental characteristic of Web APIs is the fact that, de facto, providers hardly follow any standard practices while implementing, publishing, and documenting their APIs. As a consequence, the discovery and use of these services by third parties is significantly hampered. In order to achieve further automation while exploiting Web APIs we present an approach for automatically extracting relevant technical information from the Web pages documenting them. In particular we have devised two algorithms that automatically extract technical details such as operation names, operation descriptions or URI templates from the documentation of Web APIs adopting either RPC or RESTful interfaces. The algorithms devised, which exploit advanced DOM processing as well as state of the art Information Extraction and Natural Language Processing techniques, have been evaluated against a detailed dataset exhibiting a high precision and recall–around 90% for both REST and RPC APIs–outperforming state of the art information extraction algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fielding, R.T.: Architectural Styles and the Design of Network-based Software Architectures. PhD thesis, University of California, Irvine (2000)

    Google Scholar 

  2. Erl, T.: SOA Principles of Service Design. The Prentice Hall Service-Oriented Computing Series. Prentice Hall (July 2007)

    Google Scholar 

  3. Pedrinaci, C., Domingue, J.: Toward the Next Wave of Services: Linked Services for the Web of Data. Journal of Universal Computer Science 16(13), 1694–1719 (2010)

    Google Scholar 

  4. Maleshkova, M., Pedrinaci, C., Domingue, J.: Investigating Web APIs on the World Wide Web. In: European Conference on Web Services (ECOWS), Ayia Napa, Cyprus (2010)

    Google Scholar 

  5. Lin, C., He, Y., Pedrinaci, C., Domingue, J.: Feature lda: a supervised topic model for automatic detection of web api documentations from the web. In: The 11th International Semantic Web Conference (ISWC), Boston, USA (2012)

    Google Scholar 

  6. Pedrinaci, C., Domingue, J., Sheth, A.: Semantic Web Services. In: Handbook on Semantic Web Technologies. Semantic Web Applications. Springer (2010)

    Google Scholar 

  7. Richardson, L., Ruby, S.: RESTful Web Services. O’Reilly Media, Inc. (May 2007)

    Google Scholar 

  8. Sheth, A., Gomadam, K., Lathem, J.: SA-REST: Semantically Interoperable and Easier-to-Use Services and Mashups. IEEE Internet Computing 11(6), 91–94 (2007)

    Article  Google Scholar 

  9. Kopecky, J., Vitvar, T., Pedrinaci, C., Maleshkova, M.: RESTful Services with Lightweight Machine-readable Descriptions and Semantic Annotations. In: Wilde, E., Pautasso, C. (eds.) REST: From Research to Practice. Springer (2011)

    Google Scholar 

  10. Gomadam, K., Ranabahu, A., Nagarajan, M., Sheth, A.P., Verma, K.: A faceted classification based approach to search and rank web apis. In: ICWS 2008: Proceedings of the 2008 IEEE International Conference on Web Services, pp. 177–184. IEEE Computer Society, Washington, DC (2008)

    Google Scholar 

  11. Steinmetz, N., Lausen, H., Brunner, M.: Web Service Search on Large Scale. In: Baresi, L., Chi, C.-H., Suzuki, J. (eds.) ICSOC-ServiceWave 2009. LNCS, vol. 5900, pp. 437–444. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  12. Lin, S., Ho, J.: Discovering informative content blocks from Web documents. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 588–593 (2002)

    Google Scholar 

  13. Debnath, S., Mitra, P., Pal, N.: Automatic Identification of Informative Sections of Web Pages. IEEE Transactions on Knowledge and Data Engineering 17(9) (2005)

    Google Scholar 

  14. Chakrabarti, D., Kumar, R., Punera, K.: Page-level template detection via isotonic smoothing. In: Proceedings of the 16th International Conference on World Wide Web, pp. 61–70 (2007)

    Google Scholar 

  15. Hammer, J., Garcia-Molina, H., Cho, J., Aranha, R., Crespo, A.: Extracting Semistructured Information from the Web. In: Proceedings of the Workshop on Management of Semistructured Data (May 1997)

    Google Scholar 

  16. Cai, D., Yu, S., Wen, J.: Vips: a visionbased page segmentation algorithm. Technical Report MSR-TR-2003-79, Microsoft Research (2003)

    Google Scholar 

  17. Liu, Y., Wang, Q., Wang, Q., Liu, Y., Wei, L.: An Adaptive Scoring Method for Block Importance Learning. In: IEEE/WIC/ACM International Conference on Web Intelligence, WI 2006, pp. 761–764 (2006)

    Google Scholar 

  18. Wan, X., Yang, J., Xiao, J.: Block-based similarity search on the Web using manifold-ranking. In: Semantic Web: Research and Applications, Proceedings, Peking Univ, Inst Comp Sci & Technol, Beijing 100871, Peoples R China, pp. 60–71 (2006)

    Google Scholar 

  19. Kang, J., Yang, J., Choi, J.: Repetition-based Web Page Segmentation by Detecting Tag Patterns for Small-Screen Devices. IEEE Transaction on Consumer Electronics 56(2) (May 2010)

    Google Scholar 

  20. Vineel, G.: Web page DOM node characterization and its application to page segmentation. In: 2009 IEEE International Conference on Internet Multimedia Services Architecture and Applications (IMSAA), pp. 1–6 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ly, P.A., Pedrinaci, C., Domingue, J. (2012). Automated Information Extraction from Web APIs Documentation. In: Wang, X.S., Cruz, I., Delis, A., Huang, G. (eds) Web Information Systems Engineering - WISE 2012. WISE 2012. Lecture Notes in Computer Science, vol 7651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35063-4_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35063-4_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35062-7

  • Online ISBN: 978-3-642-35063-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics