Abstract
A fundamental characteristic of Web APIs is the fact that, de facto, providers hardly follow any standard practices while implementing, publishing, and documenting their APIs. As a consequence, the discovery and use of these services by third parties is significantly hampered. In order to achieve further automation while exploiting Web APIs we present an approach for automatically extracting relevant technical information from the Web pages documenting them. In particular we have devised two algorithms that automatically extract technical details such as operation names, operation descriptions or URI templates from the documentation of Web APIs adopting either RPC or RESTful interfaces. The algorithms devised, which exploit advanced DOM processing as well as state of the art Information Extraction and Natural Language Processing techniques, have been evaluated against a detailed dataset exhibiting a high precision and recall–around 90% for both REST and RPC APIs–outperforming state of the art information extraction algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fielding, R.T.: Architectural Styles and the Design of Network-based Software Architectures. PhD thesis, University of California, Irvine (2000)
Erl, T.: SOA Principles of Service Design. The Prentice Hall Service-Oriented Computing Series. Prentice Hall (July 2007)
Pedrinaci, C., Domingue, J.: Toward the Next Wave of Services: Linked Services for the Web of Data. Journal of Universal Computer Science 16(13), 1694–1719 (2010)
Maleshkova, M., Pedrinaci, C., Domingue, J.: Investigating Web APIs on the World Wide Web. In: European Conference on Web Services (ECOWS), Ayia Napa, Cyprus (2010)
Lin, C., He, Y., Pedrinaci, C., Domingue, J.: Feature lda: a supervised topic model for automatic detection of web api documentations from the web. In: The 11th International Semantic Web Conference (ISWC), Boston, USA (2012)
Pedrinaci, C., Domingue, J., Sheth, A.: Semantic Web Services. In: Handbook on Semantic Web Technologies. Semantic Web Applications. Springer (2010)
Richardson, L., Ruby, S.: RESTful Web Services. O’Reilly Media, Inc. (May 2007)
Sheth, A., Gomadam, K., Lathem, J.: SA-REST: Semantically Interoperable and Easier-to-Use Services and Mashups. IEEE Internet Computing 11(6), 91–94 (2007)
Kopecky, J., Vitvar, T., Pedrinaci, C., Maleshkova, M.: RESTful Services with Lightweight Machine-readable Descriptions and Semantic Annotations. In: Wilde, E., Pautasso, C. (eds.) REST: From Research to Practice. Springer (2011)
Gomadam, K., Ranabahu, A., Nagarajan, M., Sheth, A.P., Verma, K.: A faceted classification based approach to search and rank web apis. In: ICWS 2008: Proceedings of the 2008 IEEE International Conference on Web Services, pp. 177–184. IEEE Computer Society, Washington, DC (2008)
Steinmetz, N., Lausen, H., Brunner, M.: Web Service Search on Large Scale. In: Baresi, L., Chi, C.-H., Suzuki, J. (eds.) ICSOC-ServiceWave 2009. LNCS, vol. 5900, pp. 437–444. Springer, Heidelberg (2009)
Lin, S., Ho, J.: Discovering informative content blocks from Web documents. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 588–593 (2002)
Debnath, S., Mitra, P., Pal, N.: Automatic Identification of Informative Sections of Web Pages. IEEE Transactions on Knowledge and Data Engineering 17(9) (2005)
Chakrabarti, D., Kumar, R., Punera, K.: Page-level template detection via isotonic smoothing. In: Proceedings of the 16th International Conference on World Wide Web, pp. 61–70 (2007)
Hammer, J., Garcia-Molina, H., Cho, J., Aranha, R., Crespo, A.: Extracting Semistructured Information from the Web. In: Proceedings of the Workshop on Management of Semistructured Data (May 1997)
Cai, D., Yu, S., Wen, J.: Vips: a visionbased page segmentation algorithm. Technical Report MSR-TR-2003-79, Microsoft Research (2003)
Liu, Y., Wang, Q., Wang, Q., Liu, Y., Wei, L.: An Adaptive Scoring Method for Block Importance Learning. In: IEEE/WIC/ACM International Conference on Web Intelligence, WI 2006, pp. 761–764 (2006)
Wan, X., Yang, J., Xiao, J.: Block-based similarity search on the Web using manifold-ranking. In: Semantic Web: Research and Applications, Proceedings, Peking Univ, Inst Comp Sci & Technol, Beijing 100871, Peoples R China, pp. 60–71 (2006)
Kang, J., Yang, J., Choi, J.: Repetition-based Web Page Segmentation by Detecting Tag Patterns for Small-Screen Devices. IEEE Transaction on Consumer Electronics 56(2) (May 2010)
Vineel, G.: Web page DOM node characterization and its application to page segmentation. In: 2009 IEEE International Conference on Internet Multimedia Services Architecture and Applications (IMSAA), pp. 1–6 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ly, P.A., Pedrinaci, C., Domingue, J. (2012). Automated Information Extraction from Web APIs Documentation. In: Wang, X.S., Cruz, I., Delis, A., Huang, G. (eds) Web Information Systems Engineering - WISE 2012. WISE 2012. Lecture Notes in Computer Science, vol 7651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35063-4_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-35063-4_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35062-7
Online ISBN: 978-3-642-35063-4
eBook Packages: Computer ScienceComputer Science (R0)