Feature LDA: A Supervised Topic Model for Automatic Detection of Web API Documentations from the Web

  • Chenghua Lin
  • Yulan He
  • Carlos Pedrinaci
  • John Domingue
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7649)


Web APIs have gained increasing popularity in recent Web service technology development owing to its simplicity of technology stack and the proliferation of mashups. However, efficiently discovering Web APIs and the relevant documentations on the Web is still a challenging task even with the best resources available on the Web. In this paper we cast the problem of detecting the Web API documentations as a text classification problem of classifying a given Web page as Web API associated or not. We propose a supervised generative topic model called feature latent Dirichlet allocation (feaLDA) which offers a generic probabilistic framework for automatic detection of Web APIs. feaLDA not only captures the correspondence between data and the associated class labels, but also provides a mechanism for incorporating side information such as labelled features automatically learned from data that can effectively help improving classification performance. Extensive experiments on our Web APIs documentation dataset shows that the feaLDA model outperforms three strong supervised baselines including naive Bayes, support vector machines, and the maximum entropy model, by over 3% in classification accuracy. In addition, feaLDA also gives superior performance when compared against other existing supervised topic models.


  1. 1.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  2. 2.
    Blei, D.M., McAuliffe, J.D.: Supervised topic models. Arxiv preprint arXiv:1003.0783 (2010)Google Scholar
  3. 3.
    Erl, T.: SOA Principles of Service Design. The Prentice Hall Service-Oriented Computing Series. Prentice Hall (2007)Google Scholar
  4. 4.
    Erosheva, E., Fienberg, S., Lafferty, J.: Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences of the United States of America 101(suppl. 1), 5220 (2004)CrossRefGoogle Scholar
  5. 5.
    Gomadam, K., Ranabahu, A., Nagarajan, M., Sheth, A.P., Verma, K.: A faceted classification based approach to search and rank web apis. In: Proceedings of ICWS, pp. 177–184 (2008)Google Scholar
  6. 6.
    Hadley, M.: Web Application Description Language. Member submission, W3C (2009)Google Scholar
  7. 7.
    Kopecký, J., Gomadam, K., Vitvar, T.: hRESTS: an HTML Microformat for Describing RESTful Web Services. In: Proceedings of the International Conference on Web Intelligence (2008)Google Scholar
  8. 8.
    Lacoste-Julien, S., Sha, F., Jordan, M.I.: DiscLDA: Discriminative learning for dimensionality reduction and classification. In: NIPS, vol. 21 (2008)Google Scholar
  9. 9.
    Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceedings of Conference on Information and Knowledge Management, CIKM (2009)Google Scholar
  10. 10.
    Lin, C., He, Y., Everson, R., Rüger, S.: Weakly-Supervised Joint Sentiment-Topic Detection from Text. IEEE Transactions on Knowledge and Data Engineering, TKDE (2011)Google Scholar
  11. 11.
    Maleshkova, M., Pedrinaci, C., Domingue, J.: Investigating web apis on the world wide web. In: European Conference on Web Services, pp. 107–114 (2010)Google Scholar
  12. 12.
    McCallum, A., Corrada-Emmanuel, A., Wang, X.: Topic and role discovery in social networks. In: Proceedings of IJCAI, pp. 786–791 (2005)Google Scholar
  13. 13.
    Mimno, D., McCallum, A.: Topic models conditioned on arbitrary features with dirichlet-multinomial regression. In: Uncertainty in Artificial Intelligence, pp. 411–418. Citeseer (2008)Google Scholar
  14. 14.
    Minka, T.: Estimating a Dirichlet distribution. Technical report, MIT (2003)Google Scholar
  15. 15.
    Pedrinaci, C., Domingue, J.: Toward the next wave of services: linked services for the web of data. Journal of Universal Computer Science 16(13), 1694–1719 (2010)Google Scholar
  16. 16.
    Pedrinaci, C., Liu, D., Lin, C., Domingue, J.: Harnessing the crowds for automating the identification of web apis. In: Intelligent Web Services Meet Social Computing at AAAI Spring Symposium (2012)Google Scholar
  17. 17.
    Pedrinaci, C., Domingue, J., Sheth, A.: Semantic Web Services. In: Handbook on Semantic Web Technologies. Springer (2010)Google Scholar
  18. 18.
    Pilioura, T., Tsalgatidou, A.: Unified Publication and Discovery of Semantic Web Services. ACM Trans. Web 3(3), 1–44 (2009)CrossRefGoogle Scholar
  19. 19.
    Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of EMNLP, pp. 248–256 (2009)Google Scholar
  20. 20.
    Ramage, D., Manning, C.D., Dumais, S.: Partially labeled topic models for interpretable text mining. In: Proceedings of KDD, pp. 457–465 (2011)Google Scholar
  21. 21.
    Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), pp. 487–494 (2004)Google Scholar
  22. 22.
    Steinmetz, N., Lausen, H., Brunner, M.: Web Service Search on Large Scale. In: Baresi, L., Chi, C.-H., Suzuki, J. (eds.) ICSOC-ServiceWave 2009. LNCS, vol. 5900, pp. 437–444. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  23. 23.
    Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Handbook of Latent Semantic Analysis, vol. 427 (2007)Google Scholar
  24. 24.
    Wallach, H., Mimno, D., McCallum, A.: Rethinking lda: Why priors matter, vol. 22, pp. 1973–1981 (2009)Google Scholar
  25. 25.
    Wang, X., Mohanty, N., McCallum, A.: Group and topic discovery from relations and text. In: Proceedings of Intl. Workshop on Link Discovery, pp. 28–35 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Chenghua Lin
    • 1
  • Yulan He
    • 1
  • Carlos Pedrinaci
    • 1
  • John Domingue
    • 1
  1. 1.Knowledge Media InstituteThe Open UniversityMilton KeynesUK

Personalised recommendations