Probabilistic Topic Models for Web Services Clustering and Discovery

  • Mustapha Aznag
  • Mohamed Quafafou
  • El Mehdi Rochd
  • Zahi Jarir
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8135)


In Information Retrieval the Probabilistic Topic Models were originally developed and utilized for topic extraction and document modeling. In this paper, we explore several probabilistic topic models: Probabilistic Latent Semantic Analysis (PLSA), Latent Dirichlet Allocation (LDA) and Correlated Topic Model (CTM) to extract latent factors from web service descriptions. These extracted latent factors are then used to group the services into clusters. In our approach, topic models are used as efficient dimension reduction techniques, which are able to capture semantic relationships between word-topic and topic-service interpreted in terms of probability distributions. To address the limitation of keywords-based queries, we represent web service description as a vector space and we introduce a new approach for discovering web services using latent factors. In our experiment, we compared the accuracy of the three probabilistic clustering algorithms (PLSA, LDA and CTM) with that of a classical clustering algorithm. We evaluated also our service discovery approach by calculating the precision (P@n) and normalized discounted cumulative gain (NDCGn). The results show that both approaches based on CTM and LDA perform better than other search methods.


Web service Data Representation Clustering Discovery Machine Learning Topic Models 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abramowicz, W., Haniewicz, K., Kaczmarek, M., Zyskowski, D.: Architecture for Web services filtering and clustering. In: ICIW 2007 (2007)Google Scholar
  2. 2.
    Atkinson, C., Bostan, P., Hummel, O., Stoll, D.: A Practical Approach to Web service Discovery and Retrieval. In: ICWS 2007 (2007)Google Scholar
  3. 3.
    Blei, D., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATHGoogle Scholar
  4. 4.
    Blei, D., Lafferty, J.D.: A Correlated Topic model of Science. In: AAS 2007, pp. 17–35 (2007)Google Scholar
  5. 5.
    Cassar, G., Barnaghi, P., Moessner, K.: Probabilistic methods for service clustering. In: Proceeding of the 4th International Workshop on Semantic Web Service Matchmaking and Resource Retrieval, Organised in conjonction the ISWC 2010 (2010)Google Scholar
  6. 6.
    Cassar, G., Barnaghi, P., Moessner, K.: A Probabilistic Latent Factor approach to service ranking. In: ICCP 2011, pp. 103–109 (2011)Google Scholar
  7. 7.
    Elgazzar, K., Hassan, A., Martin, P.: Clustering WSDL Documents to Bootstrap the Discovery of Web Services. In: ICWS 2010, pp. 147–154 (2010)Google Scholar
  8. 8.
    Dong, X., Halevy, A., Madhavan, J., Nemes, E., Zhang, J.: Similarity Search for Web Services. In: VLDB Conference, Toronto, Canada, pp. 372–383 (2004)Google Scholar
  9. 9.
    Heß, A., Kushmerick, N.: Learning to Attach Semantic Metadata to Web Services. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 258–273. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  10. 10.
    Hofmann, T.: Probabilistic Latent Semantic Analysis. In: UAI, pp. 289–296 (1999)Google Scholar
  11. 11.
    Kokash, N.: A Comparison of Web Service Interface Similarity Measures. Frontiers in Artificial Intelligence and Applications, vol. 142, pp. 220–231 (2006)Google Scholar
  12. 12.
    Lausen, H., Haselwanter, T.: Finding Web services. In: European Semantic Technology Conference, Vienna, Austria (2007)Google Scholar
  13. 13.
    Liu, W., Wong, W.: Web service clustering using text mining techniques. IJAOSE 2009 3(1), 6–26 (2009)CrossRefGoogle Scholar
  14. 14.
    Ma, J., Zhang, Y., He, J.: Efficiently finding web services using a clustering semantic approach. In: CSSSIA 2008, pp. 1–8. ACM, New York (2008)Google Scholar
  15. 15.
    Nayak, R., Lee, B.: Web service Discovery with Additional Semantics and Clustering. In: IEEE/WIC/ACM 2007 (2007)Google Scholar
  16. 16.
    Porter, M.F.: An Algorithm for Suffix Stripping. Program 1980 14(3), 130–137 (1980)Google Scholar
  17. 17.
    Platzer, C., Rosenberg, F., Dustdar, S.: Web service clustering using multidimentional angles as proximity measures. ACM Trans. Internet Technol. 9(3), 1–26 (2009)CrossRefGoogle Scholar
  18. 18.
    Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley Longman Publishing Co., Inc., Boston (1989)Google Scholar
  19. 19.
    Sivashanmugam, K., Verma, A.P., Miller, J.A.: Adding Semantics to Web services Standards. In: ICWS 2003, pp. 395–401 (2003)Google Scholar
  20. 20.
    Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Landauer, T., Mcnamara, D., Dennis, S., Kintsch, W. (eds.) Latent Semantic Analysis: A Road to Meaning. Lawrence Erlbaum (2007)Google Scholar
  21. 21.
    Yu, Q.: Place Semantics into Context: Service Community Discovery from the WSDL Corpus. In: Kappel, G., Maamar, Z., Motahari-Nezhad, H.R. (eds.) ICSOC 2011. LNCS, vol. 7084, pp. 188–203. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  22. 22.
    Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. In: Machine Learning 2004, vol. 55, pp. 311–331 (2004)Google Scholar
  23. 23.
    Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: CIKM 2002 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Mustapha Aznag
    • 1
  • Mohamed Quafafou
    • 1
  • El Mehdi Rochd
    • 1
  • Zahi Jarir
    • 2
  1. 1.LSIS UMR 7296Aix-Marseille UniversityFrance
  2. 2.LISI Laboratory, FSSMUniversity of Cadi AyyadMorocco

Personalised recommendations