Skip to main content
Log in

Joint semantic similarity assessment with raw corpus and structured ontology for semantic-oriented service discovery

  • Original Article
  • Published:
Personal and Ubiquitous Computing Aims and scope Submit manuscript

Abstract

Semantic-oriented service matching is one of the challenges in automatic Web service discovery. Service users may search for Web services using keywords and receive the matching services in terms of their functional profiles. A number of approaches to computing the semantic similarity between words have been developed to enhance the precision of matchmaking, which can be classified into ontology-based and corpus-based approaches. The ontology-based approaches commonly use the differentiated concept information provided by a large ontology for measuring lexical similarity with word sense disambiguation. Nevertheless, most of the ontologies are domain-special and limited to lexical coverage, which have a limited applicability. On the other hand, corpus-based approaches rely on the distributional statistics of context to represent per word as a vector and measure the distance of word vectors. However, the polysemous problem may lead to a low computational accuracy. In this paper, in order to augment the semantic information content in word vectors, we propose a multiple semantic fusion (MSF) model to generate sense-specific vector per word. In this model, various semantic properties of the general-purpose ontology WordNet are integrated to fine-tune the distributed word representations learned from corpus, in terms of vector combination strategies. The retrofitted word vectors are modeled as semantic vectors for estimating semantic similarity. The MSF model-based similarity measure is validated against other similarity measures on multiple benchmark datasets. Experimental results of word similarity evaluation indicate that our computational method can obtain higher correlation coefficient with human judgment in most cases. Moreover, the proposed similarity measure is demonstrated to improve the performance of Web service matchmaking based on a single semantic resource. Accordingly, our findings provide a new method and perspective to understand and represent lexical semantics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://word2vec.googlecode.com/svn/trunk/.

  2. http://dumps.wikimedia.org/enwiki/20140903/.

  3. http://www.ota.ox.ac.uk/desc/2554.

  4. http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml/.

  5. http://www.nltk.org/book/ch02.html.

  6. http://homepage.tudelft.nl/19j49/t-SNE.html.

  7. http://projects.semwebcentral.org/projects/owls-tc/.

References

  1. Agirre E, Alfonseca E, Hall K, Kravalova J, Paşca M, Soroa A (2009) A study on similarity and relatedness using distributional and WordNet-based approaches. In: The conference of the North American chapter of the Association for Computational Linguistics—human language technologies, ACL, Boulder, Colorado, pp 19–27

  2. Alves AO, Ferrugento A, Lourenço M, Rodrigues F (2014) Asap: automatic semantic alignment for phrases. In: The 8th international workshop on semantic evaluation (SemEval), Dublin, Ireland, pp 104–108

  3. Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. Int Jt Conf Artif Intell 3:805–810

    Google Scholar 

  4. Bian J, Gao B, Liu T (2014) Knowledge-powered deep learning for word embedding. In: The European conference on machine learning and knowledge discovery in databases (ECML PKDD), Nancy, France, pp 132–148

  5. Bordes A, Weston J, Collobert R, Bengio Y (2011) Learning structured embeddings of knowledge bases. In: The 25th AAAI conference on artificial intelligence, San Francisco, California

  6. Chaves-González JM, MartíNez-Gil J (2013) Evolutionary algorithm based on different semantic similarity for synonym recognition. Knowl Based Syst 37(4):62–69

    Article  Google Scholar 

  7. Chen X, Liu Z, Sun M (2014) A unified model for word sense representation and disambiguation. In: The conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, pp 1025–1035

  8. Curbera F, Duftler M, Khalaf R, Nagy W, Mukhi N, Weerawarana S (2002) Unraveling the web services web: an introduction to SOAP, WSDL, and UDDI. IEEE Internet Comput 6(2):86

    Article  Google Scholar 

  9. Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407

    Article  Google Scholar 

  10. Faruqui M, Dodge J, Jauhar SK, Dyer C, Hovy E, Smith NA (2015) Retrofitting word vectors to semantic lexicons. In: Conference of the North American chapter of the Association for Computational Linguistics: human language technologies, Denver, Colorado, pp 1606–1615

  11. Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2001) Placing search in context: the concept revisited. In: The 10th international conference on World Wide Web. ACM, Hong Kong, pp 406–414

    Google Scholar 

  12. Gao JB, Zhang BW, Chen XH (2015) A WordNet-based semantic similarity measurement combining edge-counting and information content theory. Eng Appl Artif Intell 39:80–88

    Article  Google Scholar 

  13. Guarino N (1995) Formal ontology, conceptual analysis and knowledge representation. Int J Hum Comput Stud 43(5):625–640

    Article  Google Scholar 

  14. Haddi E, Liu X, Shi Y (2013) The role of text pre-processing in sentiment analysis. In: The first international conference on information technology and quantitative management, Sushou, pp 26–32

    Article  Google Scholar 

  15. Huang EH, Socher R, Manning CD, Ng AY (2012) Improving word representations via global context and multiple word prototypes. In: The 50th annual meeting of the Association for Computational Linguistics, Jeju Island, South Korea, pp 873–882

  16. Khanam SA, Youn HY (2016) A web service discovery scheme based on structural and semantic similarity. J Inf Sci Eng 32(1):153–176

    Google Scholar 

  17. Li Y, Bandar ZA, McLean D (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng 15(4):871–882

    Article  Google Scholar 

  18. Lin D (1998) An information-theoretic definition of similarity. In: The 15th international conference on machine learning (ICML), Madison, Wisconsin, USA, pp 296–304

  19. Liu HZ, Bao H, Xu D (2012) Concept vector for semantic similarity and relatedness based on WordNet structure. J Syst Softw 85(2):370–381

    Article  Google Scholar 

  20. Luong MT, Socher R, Manning C (2013) Better word representations with recursive neural networks for morphology. In: The 17th conference on computational natural language learning (CoNLL 2013), Sofia, Bulgaria, pp 2493–2537

  21. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  22. Martin D, Paolucci M, McIlraith S, Burstein M, McDermott D, McGuinness D, Parsia B, Payne T, Sabou M, Solanki M, et al (2005) Bringing semantics to web services: the OWL-s approach. In: Semantic web services and web process composition. Springer, Berlin, Heidelberg, pp 26–42

  23. Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. In: The 21st national conference on artificial intelligence, Boston, USA, pp 775–780

  24. Mikolov T (2012) Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology

  25. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space (2013). arXiv:1301.3781 [cs.CL]

  26. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: The 27th annual conference on neural information processing systems (NIPS), Lake Tahoe, Nevada, pp 3111–3119

  27. Mikolov T, Yih W, Zweig G (2013) Linguistic regularities in continuous space word representations. In: The conference of North American chapter of the Association for Computational Linguistics–human language technologies, Atlanta, GA, USA, pp 746–751

  28. Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41

    Article  Google Scholar 

  29. Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6(1):1–28

    Article  Google Scholar 

  30. Ngan LD, Kanagasabai R (2013) Semantic web service discovery: state-of-the-art and research challenges. Person Ubiquitous Comput 17(8):1741–1752

    Article  Google Scholar 

  31. Patwardhan S, Pedersen T (2006) Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In: The 11th conference of the European chapter of the Association for Computational Linguistics, workshop on making sense of sense, vol 1501. Trento, Italy, pp 1–8

  32. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: The empirical methods in natural language processing (EMNLP). ACL, Doha, pp 1532–1543

    Google Scholar 

  33. Pirró G (2009) A semantic similarity metric combining features and intrinsic information content. Data Knowl Eng 68(11):1289–1308

    Article  Google Scholar 

  34. Radinsky K, Agichtein E, Gabrilovich E, Markovitch S (2011) A word at a time: computing word relatedness using temporal semantic analysis. In: The 20th international conference on world wide web. ACM, New York, pp 337–346

  35. Ross S (1976) A first course in probability. Macmillan, New York, NY

    MATH  Google Scholar 

  36. Rubenstein H, Goodenough JB (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633

    Article  Google Scholar 

  37. Simonoff JS (2012) Smoothing methods in statistics. Springer, New York

    MATH  Google Scholar 

  38. Socher R, Lin CCY, Ng AY, Manning CD (2011) Parsing natural scenes and natural language with recursive neural networks. In: The 28th international conference on machine learning. ACM, Bellevue, pp 129–136

    Google Scholar 

  39. Thuy PTT, Lee YK, Lee S (2013) Semantic and structural similarities between xml schemas for integration of ubiquitous healthcare data. Person Ubiquitous Comput 17(7):1331–1339

    Article  Google Scholar 

  40. Turney PD, Pantel P et al (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37(1):141–188

    MathSciNet  MATH  Google Scholar 

  41. Xu C, Bai Y, Bian J, Gao B, Wang G, Liu X, Liu TY (2014) Rc-net: a general framework for incorporating knowledge into word representations. In: The 23rd ACM international conference on conference on information and knowledge management. ACM, Shanghai, pp 1219–1228

    Google Scholar 

  42. Yih W, Qazvinian V (2012) Measuring word relatedness using heterogeneous vector space models. In: The conference of the North American chapter of the Association for Computational Linguistics—human language technologies. ACL, Montréal, pp 616–620

    Google Scholar 

  43. Yu M, Dredze M (2014) Improving lexical embeddings with semantic knowledge. In: The 52nd annual meeting of the Association for Computational Linguistics, vol 2. ACL, Baltimore, pp 545–550

    Google Scholar 

Download references

Acknowledgments

This work is supported in part by National Natural Science Foundation of China (Nos. 61272353, 61370128, 61428201 and 61502028), Program for New Century Excellent Talents in University (NCET-13-0659), Beijing Higher Education Young Elite Teacher Project (YETP0583).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuanyuan Cai.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, W., Cai, Y., Che, X. et al. Joint semantic similarity assessment with raw corpus and structured ontology for semantic-oriented service discovery. Pers Ubiquit Comput 20, 311–323 (2016). https://doi.org/10.1007/s00779-016-0921-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00779-016-0921-0

Keywords

Navigation