Abstract
Semantic-oriented service matching is one of the challenges in automatic Web service discovery. Service users may search for Web services using keywords and receive the matching services in terms of their functional profiles. A number of approaches to computing the semantic similarity between words have been developed to enhance the precision of matchmaking, which can be classified into ontology-based and corpus-based approaches. The ontology-based approaches commonly use the differentiated concept information provided by a large ontology for measuring lexical similarity with word sense disambiguation. Nevertheless, most of the ontologies are domain-special and limited to lexical coverage, which have a limited applicability. On the other hand, corpus-based approaches rely on the distributional statistics of context to represent per word as a vector and measure the distance of word vectors. However, the polysemous problem may lead to a low computational accuracy. In this paper, in order to augment the semantic information content in word vectors, we propose a multiple semantic fusion (MSF) model to generate sense-specific vector per word. In this model, various semantic properties of the general-purpose ontology WordNet are integrated to fine-tune the distributed word representations learned from corpus, in terms of vector combination strategies. The retrofitted word vectors are modeled as semantic vectors for estimating semantic similarity. The MSF model-based similarity measure is validated against other similarity measures on multiple benchmark datasets. Experimental results of word similarity evaluation indicate that our computational method can obtain higher correlation coefficient with human judgment in most cases. Moreover, the proposed similarity measure is demonstrated to improve the performance of Web service matchmaking based on a single semantic resource. Accordingly, our findings provide a new method and perspective to understand and represent lexical semantics.
Similar content being viewed by others
Notes
References
Agirre E, Alfonseca E, Hall K, Kravalova J, Paşca M, Soroa A (2009) A study on similarity and relatedness using distributional and WordNet-based approaches. In: The conference of the North American chapter of the Association for Computational Linguistics—human language technologies, ACL, Boulder, Colorado, pp 19–27
Alves AO, Ferrugento A, Lourenço M, Rodrigues F (2014) Asap: automatic semantic alignment for phrases. In: The 8th international workshop on semantic evaluation (SemEval), Dublin, Ireland, pp 104–108
Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. Int Jt Conf Artif Intell 3:805–810
Bian J, Gao B, Liu T (2014) Knowledge-powered deep learning for word embedding. In: The European conference on machine learning and knowledge discovery in databases (ECML PKDD), Nancy, France, pp 132–148
Bordes A, Weston J, Collobert R, Bengio Y (2011) Learning structured embeddings of knowledge bases. In: The 25th AAAI conference on artificial intelligence, San Francisco, California
Chaves-González JM, MartíNez-Gil J (2013) Evolutionary algorithm based on different semantic similarity for synonym recognition. Knowl Based Syst 37(4):62–69
Chen X, Liu Z, Sun M (2014) A unified model for word sense representation and disambiguation. In: The conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, pp 1025–1035
Curbera F, Duftler M, Khalaf R, Nagy W, Mukhi N, Weerawarana S (2002) Unraveling the web services web: an introduction to SOAP, WSDL, and UDDI. IEEE Internet Comput 6(2):86
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Faruqui M, Dodge J, Jauhar SK, Dyer C, Hovy E, Smith NA (2015) Retrofitting word vectors to semantic lexicons. In: Conference of the North American chapter of the Association for Computational Linguistics: human language technologies, Denver, Colorado, pp 1606–1615
Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2001) Placing search in context: the concept revisited. In: The 10th international conference on World Wide Web. ACM, Hong Kong, pp 406–414
Gao JB, Zhang BW, Chen XH (2015) A WordNet-based semantic similarity measurement combining edge-counting and information content theory. Eng Appl Artif Intell 39:80–88
Guarino N (1995) Formal ontology, conceptual analysis and knowledge representation. Int J Hum Comput Stud 43(5):625–640
Haddi E, Liu X, Shi Y (2013) The role of text pre-processing in sentiment analysis. In: The first international conference on information technology and quantitative management, Sushou, pp 26–32
Huang EH, Socher R, Manning CD, Ng AY (2012) Improving word representations via global context and multiple word prototypes. In: The 50th annual meeting of the Association for Computational Linguistics, Jeju Island, South Korea, pp 873–882
Khanam SA, Youn HY (2016) A web service discovery scheme based on structural and semantic similarity. J Inf Sci Eng 32(1):153–176
Li Y, Bandar ZA, McLean D (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng 15(4):871–882
Lin D (1998) An information-theoretic definition of similarity. In: The 15th international conference on machine learning (ICML), Madison, Wisconsin, USA, pp 296–304
Liu HZ, Bao H, Xu D (2012) Concept vector for semantic similarity and relatedness based on WordNet structure. J Syst Softw 85(2):370–381
Luong MT, Socher R, Manning C (2013) Better word representations with recursive neural networks for morphology. In: The 17th conference on computational natural language learning (CoNLL 2013), Sofia, Bulgaria, pp 2493–2537
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Martin D, Paolucci M, McIlraith S, Burstein M, McDermott D, McGuinness D, Parsia B, Payne T, Sabou M, Solanki M, et al (2005) Bringing semantics to web services: the OWL-s approach. In: Semantic web services and web process composition. Springer, Berlin, Heidelberg, pp 26–42
Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. In: The 21st national conference on artificial intelligence, Boston, USA, pp 775–780
Mikolov T (2012) Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space (2013). arXiv:1301.3781 [cs.CL]
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: The 27th annual conference on neural information processing systems (NIPS), Lake Tahoe, Nevada, pp 3111–3119
Mikolov T, Yih W, Zweig G (2013) Linguistic regularities in continuous space word representations. In: The conference of North American chapter of the Association for Computational Linguistics–human language technologies, Atlanta, GA, USA, pp 746–751
Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41
Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6(1):1–28
Ngan LD, Kanagasabai R (2013) Semantic web service discovery: state-of-the-art and research challenges. Person Ubiquitous Comput 17(8):1741–1752
Patwardhan S, Pedersen T (2006) Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In: The 11th conference of the European chapter of the Association for Computational Linguistics, workshop on making sense of sense, vol 1501. Trento, Italy, pp 1–8
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: The empirical methods in natural language processing (EMNLP). ACL, Doha, pp 1532–1543
Pirró G (2009) A semantic similarity metric combining features and intrinsic information content. Data Knowl Eng 68(11):1289–1308
Radinsky K, Agichtein E, Gabrilovich E, Markovitch S (2011) A word at a time: computing word relatedness using temporal semantic analysis. In: The 20th international conference on world wide web. ACM, New York, pp 337–346
Ross S (1976) A first course in probability. Macmillan, New York, NY
Rubenstein H, Goodenough JB (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633
Simonoff JS (2012) Smoothing methods in statistics. Springer, New York
Socher R, Lin CCY, Ng AY, Manning CD (2011) Parsing natural scenes and natural language with recursive neural networks. In: The 28th international conference on machine learning. ACM, Bellevue, pp 129–136
Thuy PTT, Lee YK, Lee S (2013) Semantic and structural similarities between xml schemas for integration of ubiquitous healthcare data. Person Ubiquitous Comput 17(7):1331–1339
Turney PD, Pantel P et al (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37(1):141–188
Xu C, Bai Y, Bian J, Gao B, Wang G, Liu X, Liu TY (2014) Rc-net: a general framework for incorporating knowledge into word representations. In: The 23rd ACM international conference on conference on information and knowledge management. ACM, Shanghai, pp 1219–1228
Yih W, Qazvinian V (2012) Measuring word relatedness using heterogeneous vector space models. In: The conference of the North American chapter of the Association for Computational Linguistics—human language technologies. ACL, Montréal, pp 616–620
Yu M, Dredze M (2014) Improving lexical embeddings with semantic knowledge. In: The 52nd annual meeting of the Association for Computational Linguistics, vol 2. ACL, Baltimore, pp 545–550
Acknowledgments
This work is supported in part by National Natural Science Foundation of China (Nos. 61272353, 61370128, 61428201 and 61502028), Program for New Century Excellent Talents in University (NCET-13-0659), Beijing Higher Education Young Elite Teacher Project (YETP0583).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lu, W., Cai, Y., Che, X. et al. Joint semantic similarity assessment with raw corpus and structured ontology for semantic-oriented service discovery. Pers Ubiquit Comput 20, 311–323 (2016). https://doi.org/10.1007/s00779-016-0921-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00779-016-0921-0