Semantic Similarity Measures for the Development of Thai Dialog System

  • Khukrit Osathanunkul
  • James O’Shea
  • Zuhair Bandar
  • Keeley Crockett
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6682)


Semantic similarity plays an important role in a number of applications including information extraction, information retrieval, document clustering and ontology learning. Most work has concentrated on English and other European languages. However, for the Thai language, there has been no research about word semantic similarity. This paper presents an experiment and benchmark data sets investigating the application of a WordNet-based machine measure to Thai similarity. Because there is no functioning Thai WordNet we also investigate the use of English WordNet with machine translation of Thai words.


word-to-word similarity word-to-word comparison semantic similarity measures Benchmark Thai Dialog System 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lemon, O., Liu, X.: DUDE: a dialogue and understanding development environment, mapping business process models to information state update dialogue systems. In: Lemon, O., Liu, X. (eds.) Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics, EACL 2006, Stroudsburg, PA, USA (2006)Google Scholar
  2. 2.
    Kopp, S., Gesellensetter, L., Krämer, N.C., Wachsmuth, I.: A Conversational Agent as Museum Guide – Design and Evaluation of a Real-World Application. In: Panayiotopoulos, T., Gratch, J., Aylett, R.S., Ballin, D., Olivier, P., Rist, T. (eds.) IVA 2005. LNCS (LNAI), vol. 3661, pp. 329–343. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  3. 3.
    Ibarhim, A., Johasson, P.: Multimodal Dialogue Systems for Interactive TVApplications. In: Proceedings of the 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002, Washington DC (2002)Google Scholar
  4. 4.
    Huang, F., et al.: Language understanding component for Chinese dialogue system. In: International Conference on Spoken Language Processing, Beijing, October 16-20, pp. 1053–1056 (2000)Google Scholar
  5. 5.
    Ehsani, F., Bernstein, J., Najmi, A.: An interactive dialog system for learning Japanese. Elsevier Science B.V., Amsterdam (2000)Google Scholar
  6. 6.
    O’Shea, J., Bandar, Z., Crockett, K., Mclean, D.: A Comparative Study of Two Short Text Semantic Similarity Measures. In: Nguyen, N.T., Jo, G.-S., Howlett, R.J., Jain, L.C. (eds.) KES-AMSTA 2008. LNCS (LNAI), vol. 4953, pp. 82–91. Springer, Heidelberg (2008)Google Scholar
  7. 7.
    Miller, G.A.: WordNet: A Lexical Database for English. Comm. Acm 38(11), 39–41 (1995)CrossRefGoogle Scholar
  8. 8.
    Sornlertlamvanich, V., et al.: Review on Development of Asian WordNet. Japlo 2009 year book, 276–285 (2009)Google Scholar
  9. 9.
    Lewis, M.P. (ed.): Ethnologue: Languages of the World, 16th edn. SIL International, Dallas (2009)Google Scholar
  10. 10.
    Rubenstein, H., Goodenough, J.B.: Contextual Correlates of Synonymy. Communication of the ACM 8(10), 627–633 (1965)CrossRefGoogle Scholar
  11. 11.
    Miller, G.A., Charles, W.G.: Contextual Correlates of Semantic Similarity. Language and Cognitive Processes 6(1), 1–28 (1991)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Jarmasz, M., Szpakowicz, S.: Roget’s Thesaurus and semantic similarity. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, Borovetz, Bulgaria, pp. 212–219 (2003)Google Scholar
  13. 13.
    Morris, J., Hirst, J.: Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics 17(1), 21–48 (1991)Google Scholar
  14. 14.
    Kozima, H., Furugori, T.: Similarity between word computed by spreading activation on an English dictionary. In: Proceedings of 6th Conference of the European Chapter of the Association for Computational Linguistics, Utrecht, pp. 232–239 (1993)Google Scholar
  15. 15.
    Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics 19(1), 17–30 (1989)CrossRefGoogle Scholar
  16. 16.
    Wu, Z., Palmer, M.: Verb semantic and lexical selection. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, NM, pp. 133–138 (1994)Google Scholar
  17. 17.
    Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th international joint conference on artificial intelligence, Montreal, Canada, pp. 448–453 (1995)Google Scholar
  18. 18.
    Lin, D.: An information-theoretic definition of similarity. In: Proceeding of the 15th International Conference on Machine Learning, pp. 296–304 (1998)Google Scholar
  19. 19.
    Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)CrossRefGoogle Scholar
  20. 20.
    Tversky, A.: Features of Similarity. Psychological Review 84(4), 327–352 (1977)CrossRefGoogle Scholar
  21. 21.
    Rodriguez, M., Egenhofer, M.: Determining Semantic Similarity Among Entity Classes from Different Ontologies. IEEE Trans. On Knowledge and Data Engineering 15(2), 442–456 (2003)CrossRefGoogle Scholar
  22. 22.
    Li, Y., Bandar, Z., McLean, D.: An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. IEEE Transactions on Knowledge and Data Engineering 15(4), 871–882 (2003)CrossRefGoogle Scholar
  23. 23.
    Pedersen, T., et al.: Measures of semantic similarity and relatedness in the Biomedical domain. Journal of Biomedical Informatics 40, 288–299 (2007)CrossRefGoogle Scholar
  24. 24.
    Pirro, G.: A semantic similarity metric combining features and intrinsic information content. Data & Knowledge Engineering 68, 1289–1308 (2009)CrossRefGoogle Scholar
  25. 25.
    Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G.M., Milios, E.: Information retrieval by semantic similarity. Int’l Journal on Semantic Web & Information Systems 2(3), 55–73 (2006)CrossRefGoogle Scholar
  26. 26.
    Google translate, (cited 08/10/2010)
  27. 27.
    Och, F.J.: Statistical Machine Translation: Foundations and Recent Advances. Tutorial at MT Summit 2005, Phuket, Thailand (2005)Google Scholar
  28. 28.
    Trakultaweekoon, K., Porkaew, P., Supnithi, T.: LEXiTRON Vocabulary Suggestion System with Recommendation and Vote Mechanism. In: Proceedings of Conference of SNLP 2007, Thailand (2007)Google Scholar
  29. 29.
    Longman: Longman Dictionary of Contemporary English, 5 edn. Longman, London (2009)Google Scholar
  30. 30.
    Li, Y., et al.: Sentence Similarity Based on Semantic Nets and Corpus Statistics. IEEE Transactions on Knowledge and Data Engineering 18(8), 1138–1150 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Khukrit Osathanunkul
    • 1
  • James O’Shea
    • 1
  • Zuhair Bandar
    • 1
  • Keeley Crockett
    • 1
  1. 1.Department of Computing and MathematicsManchester Metropolitan UniversityManchesterUnited Kingdom

Personalised recommendations