Advertisement

EVE: explainable vector based embedding technique using Wikipedia

  • M. Atif Qureshi
  • Derek Greene
Article

Abstract

We present an unsupervised explainable vector embedding technique, called EVE, which is built upon the structure of Wikipedia. The proposed model defines the dimensions of a semantic vector representing a concept using human-readable labels, thereby it is readily interpretable. Specifically, each vector is constructed using the Wikipedia category graph structure together with the Wikipedia article link structure. To test the effectiveness of the proposed model, we consider its usefulness in three fundamental tasks: 1) intruder detection—to evaluate its ability to identify a non-coherent vector from a list of coherent vectors, 2) ability to cluster—to evaluate its tendency to group related vectors together while keeping unrelated vectors in separate clusters, and 3) sorting relevant items first—to evaluate its ability to rank vectors (items) relevant to the query in the top order of the result. For each task, we also propose a strategy to generate a task-specific human-interpretable explanation from the model. These demonstrate the overall effectiveness of the explainable embeddings generated by EVE. Finally, we compare EVE with the Word2Vec, FastText, and GloVe embedding techniques across the three tasks, and report improvements over the state-of-the-art.

Keywords

Distributional semantics Unsupervised learning Wikipedia 

Notes

Acknowledgements

This publication has emanated from research conducted with the support of Science Foundation Ireland (SFI), under Grant Number SFI/12/ RC/2289.

References

  1. Adler, P., Falk, C., Friedler, S.A, Rybeck, G., Scheidegger, C., Smith, B., Venkatasubramanian, S. (2016). Auditing black-box models for indirect influence. In 2016 IEEE 16th international conference on data mining (ICDM) (pp. 1–10). IEEE.Google Scholar
  2. Agirre, E., & Soroa, A. (2009). Personalizing pagerank for word sense disambiguation. In Proceedings of the 12th conference of the European chapter of the association for computational linguistics, association for computational linguistics (pp. 33–41).Google Scholar
  3. Arora, S., Li, Y., Liang, Y., Ma, T., Risteski, A. (2016). A latent variable model approach to pmi-based word embeddings. Transactions of the Association for Computational Linguistics, 4, 385–399.Google Scholar
  4. Baroni, M., & Lenci, A. (2010). Distributional memory: a general framework for corpus-based semantics. Computational Linguistics, 36 (4), 673–721.CrossRefGoogle Scholar
  5. Baroni, M., Dinu, G., Kruszewski, G. (2014). Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In ACL (Vol. 1, pp. 238–247).Google Scholar
  6. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C. (2003). A neural probabilistic language model. JMLR, 3, 1137–1155.zbMATHGoogle Scholar
  7. Bhargava, P., Phan, T., Zhou, J., Lee, J. (2015). Who, what, when, and where: multi-dimensional collaborative recommendations using tensor factorization on sparse user-generated data. In Proceedings of the 24th international conference on world wide web (pp. 130–140). ACM.Google Scholar
  8. Bian, J., Gao, B., Liu, T.Y. (2014). Knowledge-powered deep learning for word embedding. In Joint European conference on machine learning and knowledge discovery in databases (pp. 132–148). Springer.Google Scholar
  9. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T. (2016). Enriching word vectors with subword information. arXiv preprint arXiv:160704606.
  10. Bordes, A., Weston, J., Collobert, R., Bengio, Y. (2011). Learning structured embeddings of knowledge bases. In Conference on artificial intelligence, EPFL-CONF-192344.Google Scholar
  11. Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 32 (1), 13–47.CrossRefzbMATHGoogle Scholar
  12. Caliński, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-Theory and Methods, 3 (1), 1–27.MathSciNetCrossRefzbMATHGoogle Scholar
  13. Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the ICML’2008 (pp. 160–167). ACM.Google Scholar
  14. Datta, A., Sen, S., Zick, Y. (2016). Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In 2016 IEEE symposium on security and privacy (SP) (pp. 598–617). IEEE.Google Scholar
  15. Deerwester, S. (1988). Improving information retrieval with latent semantic indexing. In Proceedings of the 51st annual meeting of the American Society for information science (Vol. 25, pp. 36–40).Google Scholar
  16. Diao, Q., Qiu, M., Wu, C.Y., Smola, A.J., Jiang, J., Wang, C. (2014). Jointly modeling aspects, ratings and sentiments for movie recommendation (jmars). In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 193–202). ACM.Google Scholar
  17. Diaz, F., Mitra, B., Craswell, N. (2016). Query expansion with locally-trained word embeddings. In Association for computational linguistics (pp. 367–377).Google Scholar
  18. Everitt, B., Landau, S., Leese, M. (2001). Cluster analysis. Wiley: Hodder Arnold Publication.zbMATHGoogle Scholar
  19. Faruqui, M., Dodge, J., Jauhar, S.K, Dyer, C., Hovy, E., Smith, N.A. (2014). Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:14114166.
  20. Firth, J. (1957). A synopsis of linguistic theory 1930–1955. In Studies in linguistic analysis (pp. 1–32).Google Scholar
  21. Fu, X., Wang, T., Li, J., Yu, C., Liu, W. (2016). Improving distributed word representation and topic model by word-topic mixture model. In Proceedings of the 8th Asian conference on machine learning (pp. 190–205).Google Scholar
  22. Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the IJCAI’07 (Vol. 7, pp. 1606–1611).Google Scholar
  23. Gallant, S.I., Caid, W.R., Carleton, J., Hecht-Nielsen, R., Qing, K.P., Sudbeck, D. (1992). Hnc’s matchplus system. In ACM SIGIR Forum (Vol. 26, pp. 34–38). ACM.Google Scholar
  24. Ganguly, D., Roy, D., Mitra, M., Jones, G.J. (2015). Word embedding based generalized language model for information retrieval. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval (pp. 795–798). ACM.Google Scholar
  25. Ganitkevitch, J., Van Durme, B., Callison-Burch, C. (2013). Ppdb: the paraphrase database. In HLT-NAACL (pp. 758–764).Google Scholar
  26. Globerson, A., Chechik, G., Pereira, F., Tishby, N. (2007). Euclidean embedding of co-occurrence data. JMLR, 8, 2265–2295.MathSciNetzbMATHGoogle Scholar
  27. Goodman, B., & Flaxman, S. (2016). European union regulations on algorithmic decision-making and a “right to explanation”. arXiv preprint arXiv:160608813.
  28. Gyöngyi, Z., Garcia-Molina, H., Pedersen, J. (2004). Combating web spam with trustrank. In Proceedings of the thirtieth international conference on very large data bases. VLDB Endowment (Vol. 30, pp. 576–587).Google Scholar
  29. Harris, Z.S. (1954). Distributional structure. Word, 10 (2–3), 146–162.CrossRefGoogle Scholar
  30. Harris, Z.S. (1968). Mathematical structures of language. New York: Wiley.zbMATHGoogle Scholar
  31. Henelius, A., Puolamäki, K., Boström, H., Asker, L., Papapetrou, P. (2014). A peek into the black box: exploring classifiers by randomization. Data Mining and Knowledge Discovery, 28 (5–6), 1503.MathSciNetCrossRefGoogle Scholar
  32. Hoffart, J., Seufert, S., Nguyen, D.B., Theobald, M., Weikum, G. (2012). Kore: keyphrase overlap relatedness for entity disambiguation. In Proceedings of the 21st ACM international conference on information and knowledge management (pp. 545–554).Google Scholar
  33. Hunt, J., & Price, C. (1988). Explaining qualitative diagnosis. Engineering Applications of Artificial Intelligence, 1 (3), 161–169.CrossRefGoogle Scholar
  34. Jarmasz, M. (2012). Roget’s thesaurus as a lexical resource for natural language processing. arXiv preprint arXiv:12040140.
  35. Jiang, Y., Zhang, X., Tang, Y., Nie, R. (2015). Feature-based approaches to semantic similarity assessment of concepts using wikipedia. Info Processing & Management, 51 (3), 215–234.CrossRefGoogle Scholar
  36. Kuzi, S., Shtok, A., Kurland, O. (2016). Query expansion using word embeddings. In Proceedings of the 25th ACM international on conference on information and knowledge management (pp. 1929–1932). ACM.Google Scholar
  37. Landauer, T.K., Foltz, P.W, Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25 (2–3), 259–284.CrossRefGoogle Scholar
  38. Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Proceedings of the NIPS’2014 (pp. 2177–2185).Google Scholar
  39. Levy, O., Goldberg, Y., Ramat-Gan, I. (2014). Linguistic regularities in sparse and explicit word representations. In CoNLL (pp. 171–180).Google Scholar
  40. Levy, O., Goldberg, Y., Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211–225.Google Scholar
  41. Lipton, Z.C. (2016). The mythos of model interpretability. arXiv preprint arXiv:160603490.
  42. Liu, Y., Liu, Z., Chua, T.S., Sun, M. (2015). Topical word embeddings. In AAAI (pp. 2418–2424).Google Scholar
  43. Lopez-Suarez, A., & Kamel, M. (1994). Dykor: a method for generating the content of explanations in knowledge systems. Knowledge-Based Systems, 7 (3), 177–188.CrossRefGoogle Scholar
  44. Manning, C.D., Raghavan, P., Schütze, H. (2008). Introduction to information retrieval. New York: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  45. Metzler, D., Dumais, S., Meek, C. (2007). Similarity measures for short segments of text. In European conference on information retrieval (pp. 16–27). Springer.Google Scholar
  46. Mihalcea, R., & Tarau, P. (2004). Textrank: bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing.Google Scholar
  47. Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781.
  48. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S, Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Proceedings of the NIPS’2013 (pp. 3111–3119).Google Scholar
  49. Nikfarjam, A., Sarker, A., O’Connor, K., Ginn, R., Gonzalez, G. (2015). Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. Journal of the American Medical Informatics Association, 22, 671–681.Google Scholar
  50. Niu, L., Dai, X., Zhang, J., Chen, J. (2015). Topic2vec: learning distributed representations of topics. In 2015 International conference on asian language processing (IALP) (pp. 193–196). IEEE.Google Scholar
  51. Page, L., Brin, S., Motwani, R., Winograd, T. (1999). The pagerank citation ranking: bringing order to the web. Tech. rep., Stanford InfoLab.Google Scholar
  52. Pennington, J., Socher, R., Manning, C.D. (2014). Glove: global vectors for word representation. In Empirical methods in natural language processing (EMNLP) (pp. 1532–1543).Google Scholar
  53. Qureshi, M.A. (2015). Utilising wikipedia for text mining applications. PhD thesis, National University of Ireland Galway.Google Scholar
  54. Ren, Z., Liang, S., Li, P., Wang, S., de Rijke, M. (2017). Social collaborative viewpoint regression with explainable recommendations. In Proceedings of the tenth ACM international conference on web search and data mining (pp. 485–494). ACM.Google Scholar
  55. Ribeiro, M.T., Singh, S., Guestrin, C. (2016). Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135–1144). ACM.Google Scholar
  56. Salton, G., & McGill, M.J. (1986). Introduction to modern information retrieval. New York: McGraw-Hill, Inc.zbMATHGoogle Scholar
  57. Sari, Y., & Stevenson, M. (2016). Exploring word embeddings and character n-grams for author clustering. In Working notes. CEUR Workshop Proceedings, CLEF.Google Scholar
  58. Schütze, H. (1992). Word space. In Proceedings of the NIPS’1992 (pp. 895–902).Google Scholar
  59. Sherkat, E., & Milios, E.E. (2017). Vector embedding of wikipedia concepts and entities. In International conference on applications of natural language to information systems (pp. 418–428). Springer.Google Scholar
  60. Socher, R., Chen, D., Manning, C.D, Ng, A. (2013). Reasoning with neural tensor networks for knowledge base completion. In Proceedings of the NIPS’2013 (pp. 926–934).Google Scholar
  61. Strube, M., & Ponzetto, S.P. (2006). Wikirelate! Computing semantic relatedness using wikipedia. In Proceedings of the 21st national conference on artificial intelligence (pp. 1419–1424).Google Scholar
  62. Tintarev, N., & Masthoff, J. (2015). Explaining recommendations: design and evaluation. In Recommender systems handbook (pp. 353–382). Springer.Google Scholar
  63. van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-sne. JMLR, 9, 2579–2605.zbMATHGoogle Scholar
  64. Wang, Z., Zhang, J., Feng, J., Chen, Z. (2014). Knowledge graph and text jointly embedding. In EMNLP, Citeseer (Vol. 14, pp. 1591–1601).Google Scholar
  65. Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L, Hao, H. (2016). Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing, 174, 806–814.CrossRefGoogle Scholar
  66. Wick, M.R, & Thompson, W.B. (1992). Reconstructive expert system explanation. Artificial Intelligence, 54 (1–2), 33–70.CrossRefGoogle Scholar
  67. Witten, I., & Milne, D. (2008). An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In AAAI workshop on wikipedia and artificial intelligence: an evolving synergy (pp. 25–30).Google Scholar
  68. Wu, F., Song, J., Yang, Y., Li, X., Zhang, Z.M, Zhuang, Y. (2015). Structured embedding via pairwise relations and long-range interactions in knowledge base. In AAAI (pp. 1663–1670).Google Scholar
  69. Xu, C., Bai, Y., Bian, J., Gao, B., Wang, G., Liu, X., Liu, T.Y. (2014). Rc-net: a general framework for incorporating knowledge into word representations. In Proceedings of the 23rd ACM international conference on conference on information and knowledge management (pp. 1219–1228).Google Scholar
  70. Yeh, E., Ramage, D., Manning, C.D, Agirre, E., Soroa, A. (2009). Wikiwalk: random walks on wikipedia for semantic relatedness. In Proceedings of the 2009 workshop on graph-based methods for natural language processing (pp. 41–49).Google Scholar
  71. Yu, M., & Dredze, M. (2014). Improving lexical embeddings with semantic knowledge. In ACL (Vol. 2, pp. 545–550).Google Scholar
  72. Zesch, T., & Gurevych, I. (2007). Analysis of the wikipedia category graph for nlp applications. In Proceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007) (pp. 1–8).Google Scholar
  73. Zhang, Y., Lai, G., Zhang, M., Zhang, Y., Liu, Y., Ma, S. (2014). Explicit factor models for explainable recommendation based on phrase-level sentiment analysis. In Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval (pp. 83–92). ACM.Google Scholar
  74. Zheng, G., & Callan, J. (2015). Learning to reweight terms with distributed representations. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval (pp. 575–584). ACM.Google Scholar
  75. Zuccon, G., Koopman, B., Bruza, P., Azzopardi, L. (2015). Integrating and evaluating neural word embeddings in information retrieval. In Proceedings of the 20th Australasian document computing symposium (p. 12). ACM.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Center for Applied Data Analytics ResearchUniversity College DublinDublinIreland
  2. 2.Insight Center for Data AnalyticsUniversity College DublinDublinIreland

Personalised recommendations