EVE: explainable vector based embedding technique using Wikipedia
We present an unsupervised explainable vector embedding technique, called EVE, which is built upon the structure of Wikipedia. The proposed model defines the dimensions of a semantic vector representing a concept using human-readable labels, thereby it is readily interpretable. Specifically, each vector is constructed using the Wikipedia category graph structure together with the Wikipedia article link structure. To test the effectiveness of the proposed model, we consider its usefulness in three fundamental tasks: 1) intruder detection—to evaluate its ability to identify a non-coherent vector from a list of coherent vectors, 2) ability to cluster—to evaluate its tendency to group related vectors together while keeping unrelated vectors in separate clusters, and 3) sorting relevant items first—to evaluate its ability to rank vectors (items) relevant to the query in the top order of the result. For each task, we also propose a strategy to generate a task-specific human-interpretable explanation from the model. These demonstrate the overall effectiveness of the explainable embeddings generated by EVE. Finally, we compare EVE with the Word2Vec, FastText, and GloVe embedding techniques across the three tasks, and report improvements over the state-of-the-art.
KeywordsDistributional semantics Unsupervised learning Wikipedia
This publication has emanated from research conducted with the support of Science Foundation Ireland (SFI), under Grant Number SFI/12/ RC/2289.
- Adler, P., Falk, C., Friedler, S.A, Rybeck, G., Scheidegger, C., Smith, B., Venkatasubramanian, S. (2016). Auditing black-box models for indirect influence. In 2016 IEEE 16th international conference on data mining (ICDM) (pp. 1–10). IEEE.Google Scholar
- Agirre, E., & Soroa, A. (2009). Personalizing pagerank for word sense disambiguation. In Proceedings of the 12th conference of the European chapter of the association for computational linguistics, association for computational linguistics (pp. 33–41).Google Scholar
- Arora, S., Li, Y., Liang, Y., Ma, T., Risteski, A. (2016). A latent variable model approach to pmi-based word embeddings. Transactions of the Association for Computational Linguistics, 4, 385–399.Google Scholar
- Baroni, M., Dinu, G., Kruszewski, G. (2014). Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In ACL (Vol. 1, pp. 238–247).Google Scholar
- Bhargava, P., Phan, T., Zhou, J., Lee, J. (2015). Who, what, when, and where: multi-dimensional collaborative recommendations using tensor factorization on sparse user-generated data. In Proceedings of the 24th international conference on world wide web (pp. 130–140). ACM.Google Scholar
- Bian, J., Gao, B., Liu, T.Y. (2014). Knowledge-powered deep learning for word embedding. In Joint European conference on machine learning and knowledge discovery in databases (pp. 132–148). Springer.Google Scholar
- Bojanowski, P., Grave, E., Joulin, A., Mikolov, T. (2016). Enriching word vectors with subword information. arXiv preprint arXiv:160704606.
- Bordes, A., Weston, J., Collobert, R., Bengio, Y. (2011). Learning structured embeddings of knowledge bases. In Conference on artificial intelligence, EPFL-CONF-192344.Google Scholar
- Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the ICML’2008 (pp. 160–167). ACM.Google Scholar
- Datta, A., Sen, S., Zick, Y. (2016). Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In 2016 IEEE symposium on security and privacy (SP) (pp. 598–617). IEEE.Google Scholar
- Deerwester, S. (1988). Improving information retrieval with latent semantic indexing. In Proceedings of the 51st annual meeting of the American Society for information science (Vol. 25, pp. 36–40).Google Scholar
- Diao, Q., Qiu, M., Wu, C.Y., Smola, A.J., Jiang, J., Wang, C. (2014). Jointly modeling aspects, ratings and sentiments for movie recommendation (jmars). In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 193–202). ACM.Google Scholar
- Diaz, F., Mitra, B., Craswell, N. (2016). Query expansion with locally-trained word embeddings. In Association for computational linguistics (pp. 367–377).Google Scholar
- Faruqui, M., Dodge, J., Jauhar, S.K, Dyer, C., Hovy, E., Smith, N.A. (2014). Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:14114166.
- Firth, J. (1957). A synopsis of linguistic theory 1930–1955. In Studies in linguistic analysis (pp. 1–32).Google Scholar
- Fu, X., Wang, T., Li, J., Yu, C., Liu, W. (2016). Improving distributed word representation and topic model by word-topic mixture model. In Proceedings of the 8th Asian conference on machine learning (pp. 190–205).Google Scholar
- Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the IJCAI’07 (Vol. 7, pp. 1606–1611).Google Scholar
- Gallant, S.I., Caid, W.R., Carleton, J., Hecht-Nielsen, R., Qing, K.P., Sudbeck, D. (1992). Hnc’s matchplus system. In ACM SIGIR Forum (Vol. 26, pp. 34–38). ACM.Google Scholar
- Ganguly, D., Roy, D., Mitra, M., Jones, G.J. (2015). Word embedding based generalized language model for information retrieval. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval (pp. 795–798). ACM.Google Scholar
- Ganitkevitch, J., Van Durme, B., Callison-Burch, C. (2013). Ppdb: the paraphrase database. In HLT-NAACL (pp. 758–764).Google Scholar
- Goodman, B., & Flaxman, S. (2016). European union regulations on algorithmic decision-making and a “right to explanation”. arXiv preprint arXiv:160608813.
- Gyöngyi, Z., Garcia-Molina, H., Pedersen, J. (2004). Combating web spam with trustrank. In Proceedings of the thirtieth international conference on very large data bases. VLDB Endowment (Vol. 30, pp. 576–587).Google Scholar
- Hoffart, J., Seufert, S., Nguyen, D.B., Theobald, M., Weikum, G. (2012). Kore: keyphrase overlap relatedness for entity disambiguation. In Proceedings of the 21st ACM international conference on information and knowledge management (pp. 545–554).Google Scholar
- Jarmasz, M. (2012). Roget’s thesaurus as a lexical resource for natural language processing. arXiv preprint arXiv:12040140.
- Kuzi, S., Shtok, A., Kurland, O. (2016). Query expansion using word embeddings. In Proceedings of the 25th ACM international on conference on information and knowledge management (pp. 1929–1932). ACM.Google Scholar
- Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Proceedings of the NIPS’2014 (pp. 2177–2185).Google Scholar
- Levy, O., Goldberg, Y., Ramat-Gan, I. (2014). Linguistic regularities in sparse and explicit word representations. In CoNLL (pp. 171–180).Google Scholar
- Levy, O., Goldberg, Y., Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211–225.Google Scholar
- Lipton, Z.C. (2016). The mythos of model interpretability. arXiv preprint arXiv:160603490.
- Liu, Y., Liu, Z., Chua, T.S., Sun, M. (2015). Topical word embeddings. In AAAI (pp. 2418–2424).Google Scholar
- Metzler, D., Dumais, S., Meek, C. (2007). Similarity measures for short segments of text. In European conference on information retrieval (pp. 16–27). Springer.Google Scholar
- Mihalcea, R., & Tarau, P. (2004). Textrank: bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing.Google Scholar
- Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781.
- Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S, Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Proceedings of the NIPS’2013 (pp. 3111–3119).Google Scholar
- Nikfarjam, A., Sarker, A., O’Connor, K., Ginn, R., Gonzalez, G. (2015). Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. Journal of the American Medical Informatics Association, 22, 671–681.Google Scholar
- Niu, L., Dai, X., Zhang, J., Chen, J. (2015). Topic2vec: learning distributed representations of topics. In 2015 International conference on asian language processing (IALP) (pp. 193–196). IEEE.Google Scholar
- Page, L., Brin, S., Motwani, R., Winograd, T. (1999). The pagerank citation ranking: bringing order to the web. Tech. rep., Stanford InfoLab.Google Scholar
- Pennington, J., Socher, R., Manning, C.D. (2014). Glove: global vectors for word representation. In Empirical methods in natural language processing (EMNLP) (pp. 1532–1543).Google Scholar
- Qureshi, M.A. (2015). Utilising wikipedia for text mining applications. PhD thesis, National University of Ireland Galway.Google Scholar
- Ren, Z., Liang, S., Li, P., Wang, S., de Rijke, M. (2017). Social collaborative viewpoint regression with explainable recommendations. In Proceedings of the tenth ACM international conference on web search and data mining (pp. 485–494). ACM.Google Scholar
- Ribeiro, M.T., Singh, S., Guestrin, C. (2016). Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135–1144). ACM.Google Scholar
- Sari, Y., & Stevenson, M. (2016). Exploring word embeddings and character n-grams for author clustering. In Working notes. CEUR Workshop Proceedings, CLEF.Google Scholar
- Schütze, H. (1992). Word space. In Proceedings of the NIPS’1992 (pp. 895–902).Google Scholar
- Sherkat, E., & Milios, E.E. (2017). Vector embedding of wikipedia concepts and entities. In International conference on applications of natural language to information systems (pp. 418–428). Springer.Google Scholar
- Socher, R., Chen, D., Manning, C.D, Ng, A. (2013). Reasoning with neural tensor networks for knowledge base completion. In Proceedings of the NIPS’2013 (pp. 926–934).Google Scholar
- Strube, M., & Ponzetto, S.P. (2006). Wikirelate! Computing semantic relatedness using wikipedia. In Proceedings of the 21st national conference on artificial intelligence (pp. 1419–1424).Google Scholar
- Tintarev, N., & Masthoff, J. (2015). Explaining recommendations: design and evaluation. In Recommender systems handbook (pp. 353–382). Springer.Google Scholar
- Wang, Z., Zhang, J., Feng, J., Chen, Z. (2014). Knowledge graph and text jointly embedding. In EMNLP, Citeseer (Vol. 14, pp. 1591–1601).Google Scholar
- Witten, I., & Milne, D. (2008). An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In AAAI workshop on wikipedia and artificial intelligence: an evolving synergy (pp. 25–30).Google Scholar
- Wu, F., Song, J., Yang, Y., Li, X., Zhang, Z.M, Zhuang, Y. (2015). Structured embedding via pairwise relations and long-range interactions in knowledge base. In AAAI (pp. 1663–1670).Google Scholar
- Xu, C., Bai, Y., Bian, J., Gao, B., Wang, G., Liu, X., Liu, T.Y. (2014). Rc-net: a general framework for incorporating knowledge into word representations. In Proceedings of the 23rd ACM international conference on conference on information and knowledge management (pp. 1219–1228).Google Scholar
- Yeh, E., Ramage, D., Manning, C.D, Agirre, E., Soroa, A. (2009). Wikiwalk: random walks on wikipedia for semantic relatedness. In Proceedings of the 2009 workshop on graph-based methods for natural language processing (pp. 41–49).Google Scholar
- Yu, M., & Dredze, M. (2014). Improving lexical embeddings with semantic knowledge. In ACL (Vol. 2, pp. 545–550).Google Scholar
- Zesch, T., & Gurevych, I. (2007). Analysis of the wikipedia category graph for nlp applications. In Proceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007) (pp. 1–8).Google Scholar
- Zhang, Y., Lai, G., Zhang, M., Zhang, Y., Liu, Y., Ma, S. (2014). Explicit factor models for explainable recommendation based on phrase-level sentiment analysis. In Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval (pp. 83–92). ACM.Google Scholar
- Zheng, G., & Callan, J. (2015). Learning to reweight terms with distributed representations. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval (pp. 575–584). ACM.Google Scholar
- Zuccon, G., Koopman, B., Bruza, P., Azzopardi, L. (2015). Integrating and evaluating neural word embeddings in information retrieval. In Proceedings of the 20th Australasian document computing symposium (p. 12). ACM.Google Scholar