ECML PKDD 2017: Machine Learning and Knowledge Discovery in Databases pp 753-769 | Cite as
Con-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec
Abstract
We present a novel approach to learn distributed representation of sentences from unlabeled data by modeling both content and context of a sentence. The content model learns sentence representation by predicting its words. On the other hand, the context model comprises a neighbor prediction component and a regularizer to model distributional and proximity hypotheses, respectively. We propose an online algorithm to train the model components jointly. We evaluate the models in a setup, where contextual information is available. The experimental results on tasks involving classification, clustering, and ranking of sentences show that our model outperforms the best existing models by a wide margin across multiple datasets.
Code related to this chapter is available at: https://github.com/tksaha/con-s2v/tree/jointlearning
Data related to this chapter are available at: https://www.dropbox.com/sh/ruhsi3c0unn0nko/AAAgVnZpojvXx9loQ21WP_MYa?dl=0
Keywords
Sen2Vec Extra-sentential context Embedding of sentencesNotes
Acknowledgments
This research is partially supported by Mohammad Hasan’s NSF CAREER Award (IIS-1149851).
References
- 1.Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, vol. 14, pp. 1188–1196 (2014)Google Scholar
- 2.Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia, pp. 746–751. Association for Computational Linguistics, June 2013Google Scholar
- 3.Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)MATHGoogle Scholar
- 4.Socher, R., Lin, C.C.Y., Ng, A.Y., Manning, C.D.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th International Conference on Machine Learning (ICML), pp. 129–136 (2011)Google Scholar
- 5.Stede, M.: Discourse Processing. Morgan & Claypool Publishers, San Rafael (2011)Google Scholar
- 6.Hobbs, J.R.: Coherence and coreference. Cogn. Sci. 3(1), 67–90 (1979)CrossRefGoogle Scholar
- 7.Kiros, R., Zhu, Y., Salakhutdinov, R., Zemel, R.S., Torralba, A., Urtasun, R., Fidler, S.: Skip-thought vectors. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS 2015, Montreal, Canada, pp. 3294–3302. MIT Press (2015)Google Scholar
- 8.Hill, F., Cho, K., Korhonen, A.: Learning distributed representations of sentences from unlabelled data. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, pp. 1367–1377. Association for Computational Linguistics, June 2016Google Scholar
- 9.Harris, Z.: Distributional structure. Word 10, 146–162 (1954)CrossRefGoogle Scholar
- 10.Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
- 11.Pham, N.T., Kruszewski, G., Lazaridou, A., Baroni, M.: Jointly optimizing word representations for lexical and sentential tasks with the C-PHRASE model. In: ACL 2015, 26–31 July 2015, Beijing, China, vol. 1: Long Papers, pp. 971–981 (2015)Google Scholar
- 12.Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cogn. Sci. 34(8), 1388–1439 (2010)CrossRefGoogle Scholar
- 13.Cho, K., van Merriënboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734. Association for Computational Linguistics, October 2014Google Scholar
- 14.Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. In: Cowell, R.G., Ghahramani, Z. (eds.) Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, Society for Artificial Intelligence and Statistics, pp. 246–252 (2005)Google Scholar
- 15.Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Teh, Y., Titterington, M. (eds.) Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS). JMLR W&CP, vol. 9, pp. 297–304 (2010)Google Scholar
- 16.Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS 2013, pp. 3111–3119 (2013)Google Scholar
- 17.Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E., Smith, N.A.: Retrofitting word vectors to semantic lexicons. In: Proceedings of NAACL (2015)Google Scholar
- 18.Halliday, M., Hasan, R.: Cohesion in English. Longman, London (1976)Google Scholar
- 19.Malioutov, I., Barzilay, R.: Minimum cut model for spoken lecture segmentation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, ACL-44, Sydney, Australia, pp. 25–32. Association for Computational Linguistics (2006)Google Scholar
- 20.Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Int. Res. 22(1), 457–479 (2004)Google Scholar
- 21.Wolf, F., Gibson, E.: Representing discourse coherence: a corpus-based study. Comput. Linguist. 31, 249–288 (2005)CrossRefGoogle Scholar
- 22.Nenkova, A., McKeown, K.: Automatic summarization. Found. Trends Inf. Retrieval 5(23), 103–233 (2011)CrossRefGoogle Scholar
- 23.Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web (1999)Google Scholar
- 24.Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, Barcelona, Spain, vol. 8 (2004)Google Scholar
- 25.Hermann, K.M., Blunsom, P.: Multilingual models for compositional distributed semantics. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, vol. 1: Long Papers, pp. 58–68. Association for Computational Linguistics, June 2014Google Scholar
- 26.Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)Google Scholar
- 27.Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: EMNLP-CoNLL, vol. 7, pp. 410–420 (2007)Google Scholar
- 28.Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)MathSciNetMATHGoogle Scholar
- 29.Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, pp. 1631–1642. Association for Computational Linguistics, October 2013Google Scholar