Learning Task Specific Distributed Paragraph Representations Using a 2-Tier Convolutional Neural Network

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9489)


We introduce a type of 2-tier convolutional neural network model for learning distributed paragraph representations for a special task (e.g. paragraph or short document level sentiment analysis and text topic categorization). We decompose the paragraph semantics into 3 cascaded constitutes: word representation, sentence composition and document composition. Specifically, we learn distributed word representations by a continuous bag-of-words model from a large unstructured text corpus. Then, using these word representations as pre-trained vectors, distributed task specific sentence representations are learned from a sentence level corpus with task-specific labels by the first tier of our model. Using these sentence representations as distributed paragraph representation vectors, distributed paragraph representations are learned from a paragraph-level corpus by the second tier of our model. It is evaluated on DBpedia ontology classification dataset and Amazon review dataset. Empirical results show the effectiveness of our proposed learning model for generating distributed paragraph representations.


Natural language processing Distributed representation Convolutional neural network 



This work is supported by the National Natural Science Foundation of China (No. 61370165, 61203378), National 863 Program of China 2015AA015405, the Natural Science Foundation of Guangdong Province (No. S2013010014475), Shenzhen Development and Reform Commission Grant No.[2014]1507, Shenzhen Peacock Plan Research Grant KQCX20140521144507925 and Baidu Collaborate Research Funding.


  1. 1.
    Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)MATHGoogle Scholar
  2. 2.
    Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia-a crystallization point for the web of data. Web Semant. Sci. Serv Agents World Wide Web 7(3), 154–165 (2009)CrossRefGoogle Scholar
  3. 3.
    Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning (ICML), pp. 160–167. ACM (2008)Google Scholar
  4. 4.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)MATHGoogle Scholar
  5. 5.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)MATHGoogle Scholar
  6. 6.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD explorations newsletter 11(1), 10–18 (2009)CrossRefGoogle Scholar
  7. 7.
    Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751 (2014)Google Scholar
  8. 8.
    Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31th International Conference on Machine Learning (ICML). pp. 1188–1196 (2014)Google Scholar
  9. 9.
    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., et al.: Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web J. 5, 1–29 (2014)Google Scholar
  10. 10.
    McAuley, J., Leskovec, J.: Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys (2013)Google Scholar
  11. 11.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at the International Conference on Learning Representations (ICLR) (2013)Google Scholar
  12. 12.
    Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  13. 13.
    Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in Neural Information Processing Systems (NIPS), pp. 1081–1088 (2009)Google Scholar
  14. 14.
    Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. In: Proceedings of the International Workshop on Artificial Intelligence and Statistics, pp. 246–252. Citeseer (2005)Google Scholar
  15. 15.
    dos Santos, C.N., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of the 25th International Conference on Computational Linguistics (COLING). Dublin, Ireland (2014)Google Scholar
  16. 16.
    Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 1631–1642. Citeseer (2013)Google Scholar
  17. 17.
    Zhang, X., LeCun, Y.: Text understanding from scratch. arXiv preprint arXiv:1502.01710 (2015)

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Shenzhen Engineering Laboratory of Performance Robots at Digital Stage, Shenzhen Graduate SchoolHarbin Institute of TechnologyShenzhenChina
  2. 2.School of Engineering and Applied ScienceAston UniversityBirminghamUK

Personalised recommendations