Short Text Understanding Based on Conceptual and Semantic Enrichment

  • Qiuyan Shi
  • Yongli Wang
  • Jianhong Sun
  • Anmin Fu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11323)


Due to the limited length and freely constructed sentence structures, short text is different from normal text, which makes traditional algorithm of text representation does not work well on it. This paper proposes a model called Conceptual and Semantic Enrichment with Topic Model (CSET) by combining Biterm Topic Model (BTM), a widely used probabilistic topic model which is designed for short text with Probase, a large-scale probabilistic knowledge base. CSET is able to capture semantic relations between words to enrich short text. Our model enables large amount of applications that rely on semantic understanding of short text, including short text classification and word similarity measurement in context.


Short text Text enrichment Similarity 



This work is supported in part by the National Natural Science Foundation of China under Grant 61170035, 61272420 and 81674099, Six talent peaks project in Jiangsu Province (Grant No. 2014 WLW-004), the Fundamental Research Funds for the Central Universities (Grant No. 30916011328, 30918015103), Jiangsu Province special funds for transformation of science and technology achievement (Grant No. BA2013047).


  1. 1.
    Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 787–788 (2007)Google Scholar
  2. 2.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J Mach. Learn. Res. Arch. 3, 993–1022 (2003)zbMATHGoogle Scholar
  3. 3.
    Chen, M., Shen, D., Shen, D.: Short text classification improved by learning multi-granularity topics. In: International Joint Conference on Artificial Intelligence, pp. 1776–1781 (2011)Google Scholar
  4. 4.
    Hu, J., et al.: Enhancing text clustering by leveraging Wikipedia semantics, pp. 179–186 (2008)Google Scholar
  5. 5.
    Kim, D., Wang, H., Oh, A.: Context-dependent conceptualization. In: International Joint Conference on Artificial Intelligence, pp. 2654–2661 (2013)Google Scholar
  6. 6.
    Ning, Y.H., Zhang, L., Ju, Y.R., Wang, W.J., Li, S.Q.: Using semantic correlation of hownet for short text classification. Appl. Mech. Mater. 513–517, 1931–1934 (2014)CrossRefGoogle Scholar
  7. 7.
    Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW, pp. 91–100 (2015)Google Scholar
  8. 8.
    Pietra, S.A.D., Pietra, S.A.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22, 39–71 (1996)zbMATHGoogle Scholar
  9. 9.
    Shen, D., et al.: Query enrichment for web-query classification. ACM Trans. Inf. Syst. 24(3), 320–352 (2006)CrossRefGoogle Scholar
  10. 10.
    Song, Y., Wang, H., Wang, Z., Li, H., Chen, W.: Short text conceptualization using a probabilistic knowledgebase. In: International Joint Conference on Artificial Intelligence, pp. 2330–2336 (2011)Google Scholar
  11. 11.
    Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding, pp. 481–492 (2012)Google Scholar
  12. 12.
    Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts, pp. 1445–1456 (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Qiuyan Shi
    • 1
  • Yongli Wang
    • 1
  • Jianhong Sun
    • 2
  • Anmin Fu
    • 1
  1. 1.School of Computer Science and EngineeringNanjing University of Science and Technology LibraryNanjingChina
  2. 2.School of Electronic Engineering and Optical EngineeringNanjing University of Science and Technology LibraryNanjingChina

Personalised recommendations