Stream-Based Live Probabilistic Topic Computing and Matching

  • Kun MaEmail author
  • Ziqiang Yu
  • Ke Ji
  • Bo Yang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10393)


Public opinion monitoring refers to real-time first story detection (FSD) on a particular Internet news event. It play an important part in finding news propagation tendency. Current opinion monitoring methods are related to text matching. However, it has some limitations such as latent and hidden topic discovery and incorrect relevance ranking of matching results on large-scale data. In this paper, we propose one improved solution to live public opinion monitoring: stream-based live probabilistic topic computing and matching. Our method attempts to address the disadvantages such as semantic matching and low efficiency on timely big data. Topic real-time computing with stream processing paradigm and topic matching with query-time document and field boosting are proposed to make substantial improvements. Finally, our experimental evaluation on topic computing and matching using crawled historical Netease news records shows the high effectiveness and efficiency of the proposed approach.


Public opinion Public sentiment Topic computing Topic matching Probabilistic topic model Stream computing Stream processing Mapreduce 



This work was supported by the Science and Technology Program of University of Jinan (XKY1734), the Open Project Joint Funding of Information Science and Engineering School of Linyi University and Discipline Team of Intelligent Logistics and Information Engineering (LDXX2017KF155), the Shandong Provincial Natural Science Foundation (ZR201702170261), the Shandong Provincial Key R&D Program (2015GGX106007 & 2016ZDJS01A12), and the Project of Shandong Province Higher Educational Science and Technology Program (J16LN13).


  1. 1.
    Bordes, A., Glorot, X., Weston, J., Bengio, Y.: A semantic matching energy function for learning with multi-relational data. Mach. Learn. 94(2), 233–259 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    De Francisci Morales, G., Gionis, A., Sozio, M.: Social content matching in mapreduce. Proc. VLDB Endow. 4(7), 460–469 (2011)CrossRefGoogle Scholar
  3. 3.
    Guo, J., Fan, Y., Ai, Q., Croft, W.B.: A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 55–64. ACM (2016)Google Scholar
  4. 4.
    Kononenko, O., Baysal, O., Holmes, R., Godfrey, M.W.: Mining modern repositories with elasticsearch. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 328–331. ACM (2014)Google Scholar
  5. 5.
    Liu, Z., Zhang, Y., Chang, E.Y., Sun, M.: PLDA+: parallel latent dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 26 (2011)Google Scholar
  6. 6.
    Ma, K., Dong, F., Yang, B.: Large-scale schema-free data deduplication approach with adaptive sliding window using mapreduce. Comput. J. 58(11), 3187–3201 (2015)CrossRefGoogle Scholar
  7. 7.
    Ma, K., Tang, Z., Zhong, J., Yang, B.: LPSMon: a stream-based live public sentiment monitoring system. Lect. Notes Comput. Sci. 9659, 534–536 (2016)Google Scholar
  8. 8.
    Ma, K., Yang, B.: Stream-based live data replication approach of in-memory cache. Concurrency Comput. Pract. Exp. 29(11), 1–9 (2017)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Ma, K., Yang, B., Yang, Z., Yu, Z.: Segment access-aware dynamic semantic cache in cloud computing environment. J. Parallel Distrib. Comput., 1–10 (2017)Google Scholar
  10. 10.
    McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action: Covers Apache Lucene 3.0. Manning Publications Co., Cherry Hill (2010)Google Scholar
  11. 11.
    Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: HLT-NAACL, vol. 13, pp. 746–751 (2013)Google Scholar
  12. 12.
    Shahi, D.: Apache solr: an introduction. In: Shahi, D. (ed.) Apache Solr, pp. 1–9. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  13. 13.
    Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 1631, p. 1642. Citeseer (2013)Google Scholar
  14. 14.
    Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Interpreting TF-IDF term weights as making relevance decisions. ACM Trans. Inf. Syst. (TOIS) 26(3), 13 (2008)CrossRefGoogle Scholar
  15. 15.
    Zhai, Z., Xu, H., Kang, B., Jia, P.: Exploiting effective features for Chinese sentiment classification. Expert Syst. Appl. 38(8), 9139–9146 (2011)CrossRefGoogle Scholar
  16. 16.
    Zhang, D., Xu, H., Su, Z., Xu, Y.: Chinese comments sentiment classification based on word2vec and SVM perf. Expert Syst. Appl. 42(4), 1857–1863 (2015)CrossRefGoogle Scholar
  17. 17.
    Zhang, M., Chakrabarti, K.: InfoGather+: semantic matching and annotation of numeric and time-varying attributes in web tables. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 145–156. ACM (2013)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Shandong Provincial Key Laboratory of Network Based Intelligent ComputingUniversity of JinanJinanChina

Personalised recommendations