Stream-Based Live Probabilistic Topic Computing and Matching
Public opinion monitoring refers to real-time first story detection (FSD) on a particular Internet news event. It play an important part in finding news propagation tendency. Current opinion monitoring methods are related to text matching. However, it has some limitations such as latent and hidden topic discovery and incorrect relevance ranking of matching results on large-scale data. In this paper, we propose one improved solution to live public opinion monitoring: stream-based live probabilistic topic computing and matching. Our method attempts to address the disadvantages such as semantic matching and low efficiency on timely big data. Topic real-time computing with stream processing paradigm and topic matching with query-time document and field boosting are proposed to make substantial improvements. Finally, our experimental evaluation on topic computing and matching using crawled historical Netease news records shows the high effectiveness and efficiency of the proposed approach.
KeywordsPublic opinion Public sentiment Topic computing Topic matching Probabilistic topic model Stream computing Stream processing Mapreduce
This work was supported by the Science and Technology Program of University of Jinan (XKY1734), the Open Project Joint Funding of Information Science and Engineering School of Linyi University and Discipline Team of Intelligent Logistics and Information Engineering (LDXX2017KF155), the Shandong Provincial Natural Science Foundation (ZR201702170261), the Shandong Provincial Key R&D Program (2015GGX106007 & 2016ZDJS01A12), and the Project of Shandong Province Higher Educational Science and Technology Program (J16LN13).
- 3.Guo, J., Fan, Y., Ai, Q., Croft, W.B.: A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 55–64. ACM (2016)Google Scholar
- 4.Kononenko, O., Baysal, O., Holmes, R., Godfrey, M.W.: Mining modern repositories with elasticsearch. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 328–331. ACM (2014)Google Scholar
- 5.Liu, Z., Zhang, Y., Chang, E.Y., Sun, M.: PLDA+: parallel latent dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 26 (2011)Google Scholar
- 7.Ma, K., Tang, Z., Zhong, J., Yang, B.: LPSMon: a stream-based live public sentiment monitoring system. Lect. Notes Comput. Sci. 9659, 534–536 (2016)Google Scholar
- 9.Ma, K., Yang, B., Yang, Z., Yu, Z.: Segment access-aware dynamic semantic cache in cloud computing environment. J. Parallel Distrib. Comput., 1–10 (2017)Google Scholar
- 10.McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action: Covers Apache Lucene 3.0. Manning Publications Co., Cherry Hill (2010)Google Scholar
- 11.Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: HLT-NAACL, vol. 13, pp. 746–751 (2013)Google Scholar
- 13.Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 1631, p. 1642. Citeseer (2013)Google Scholar
- 17.Zhang, M., Chakrabarti, K.: InfoGather+: semantic matching and annotation of numeric and time-varying attributes in web tables. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 145–156. ACM (2013)Google Scholar