Advertisement

Cluster Computing

, Volume 19, Issue 3, pp 1399–1410 | Cite as

Building associated semantic representation model for the ultra-short microblog text jumping in big data

  • Shunxiang ZhangEmail author
  • Yin Wang
  • Shiyao Zhang
  • Guangli Zhu
Article

Abstract

In the massive microblog texts, the ultra-short microblog text is difficult to be independently understood because of its special characteristics such as data sparseness, content fragmentation and so on. To solve this problem, this paper presents an associated semantic representation model for the ultra-short microblog text (ASRM-UMT) to help users understand it better. First, multi-layer associated semantic views of the ultra-short microblog text are built. The ICTCLAS system is adopted to extract the feature keywords from microblog texts. The mining algorithm of associated semantic on a dynamic time window is proposed to mine the associated semantic relations among the feature keywords. The mining process has deeply considered three aspects including context, comments and transmissions of microblog texts. Then, multi-layer associated semantic views of the ultra-short microblog text are optimized. The comparison of the clustering coefficients among several multi-layer associated semantic views is presented to select the optimal associated semantic view. Experimental results show that the proposed model can represent the ultra-short microblog text accurately and effectively.

Keywords

Ultra-short microblog text Multi-layer associated semantic view Dynamic time window Associated semantic representation model Clustering coefficient 

References

  1. 1.
    Li, J., Fan, Q., Zhang, K.: Keyword extraction based on tf/idf for Chinese news document. Wuhan Univ. J. Nat. Sci. 12(5), 917–921 (2007)CrossRefGoogle Scholar
  2. 2.
    Wartena, C., Brussee, R., Slakhorst, W.: Keyword extraction using word co-occurrence. In: Proceedings of the 2010 Workshops on Database and Expert Systems Applications, IEEE Computer Society, pp. 54–58 (2010)Google Scholar
  3. 3.
    Jiao, H., Liu, Q., Jia, H.B.: Chinese keyword extraction based on N-gram and word co-occurrence. In: Computational Intelligence and Security Workshops, 2007 (CISW 2007), pp. 152–155 (2007)Google Scholar
  4. 4.
    Zhao, W., Hou, X.: News topic recognition of Chinese microblog based on word co-occurrence graph. CAAI Trans. Intell. Syst. 7(5), 444–449 (2012)Google Scholar
  5. 5.
    Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization[C]. In: Mmies 08 Workshop on Multi-source Multilingual Information Extraction and Summar, pp. 17–24 (2008)Google Scholar
  6. 6.
    Chien, L.F.: PAT-tree-based keyword extraction for Chinese information retrieval. In: Machinery, ACM SIGIR Forum, Association for Computing, pp. 221–222 (1989)Google Scholar
  7. 7.
    Zhang, K., Xu, H., Tang, J., et al.: Keyword extraction using support vector machine. Lecture Notes in Computer Science, vol. 4016, pp. 85–96 (2006)Google Scholar
  8. 8.
    Wang, L., Zheng, T., Cheng, Q., et al.: Discovering news topics from microblogs based on semantic co-occurrence. Comput. Eng. Appl. 50(17), 150–154 (2014)Google Scholar
  9. 9.
    Zhang, S., Wang. Y., Liu, W., et al.: A model for estimating the out-degree of nodes in associated semantic network from semantic feature view. Concurr. Comp. Pract. E (2016). doi: 10.1002/cpe.3819
  10. 10.
    Razavi, A.H., Inkpen, D.: Text Representation using Multi-level Latent Dirichlet Allocation. In: Sokolova, M., van Beek, P., et al. (eds.) Advances in artificial intelligence, pp. 215–226. Springer, New York (2014)CrossRefGoogle Scholar
  11. 11.
    Tang, W., Chen, X., Xu, Z.: A Semantic Representation of Microblog Short Text Based on Topic Model. Springer, Berlin (2014)Google Scholar
  12. 12.
    Hu, J., Xiong, C., Shu, J., et al.: A novel method of three dimensional text representation. In: IEEE, pp. 1–4 (2009)Google Scholar
  13. 13.
    Delpisheh, E., An, A.: Topic Modeling Using Collapsed Typed Dependency Relations. In: Tseng, V.S., et al. (eds.) Advances in knowledge discovery and data mining, pp. 146–161. Springer, New York (2014)CrossRefGoogle Scholar
  14. 14.
    Xuan, W.F., Liu, B.Q., Sun, C.J., et al.: Finding main topics in blogosphere using document clustering based on topic model. In: Machine Learning and Cybernetics (ICMLC), IEEE, pp. 1902–1908, (2011)Google Scholar
  15. 15.
    Luo, X., Zhang, J., Ye, F., et al.: Power series representation model of text knowledge based on human concept learning. IEEE Syst. Man Cybern. Syst. 44(1), 86–102 (2014)CrossRefGoogle Scholar
  16. 16.
    Wu, S., Zhang, Z., Qian, Q.: Research on text representation model based on language network. Inf. Sci. 12(31), 119–125 (2013)Google Scholar
  17. 17.
    Liao, T., Liu, Z., Wang, X.: Research on event-based method for text representation. Comput. Sci. 39(12), 188–191 (2012)Google Scholar
  18. 18.
    Wu, J., Liu, Q.: Research on graph structure based method for Chinese text representation. J. China Soc. Sci. Tech. 29(4), 618–624 (2010)Google Scholar
  19. 19.
    Li, G., Mao, J.: A review on text graph representation and its application in mining. J. China Soc. Sci. Tech. 32(12), 1257–1264 (2013)MathSciNetGoogle Scholar
  20. 20.
    Rusu, D., Fortuna, B., Mladenic, D., et al.: Visual analysis of documents with semantic graphs [EB/OL]. http://www.hiit.fi/vakd09/papers.html. Accessed 10 Aug 2012
  21. 21.
    Qu, Q., Qiu, J.G., Sun, C.Y., et al.: Graph-based knowledge representation model and pattern retrieval. In: FSKD, IEEE, pp. 541–545 (2008)Google Scholar
  22. 22.
    Xu, Z., Liu, Y., Mei, L., et al.: Semantic based representing and organizing surveillance big data using video structural description technology. J. Syst. Softw. 102, 217–225 (2014)CrossRefGoogle Scholar
  23. 23.
    Xu, Z., Liu, Y., Mei, L., et al.: Generating temporal semantic context of concepts using web search engines. J. Netw. Comput. Appl. 43(1), 42–55 (2014)CrossRefGoogle Scholar
  24. 24.
    Luo, X., Xu, Z., Yu, J., et al.: Building association link network for semantic link on web resources. IEEE Trans. Autom. Sci. Eng. 8(3), 482–494 (2011)CrossRefGoogle Scholar
  25. 25.
    Bernard, T., Bui, A., Pilard, L., et al.: A distributed clustering algorithm for large-scale dynamic networks. Clust. Comput. 15(4), 335–350 (2012)CrossRefGoogle Scholar
  26. 26.
    Yu, Z., Wang, H., Lin, X., et al.: Understanding short texts through semantic enrichment and hashing. IEEE Knowl. Data Eng. 28(2), 566–579 (2016)CrossRefGoogle Scholar
  27. 27.
    Tang, J., Wang, X., Gao, H., et al.: Enriching short text representation in microblog for clustering. Front. Comput. Sci. 6(1), 88–101 (2012)MathSciNetzbMATHGoogle Scholar
  28. 28.
    Chen, Y., Li, F., Fan, J.: Mining association rules in big data with NGEP. Clust. Comput. 18(2), 577–585 (2015)CrossRefGoogle Scholar
  29. 29.
    Tan, P.N.: An Introduction to Data Mining. Turing series of Computer Science. People’s Posts and Telecommunications press, Beijing (2011)Google Scholar
  30. 30.
    Wang, X.F.: Complex Network Theory and Its Application. Tsinghua University press, Beijing (2006)Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Shunxiang Zhang
    • 1
    Email author
  • Yin Wang
    • 1
  • Shiyao Zhang
    • 1
  • Guangli Zhu
    • 1
  1. 1.Anhui University of Science and TechnologyHuainanChina

Personalised recommendations