Skip to main content
Log in

Building associated semantic representation model for the ultra-short microblog text jumping in big data

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

In the massive microblog texts, the ultra-short microblog text is difficult to be independently understood because of its special characteristics such as data sparseness, content fragmentation and so on. To solve this problem, this paper presents an associated semantic representation model for the ultra-short microblog text (ASRM-UMT) to help users understand it better. First, multi-layer associated semantic views of the ultra-short microblog text are built. The ICTCLAS system is adopted to extract the feature keywords from microblog texts. The mining algorithm of associated semantic on a dynamic time window is proposed to mine the associated semantic relations among the feature keywords. The mining process has deeply considered three aspects including context, comments and transmissions of microblog texts. Then, multi-layer associated semantic views of the ultra-short microblog text are optimized. The comparison of the clustering coefficients among several multi-layer associated semantic views is presented to select the optimal associated semantic view. Experimental results show that the proposed model can represent the ultra-short microblog text accurately and effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Li, J., Fan, Q., Zhang, K.: Keyword extraction based on tf/idf for Chinese news document. Wuhan Univ. J. Nat. Sci. 12(5), 917–921 (2007)

    Article  Google Scholar 

  2. Wartena, C., Brussee, R., Slakhorst, W.: Keyword extraction using word co-occurrence. In: Proceedings of the 2010 Workshops on Database and Expert Systems Applications, IEEE Computer Society, pp. 54–58 (2010)

  3. Jiao, H., Liu, Q., Jia, H.B.: Chinese keyword extraction based on N-gram and word co-occurrence. In: Computational Intelligence and Security Workshops, 2007 (CISW 2007), pp. 152–155 (2007)

  4. Zhao, W., Hou, X.: News topic recognition of Chinese microblog based on word co-occurrence graph. CAAI Trans. Intell. Syst. 7(5), 444–449 (2012)

    Google Scholar 

  5. Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization[C]. In: Mmies 08 Workshop on Multi-source Multilingual Information Extraction and Summar, pp. 17–24 (2008)

  6. Chien, L.F.: PAT-tree-based keyword extraction for Chinese information retrieval. In: Machinery, ACM SIGIR Forum, Association for Computing, pp. 221–222 (1989)

  7. Zhang, K., Xu, H., Tang, J., et al.: Keyword extraction using support vector machine. Lecture Notes in Computer Science, vol. 4016, pp. 85–96 (2006)

  8. Wang, L., Zheng, T., Cheng, Q., et al.: Discovering news topics from microblogs based on semantic co-occurrence. Comput. Eng. Appl. 50(17), 150–154 (2014)

    Google Scholar 

  9. Zhang, S., Wang. Y., Liu, W., et al.: A model for estimating the out-degree of nodes in associated semantic network from semantic feature view. Concurr. Comp. Pract. E (2016). doi:10.1002/cpe.3819

  10. Razavi, A.H., Inkpen, D.: Text Representation using Multi-level Latent Dirichlet Allocation. In: Sokolova, M., van Beek, P., et al. (eds.) Advances in artificial intelligence, pp. 215–226. Springer, New York (2014)

    Chapter  Google Scholar 

  11. Tang, W., Chen, X., Xu, Z.: A Semantic Representation of Microblog Short Text Based on Topic Model. Springer, Berlin (2014)

    Google Scholar 

  12. Hu, J., Xiong, C., Shu, J., et al.: A novel method of three dimensional text representation. In: IEEE, pp. 1–4 (2009)

  13. Delpisheh, E., An, A.: Topic Modeling Using Collapsed Typed Dependency Relations. In: Tseng, V.S., et al. (eds.) Advances in knowledge discovery and data mining, pp. 146–161. Springer, New York (2014)

    Chapter  Google Scholar 

  14. Xuan, W.F., Liu, B.Q., Sun, C.J., et al.: Finding main topics in blogosphere using document clustering based on topic model. In: Machine Learning and Cybernetics (ICMLC), IEEE, pp. 1902–1908, (2011)

  15. Luo, X., Zhang, J., Ye, F., et al.: Power series representation model of text knowledge based on human concept learning. IEEE Syst. Man Cybern. Syst. 44(1), 86–102 (2014)

    Article  Google Scholar 

  16. Wu, S., Zhang, Z., Qian, Q.: Research on text representation model based on language network. Inf. Sci. 12(31), 119–125 (2013)

    Google Scholar 

  17. Liao, T., Liu, Z., Wang, X.: Research on event-based method for text representation. Comput. Sci. 39(12), 188–191 (2012)

    Google Scholar 

  18. Wu, J., Liu, Q.: Research on graph structure based method for Chinese text representation. J. China Soc. Sci. Tech. 29(4), 618–624 (2010)

    Google Scholar 

  19. Li, G., Mao, J.: A review on text graph representation and its application in mining. J. China Soc. Sci. Tech. 32(12), 1257–1264 (2013)

    MathSciNet  Google Scholar 

  20. Rusu, D., Fortuna, B., Mladenic, D., et al.: Visual analysis of documents with semantic graphs [EB/OL]. http://www.hiit.fi/vakd09/papers.html. Accessed 10 Aug 2012

  21. Qu, Q., Qiu, J.G., Sun, C.Y., et al.: Graph-based knowledge representation model and pattern retrieval. In: FSKD, IEEE, pp. 541–545 (2008)

  22. Xu, Z., Liu, Y., Mei, L., et al.: Semantic based representing and organizing surveillance big data using video structural description technology. J. Syst. Softw. 102, 217–225 (2014)

    Article  Google Scholar 

  23. Xu, Z., Liu, Y., Mei, L., et al.: Generating temporal semantic context of concepts using web search engines. J. Netw. Comput. Appl. 43(1), 42–55 (2014)

    Article  Google Scholar 

  24. Luo, X., Xu, Z., Yu, J., et al.: Building association link network for semantic link on web resources. IEEE Trans. Autom. Sci. Eng. 8(3), 482–494 (2011)

    Article  Google Scholar 

  25. Bernard, T., Bui, A., Pilard, L., et al.: A distributed clustering algorithm for large-scale dynamic networks. Clust. Comput. 15(4), 335–350 (2012)

    Article  Google Scholar 

  26. Yu, Z., Wang, H., Lin, X., et al.: Understanding short texts through semantic enrichment and hashing. IEEE Knowl. Data Eng. 28(2), 566–579 (2016)

    Article  Google Scholar 

  27. Tang, J., Wang, X., Gao, H., et al.: Enriching short text representation in microblog for clustering. Front. Comput. Sci. 6(1), 88–101 (2012)

    MathSciNet  MATH  Google Scholar 

  28. Chen, Y., Li, F., Fan, J.: Mining association rules in big data with NGEP. Clust. Comput. 18(2), 577–585 (2015)

    Article  Google Scholar 

  29. Tan, P.N.: An Introduction to Data Mining. Turing series of Computer Science. People’s Posts and Telecommunications press, Beijing (2011)

    Google Scholar 

  30. Wang, X.F.: Complex Network Theory and Its Application. Tsinghua University press, Beijing (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shunxiang Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, S., Wang, Y., Zhang, S. et al. Building associated semantic representation model for the ultra-short microblog text jumping in big data. Cluster Comput 19, 1399–1410 (2016). https://doi.org/10.1007/s10586-016-0602-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-016-0602-9

Keywords

Navigation