Skip to main content

Advertisement

Log in

Automatic extraction of persistent topics from social text streams

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Microblogging services allow users to publish their thoughts, activities, and interests in the form of text streams and to share them with others in a social network. A user’s text stream in a microblogging service is temporally composed of the posts the user has written or republished from other socially connected users. In this context, most research on the microblogging service has primarily focused on social graph or topic extraction from the text streams, and in particular, several studies attempted to discover user’s topics of interests from a text stream since the topics play a crucial role in user search, friend recommendation, and contextual advertisement. Yet, they did not yet fully address unique properties of the stream. In this paper, we study a problem of detecting the topics of long-term steady interests to a user from a text stream, considering its dynamic and social characteristics, and propose a graph-based topic extraction model. Extensive experiments have been carried out to investigate the effects of the proposed approach by using a real-world dataset, and the proposed model is shown to produce better performance than the existing alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal, C.C., Zhai, C.: Mining Text Data. Springer (2012)

  2. Bostandjiev, S., ODonovan, J. Hllerer, T.: TasteWeights: a visual interactive hybrid recommender system. In: Proc. 6th ACM Conference on Recommender Systems (2012)

  3. Box, G.E.P., Jenkins, G.M.: Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco (1976)

    MATH  Google Scholar 

  4. Can, F.: Incremental clustering for dynamic information processing. ACM Trans. Inf. Syst. 11(2), 143–164 (1993)

    Article  MathSciNet  Google Scholar 

  5. Cataldi, M., Caro, L.D., Schifanella, C.: Emerging topic detection on Twitter based on temporal and social terms evaluation. In: Proc. 10th International Workshop on Multimedia Data Mining Table of Contents (2010)

  6. Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering anddynamic information retrieval. In: Proc. 29th ACM Symp. Theory of Computing (1997)

  7. Chen, J., Nairn, R., Nelson, L., Bernstein, M., Chi, E.: Short and tweet:experiments on recommending content from information streams. In: Proc. Int. Conf. Human Factors in Computing Systems (2010)

  8. Chen, K., Luesukprasert, L., Chou, S.: Hot topic extraction based on timeline analysis and multidimensional sentence modeling. IEEE Trans. Knowl. Data Eng. 19(8), 1016–1025 (2007)

    Article  Google Scholar 

  9. Chen, P., Lin, S.: Automatic keyword prediction using google similarity distance. Exp. Syst. Appl. 37(3), 1928–1938 (2010)

    Article  Google Scholar 

  10. Edda, L., Jorg, K.: Text categorization with support vector machines. How to represent texts in input space? Mach. Learn. 46, 423–444 (2002)

    Article  MATH  Google Scholar 

  11. Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proc. 16th Int. Joint Conf. Artificial Intelligence (IJCAI) (1999)

  12. Hannon, J., Bennett, M., Smyth, B.: Recommending twitter users to follow using content and collaborative filtering approaches. In: Proc. 4th ACM Conf. Recommender Systems (2010)

  13. Hong, L., Davison, B.D.: Empirical study of topic modeling in twitter. In: Proc. 1st Workshop on Social Media Analytics (SOMA) (2010)

  14. Hristidis, V., Valdivia, O., Vlachos, M., Yu, P.S.: Information discoveryacross multiple streams. Inform. Sci. 179, 3268–3285 (2009)

    Article  Google Scholar 

  15. Hu, C.L., Chou, C.K.: Rss watchdog: an instant event monitor on real onlinenews streams. In: Proc. 18th ACM Conf. Information and Knowledge Management (CIKM) (2009)

  16. Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understandingmicroblogging usage and communities. In: Proc. 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis (2007)

  17. Jones, R., Diaz, F.: Temporal profiles of queries. ACM Trans. Inf. Syst. 25(3) (2007). doi:10.1145/1247715.1247720

  18. Kleinberg, J.: Temporal dynamics of on-line information streams. In: Garofalakis, M., Gehrke, J., Rastogi, R. (eds.) Data Stream Management: Processing High-Speed Data Streams. Springer, Heidelberg (2004)

  19. Krulwich, B., Burkey, C.: Learning user information interests through the extraction of semantically significant phrases. In: Proc. Symp. Machine Learning in Information Access (AAAI) (1996)

  20. Kumar, R., Novak, J., Raghavan, P., Tomkins, A.: On the bursty evolution of blogspace. World Wide Web 8(2), 159–178 (2005)

    Article  Google Scholar 

  21. Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or anews media? In: Proc. 19th Int. Conf. World Wide Web (WWW) (2010)

  22. Lappas, T., Arai, B., Platakis, M., Kotsakos, D., Gunopulos, M.: On burstiness-aware search for document sequences. In: Proc. SIGKDD (2009)

  23. Lee, L.H., Isa, D., Choo, W.O., Chue, W.Y.: High relevance keyword extractionfacility for bayesian text classification on different domains of varyingcharacteristic. Exp. Syst. Appl. 39(1), 1147–1155 (2011)

    Article  Google Scholar 

  24. Li, C., Weng, J., He, Q., Yao, Y., Datta, A., Sun, A., Lee, B.-S.: TwiNER: unsupervised named entity recognition in targeted Twitter stream. In: Proc. SIGIR (2012)

  25. Li, Z., Zhou, D., Juan, Y., Han, J.: Keyword extraction for social snippets. In: Proc. 19th Int. Conf. World Wide Web (WWW) (2010)

  26. Lin, C., Lin., C, Li, J., Wang, D., Chen. Y., Li, T.: Generating event storylines from microblogs. In: Proc. 21st ACM Conference on Information and Knowledge Management (CIKM) (2012)

  27. Mei, Q., Zhai, C.: Discovery evolutionary theme patterns from text - anexploration of temporal text mining. In: Proc. 11th ACM SIGKDD Int. Conf. Knowledge Discovery in Data Mining (2005)

  28. Michelson, M., Macskassy, S.A.: Discovering users topics of interest ontwitter: a first look. In: Proc. 4th Workshop on Analytics for Noisy Unstructured Text Data(2010)

  29. Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP)(2004)

  30. Nguen, T.D., Kan, M.Y.: Keyphrase extraction in scientific publications. In: Proc. 10th Int. Conf. Asian Digital Libraries (ICADL) (2007)

  31. OConnor, B., Krieger, M., Ahn, D.: TweetMotif: exploratory search and topic summarization for twitter. In: Proc. ICWSM (2010)

  32. Oka, M., Abe, H., Kato, K.: Extracting topics from weblogs through frequency segments. In: Proc. 15th Int. Conf. World Wide Web (WWW) (2006)

  33. Park, J., Shin, Y., Kim, K., Chung, B.: Searching social media streams on theweb. IEEE Intell. Syst. 25(6), 24–31 (2010)

    Article  Google Scholar 

  34. Pennacchiotti, M., Gurumurthy, S.: Investigating topic models for social media user recommendation. In: Proc. 20th Int. Conf. World Wide Web (WWW) (2011)

  35. Pennacchiotti, M., Popescu, A.: Democrats, Republicans and Starbucks Afficionados: user classification in Twitter. In: Proc. 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2011)

  36. Puniani, K., Eisenstein, J., Cohen, S., Xing, E.P.: Social links from latenttopics in microblog. In: Proc. NAACL HLT 2010 Workshop on Computational Linguistics in a Worldof Social Media (2010)

  37. Ramage, D., Dumais, S., Liebling, D.: Characterizing microblogs with topicmodels. In: Proc. 4th Int. Conf. Weblogs and Social Media (2010)

  38. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  39. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)

    Article  Google Scholar 

  40. Song, M., Song, I.Y., Hu, X.: Kpspotter: a flexible information gain-basedkeyphrase extraction system. In: Proc. 5th ACM Int. Workshop on Web Information and Data Management(WIDM) (2003)

  41. Steier, A.M., Belew, R.K.: Exporting phrases: a statistical analysis of topicallanguage. In: Proc. 2nd Symp. Document Analysis and Information Retrieval (1993)

  42. Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retr. 2(4), 303–336 (2000)

    Article  Google Scholar 

  43. Viermetz, M., Skubacz, M., Ziegler, C., Seipel, D.: Tracking topic evolution in news environment. In: Proc. 10th IEEE Conference on E-Commerce Technology and 5th IEEE Conference on Enterprise Computing, E-Commerce and E-Services (2008)

  44. Wan, X., Xiao, J.: Exploiting neighborhood knowledge for single documentsummarization and keyphrase extraction. ACM Trans. Inf. Syst. 28(2) (2010). doi:10.1145/1247715.1247720

  45. Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach forsimultaneous document summarization and keyword extraction. In: Proc. 45th Association of Computational Linguistics (2007)

  46. Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitiveinfluential twitterers. In: Proc. 3rd ACM Int. Conf. Web Search and Data Mining (2010)

  47. Wu, S., Li, Y., Xu, Y.: Deploying approaches for pattern refinement in text mining. In: Proc. 2006 IEEE International Conference on Data Mining (2006)

  48. Yih, W.T., Goodman, J., Carvalho, V.R.: Finding advertising keywords on webpages. In: Proc. 15th Int. Conf. World Wide Web (WWW) (2006)

  49. Zha, H.: Generic summarization and keyphrase extraction using mutualreinforcement principle and sentence clustering. In: Proc. 25th Int. ACM SIGIR Conf. Research and Development in Information Retrieval (2002)

    Google Scholar 

  50. Zhong, S.: Efficient streaming text clustering. Neural Netw. 18(5–6), 790–798 (2005)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonghun Park.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shin, Y., Ryo, C. & Park, J. Automatic extraction of persistent topics from social text streams. World Wide Web 17, 1395–1420 (2014). https://doi.org/10.1007/s11280-013-0251-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-013-0251-3

Keywords

Navigation