Automatic extraction of persistent topics from social text streams

Shin, Yongwook; Ryo, Chuhyeop; Park, Jonghun

doi:10.1007/s11280-013-0251-3

Automatic extraction of persistent topics from social text streams

Published: 23 August 2013

Volume 17, pages 1395–1420, (2014)
Cite this article

World Wide Web Aims and scope Submit manuscript

Yongwook Shin¹,
Chuhyeop Ryo¹ &
Jonghun Park¹

608 Accesses
6 Citations
3 Altmetric
Explore all metrics

Abstract

Microblogging services allow users to publish their thoughts, activities, and interests in the form of text streams and to share them with others in a social network. A user’s text stream in a microblogging service is temporally composed of the posts the user has written or republished from other socially connected users. In this context, most research on the microblogging service has primarily focused on social graph or topic extraction from the text streams, and in particular, several studies attempted to discover user’s topics of interests from a text stream since the topics play a crucial role in user search, friend recommendation, and contextual advertisement. Yet, they did not yet fully address unique properties of the stream. In this paper, we study a problem of detecting the topics of long-term steady interests to a user from a text stream, considering its dynamic and social characteristics, and propose a graph-based topic extraction model. Extensive experiments have been carried out to investigate the effects of the proposed approach by using a real-world dataset, and the proposed model is shown to produce better performance than the existing alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aggarwal, C.C., Zhai, C.: Mining Text Data. Springer (2012)
Bostandjiev, S., ODonovan, J. Hllerer, T.: TasteWeights: a visual interactive hybrid recommender system. In: Proc. 6th ACM Conference on Recommender Systems (2012)
Box, G.E.P., Jenkins, G.M.: Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco (1976)
MATH Google Scholar
Can, F.: Incremental clustering for dynamic information processing. ACM Trans. Inf. Syst. 11(2), 143–164 (1993)
Article MathSciNet Google Scholar
Cataldi, M., Caro, L.D., Schifanella, C.: Emerging topic detection on Twitter based on temporal and social terms evaluation. In: Proc. 10th International Workshop on Multimedia Data Mining Table of Contents (2010)
Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering anddynamic information retrieval. In: Proc. 29th ACM Symp. Theory of Computing (1997)
Chen, J., Nairn, R., Nelson, L., Bernstein, M., Chi, E.: Short and tweet:experiments on recommending content from information streams. In: Proc. Int. Conf. Human Factors in Computing Systems (2010)
Chen, K., Luesukprasert, L., Chou, S.: Hot topic extraction based on timeline analysis and multidimensional sentence modeling. IEEE Trans. Knowl. Data Eng. 19(8), 1016–1025 (2007)
Article Google Scholar
Chen, P., Lin, S.: Automatic keyword prediction using google similarity distance. Exp. Syst. Appl. 37(3), 1928–1938 (2010)
Article Google Scholar
Edda, L., Jorg, K.: Text categorization with support vector machines. How to represent texts in input space? Mach. Learn. 46, 423–444 (2002)
Article MATH Google Scholar
Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proc. 16th Int. Joint Conf. Artificial Intelligence (IJCAI) (1999)
Hannon, J., Bennett, M., Smyth, B.: Recommending twitter users to follow using content and collaborative filtering approaches. In: Proc. 4th ACM Conf. Recommender Systems (2010)
Hong, L., Davison, B.D.: Empirical study of topic modeling in twitter. In: Proc. 1st Workshop on Social Media Analytics (SOMA) (2010)
Hristidis, V., Valdivia, O., Vlachos, M., Yu, P.S.: Information discoveryacross multiple streams. Inform. Sci. 179, 3268–3285 (2009)
Article Google Scholar
Hu, C.L., Chou, C.K.: Rss watchdog: an instant event monitor on real onlinenews streams. In: Proc. 18th ACM Conf. Information and Knowledge Management (CIKM) (2009)
Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understandingmicroblogging usage and communities. In: Proc. 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis (2007)
Jones, R., Diaz, F.: Temporal profiles of queries. ACM Trans. Inf. Syst. 25(3) (2007). doi:10.1145/1247715.1247720
Kleinberg, J.: Temporal dynamics of on-line information streams. In: Garofalakis, M., Gehrke, J., Rastogi, R. (eds.) Data Stream Management: Processing High-Speed Data Streams. Springer, Heidelberg (2004)
Krulwich, B., Burkey, C.: Learning user information interests through the extraction of semantically significant phrases. In: Proc. Symp. Machine Learning in Information Access (AAAI) (1996)
Kumar, R., Novak, J., Raghavan, P., Tomkins, A.: On the bursty evolution of blogspace. World Wide Web 8(2), 159–178 (2005)
Article Google Scholar
Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or anews media? In: Proc. 19th Int. Conf. World Wide Web (WWW) (2010)
Lappas, T., Arai, B., Platakis, M., Kotsakos, D., Gunopulos, M.: On burstiness-aware search for document sequences. In: Proc. SIGKDD (2009)
Lee, L.H., Isa, D., Choo, W.O., Chue, W.Y.: High relevance keyword extractionfacility for bayesian text classification on different domains of varyingcharacteristic. Exp. Syst. Appl. 39(1), 1147–1155 (2011)
Article Google Scholar
Li, C., Weng, J., He, Q., Yao, Y., Datta, A., Sun, A., Lee, B.-S.: TwiNER: unsupervised named entity recognition in targeted Twitter stream. In: Proc. SIGIR (2012)
Li, Z., Zhou, D., Juan, Y., Han, J.: Keyword extraction for social snippets. In: Proc. 19th Int. Conf. World Wide Web (WWW) (2010)
Lin, C., Lin., C, Li, J., Wang, D., Chen. Y., Li, T.: Generating event storylines from microblogs. In: Proc. 21st ACM Conference on Information and Knowledge Management (CIKM) (2012)
Mei, Q., Zhai, C.: Discovery evolutionary theme patterns from text - anexploration of temporal text mining. In: Proc. 11th ACM SIGKDD Int. Conf. Knowledge Discovery in Data Mining (2005)
Michelson, M., Macskassy, S.A.: Discovering users topics of interest ontwitter: a first look. In: Proc. 4th Workshop on Analytics for Noisy Unstructured Text Data(2010)
Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP)(2004)
Nguen, T.D., Kan, M.Y.: Keyphrase extraction in scientific publications. In: Proc. 10th Int. Conf. Asian Digital Libraries (ICADL) (2007)
OConnor, B., Krieger, M., Ahn, D.: TweetMotif: exploratory search and topic summarization for twitter. In: Proc. ICWSM (2010)
Oka, M., Abe, H., Kato, K.: Extracting topics from weblogs through frequency segments. In: Proc. 15th Int. Conf. World Wide Web (WWW) (2006)
Park, J., Shin, Y., Kim, K., Chung, B.: Searching social media streams on theweb. IEEE Intell. Syst. 25(6), 24–31 (2010)
Article Google Scholar
Pennacchiotti, M., Gurumurthy, S.: Investigating topic models for social media user recommendation. In: Proc. 20th Int. Conf. World Wide Web (WWW) (2011)
Pennacchiotti, M., Popescu, A.: Democrats, Republicans and Starbucks Afficionados: user classification in Twitter. In: Proc. 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2011)
Puniani, K., Eisenstein, J., Cohen, S., Xing, E.P.: Social links from latenttopics in microblog. In: Proc. NAACL HLT 2010 Workshop on Computational Linguistics in a Worldof Social Media (2010)
Ramage, D., Dumais, S., Liebling, D.: Characterizing microblogs with topicmodels. In: Proc. 4th Int. Conf. Weblogs and Social Media (2010)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Article Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Article Google Scholar
Song, M., Song, I.Y., Hu, X.: Kpspotter: a flexible information gain-basedkeyphrase extraction system. In: Proc. 5th ACM Int. Workshop on Web Information and Data Management(WIDM) (2003)
Steier, A.M., Belew, R.K.: Exporting phrases: a statistical analysis of topicallanguage. In: Proc. 2nd Symp. Document Analysis and Information Retrieval (1993)
Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retr. 2(4), 303–336 (2000)
Article Google Scholar
Viermetz, M., Skubacz, M., Ziegler, C., Seipel, D.: Tracking topic evolution in news environment. In: Proc. 10th IEEE Conference on E-Commerce Technology and 5th IEEE Conference on Enterprise Computing, E-Commerce and E-Services (2008)
Wan, X., Xiao, J.: Exploiting neighborhood knowledge for single documentsummarization and keyphrase extraction. ACM Trans. Inf. Syst. 28(2) (2010). doi:10.1145/1247715.1247720
Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach forsimultaneous document summarization and keyword extraction. In: Proc. 45th Association of Computational Linguistics (2007)
Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitiveinfluential twitterers. In: Proc. 3rd ACM Int. Conf. Web Search and Data Mining (2010)
Wu, S., Li, Y., Xu, Y.: Deploying approaches for pattern refinement in text mining. In: Proc. 2006 IEEE International Conference on Data Mining (2006)
Yih, W.T., Goodman, J., Carvalho, V.R.: Finding advertising keywords on webpages. In: Proc. 15th Int. Conf. World Wide Web (WWW) (2006)
Zha, H.: Generic summarization and keyphrase extraction using mutualreinforcement principle and sentence clustering. In: Proc. 25th Int. ACM SIGIR Conf. Research and Development in Information Retrieval (2002)
Google Scholar
Zhong, S.: Efficient streaming text clustering. Neural Netw. 18(5–6), 790–798 (2005)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Industrial Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 151-744, Korea
Yongwook Shin, Chuhyeop Ryo & Jonghun Park

Authors

Yongwook Shin
View author publications
You can also search for this author in PubMed Google Scholar
Chuhyeop Ryo
View author publications
You can also search for this author in PubMed Google Scholar
Jonghun Park
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonghun Park.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shin, Y., Ryo, C. & Park, J. Automatic extraction of persistent topics from social text streams. World Wide Web 17, 1395–1420 (2014). https://doi.org/10.1007/s11280-013-0251-3

Download citation

Received: 23 May 2012
Revised: 06 June 2013
Accepted: 30 July 2013
Published: 23 August 2013
Issue Date: November 2014
DOI: https://doi.org/10.1007/s11280-013-0251-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic extraction of persistent topics from social text streams

Abstract

Access this article

Similar content being viewed by others

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Social media analytics: a survey of techniques, tools and platforms

A survey of sentiment analysis in social media

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic extraction of persistent topics from social text streams

Abstract

Access this article

Similar content being viewed by others

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Social media analytics: a survey of techniques, tools and platforms

A survey of sentiment analysis in social media

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation