Abstract
Microblogging services allow users to publish their thoughts, activities, and interests in the form of text streams and to share them with others in a social network. A user’s text stream in a microblogging service is temporally composed of the posts the user has written or republished from other socially connected users. In this context, most research on the microblogging service has primarily focused on social graph or topic extraction from the text streams, and in particular, several studies attempted to discover user’s topics of interests from a text stream since the topics play a crucial role in user search, friend recommendation, and contextual advertisement. Yet, they did not yet fully address unique properties of the stream. In this paper, we study a problem of detecting the topics of long-term steady interests to a user from a text stream, considering its dynamic and social characteristics, and propose a graph-based topic extraction model. Extensive experiments have been carried out to investigate the effects of the proposed approach by using a real-world dataset, and the proposed model is shown to produce better performance than the existing alternatives.
Similar content being viewed by others
References
Aggarwal, C.C., Zhai, C.: Mining Text Data. Springer (2012)
Bostandjiev, S., ODonovan, J. Hllerer, T.: TasteWeights: a visual interactive hybrid recommender system. In: Proc. 6th ACM Conference on Recommender Systems (2012)
Box, G.E.P., Jenkins, G.M.: Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco (1976)
Can, F.: Incremental clustering for dynamic information processing. ACM Trans. Inf. Syst. 11(2), 143–164 (1993)
Cataldi, M., Caro, L.D., Schifanella, C.: Emerging topic detection on Twitter based on temporal and social terms evaluation. In: Proc. 10th International Workshop on Multimedia Data Mining Table of Contents (2010)
Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering anddynamic information retrieval. In: Proc. 29th ACM Symp. Theory of Computing (1997)
Chen, J., Nairn, R., Nelson, L., Bernstein, M., Chi, E.: Short and tweet:experiments on recommending content from information streams. In: Proc. Int. Conf. Human Factors in Computing Systems (2010)
Chen, K., Luesukprasert, L., Chou, S.: Hot topic extraction based on timeline analysis and multidimensional sentence modeling. IEEE Trans. Knowl. Data Eng. 19(8), 1016–1025 (2007)
Chen, P., Lin, S.: Automatic keyword prediction using google similarity distance. Exp. Syst. Appl. 37(3), 1928–1938 (2010)
Edda, L., Jorg, K.: Text categorization with support vector machines. How to represent texts in input space? Mach. Learn. 46, 423–444 (2002)
Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proc. 16th Int. Joint Conf. Artificial Intelligence (IJCAI) (1999)
Hannon, J., Bennett, M., Smyth, B.: Recommending twitter users to follow using content and collaborative filtering approaches. In: Proc. 4th ACM Conf. Recommender Systems (2010)
Hong, L., Davison, B.D.: Empirical study of topic modeling in twitter. In: Proc. 1st Workshop on Social Media Analytics (SOMA) (2010)
Hristidis, V., Valdivia, O., Vlachos, M., Yu, P.S.: Information discoveryacross multiple streams. Inform. Sci. 179, 3268–3285 (2009)
Hu, C.L., Chou, C.K.: Rss watchdog: an instant event monitor on real onlinenews streams. In: Proc. 18th ACM Conf. Information and Knowledge Management (CIKM) (2009)
Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understandingmicroblogging usage and communities. In: Proc. 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis (2007)
Jones, R., Diaz, F.: Temporal profiles of queries. ACM Trans. Inf. Syst. 25(3) (2007). doi:10.1145/1247715.1247720
Kleinberg, J.: Temporal dynamics of on-line information streams. In: Garofalakis, M., Gehrke, J., Rastogi, R. (eds.) Data Stream Management: Processing High-Speed Data Streams. Springer, Heidelberg (2004)
Krulwich, B., Burkey, C.: Learning user information interests through the extraction of semantically significant phrases. In: Proc. Symp. Machine Learning in Information Access (AAAI) (1996)
Kumar, R., Novak, J., Raghavan, P., Tomkins, A.: On the bursty evolution of blogspace. World Wide Web 8(2), 159–178 (2005)
Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or anews media? In: Proc. 19th Int. Conf. World Wide Web (WWW) (2010)
Lappas, T., Arai, B., Platakis, M., Kotsakos, D., Gunopulos, M.: On burstiness-aware search for document sequences. In: Proc. SIGKDD (2009)
Lee, L.H., Isa, D., Choo, W.O., Chue, W.Y.: High relevance keyword extractionfacility for bayesian text classification on different domains of varyingcharacteristic. Exp. Syst. Appl. 39(1), 1147–1155 (2011)
Li, C., Weng, J., He, Q., Yao, Y., Datta, A., Sun, A., Lee, B.-S.: TwiNER: unsupervised named entity recognition in targeted Twitter stream. In: Proc. SIGIR (2012)
Li, Z., Zhou, D., Juan, Y., Han, J.: Keyword extraction for social snippets. In: Proc. 19th Int. Conf. World Wide Web (WWW) (2010)
Lin, C., Lin., C, Li, J., Wang, D., Chen. Y., Li, T.: Generating event storylines from microblogs. In: Proc. 21st ACM Conference on Information and Knowledge Management (CIKM) (2012)
Mei, Q., Zhai, C.: Discovery evolutionary theme patterns from text - anexploration of temporal text mining. In: Proc. 11th ACM SIGKDD Int. Conf. Knowledge Discovery in Data Mining (2005)
Michelson, M., Macskassy, S.A.: Discovering users topics of interest ontwitter: a first look. In: Proc. 4th Workshop on Analytics for Noisy Unstructured Text Data(2010)
Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP)(2004)
Nguen, T.D., Kan, M.Y.: Keyphrase extraction in scientific publications. In: Proc. 10th Int. Conf. Asian Digital Libraries (ICADL) (2007)
OConnor, B., Krieger, M., Ahn, D.: TweetMotif: exploratory search and topic summarization for twitter. In: Proc. ICWSM (2010)
Oka, M., Abe, H., Kato, K.: Extracting topics from weblogs through frequency segments. In: Proc. 15th Int. Conf. World Wide Web (WWW) (2006)
Park, J., Shin, Y., Kim, K., Chung, B.: Searching social media streams on theweb. IEEE Intell. Syst. 25(6), 24–31 (2010)
Pennacchiotti, M., Gurumurthy, S.: Investigating topic models for social media user recommendation. In: Proc. 20th Int. Conf. World Wide Web (WWW) (2011)
Pennacchiotti, M., Popescu, A.: Democrats, Republicans and Starbucks Afficionados: user classification in Twitter. In: Proc. 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2011)
Puniani, K., Eisenstein, J., Cohen, S., Xing, E.P.: Social links from latenttopics in microblog. In: Proc. NAACL HLT 2010 Workshop on Computational Linguistics in a Worldof Social Media (2010)
Ramage, D., Dumais, S., Liebling, D.: Characterizing microblogs with topicmodels. In: Proc. 4th Int. Conf. Weblogs and Social Media (2010)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Song, M., Song, I.Y., Hu, X.: Kpspotter: a flexible information gain-basedkeyphrase extraction system. In: Proc. 5th ACM Int. Workshop on Web Information and Data Management(WIDM) (2003)
Steier, A.M., Belew, R.K.: Exporting phrases: a statistical analysis of topicallanguage. In: Proc. 2nd Symp. Document Analysis and Information Retrieval (1993)
Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retr. 2(4), 303–336 (2000)
Viermetz, M., Skubacz, M., Ziegler, C., Seipel, D.: Tracking topic evolution in news environment. In: Proc. 10th IEEE Conference on E-Commerce Technology and 5th IEEE Conference on Enterprise Computing, E-Commerce and E-Services (2008)
Wan, X., Xiao, J.: Exploiting neighborhood knowledge for single documentsummarization and keyphrase extraction. ACM Trans. Inf. Syst. 28(2) (2010). doi:10.1145/1247715.1247720
Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach forsimultaneous document summarization and keyword extraction. In: Proc. 45th Association of Computational Linguistics (2007)
Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitiveinfluential twitterers. In: Proc. 3rd ACM Int. Conf. Web Search and Data Mining (2010)
Wu, S., Li, Y., Xu, Y.: Deploying approaches for pattern refinement in text mining. In: Proc. 2006 IEEE International Conference on Data Mining (2006)
Yih, W.T., Goodman, J., Carvalho, V.R.: Finding advertising keywords on webpages. In: Proc. 15th Int. Conf. World Wide Web (WWW) (2006)
Zha, H.: Generic summarization and keyphrase extraction using mutualreinforcement principle and sentence clustering. In: Proc. 25th Int. ACM SIGIR Conf. Research and Development in Information Retrieval (2002)
Zhong, S.: Efficient streaming text clustering. Neural Netw. 18(5–6), 790–798 (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shin, Y., Ryo, C. & Park, J. Automatic extraction of persistent topics from social text streams. World Wide Web 17, 1395–1420 (2014). https://doi.org/10.1007/s11280-013-0251-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-013-0251-3