Skip to main content

Text Analytics in Social Media

  • Chapter
  • First Online:
Mining Text Data

Abstract

The rapid growth of online social media in the form of collaborativelycreated content presents new opportunities and challenges to both producers and consumers of information. With the large amount of data produced by various social media services, text analytics provides an effective way to meet usres’ diverse information needs. In this chapter, we first introduce the background of traditional text analytics and the distinct aspects of textual data in social media. We next discuss the research progress of applying text analytics in social media from different perspectives, and show how to improve existing approaches to text representation in social media, using real-world examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. L. Adamic, J. Zhang, E. Bakshy, and M. Ackerman. Knowledge sharing and yahoo answers: everyone knows something. In Proceeding of the 17th international conference on World Wide Web, pages 665674. ACM, 2008.

    Google Scholar 

  2. N. Agarwal, H. Liu, L. Tang, and P. S. Yu. Identifying the influential bloggers in a community. In Proceedings of the international conference on Web search and web data mining, WSDM 08, pages 207218, New York, NY, USA, 2008. ACM.

    Google Scholar 

  3. C. C. Aggarwal and N. Li. On node classification in dynamic content-based networks. In The Eleventh SIAM International Conference on Data Mining, pages 355366, 2011.

    Google Scholar 

  4. C. C. Aggarwal and H.Wang. Text mining in social networks. Social Network Data Analytics, pages 353378, 2011.

    Google Scholar 

  5. E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In Proceedings of the international conference on Web search and web data mining, WSDM 08, pages 183194, New York, NY, USA, 2008. ACM.

    Google Scholar 

  6. R. Angelova and G. Weikum. Graph-based text classification: learn from your neighbors. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 485492. ACM, 2006.

    Google Scholar 

  7. E. Bakshy, J. Hofman, W. Mason, and D. Watts. Identifying influencers on twitter. In Proceedings of the fourth ACM International Conference on Web Search and Data Mining, 2011.

    Google Scholar 

  8. S. Banerjee, K. Ramanathan, and A. Gupta. Clustering short texts using wikipedia. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 787788. ACM, 2007.

    Google Scholar 

  9. G. Barbier and H. Liu. Information Provenance in Social Media. Social Computing, Behavioral-Cultural Modeling and Prediction, pages 276283, 2011.

    Google Scholar 

  10. D. Carmel, H. Roitman, and N. Zwerdling. Enhancing cluster labeling using wikipedia. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 139146. ACM, 2009.

    Google Scholar 

  11. S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In ACM SIGMOD Record, volume 27, pages 307318. ACM, 1998.

    Google Scholar 

  12. H.-H. Chen, M.-S. Lin, and Y.-C. Wei. Novel association measures using web search with double checking. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 10091016. Association for Computational Linguistics, 2006.

    Google Scholar 

  13. L. Chen and A. Roy. Event detection from Flickr data through wavelet-based spatial analysis. In Proceeding of the 18th ACM conference on Information and knowledge management, pages 523532. ACM, 2009.

    Google Scholar 

  14. B. Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the International AAAI Conference on Weblogs and Social Media, pages 122129, 2010.

    Google Scholar 

  15. B. Danushka, M. Yutaka, and I. Mitsuru. Measuring semantic similarity between words using web search engines. In Proceedings of the 16th international conference on World Wide Web, WWW 07, pages 757766, 2007

    Google Scholar 

  16. L. Denoyer and P. Gallinari. The wikipedia xml corpus. SIGIR Forum, 40(1):6469, 2006.

    Article  Google Scholar 

  17. J. Furnkranz. Exploiting structural information for text classification on the www. Advances in Intelligent Data Analysis, pages 487497, 1999.

    Google Scholar 

  18. E. Gabrilovich and S. Markovitch. Feature generation for text categorization using world knowledge. In International joint conference on artificial intelligence, volume 19, page 1048, 2005.

    Google Scholar 

  19. E. Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proceedings of the National Conference on Artificial Intelligence, volume 21, page 1301, 2006.

    Google Scholar 

  20. E. Gabrilovich and S. Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 612, 2007.

    Google Scholar 

  21. S. Gerani, M. J. Carman, and F. Crestani. Proximity-based opinion retrieval. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR 10, pages 403410, New York, NY, USA, 2010. ACM.

    Google Scholar 

  22. M. Gray, B. Team, J. Pickett, D. Hoiberg, D. Clancy, P. Norvig, J. Orwant, and S. Pinker. Quantitative Analysis of Culture Using Millions of Digitized Books. science, 1199644(176):331, 2011.

    Google Scholar 

  23. Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He. Document recommendation in social tagging services. In Proceedings of the 19th international conference on World wide web,WWW 10, pages 391400, New York, NY, USA, 2010. ACM.

    Google Scholar 

  24. J. Hammerton, M. Osborne, S. Armstrong, and W. Daelemans. Introduction to special issue on machine learning approaches to shallow parsing. Machine Learning Research, 2:551558, 2002.

    Google Scholar 

  25. F. M. Harper, D. Moy, and J. A. Konstan. Facts or friends?: distinguishing informational and conversational questions in social qa sites. In Proceedings of the 27th international conference on Human factors in computing systems, CHI 09, pages 759768, New York, NY, USA, 2009. ACM.

    Google Scholar 

  26. P. Heymann, G. Koutrika, and H. Garcia-Molina. Can social bookmarking improve web search? In Proceedings of the international conference on Web search and web data mining, pages 195206. ACM, 2008.

    Google Scholar 

  27. J. Hu, L. Fang, Y. Cao, H. Zeng, H. Li, Q. Yang, and Z. Chen. Enhancing text clustering by leveraging Wikipedia semantics. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 179186. ACM, 2008.

    Google Scholar 

  28. X. Hu, N. Sun, C. Zhang, and T.-S. Chua. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In Proceeding of the 18th ACM conference on Information and knowledge management, pages 919928. ACM, 2009.

    Google Scholar 

  29. X. Hu, X. Zhang, C. Lu, E. K. Park, and X. Zhou. Exploiting wikipedia as external knowledge for document clustering. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 389396. ACM, 2009.

    Google Scholar 

  30. A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 5665. ACM, 2007.

    Google Scholar 

  31. M. Ji, Y. Sun, M. Danilevsky, J. Han, and J. Gao. Graph regularized transductive classification on heterogeneous information networks. Machine Learning and Knowledge Discovery in Databases, pages 570586, 2010.

    Google Scholar 

  32. G. Kumaran and J. Allan. Text classification and named entities for new event detection. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 297304. ACM, 2004.

    Google Scholar 

  33. H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, WWW 10, pages 591600, New York, NY, USA, 2010. ACM.

    Google Scholar 

  34. Y. Lee, H.-y. Jung, W. Song, and J.-H. Lee. Mining the blogosphere for top news stories identification. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR 10, pages 395402, New York, NY, USA, 2010. ACM.

    Google Scholar 

  35. K. Lerman and T. Hogg. Using a model of social dynamics to predict popularity of news. In Proceedings of the 19th international conference on World wide web, WWW 10, pages 621630, New York, NY, USA, 2010. ACM.

    Google Scholar 

  36. D. Lewis and W. Croft. Term clustering of syntactic phrases. In Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval, pages 385404. ACM, 1989.

    Google Scholar 

  37. C. Lin, B. Zhao, Q. Mei, and J. Han. Pet: a statistical model for popular events tracking in social communities. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 929938. ACM, 2010.

    Google Scholar 

  38. Y. Lu, P. Tsaparas, A. Ntoulas, and L. Polanyi. Exploiting social context for review quality prediction. In Proceedings of the 19th international conference on World wide web,WWW10, pages 691700, New York, NY, USA, 2010. ACM.

    Google Scholar 

  39. C. Macdonald, I. Ounis, and I. Soboroff. Overview of the trec-2009 blog track. Proceedings of TREC 2009, 2010.

    Google Scholar 

  40. D. Margineantu, W. Wong, and D. Dash. Machine learning algorithms for event detection. Machine Learning, 79(3):257259, 2010.

    Article  Google Scholar 

  41. J. McLean. State of the Blogosphere, introduction, 2009.

    Google Scholar 

  42. M. Mendoza, B. Poblete, and C. Castillo. Twitter Under Crisis: Can we trust what we RT? In 1st Workshop on Social Media Analytics (SOMA10), 2010.

    Google Scholar 

  43. S. Moturu. Quantifying the Trustworthiness of User-Generated Social Media Content. PhD thesis, Arizona State University, 2009.

    Google Scholar 

  44. S. Osinski, J. Stefanowski, and D. Weiss. Lingo: Search results clustering algorithm based on singular value decomposition. In Proceedings of the IIS: IIPWM04 Conference, page 359, 2004.

    Google Scholar 

  45. X.-H. Phan, L.-M. Nguyen, and S. Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceeding of the 17th international conference on World Wide Web, pages 91100. ACM, 2008.

    Google Scholar 

  46. M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130137, 1980.

    Google Scholar 

  47. T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web, pages 851860. ACM, 2010.

    Google Scholar 

  48. B. Sigurbjornsson and R. Van Zwol. Flickr tag recommendation based on collective knowledge. In Proceeding of the 17th international conference on World Wide Web, pages 327336. ACM, 2008.

    Google Scholar 

  49. A. Stavrianou, P. Andritsos, and N. Nicoloyannis. Overview and semantic issues of text mining. ACM SIGMOD Record, 36(3):2334, 2007.

    Article  Google Scholar 

  50. Y. Sun, J. Han, J. Gao, and Y. Yu. itopicmodel: Information network-integrated topic modeling. In Data Mining, 2009. ICDM09. Ninth IEEE International Conference on, pages 493502. IEEE, 2009.

    Google Scholar 

  51. Y. Sun, Y. Yu, and J. Han. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 797806. ACM, 2009.

    Google Scholar 

  52. J. Surowiecki. The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations. Random House of Canada, 2004.

    Google Scholar 

  53. L. Tang and H. Liu. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 817826. ACM, 2009.

    Google Scholar 

  54. L. Urena-Lopez, M. Buenaga, and J. Gomez. Integrating linguistic resources in TC through WSD. Computers and the Humanities, 35(2):215230, 2001.

    Article  Google Scholar 

  55. N. Van House. Flickr and public image-sharing: distant closeness and photo exhibition. In CHI07 extended abstracts on Human factors in computing systems, pages 27172722. ACM, 2007.

    Google Scholar 

  56. J. Wang, Y. Zhou, L. Li, B. Hu, and X. Hu. Improving short text clustering performance with keyword expansion. In The Sixth International Symposium on Neural Networks (ISNN 2009), pages 291298. Springer, 2009.

    Google Scholar 

  57. K. Wang, Z. Ming, X. Hu, and T. Chua. Segmentation of multisentence questions: towards effective question retrieval in cQA services. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 387394. ACM, 2010.

    Google Scholar 

  58. P.Wang and C. Domeniconi. Building semantic kernels for text classification using Wikipedia. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 713721. ACM, 2008.

    Google Scholar 

  59. X. Wang, L. Tang, H. Gao, and H. Liu. Discovering overlapping groups in social media. In the 10th IEEE International Conference on Data Mining series (ICDM2010), Sydney, Australia, December 14 - 17 2010.

    Google Scholar 

  60. X. Wang, C. Zhai, X. Hu, and R. Sproat. Mining correlated bursty topic patterns from coordinated text streams. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 784793. ACM, 2007.

    Google Scholar 

  61. D. Yin, Z. Xue, L. Hong, and B. D. Davison. A probabilistic model for personalized tag prediction. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 10, pages 959968, New York, NY, USA, 2010. ACM.

    Google Scholar 

  62. Z. Yin, R. Li, Q. Mei, and J. Han. Exploring social tagging graph for web object classification. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 09, pages 957966, New York, NY, USA, 2009. ACM.

    Google Scholar 

  63. J. Yuan, Z. Zha, Z. Zhao, X. Zhou, and T. Chua. Utilizing related samples to learn complex queries in interactive concept-based video search. In Proceedings of the ACM International Conference on Image and Video Retrieval, pages 6673. ACM, 2010.

    Google Scholar 

  64. R. Zafarani and H. Liu. Connecting Corresponding Identities across Communities. In Proceedings of the 3rd International Conference on Weblogs and Social Media (ICWSM09), 2009.

    Google Scholar 

  65. T. Zesch, C. Muller, and I. Gurevych. Extracting lexical semantic knowledge from wikipedia and wiktionary. In Proceedings of the Conference on Language Resources and Evaluation (LREC), pages 16461652. Citeseer, 2008.

    Google Scholar 

  66. Z. Zha, X. Hua, T. Mei, J. Wang, G. Qi, and Z. Wang. Joint multilabel multi-instance learning for image classification. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 18. IEEE, 2008.

    Google Scholar 

  67. Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 15011506. AAAI Press, 2007.

    Google Scholar 

  68. Y. Zhou, H. Cheng, and J. Yu. Graph clustering based on structural/ attribute similarities. Proceedings of the VLDB Endowment, 2(1):718729, 2009.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xia Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Hu, X., Liu, H. (2012). Text Analytics in Social Media. In: Aggarwal, C., Zhai, C. (eds) Mining Text Data. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-3223-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-3223-4_12

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4614-3222-7

  • Online ISBN: 978-1-4614-3223-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics