Advertisement

Text Analytics in Social Media

  • Xia HuEmail author
  • Huan Liu

Abstract

The rapid growth of online social media in the form of collaborativelycreated content presents new opportunities and challenges to both producers and consumers of information. With the large amount of data produced by various social media services, text analytics provides an effective way to meet usres’ diverse information needs. In this chapter, we first introduce the background of traditional text analytics and the distinct aspects of textual data in social media. We next discuss the research progress of applying text analytics in social media from different perspectives, and show how to improve existing approaches to text representation in social media, using real-world examples.

Keywords

Text Analytics Social Media Text Representation Time Sensitivity Short Text Event Detection Collaborative Question Answering Social Tagging Semantic Knowledge 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    L. Adamic, J. Zhang, E. Bakshy, and M. Ackerman. Knowledge sharing and yahoo answers: everyone knows something. In Proceeding of the 17th international conference on World Wide Web, pages 665674. ACM, 2008.Google Scholar
  2. 2.
    N. Agarwal, H. Liu, L. Tang, and P. S. Yu. Identifying the influential bloggers in a community. In Proceedings of the international conference on Web search and web data mining, WSDM 08, pages 207218, New York, NY, USA, 2008. ACM.Google Scholar
  3. 3.
    C. C. Aggarwal and N. Li. On node classification in dynamic content-based networks. In The Eleventh SIAM International Conference on Data Mining, pages 355366, 2011.Google Scholar
  4. 4.
    C. C. Aggarwal and H.Wang. Text mining in social networks. Social Network Data Analytics, pages 353378, 2011.Google Scholar
  5. 5.
    E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In Proceedings of the international conference on Web search and web data mining, WSDM 08, pages 183194, New York, NY, USA, 2008. ACM.Google Scholar
  6. 6.
    R. Angelova and G. Weikum. Graph-based text classification: learn from your neighbors. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 485492. ACM, 2006.Google Scholar
  7. 7.
    E. Bakshy, J. Hofman, W. Mason, and D. Watts. Identifying influencers on twitter. In Proceedings of the fourth ACM International Conference on Web Search and Data Mining, 2011.Google Scholar
  8. 8.
    S. Banerjee, K. Ramanathan, and A. Gupta. Clustering short texts using wikipedia. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 787788. ACM, 2007.Google Scholar
  9. 9.
    G. Barbier and H. Liu. Information Provenance in Social Media. Social Computing, Behavioral-Cultural Modeling and Prediction, pages 276283, 2011.Google Scholar
  10. 10.
    D. Carmel, H. Roitman, and N. Zwerdling. Enhancing cluster labeling using wikipedia. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 139146. ACM, 2009.Google Scholar
  11. 11.
    S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In ACM SIGMOD Record, volume 27, pages 307318. ACM, 1998.Google Scholar
  12. 12.
    H.-H. Chen, M.-S. Lin, and Y.-C. Wei. Novel association measures using web search with double checking. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 10091016. Association for Computational Linguistics, 2006.Google Scholar
  13. 13.
    L. Chen and A. Roy. Event detection from Flickr data through wavelet-based spatial analysis. In Proceeding of the 18th ACM conference on Information and knowledge management, pages 523532. ACM, 2009.Google Scholar
  14. 14.
    B. Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the International AAAI Conference on Weblogs and Social Media, pages 122129, 2010.Google Scholar
  15. 15.
    B. Danushka, M. Yutaka, and I. Mitsuru. Measuring semantic similarity between words using web search engines. In Proceedings of the 16th international conference on World Wide Web, WWW 07, pages 757766, 2007Google Scholar
  16. 16.
    L. Denoyer and P. Gallinari. The wikipedia xml corpus. SIGIR Forum, 40(1):6469, 2006.CrossRefGoogle Scholar
  17. 17.
    J. Furnkranz. Exploiting structural information for text classification on the www. Advances in Intelligent Data Analysis, pages 487497, 1999.Google Scholar
  18. 18.
    E. Gabrilovich and S. Markovitch. Feature generation for text categorization using world knowledge. In International joint conference on artificial intelligence, volume 19, page 1048, 2005.Google Scholar
  19. 19.
    E. Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proceedings of the National Conference on Artificial Intelligence, volume 21, page 1301, 2006.Google Scholar
  20. 20.
    E. Gabrilovich and S. Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 612, 2007.Google Scholar
  21. 21.
    S. Gerani, M. J. Carman, and F. Crestani. Proximity-based opinion retrieval. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR 10, pages 403410, New York, NY, USA, 2010. ACM.Google Scholar
  22. 22.
    M. Gray, B. Team, J. Pickett, D. Hoiberg, D. Clancy, P. Norvig, J. Orwant, and S. Pinker. Quantitative Analysis of Culture Using Millions of Digitized Books. science, 1199644(176):331, 2011.Google Scholar
  23. 23.
    Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He. Document recommendation in social tagging services. In Proceedings of the 19th international conference on World wide web,WWW 10, pages 391400, New York, NY, USA, 2010. ACM.Google Scholar
  24. 24.
    J. Hammerton, M. Osborne, S. Armstrong, and W. Daelemans. Introduction to special issue on machine learning approaches to shallow parsing. Machine Learning Research, 2:551558, 2002.Google Scholar
  25. 25.
    F. M. Harper, D. Moy, and J. A. Konstan. Facts or friends?: distinguishing informational and conversational questions in social qa sites. In Proceedings of the 27th international conference on Human factors in computing systems, CHI 09, pages 759768, New York, NY, USA, 2009. ACM.Google Scholar
  26. 26.
    P. Heymann, G. Koutrika, and H. Garcia-Molina. Can social bookmarking improve web search? In Proceedings of the international conference on Web search and web data mining, pages 195206. ACM, 2008.Google Scholar
  27. 27.
    J. Hu, L. Fang, Y. Cao, H. Zeng, H. Li, Q. Yang, and Z. Chen. Enhancing text clustering by leveraging Wikipedia semantics. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 179186. ACM, 2008.Google Scholar
  28. 28.
    X. Hu, N. Sun, C. Zhang, and T.-S. Chua. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In Proceeding of the 18th ACM conference on Information and knowledge management, pages 919928. ACM, 2009.Google Scholar
  29. 29.
    X. Hu, X. Zhang, C. Lu, E. K. Park, and X. Zhou. Exploiting wikipedia as external knowledge for document clustering. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 389396. ACM, 2009.Google Scholar
  30. 30.
    A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 5665. ACM, 2007.Google Scholar
  31. 31.
    M. Ji, Y. Sun, M. Danilevsky, J. Han, and J. Gao. Graph regularized transductive classification on heterogeneous information networks. Machine Learning and Knowledge Discovery in Databases, pages 570586, 2010.Google Scholar
  32. 32.
    G. Kumaran and J. Allan. Text classification and named entities for new event detection. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 297304. ACM, 2004.Google Scholar
  33. 33.
    H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, WWW 10, pages 591600, New York, NY, USA, 2010. ACM.Google Scholar
  34. 34.
    Y. Lee, H.-y. Jung, W. Song, and J.-H. Lee. Mining the blogosphere for top news stories identification. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR 10, pages 395402, New York, NY, USA, 2010. ACM.Google Scholar
  35. 35.
    K. Lerman and T. Hogg. Using a model of social dynamics to predict popularity of news. In Proceedings of the 19th international conference on World wide web, WWW 10, pages 621630, New York, NY, USA, 2010. ACM.Google Scholar
  36. 36.
    D. Lewis and W. Croft. Term clustering of syntactic phrases. In Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval, pages 385404. ACM, 1989.Google Scholar
  37. 37.
    C. Lin, B. Zhao, Q. Mei, and J. Han. Pet: a statistical model for popular events tracking in social communities. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 929938. ACM, 2010.Google Scholar
  38. 38.
    Y. Lu, P. Tsaparas, A. Ntoulas, and L. Polanyi. Exploiting social context for review quality prediction. In Proceedings of the 19th international conference on World wide web,WWW10, pages 691700, New York, NY, USA, 2010. ACM.Google Scholar
  39. 39.
    C. Macdonald, I. Ounis, and I. Soboroff. Overview of the trec-2009 blog track. Proceedings of TREC 2009, 2010.Google Scholar
  40. 40.
    D. Margineantu, W. Wong, and D. Dash. Machine learning algorithms for event detection. Machine Learning, 79(3):257259, 2010.CrossRefGoogle Scholar
  41. 41.
    J. McLean. State of the Blogosphere, introduction, 2009.Google Scholar
  42. 42.
    M. Mendoza, B. Poblete, and C. Castillo. Twitter Under Crisis: Can we trust what we RT? In 1st Workshop on Social Media Analytics (SOMA10), 2010.Google Scholar
  43. 43.
    S. Moturu. Quantifying the Trustworthiness of User-Generated Social Media Content. PhD thesis, Arizona State University, 2009.Google Scholar
  44. 44.
    S. Osinski, J. Stefanowski, and D. Weiss. Lingo: Search results clustering algorithm based on singular value decomposition. In Proceedings of the IIS: IIPWM04 Conference, page 359, 2004.Google Scholar
  45. 45.
    X.-H. Phan, L.-M. Nguyen, and S. Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceeding of the 17th international conference on World Wide Web, pages 91100. ACM, 2008.Google Scholar
  46. 46.
    M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130137, 1980.Google Scholar
  47. 47.
    T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web, pages 851860. ACM, 2010.Google Scholar
  48. 48.
    B. Sigurbjornsson and R. Van Zwol. Flickr tag recommendation based on collective knowledge. In Proceeding of the 17th international conference on World Wide Web, pages 327336. ACM, 2008.Google Scholar
  49. 49.
    A. Stavrianou, P. Andritsos, and N. Nicoloyannis. Overview and semantic issues of text mining. ACM SIGMOD Record, 36(3):2334, 2007.CrossRefGoogle Scholar
  50. 50.
    Y. Sun, J. Han, J. Gao, and Y. Yu. itopicmodel: Information network-integrated topic modeling. In Data Mining, 2009. ICDM09. Ninth IEEE International Conference on, pages 493502. IEEE, 2009.Google Scholar
  51. 51.
    Y. Sun, Y. Yu, and J. Han. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 797806. ACM, 2009.Google Scholar
  52. 52.
    J. Surowiecki. The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations. Random House of Canada, 2004.Google Scholar
  53. 53.
    L. Tang and H. Liu. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 817826. ACM, 2009.Google Scholar
  54. 54.
    L. Urena-Lopez, M. Buenaga, and J. Gomez. Integrating linguistic resources in TC through WSD. Computers and the Humanities, 35(2):215230, 2001.CrossRefGoogle Scholar
  55. 55.
    N. Van House. Flickr and public image-sharing: distant closeness and photo exhibition. In CHI07 extended abstracts on Human factors in computing systems, pages 27172722. ACM, 2007.Google Scholar
  56. 56.
    J. Wang, Y. Zhou, L. Li, B. Hu, and X. Hu. Improving short text clustering performance with keyword expansion. In The Sixth International Symposium on Neural Networks (ISNN 2009), pages 291298. Springer, 2009.Google Scholar
  57. 57.
    K. Wang, Z. Ming, X. Hu, and T. Chua. Segmentation of multisentence questions: towards effective question retrieval in cQA services. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 387394. ACM, 2010.Google Scholar
  58. 58.
    P.Wang and C. Domeniconi. Building semantic kernels for text classification using Wikipedia. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 713721. ACM, 2008.Google Scholar
  59. 59.
    X. Wang, L. Tang, H. Gao, and H. Liu. Discovering overlapping groups in social media. In the 10th IEEE International Conference on Data Mining series (ICDM2010), Sydney, Australia, December 14 - 17 2010.Google Scholar
  60. 60.
    X. Wang, C. Zhai, X. Hu, and R. Sproat. Mining correlated bursty topic patterns from coordinated text streams. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 784793. ACM, 2007.Google Scholar
  61. 61.
    D. Yin, Z. Xue, L. Hong, and B. D. Davison. A probabilistic model for personalized tag prediction. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 10, pages 959968, New York, NY, USA, 2010. ACM.Google Scholar
  62. 62.
    Z. Yin, R. Li, Q. Mei, and J. Han. Exploring social tagging graph for web object classification. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 09, pages 957966, New York, NY, USA, 2009. ACM.Google Scholar
  63. 63.
    J. Yuan, Z. Zha, Z. Zhao, X. Zhou, and T. Chua. Utilizing related samples to learn complex queries in interactive concept-based video search. In Proceedings of the ACM International Conference on Image and Video Retrieval, pages 6673. ACM, 2010.Google Scholar
  64. 64.
    R. Zafarani and H. Liu. Connecting Corresponding Identities across Communities. In Proceedings of the 3rd International Conference on Weblogs and Social Media (ICWSM09), 2009.Google Scholar
  65. 65.
    T. Zesch, C. Muller, and I. Gurevych. Extracting lexical semantic knowledge from wikipedia and wiktionary. In Proceedings of the Conference on Language Resources and Evaluation (LREC), pages 16461652. Citeseer, 2008.Google Scholar
  66. 66.
    Z. Zha, X. Hua, T. Mei, J. Wang, G. Qi, and Z. Wang. Joint multilabel multi-instance learning for image classification. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 18. IEEE, 2008.Google Scholar
  67. 67.
    Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 15011506. AAAI Press, 2007.Google Scholar
  68. 68.
    Y. Zhou, H. Cheng, and J. Yu. Graph clustering based on structural/ attribute similarities. Proceedings of the VLDB Endowment, 2(1):718729, 2009.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Computer Science and EngineeringArizona State UniversityPhoenixUSA

Personalised recommendations