Text Analytics in Social Media

Hu, Xia; Liu, Huan

doi:10.1007/978-1-4614-3223-4_12

Xia Hu³ &
Huan Liu³

20k Accesses
77 Citations
1 Altmetric

Abstract

The rapid growth of online social media in the form of collaborativelycreated content presents new opportunities and challenges to both producers and consumers of information. With the large amount of data produced by various social media services, text analytics provides an effective way to meet usres’ diverse information needs. In this chapter, we first introduce the background of traditional text analytics and the distinct aspects of textual data in social media. We next discuss the research progress of applying text analytics in social media from different perspectives, and show how to improve existing approaches to text representation in social media, using real-world examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

L. Adamic, J. Zhang, E. Bakshy, and M. Ackerman. Knowledge sharing and yahoo answers: everyone knows something. In Proceeding of the 17th international conference on World Wide Web, pages 665–674. ACM, 2008.
Google Scholar
N. Agarwal, H. Liu, L. Tang, and P. S. Yu. Identifying the influential bloggers in a community. In Proceedings of the international conference on Web search and web data mining, WSDM ’08, pages 207–218, New York, NY, USA, 2008. ACM.
Google Scholar
C. C. Aggarwal and N. Li. On node classification in dynamic content-based networks. In The Eleventh SIAM International Conference on Data Mining, pages 355–366, 2011.
Google Scholar
C. C. Aggarwal and H.Wang. Text mining in social networks. Social Network Data Analytics, pages 353–378, 2011.
Google Scholar
E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In Proceedings of the international conference on Web search and web data mining, WSDM ’08, pages 183–194, New York, NY, USA, 2008. ACM.
Google Scholar
R. Angelova and G. Weikum. Graph-based text classification: learn from your neighbors. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 485–492. ACM, 2006.
Google Scholar
E. Bakshy, J. Hofman, W. Mason, and D. Watts. Identifying influencers on twitter. In Proceedings of the fourth ACM International Conference on Web Search and Data Mining, 2011.
Google Scholar
S. Banerjee, K. Ramanathan, and A. Gupta. Clustering short texts using wikipedia. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 787–788. ACM, 2007.
Google Scholar
G. Barbier and H. Liu. Information Provenance in Social Media. Social Computing, Behavioral-Cultural Modeling and Prediction, pages 276–283, 2011.
Google Scholar
D. Carmel, H. Roitman, and N. Zwerdling. Enhancing cluster labeling using wikipedia. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 139–146. ACM, 2009.
Google Scholar
S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In ACM SIGMOD Record, volume 27, pages 307–318. ACM, 1998.
Google Scholar
H.-H. Chen, M.-S. Lin, and Y.-C. Wei. Novel association measures using web search with double checking. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 1009–1016. Association for Computational Linguistics, 2006.
Google Scholar
L. Chen and A. Roy. Event detection from Flickr data through wavelet-based spatial analysis. In Proceeding of the 18th ACM conference on Information and knowledge management, pages 523–532. ACM, 2009.
Google Scholar
B. Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the International AAAI Conference on Weblogs and Social Media, pages 122–129, 2010.
Google Scholar
B. Danushka, M. Yutaka, and I. Mitsuru. Measuring semantic similarity between words using web search engines. In Proceedings of the 16th international conference on World Wide Web, WWW ’07, pages 757–766, 2007
Google Scholar
L. Denoyer and P. Gallinari. The wikipedia xml corpus. SIGIR Forum, 40(1):64–69, 2006.
Article Google Scholar
J. F”urnkranz. Exploiting structural information for text classification on the www. Advances in Intelligent Data Analysis, pages 487–497, 1999.
Google Scholar
E. Gabrilovich and S. Markovitch. Feature generation for text categorization using world knowledge. In International joint conference on artificial intelligence, volume 19, page 1048, 2005.
Google Scholar
E. Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proceedings of the National Conference on Artificial Intelligence, volume 21, page 1301, 2006.
Google Scholar
E. Gabrilovich and S. Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 6–12, 2007.
Google Scholar
S. Gerani, M. J. Carman, and F. Crestani. Proximity-based opinion retrieval. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’10, pages 403–410, New York, NY, USA, 2010. ACM.
Google Scholar
M. Gray, B. Team, J. Pickett, D. Hoiberg, D. Clancy, P. Norvig, J. Orwant, and S. Pinker. Quantitative Analysis of Culture Using Millions of Digitized Books. science, 1199644(176):331, 2011.
Google Scholar
Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He. Document recommendation in social tagging services. In Proceedings of the 19th international conference on World wide web,WWW ’10, pages 391–400, New York, NY, USA, 2010. ACM.
Google Scholar
J. Hammerton, M. Osborne, S. Armstrong, and W. Daelemans. Introduction to special issue on machine learning approaches to shallow parsing. Machine Learning Research, 2:551–558, 2002.
Google Scholar
F. M. Harper, D. Moy, and J. A. Konstan. Facts or friends?: distinguishing informational and conversational questions in social qa sites. In Proceedings of the 27th international conference on Human factors in computing systems, CHI ’09, pages 759–768, New York, NY, USA, 2009. ACM.
Google Scholar
P. Heymann, G. Koutrika, and H. Garcia-Molina. Can social bookmarking improve web search? In Proceedings of the international conference on Web search and web data mining, pages 195–206. ACM, 2008.
Google Scholar
J. Hu, L. Fang, Y. Cao, H. Zeng, H. Li, Q. Yang, and Z. Chen. Enhancing text clustering by leveraging Wikipedia semantics. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 179–186. ACM, 2008.
Google Scholar
X. Hu, N. Sun, C. Zhang, and T.-S. Chua. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In Proceeding of the 18th ACM conference on Information and knowledge management, pages 919–928. ACM, 2009.
Google Scholar
X. Hu, X. Zhang, C. Lu, E. K. Park, and X. Zhou. Exploiting wikipedia as external knowledge for document clustering. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 389–396. ACM, 2009.
Google Scholar
A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 56–65. ACM, 2007.
Google Scholar
M. Ji, Y. Sun, M. Danilevsky, J. Han, and J. Gao. Graph regularized transductive classification on heterogeneous information networks. Machine Learning and Knowledge Discovery in Databases, pages 570–586, 2010.
Google Scholar
G. Kumaran and J. Allan. Text classification and named entities for new event detection. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 297–304. ACM, 2004.
Google Scholar
H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 591–600, New York, NY, USA, 2010. ACM.
Google Scholar
Y. Lee, H.-y. Jung, W. Song, and J.-H. Lee. Mining the blogosphere for top news stories identification. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’10, pages 395–402, New York, NY, USA, 2010. ACM.
Google Scholar
K. Lerman and T. Hogg. Using a model of social dynamics to predict popularity of news. In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 621–630, New York, NY, USA, 2010. ACM.
Google Scholar
D. Lewis and W. Croft. Term clustering of syntactic phrases. In Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval, pages 385–404. ACM, 1989.
Google Scholar
C. Lin, B. Zhao, Q. Mei, and J. Han. Pet: a statistical model for popular events tracking in social communities. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 929–938. ACM, 2010.
Google Scholar
Y. Lu, P. Tsaparas, A. Ntoulas, and L. Polanyi. Exploiting social context for review quality prediction. In Proceedings of the 19th international conference on World wide web,WWW’10, pages 691–700, New York, NY, USA, 2010. ACM.
Google Scholar
C. Macdonald, I. Ounis, and I. Soboroff. Overview of the trec-2009 blog track. Proceedings of TREC 2009, 2010.
Google Scholar
D. Margineantu, W. Wong, and D. Dash. Machine learning algorithms for event detection. Machine Learning, 79(3):257–259, 2010.
Article Google Scholar
J. McLean. State of the Blogosphere, introduction, 2009.
Google Scholar
M. Mendoza, B. Poblete, and C. Castillo. Twitter Under Crisis: Can we trust what we RT? In 1st Workshop on Social Media Analytics (SOMA’10), 2010.
Google Scholar
S. Moturu. Quantifying the Trustworthiness of User-Generated Social Media Content. PhD thesis, Arizona State University, 2009.
Google Scholar
S. Osinski, J. Stefanowski, and D. Weiss. Lingo: Search results clustering algorithm based on singular value decomposition. In Proceedings of the IIS: IIPWM’04 Conference, page 359, 2004.
Google Scholar
X.-H. Phan, L.-M. Nguyen, and S. Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceeding of the 17th international conference on World Wide Web, pages 91–100. ACM, 2008.
Google Scholar
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.
Google Scholar
T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web, pages 851–860. ACM, 2010.
Google Scholar
B. Sigurbjornsson and R. Van Zwol. Flickr tag recommendation based on collective knowledge. In Proceeding of the 17th international conference on World Wide Web, pages 327–336. ACM, 2008.
Google Scholar
A. Stavrianou, P. Andritsos, and N. Nicoloyannis. Overview and semantic issues of text mining. ACM SIGMOD Record, 36(3):23–34, 2007.
Article Google Scholar
Y. Sun, J. Han, J. Gao, and Y. Yu. itopicmodel: Information network-integrated topic modeling. In Data Mining, 2009. ICDM’09. Ninth IEEE International Conference on, pages 493–502. IEEE, 2009.
Google Scholar
Y. Sun, Y. Yu, and J. Han. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 797–806. ACM, 2009.
Google Scholar
J. Surowiecki. The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations. Random House of Canada, 2004.
Google Scholar
L. Tang and H. Liu. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 817–826. ACM, 2009.
Google Scholar
L. Urena-Lopez, M. Buenaga, and J. Gomez. Integrating linguistic resources in TC through WSD. Computers and the Humanities, 35(2):215–230, 2001.
Article Google Scholar
N. Van House. Flickr and public image-sharing: distant closeness and photo exhibition. In CHI’07 extended abstracts on Human factors in computing systems, pages 2717–2722. ACM, 2007.
Google Scholar
J. Wang, Y. Zhou, L. Li, B. Hu, and X. Hu. Improving short text clustering performance with keyword expansion. In The Sixth International Symposium on Neural Networks (ISNN 2009), pages 291–298. Springer, 2009.
Google Scholar
K. Wang, Z. Ming, X. Hu, and T. Chua. Segmentation of multisentence questions: towards effective question retrieval in cQA services. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 387–394. ACM, 2010.
Google Scholar
P.Wang and C. Domeniconi. Building semantic kernels for text classification using Wikipedia. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 713–721. ACM, 2008.
Google Scholar
X. Wang, L. Tang, H. Gao, and H. Liu. Discovering overlapping groups in social media. In the 10th IEEE International Conference on Data Mining series (ICDM2010), Sydney, Australia, December 14 - 17 2010.
Google Scholar
X. Wang, C. Zhai, X. Hu, and R. Sproat. Mining correlated bursty topic patterns from coordinated text streams. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 784–793. ACM, 2007.
Google Scholar
D. Yin, Z. Xue, L. Hong, and B. D. Davison. A probabilistic model for personalized tag prediction. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’10, pages 959–968, New York, NY, USA, 2010. ACM.
Google Scholar
Z. Yin, R. Li, Q. Mei, and J. Han. Exploring social tagging graph for web object classification. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’09, pages 957–966, New York, NY, USA, 2009. ACM.
Google Scholar
J. Yuan, Z. Zha, Z. Zhao, X. Zhou, and T. Chua. Utilizing related samples to learn complex queries in interactive concept-based video search. In Proceedings of the ACM International Conference on Image and Video Retrieval, pages 66–73. ACM, 2010.
Google Scholar
R. Zafarani and H. Liu. Connecting Corresponding Identities across Communities. In Proceedings of the 3rd International Conference on Weblogs and Social Media (ICWSM09), 2009.
Google Scholar
T. Zesch, C. Muller, and I. Gurevych. Extracting lexical semantic knowledge from wikipedia and wiktionary. In Proceedings of the Conference on Language Resources and Evaluation (LREC), pages 1646–1652. Citeseer, 2008.
Google Scholar
Z. Zha, X. Hua, T. Mei, J. Wang, G. Qi, and Z. Wang. Joint multilabel multi-instance learning for image classification. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008.
Google Scholar
Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007.
Google Scholar
Y. Zhou, H. Cheng, and J. Yu. Graph clustering based on structural/ attribute similarities. Proceedings of the VLDB Endowment, 2(1):718–729, 2009.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engineering, Arizona State University, Phoenix, USA
Xia Hu & Huan Liu

Authors

Xia Hu
View author publications
You can also search for this author in PubMed Google Scholar
Huan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xia Hu .

Editor information

Editors and Affiliations

Thomas J. Watson Research Center, IBM, Skyline Drive 19, Hawthorne, 10532, New York, USA
Charu C. Aggarwal
at Urbana-Champaign, University of Illinois, URBANA, 61801, Illinois, USA
ChengXiang Zhai

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hu, X., Liu, H. (2012). Text Analytics in Social Media. In: Aggarwal, C., Zhai, C. (eds) Mining Text Data. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-3223-4_12

Download citation

DOI: https://doi.org/10.1007/978-1-4614-3223-4_12
Published: 07 January 2012
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-3222-7
Online ISBN: 978-1-4614-3223-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics