Skip to main content

Data Mining in Social Media

  • Chapter
  • First Online:
Social Network Data Analytics

Abstract

The rise of online social media is providing a wealth of social network data. Data mining techniques provide researchers and practitioners the tools needed to analyze large, complex, and frequently changing social media data. This chapter introduces the basics of data mining, reviews social media, discusses how to mine social media data, and highlights some illustrative examples with an emphasis on social networking sites and blogs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. N. Agarwal and H. Liu. Modeling and Data Mining in Blogosphere, volume 1 of Synthesis Lectures on Data Mining and Knowledge Discovery. Morgan and Claypool, 2009.

    Google Scholar 

  2. N. Agarwal, H. Liu, S. Subramanya, J. Salerno, and P. Yu. Connecting sparsely distributed similar bloggers. pages 11 –20, dec. 2009.

    Google Scholar 

  3. C. C. Aggarwal and H. Wang, editors. Managing and Mining Graph Data. Springer, 2009.

    Google Scholar 

  4. P. K. Akshay Java and T. Oates. Modeling the spread of influence on the blogosphere. Technical Report UMBC TR-CS-06-03, Universtiy of Maryland Baltimore County, 1000 Hilltop Circle Baltimore, MD, USA, March 2006.

    Google Scholar 

  5. A. Ammari and V. Zharkova. Combining tag cloud learning with svm classification to achieve intelligent search for relevant blog articles. In 1st International Workshop on Mining Social Media (MSM09-CAEPIA09), 2009.

    Google Scholar 

  6. A. Anagnostopoulos, R. Kumar, and M. Mahdian. Influence and correlation in social networks. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 7–15, New York, NY, USA, 2008. ACM.

    Google Scholar 

  7. E.-A. Baatarjav, S. Phithakkitnukoon, and R. Dantu. Group recommendation system for facebook. pages 211–219, 2010.

    Google Scholar 

  8. L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore art thou r3579x?: Anonymized social networks, hidden patterns, and structural steganography. In Proceedings of the 16th international conference on World Wide Web, pages 181–190, New York, NY, USA, 2007. ACM.

    Google Scholar 

  9. R. Bai, X. Wang, and J. Liao. Folksonomy for the blogosphere: Blog identification and classification. volume 3, pages 631 –635, 31 2009-april 2 2009.

    Google Scholar 

  10. R. Beckmann, C. Suzanne, and R. Langer. Netnography: Rich insights from online research. Insights@CBS, pages 1–4, September 2005. Published as a supplement to Insights@CBS, nr. 14, 6. September 2005: http://frontpage.cbs.dk/insights/670005.shtml.

    Google Scholar 

  11. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.

    MATH  Google Scholar 

  12. J. Bonneau, J. Anderson, and G. Danezis. Prying data out of a social network. pages 249 –254, july 2009.

    Google Scholar 

  13. C. T. Butts. Revisiting the foundation of network analysis. Science, 325:414–416, July 2009.

    Article  MathSciNet  Google Scholar 

  14. S.-K. Chai. Social computing: An opportunity for mathematical sociologists. The Mathematical Sociologist, 12(2), 2008-9.

    Google Scholar 

  15. S.-K. Chai, J. J. Salerno, and P. L. Mabry, editors. Advances in Social Computing, Lecture Notes in Computer Science. Third International Conference on Social Computing, Behavorial Modeling, and Prediction, SBP 2010, Springer, March 2010.

    Google Scholar 

  16. Y. Chi, S. Zhu, K. Hino, Y. Gong, and Y. Zhang. iolap: A framework for analyzing the internet, social networks, and other networked data. Multimedia, IEEE Transactions on, 11(3):372–382, april 2009.

    Article  Google Scholar 

  17. J. C. Cortizo, F. M. Carrero, J. M. Gomez, B. Monsalve, and P. Puertas. Introduction to mining social media. In F. M. Carrero, J. M. Gomez, B. Monsalve, P. Puertas, and J. C. a. Cortizo, editors, Proceedings of the 1st International Workshop on Mining Social Media, pages 1–3, 2009.

    Google Scholar 

  18. E. Cox. Fuzzy Modeling and Genetic Algorithms for Data Mining and Exploration. Elsevier/Morgan Kaufmann, Amsterdam, 2005.

    MATH  Google Scholar 

  19. D. Crandall, D. Cosley, D. Huttenlocher, J. Kleinberg, and S. Suri. Feedback effects between similarity and social influence in online communities. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 160–168, New York, NY, USA, 2008. ACM.

    Google Scholar 

  20. R. da Cunha Recuero. Information flows and social capital in weblogs: a case study in the brazilian blogosphere. In Proceedings of the nineteenth ACM conference on Hypertext and hypermedia, pages 97–106, New York, NY, USA, 2008. ACM.

    Google Scholar 

  21. P. Domingos andM. Richardson. Mining the network value of customers. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 57–66, New York, NY, USA, 2001. ACM.

    Google Scholar 

  22. N. Eagle, A. Pentland, and D. Lazer. Mobile phone data for inferring social network structure. In H. Liu, J. J. Salerno, and M. J. Young, editors, Social Computing, Behavioral Modeling, and Prediction, Computer Science, pages 79–88. Springer, April 2008.

    Google Scholar 

  23. C. Faloutsos, J. Han, and P. S. Yu., editors. Link Mining: Models, Algorithms and Applications. 2010.

    Google Scholar 

  24. P. Gloor, J. Krauss, S. Nann, K. Fischbach, and D. Schoder. Web science 2.0: Identifying trends through semantic social network analysis. volume 4, pages 215 –222, aug. 2009.

    Google Scholar 

  25. F. Gravetter and L. Wallnau. Essentials of Statistics for the Behavioral Sciences. Wadsworth, Belmont, 2002.

    Google Scholar 

  26. D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. In Proceedings of the 13th International Conference on World Wide Web, pages 491–501, New York, NY, USA, 2004. ACM.

    Google Scholar 

  27. M. Hamdaqa and A. Hamou-Lhadj. Citation analysis: An approach for facilitating the understanding and the analysis of regulatory compliance documents. pages 278 –283, april 2009.

    Google Scholar 

  28. J. Han. Data Mining Concepts and Techniques. Morgan Kaufmann, San Diego, 2006.

    Google Scholar 

  29. D. Hughes and R. Kellman. Blogging’s global impact and the future of blogging. Blog, October 2009. Accessed March 24, 2010.

    Google Scholar 

  30. A. Java. Mining social media communities and content. PhD thesis, Catonsville, MD, USA, 2008. Adviser-Finin, Timothy W.

    Google Scholar 

  31. W. Jun, J. Xin, and W. Yun-peng. An empirical study of knowledge collaboration networks in virtual community: Based on wiki. pages 1092 –1097, sept. 2009.

    Google Scholar 

  32. G. C. Kane, R. G. Fichman, J. Gallaugher, and J. Glasier. Community relations 2.0. Harvard Business Review, 87(11):45–50, November 2009.

    Google Scholar 

  33. A. M. Kaplan and M. Haenlein. Users of the world, unite! the challenges and opportunities of social media. Business Horizons, 53(1):59–68, Jan 2009.

    Article  Google Scholar 

  34. M. Kayaalp, T. Ozyer, and S. Ozyer. A collaborative and content based event recommendation system integrated with data collection scrapers and services at a social networking site. In Social Network Analysis and Mining, 2009. ASONAM ’09. International Conference on Advances in, pages 113 –118, july 2009.

    Google Scholar 

  35. E. Kim and S. Han. An analytical way to find influencers on social networks and validate their effects in disseminating social games. In Social Network Analysis and Mining, 2009. ASONAM ’09. International Conference on Advances in, pages 41 –46, july 2009.

    Google Scholar 

  36. I. King, J. Li, and K. T. Chan. A brief survey of computational approaches in social computing. In IJCNN’09: Proceedings of the 2009 international joint conference on Neural Networks, pages 2699–2706, Piscataway, NJ, USA, 2009. IEEE Press.

    Google Scholar 

  37. R. V. Kozinets. I want to believe: A netnography of the x-philes’ subculture of consumption. Advances in Consumer Research, 24:470–475, 1997.

    Google Scholar 

  38. R. V. Kozinets. The field behind the screen: Using netnography for marketing research in online communities. Journal of Marketing Research, 39(1):61–72, February 2002.

    Article  Google Scholar 

  39. S. Kumar, N. Agarwal, M. Lim, and H. Liu. Mapping socio-cultural dynamics in indonesian blogosphere. In Proceedings of the Third International Conference on Computational Cultural Dynamics (ICCCD 2009), 2009.

    Google Scholar 

  40. S. Kumar, R. Zafarani, M. Abbasi, G. Barbier, and H. Liu. Convergence of influential bloggers for topic discovery in the blogosphere. In S. K. Chai, J. Salerno, and P. Mabry, editors, Social Computing and Behavior Modeling, volume 6007 of Lectures Notes in Computer Science, pages 406–412, Springer, 2010.

    Google Scholar 

  41. Y.-S. Kwon, S.-W. Kim, S. Park, S.-H. Lim, and J. B. Lee. The information diffusion model in the blog world. In SNA-KDD ’09: Proceedings of the 3rd Workshop on Social Network Mining and Analysis, pages 1–9, New York, NY, USA, 2009. ACM.

    Google Scholar 

  42. G. Lakshmanan and M. Oberhofer. Knowledge discovery in the blogosphere: Approaches and challenges. Internet Computing, IEEE, 14(2):24–32, march-april 2010.

    Article  Google Scholar 

  43. D. Larose. Discovering Knowledge in Data. Wiley-Interscience, New York, 2005.

    MATH  Google Scholar 

  44. H. Lauw, J. C. Shafer, R. Agrawal, and A. Ntoulas. Homophily in the digital world: A livejournal case study. Internet Computing, IEEE, 14(2):15–23, march-april 2010.

    Article  Google Scholar 

  45. D. Lazer, A. Pentland, L. Adamic, S. Aral, A.-L. Barabasi, D. Brewer, N. Christakis, N. Contractor, J. Fowler, M. Gutmann, T. Jebara, G. King, M. Macy, D. Roy, and M. V. Alstyne. Computational social science. Science, 323:721–723, 2009.

    Article  Google Scholar 

  46. D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In CIKM ’03: Proceedings of the twelfth international conference on Information and knowledge management, pages 556–559, New York, NY, USA, 2004. ACM.

    Google Scholar 

  47. B. Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer, Berlin, 2006.

    Google Scholar 

  48. Z. Liu and L. Liu. Complex network property analysis of knowledge cooperation networks. pages 544 –547, may 2009.

    Google Scholar 

  49. I.-C. Moon, Y.-M. Kim, H.-J. Lee, and A. Oh. Temporal issue trend identifications in blogs. volume 4, pages 619 –626, aug. 2009.

    Google Scholar 

  50. F. M. R. Pardo and A. P. Padilla. Detecting blogs independently from the language and content. In 1st International Workshop on Mining Social Media (MSM09-CAEPIA09), 2009.

    Google Scholar 

  51. E. Qualman. Socialnomics. Knopf Books for Young Readers, New York, 2009.

    Google Scholar 

  52. J. Ritterman, M. Osborne, and E. Klein. Using prediction markets and twitter to predict swine flu pandemic. In F. M. Carrero, J. M. Gomez, B. Monsalve, P. Puertas, and J. C. a. Cortizo, editors, Proceedings of the 1st International Workshop on Mining Social Media, pages 9–17, 2009.

    Google Scholar 

  53. D. Schuler. Social computing. Commun. ACM, 37(1):28–29, 1994.

    Article  Google Scholar 

  54. I. Steinwart. Support Vector Machines. Westview, Boulder, 2008.

    Google Scholar 

  55. A. Stewart, L. Chen, R. Paiu, and W. Nejdl. Discovering information diffusion paths from blogosphere for online advertising. In ADKDD ’07: Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising, pages 46–54, New York, NY, USA, 2007. ACM.

    Google Scholar 

  56. P.-N. Tan. Introduction to Data Mining. Pearson Addison Wesley, San Francisco, 2006.

    Google Scholar 

  57. J. Tang, J. Sun, C.Wang, and Z. Yang. Social influence analysis in largescale networks. In KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 807–816, New York, NY, USA, 2009. ACM.

    Google Scholar 

  58. L. Tang and H. Liu. Scalable learning of collective behavior based on sparse social dimensions. In CIKM’09: Proceeding of the 18th ACMconference on Information and knowledge management, pages 1107–1116, New York, NY, USA, 2009. ACM.

    Google Scholar 

  59. L. Tang and H. Liu. Toward collective behavior prediction via social dimension extraction. Intelligent Systems, IEEE, PP(99):1 –1, 2010.

    Google Scholar 

  60. L. Tang, H. Liu, J. Zhang, N. Agarwal, and J. J. Salerno. Topic taxonomy adaptation for group profiling. ACM Trans. Knowl. Discov. Data, 1(4):1–28, January 2008.

    Google Scholar 

  61. L. Tang, X. Wang, and H. Liu. Uncoverning groups via heterogeneous interaction analysis. In Data Mining, 2009. ICDM ’09. Ninth IEEE International Conference on, pages 503 –512, 6-9 2009.

    Google Scholar 

  62. L. Tang, X.Wang, and H. Liu. Understanding emerging social structures — a group profiling approach. Technical report, School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, 2010.

    Google Scholar 

  63. B. Ulicny, M. Kokar, and C. Matheus. Metrics for monitoring a socialpolitical blogosphere: A malaysian case study. Internet Computing, IEEE, 14(2):34 –44, march-april 2010.

    Google Scholar 

  64. G. Vaynerchuk. Crush It!: Why Now Is the Time to Cash in on Your Passion. HarperCollins, 10 East 53rd Street, New York, NY 10022, 1st edition, 2009.

    Google Scholar 

  65. F.-Y. Wang, K. M. Carley, D. Zeng, and W. Mao. Social computing: From social informatics to social intelligence. Intelligent Systems, IEEE, 22(2):79 –83, March-April 2007.

    Google Scholar 

  66. J. Wang, Y. Luo, Y. Zhao, and J. Le. A survey on privacy preserving data mining. pages 111 –114, april 2009.

    Google Scholar 

  67. S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.

    Google Scholar 

  68. M. Wesch. An anthropological introduction to youtube. Presentation at the Library of Congress/Electronic, June 2008. Contributors include and The Digital Ethnography Working Group at Kansas State University; Accessed on 22 Mar 2010.

    Google Scholar 

  69. I. Witten and E. Frank. Data Mining. Morgan Kaufman, San Francisco, 2005.

    Google Scholar 

  70. D. Zhou, I. Councill, H. Zha, and C. Giles. Discovering temporal communities from social network documents. In Seventh IEEE International Conference on Data Mining, pages 745 –750, Oct. 2007.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Geoffrey Barbier .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Barbier, G., Liu, H. (2011). Data Mining in Social Media. In: Aggarwal, C. (eds) Social Network Data Analytics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-8462-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-8462-3_12

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-8461-6

  • Online ISBN: 978-1-4419-8462-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics