Skip to main content

A Comparative Study on Cluster Analysis of Microblogging Data

  • Conference paper
  • First Online:
Emerging Technologies in Data Mining and Information Security

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 813))

Abstract

In today’s world, Online Social Media is one of the popular platforms for information exchange. During any important occurrence in society such as World Cup, election, budget, natural calamity microblogs take a significant and flexible platform for public as communication media. Millions of messages are exchanged every day by a large set of users in several microblogs, which cause information overload. These phenomena open up numerous challenges for the researchers. Due to the noisy and precise nature of messages, it is a challenging task to cluster data and mine meaningful information to summarize any trending or non-trending topic. In association with this, growing of data in huge volume is another challenge of clustering. Several researchers proposed different methods for clustering. This work focuses on a comparative study of the different proposed clustering approaches using community detection and genetic algorithm-based techniques on microblogging data. The analysis also shows the comparative performance study of different clustering methods for the similar dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: real-world event identification on twitter. In: Fifth International AAAI Conference on Weblogs and Social Media (2011)

    Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). http://dl.acm.org/citation.cfm?id=944919.944937

  3. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998). http://dx.doi.org/10.1016/S0169-7552(98)00110-X

  4. Cataldi, M., Di Caro, L., Schifanella, C.: Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of the Tenth International Workshop on Multimedia Data Mining, MDMKDD ’10, pp. 4:1–4:10. ACM (2010). https://doi.org/10.1145/1814245.1814249

  5. Cheong, M., Lee, V.: A study on detecting patterns in twitter intra-topic user and message clustering. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 3125–3128 (2010). https://doi.org/10.1109/ICPR.2010.765

  6. Dutta, S., Ghatak, S., Das, A., Gupta, M., Dasgupta, S.: Feature selection based clustering on micro-blogging data. In: International Conference on Computational Intelligence in Data Mining (ICCIDM-2017) (2017)

    Google Scholar 

  7. Dutta, S., Ghatak, S., Ghosh, S., Das, A.K.: A genetic algorithm based tweet clustering technique. In: 2017 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–6 (2017). https://doi.org/10.1109/ICCCI.2017.8117721

  8. Dutta, S., Ghatak, S., Roy, M., Ghosh, S., Das, A.K.: A graph based clustering technique for tweet summarization. In: 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), pp. 1–6 (2015). https://doi.org/10.1109/ICRITO.2015.7359276

  9. Genc, Y., Sakamoto, Y., Nickerson, J.V.: Discovering context: classifying tweets through a semantic transform based on wikipedia. In: Proceedings of the 6th International Conference on Foundations of Augmented Cognition: Directing the Future of Adaptive Systems, FAC’11, pp. 484–492. Springer (2011)

    Google Scholar 

  10. Typhoon Hagupit—Wikipedia (2014). http://en.wikipedia.org/wiki/Typhoon_Hagupit

  11. Hyderabad blasts—Wikipedia (2013). http://en.wikipedia.org/wiki/2013_Hyderabad_blasts

  12. Infomap—community detection. http://www.mapequation.org/code.html

  13. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010). https://doi.org/10.1016/j.patrec.2009.09.011

  14. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999). https://doi.org/10.1145/331499.331504

  15. Karypis, G., Han, E.H.S., Kumar, V.: Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8), 68–75 (1999). https://doi.org/10.1109/2.781637

  16. Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM ’10, pp. 911–916. IEEE Computer Society, Washington, DC, USA (2010). https://doi.org/10.1109/ICDM.2010.35

  17. Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pp. 1155–1158. ACM (2010). https://doi.org/10.1145/1807167.1807306

  18. Michelson, M., Macskassy, S.A.: Discovering users’ topics of interest on twitter: a first look. In: Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data, AND ’10, pp. 73–80. ACM (2010). https://doi.org/10.1145/1871840.1871852

  19. Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. In: Proceedings of ICWSM. The AAAI Press (2010). http://dblp.uni-trier.de/db/conf/icwsm/icwsm2010.html#RamageDL10

  20. Rau, L.F., Jacobs, P.S., Zernik, U.: Information extraction and text summarization using linguistic knowledge acquisition. Inf. Process. Manag. 25(4), 419–428 (1989). https://doi.org/10.1016/0306-4573(89)90069-1. http://www.sciencedirect.com/science/article/pii/0306457389900691

  21. Sandy Hook Elementary School shooting—Wikipedia (2012). http://en.wikipedia.org/wiki/Sandy_Hook_Elementary_School_shooting

  22. REST API Resources, Twitter Developers. https://dev.twitter.com/docs/api

  23. North India floods—Wikipedia (2013). http://en.wikipedia.org/wiki/2013_North_India_floods

  24. Vanderwende, L., Suzuki, H., Brockett, C., Nenkova, A.: Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion. Inf. Process. Manag. 43(6), 1606–1618 (2007). http://dx.doi.org/10.1016/j.ipm.2007.01.023

  25. Wordnet—a lexical database for English. http://wordnet.princeton.edu/

  26. Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 177–186. ACM (2011)

    Google Scholar 

  27. Zhang, C., Lei, D., Yuan, Q., Zhuang, H., Kaplan, L.M., Wang, S., Han, J.: Geoburst+: effective and real-time local event detection in geo-tagged tweet streams. ACM TIST 9(3), 34:1–34:24 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soumi Dutta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dutta, S., Das, A.K., Dutta, G., Gupta, M. (2019). A Comparative Study on Cluster Analysis of Microblogging Data. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 813. Springer, Singapore. https://doi.org/10.1007/978-981-13-1498-8_77

Download citation

Publish with us

Policies and ethics