Abstract
In today’s world, Online Social Media is one of the popular platforms for information exchange. During any important occurrence in society such as World Cup, election, budget, natural calamity microblogs take a significant and flexible platform for public as communication media. Millions of messages are exchanged every day by a large set of users in several microblogs, which cause information overload. These phenomena open up numerous challenges for the researchers. Due to the noisy and precise nature of messages, it is a challenging task to cluster data and mine meaningful information to summarize any trending or non-trending topic. In association with this, growing of data in huge volume is another challenge of clustering. Several researchers proposed different methods for clustering. This work focuses on a comparative study of the different proposed clustering approaches using community detection and genetic algorithm-based techniques on microblogging data. The analysis also shows the comparative performance study of different clustering methods for the similar dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: real-world event identification on twitter. In: Fifth International AAAI Conference on Weblogs and Social Media (2011)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). http://dl.acm.org/citation.cfm?id=944919.944937
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998). http://dx.doi.org/10.1016/S0169-7552(98)00110-X
Cataldi, M., Di Caro, L., Schifanella, C.: Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of the Tenth International Workshop on Multimedia Data Mining, MDMKDD ’10, pp. 4:1–4:10. ACM (2010). https://doi.org/10.1145/1814245.1814249
Cheong, M., Lee, V.: A study on detecting patterns in twitter intra-topic user and message clustering. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 3125–3128 (2010). https://doi.org/10.1109/ICPR.2010.765
Dutta, S., Ghatak, S., Das, A., Gupta, M., Dasgupta, S.: Feature selection based clustering on micro-blogging data. In: International Conference on Computational Intelligence in Data Mining (ICCIDM-2017) (2017)
Dutta, S., Ghatak, S., Ghosh, S., Das, A.K.: A genetic algorithm based tweet clustering technique. In: 2017 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–6 (2017). https://doi.org/10.1109/ICCCI.2017.8117721
Dutta, S., Ghatak, S., Roy, M., Ghosh, S., Das, A.K.: A graph based clustering technique for tweet summarization. In: 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), pp. 1–6 (2015). https://doi.org/10.1109/ICRITO.2015.7359276
Genc, Y., Sakamoto, Y., Nickerson, J.V.: Discovering context: classifying tweets through a semantic transform based on wikipedia. In: Proceedings of the 6th International Conference on Foundations of Augmented Cognition: Directing the Future of Adaptive Systems, FAC’11, pp. 484–492. Springer (2011)
Typhoon Hagupit—Wikipedia (2014). http://en.wikipedia.org/wiki/Typhoon_Hagupit
Hyderabad blasts—Wikipedia (2013). http://en.wikipedia.org/wiki/2013_Hyderabad_blasts
Infomap—community detection. http://www.mapequation.org/code.html
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010). https://doi.org/10.1016/j.patrec.2009.09.011
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999). https://doi.org/10.1145/331499.331504
Karypis, G., Han, E.H.S., Kumar, V.: Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8), 68–75 (1999). https://doi.org/10.1109/2.781637
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM ’10, pp. 911–916. IEEE Computer Society, Washington, DC, USA (2010). https://doi.org/10.1109/ICDM.2010.35
Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pp. 1155–1158. ACM (2010). https://doi.org/10.1145/1807167.1807306
Michelson, M., Macskassy, S.A.: Discovering users’ topics of interest on twitter: a first look. In: Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data, AND ’10, pp. 73–80. ACM (2010). https://doi.org/10.1145/1871840.1871852
Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. In: Proceedings of ICWSM. The AAAI Press (2010). http://dblp.uni-trier.de/db/conf/icwsm/icwsm2010.html#RamageDL10
Rau, L.F., Jacobs, P.S., Zernik, U.: Information extraction and text summarization using linguistic knowledge acquisition. Inf. Process. Manag. 25(4), 419–428 (1989). https://doi.org/10.1016/0306-4573(89)90069-1. http://www.sciencedirect.com/science/article/pii/0306457389900691
Sandy Hook Elementary School shooting—Wikipedia (2012). http://en.wikipedia.org/wiki/Sandy_Hook_Elementary_School_shooting
REST API Resources, Twitter Developers. https://dev.twitter.com/docs/api
North India floods—Wikipedia (2013). http://en.wikipedia.org/wiki/2013_North_India_floods
Vanderwende, L., Suzuki, H., Brockett, C., Nenkova, A.: Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion. Inf. Process. Manag. 43(6), 1606–1618 (2007). http://dx.doi.org/10.1016/j.ipm.2007.01.023
Wordnet—a lexical database for English. http://wordnet.princeton.edu/
Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 177–186. ACM (2011)
Zhang, C., Lei, D., Yuan, Q., Zhuang, H., Kaplan, L.M., Wang, S., Han, J.: Geoburst+: effective and real-time local event detection in geo-tagged tweet streams. ACM TIST 9(3), 34:1–34:24 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Dutta, S., Das, A.K., Dutta, G., Gupta, M. (2019). A Comparative Study on Cluster Analysis of Microblogging Data. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 813. Springer, Singapore. https://doi.org/10.1007/978-981-13-1498-8_77
Download citation
DOI: https://doi.org/10.1007/978-981-13-1498-8_77
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1497-1
Online ISBN: 978-981-13-1498-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)