Skip to main content

Sampling-based visual assessment computing techniques for an efficient social data clustering

Abstract

Visual methods were used for pre-cluster assessment and useful cluster partitions. Existing visual methods, such as visual assessment tendency (VAT), spectral VAT (SpecVAT), cosine-based VAT (cVAT), and multi-viewpoints cosine-based similarity VAT (MVS-VAT), effectively assess the knowledge about the number of clusters or cluster tendency. Tweets data partitioning is underlying the problem of social data clustering. Cosine-based visual methods succeeded widely in text data clustering. Thus, cVAT and MVS-VAT are the best suited methods for the derivation of social data clusters. However, MVS-VAT is facing the problem of scalability issues in terms of computational time and memory allocation. Therefore, this paper presents the sampling-based MVS-VAT computing technique to overcome the scalability problem in social data clustering to select sample inter-cluster viewpoints. Standard health keywords and benchmarked TREC2017 and TREC2018 health keywords are taken to extract health tweets in the experiment for illustrating the performance comparison between existing and proposed visual methods.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig.10

References

  1. 1.

    Lin YS, Jiang JY, Lee SJ (2014) A similarity measure for text classification and clustering. IEEE Trans Knowledge Data Eng (2014)

  2. 2.

    Rui X, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678

    Article  Google Scholar 

  3. 3.

    Rajendra Prasad K, Suleman Basha M (2016) Improving the performance of speech clustering method. In: IEEE 10th International Conference on Intelligent Systems and Control (ISCO).

  4. 4.

    Wu X, Kumar V, Quinlan JR et al (2008) Top 10 algorithms in data mining, knowledge information system, vol 14. Springer, Heidelberg, pp 1–37.

  5. 5.

    Sik-Lanyi et al (2019) Accessibility testing of European health-related websites. Arab J Sci Eng 44:9171–9190

  6. 6.

    Ramathilagam S, Devi R, Kannan SR (2013) Extended fuzzy c-means: an analyzing data clustering problems. Cluster Comput

  7. 7.

    Feng Yi, Bo Jiang, Jianjun Wu (2020) Topic modeling for short texts via word embedding and document correlation. IEEE Access 8:30692–30705

  8. 8.

    Lee D, Seung H (2000) Algorithms for non-negative matrix factorization. Advances in neural information processing systems 13, NIPS 2000. Denver, CO, USA, pp 556–562

  9. 9.

    Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  10. 10.

    Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407

    Article  Google Scholar 

  11. 11.

    T Hofmann (1999) Probabilistic latent semantic indexing. SIGIR. ACM, New York, pp 50–57

  12. 12.

    Xu G, Meng Y, Chen Z, Qiu X, Wang C, Yao H (2019) Research on topic detection and tracking for online news texts. IEEE Access 7:58407–58418

    Article  Google Scholar 

  13. 13.

    Bezdek JC, Hathaway RJ (2002) VAT: a tool for visual assessment of (cluster) tendency. In: Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02, 2002, pp 2225–2230

  14. 14.

    Bezdek, James Leckie (2008) SpecVAT: enhanced visual cluster analysis. IEEE Int Conf Data Mining, ICDM

  15. 15.

    Rajendra Prasad K, Mohammed M, Noorullah RM (2019) Visual topic models for healthcare data clustering. Evolutionary Intelligence.

  16. 16.

    Rajendra Prasad K, Mohammed M, Noorullah RM (2019) Hybrid topic cluster models for social healthcare data. Int J Adv Comput Sci Appl 10(11):490–506.

    Google Scholar 

  17. 17.

    Ali Seyed Shirkhorshidi, Saeed Aghabozorgi, Teh Ying Wah (2015) A comparison study on similarity and dissimilarity measures in clustering continuous data. PLoS 10(12):1–20

  18. 18.

    Suleman Basha M, Mouleeswaran SK, Rajendra Prasad K (2019) Cluster tendency methods for visualizing the data partitions. Int J Innovative Technol Explor Eng.

  19. 19.

    Vijeya Kaveri V, Maheswari V (2019) A framework for recommending health-related topics based on topic modeling in conversational data (Twitter). Cluster Computing.

  20. 20.

    Asghar MZ et al (2018) RIFT: a rule induction framework for twitter sentiment analysis. Arab J Sci Eng 43:857–877

    Article  Google Scholar 

  21. 21.

    Kumar D, Bezdek JC, Palaniswami M, Rajasegarar S, Leckie C, Havens TC (2016) A hybrid approach to clustering in big data. IEEE Trans Cybern 46(10):2372–2385

    Article  Google Scholar 

  22. 22.

    Kumar D, Palaniswami M, Rajasegarar S, Leckie C, Bezdek JC, Havens TC (2013) clusiVAT: A mixed visual/numerical clustering algorithm for big data. 2013 IEEE International Conference on Big Data, Silicon Valley, CA, 2013, pp 112–117.

  23. 23.

    Wuhan (2018) TF-IDF based feature words extraction and topic modeling for short text. In: ICMSS2018.

  24. 24.

    Wallach, Hanna M (2006) Topic modeling: beyond bag-of-words, ACM International Conference Proceeding Series, 2006

  25. 25.

    Alessia Amelio, Clara Pizzuti (2015) Is normalized mutual information a fair measure for comparing community detection methods?. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2015.

  26. 26.

    https://www.webmd.com/

  27. 27.

    https://trec.nist.gov/data/web2014.html

  28. 28.

    https://trec.nist.gov/data/microblog2015.html

  29. 29.

    Bodjanova S (2006) Crisp partitions Induced by a fuzzy set. In: Batagelj V, Bock HH, Ferligoj A., Žiberna A (eds) Data science and classification. Studies in classification, data analysis, and knowledge organization. Springer, Berlin (2006)

  30. 30.

    Pattanodom et al. (2016) Clustering data with the presence of missing values by ensemble approach. In: Second Asian Conference on Defense Technology.

  31. 31.

    Bhatnagar V, Majhi R, Jena PR (2018) Comparative performance evaluation of clustering algorithms for grouping manufacturing firms. Arab J Sci Eng 43:4071–4083

    Article  Google Scholar 

Download references

Acknowledgment

This work is supported by the Science & Engineering Research Board (SERB), Department of Science and Technology, Government of India for the Research Grant of DST Project Number ECR/2016/001556.

Author information

Affiliations

Authors

Corresponding author

Correspondence to M. Suleman Basha.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Basha, M.S., Mouleeswaran, S.K. & Prasad, K.R. Sampling-based visual assessment computing techniques for an efficient social data clustering. J Supercomput 77, 8013–8037 (2021). https://doi.org/10.1007/s11227-021-03618-6

Download citation

Keywords

  • Cluster tendency
  • Social data clustering
  • Scalability
  • Visual methods
  • Feature extraction