Skip to main content
Log in

DICH: A framework for discovering implicit communities hidden in tweets

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Microblogging (or tweet) has become a mainstream channel for people to share information with others on the Internet. Users are linked as a huge social network through tweets. Community recognition in tweet-based social network is very important for identifying users’ interests to help companies to improve their marketing strategies. However, because of being massive, involved in large fields, short-length and non-structural, it is difficult to process tweet messages with existing approaches straightforward. Due to this reason, in this work we present a framework DICH to Discover Implicit Communities Hidden in tweet data. To implement the framework, besides proposing techniques for preprocessing tweet data, we develop an unsupervised learning method called MbCLARANS, which is an optimized CLARANS algorithm, to discover the implicit communities hidden in tweet datasets. During the process of computing, the pairwise relationships between users are employed to improve the clustering quality. In addition, an adaptive k strategy is utilized to make the approach more applicable. The performance of the approach is demonstrated with experiments on tweet data collected from SINA Weibo.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Amjad, A., Mona, D., Pradeep, D., Dragomir, R.: Subgroup detection in ideological discussions. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - vol. 1, pp. 399–409 (2012)

  2. Benevenuto, F., Rodrigues, T., Cha, M., Almeida, V.: Characterizing user behavior in online social networks. In: Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement (IMC’09), pp. 49–62 (2009)

  3. Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Proceedings of the 21st World Wide Web Conference (WWW’11), pp. 587–596 (2011)

  4. Chavez, E., Navarro, G., Baeza-Yates, B.A., Marroquin, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)

    Article  Google Scholar 

  5. Chen, K., Liu, S.: Word identification for Mandarin Chinese sentences. In: Proceedings of the 15th International Conference on Computational Linguistics (COLING’92), pp. 101–107 (1992)

  6. Elson, D., Dames, N., McKeown, K.: Extracting social networks from literary fiction. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics(ACL’10), pp. 138–147 (2010)

  7. Falk, J.S.: Linguistics and Language: A survey of Basic Concepts and Implications. Wiley, New York (1978)

    Google Scholar 

  8. Hassan, A., Abu-Jbara, A., Radev, D.: Extracting signed social networks from text. In: Proceedings of the Text Graphs Workshop at ACL (TextGraphs-7), pp. 4–12 (2012)

  9. Huberman, B., Robero, D.M., Wu, F.: Social networks that matter: twitter under the microscope. First Monday 14(1–5), 1–9 (2009)

    Google Scholar 

  10. Jain, A., Dubes, R.: Algorithms for Clustering Data, pp. 30–70. Prentice-Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  11. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  12. Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understanding microblogusage and communities. In: Proceedings of the 9th International Workshop on Knowledge Discovery on the Web(WebKDD2007), and the 1st International Workshop on Social Networks Analysis SNA-KDD’07, LNCS 5439, pp. 118138. Springer (2007)

  13. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-Means clustering algorithm: analysis and implementation. IEEE Trans. Pattern. Anal. Mach. Intell. 27(7), 881–892 (2002)

    Article  Google Scholar 

  14. Kaufman, L., Rousseeuw, P.: Clustering by means of Medoids, in statistical data analysis based on the L1-Norm and related methods. In: Dodge, Y. (ed.) pp. 405–416. North-Holland (1987)

  15. Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis, pp 121–157. Wiley, New York (1990)

    Book  Google Scholar 

  16. Lee, R., Wakamiya, S., Sumiya, K.: Discovery of unusual regional social activities using geo-tagged microblogs. World Wide Web: Internet Web Inf. Syst. 14(4), 321–349 (2011)

    Article  Google Scholar 

  17. Lento, T., Welser, H.T., Gu, L., Smith, M.: The ties that blog: examining the relationship between socialites and continued participation in the wallop we blogging system. Retrieved from http://www.ra.ethz.ch/CDstore/www2006/www.blogpulse.com/www2006-workshop/papers/Lento-Welser-Gu-Smith-TiesThatBlog.pdf (2006)

  18. Leskovec, J., Lang, K.J., Mahoney, W.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International World Wide Web Conference (WWW’10), pp. 631–640 (2010)

  19. Li, H., Nie, Z., Lee, W.C., Giles, L., Wen, J.R.: Scalable community discovery on textual data with relations. In: Proceedings of the 17th ACM 17th Conference on Information and Knowledge Management (CIKM’08), pp. 1203–1212 (2008)

  20. Lin, W., Kong, X., Yu, P.S., Wu, Q., Jia, Y., Li, C.: Community detection in incomplete information networks. Proc. of the 21st World Wide Web Conference (WWW’12), pp. 341–350. ACM Press (2012)

  21. Lu, Y., Wang, H., Zhai, C., Roth, D.: Unsupervised discovery of opposing opinion networks from forum discussions. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12), pp. 1642–1646 (2012)

  22. McCallum, A., Wang, X., Corrada-Emmanuel, A.: Topic and role discovery in social networks with experiments on enron and academic Email. J. Artif. Intell. Res. 30, 249–272 (2007)

    Google Scholar 

  23. Michiel, H.: Euclidean space. Encyclopedia of Mathematics, Springer (2001)

  24. Musial, K., Kazienko, P.: Social networks on the Internet. World Wide Web: Internet Web Inf. Syst. 16(1), 31–72 (2013)

    Article  Google Scholar 

  25. Nardi, B.A., Schiano, D.J., Gumbrecht, M., Swartz, L.: Why we blog. Commun. ACM 47(12), 41–46 (2004)

    Article  Google Scholar 

  26. Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th Conference on Very Large Data Bases (VLDB’94), pp. 144–155. Morgan Kaufmann (1994)

  27. Sachan, M., Contractor, D., Faruquie, T.A., Subramaniam, L.V.: Using content and interactions for discovering communities in social networks. In: Proceedings of the 21st World Wide Web Conference(WWW’12), pp. 331–340. ACM Press (2012)

  28. Tversky, A.: Feature of similarty. Psychol. Rev. 84, 327–352 (1977)

    Article  Google Scholar 

  29. Yan, X., Yan, L.: Gender classification of weblog authors. In: Proceedings of the 2006 AAAI Spring Symposium on Computation Approaches for Analyzing Weblogs(AAAI’06), Technical Report SS-06-03, pp. 228–230 (2006)

  30. Zardi, H., Romdhane, L.B.: An O(n 2) algorithm for detecting communities of unbalanced sizes in large scale social networks. Knowl. Based Syst. 37, 19–36 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dunlu Peng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peng, D., Lei, X. & Huang, T. DICH: A framework for discovering implicit communities hidden in tweets. World Wide Web 18, 795–818 (2015). https://doi.org/10.1007/s11280-014-0279-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-014-0279-z

Keywords

Navigation