DICH: A framework for discovering implicit communities hidden in tweets

Peng, Dunlu; Lei, Xie; Huang, Ting

doi:10.1007/s11280-014-0279-z

DICH: A framework for discovering implicit communities hidden in tweets

Published: 21 February 2014

Volume 18, pages 795–818, (2015)
Cite this article

World Wide Web Aims and scope Submit manuscript

Dunlu Peng¹,
Xie Lei¹ &
Ting Huang¹

225 Accesses
5 Citations
Explore all metrics

Abstract

Microblogging (or tweet) has become a mainstream channel for people to share information with others on the Internet. Users are linked as a huge social network through tweets. Community recognition in tweet-based social network is very important for identifying users’ interests to help companies to improve their marketing strategies. However, because of being massive, involved in large fields, short-length and non-structural, it is difficult to process tweet messages with existing approaches straightforward. Due to this reason, in this work we present a framework DICH to Discover Implicit Communities Hidden in tweet data. To implement the framework, besides proposing techniques for preprocessing tweet data, we develop an unsupervised learning method called MbCLARANS, which is an optimized CLARANS algorithm, to discover the implicit communities hidden in tweet datasets. During the process of computing, the pairwise relationships between users are employed to improve the clustering quality. In addition, an adaptive k strategy is utilized to make the approach more applicable. The performance of the approach is demonstrated with experiments on tweet data collected from SINA Weibo.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Amjad, A., Mona, D., Pradeep, D., Dragomir, R.: Subgroup detection in ideological discussions. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - vol. 1, pp. 399–409 (2012)
Benevenuto, F., Rodrigues, T., Cha, M., Almeida, V.: Characterizing user behavior in online social networks. In: Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement (IMC’09), pp. 49–62 (2009)
Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Proceedings of the 21st World Wide Web Conference (WWW’11), pp. 587–596 (2011)
Chavez, E., Navarro, G., Baeza-Yates, B.A., Marroquin, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
Article Google Scholar
Chen, K., Liu, S.: Word identification for Mandarin Chinese sentences. In: Proceedings of the 15th International Conference on Computational Linguistics (COLING’92), pp. 101–107 (1992)
Elson, D., Dames, N., McKeown, K.: Extracting social networks from literary fiction. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics(ACL’10), pp. 138–147 (2010)
Falk, J.S.: Linguistics and Language: A survey of Basic Concepts and Implications. Wiley, New York (1978)
Google Scholar
Hassan, A., Abu-Jbara, A., Radev, D.: Extracting signed social networks from text. In: Proceedings of the Text Graphs Workshop at ACL (TextGraphs-7), pp. 4–12 (2012)
Huberman, B., Robero, D.M., Wu, F.: Social networks that matter: twitter under the microscope. First Monday 14(1–5), 1–9 (2009)
Google Scholar
Jain, A., Dubes, R.: Algorithms for Clustering Data, pp. 30–70. Prentice-Hall, Englewood Cliffs (1988)
MATH Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understanding microblogusage and communities. In: Proceedings of the 9th International Workshop on Knowledge Discovery on the Web(WebKDD2007), and the 1st International Workshop on Social Networks Analysis SNA-KDD’07, LNCS 5439, pp. 118138. Springer (2007)
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-Means clustering algorithm: analysis and implementation. IEEE Trans. Pattern. Anal. Mach. Intell. 27(7), 881–892 (2002)
Article Google Scholar
Kaufman, L., Rousseeuw, P.: Clustering by means of Medoids, in statistical data analysis based on the L1-Norm and related methods. In: Dodge, Y. (ed.) pp. 405–416. North-Holland (1987)
Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis, pp 121–157. Wiley, New York (1990)
Book Google Scholar
Lee, R., Wakamiya, S., Sumiya, K.: Discovery of unusual regional social activities using geo-tagged microblogs. World Wide Web: Internet Web Inf. Syst. 14(4), 321–349 (2011)
Article Google Scholar
Lento, T., Welser, H.T., Gu, L., Smith, M.: The ties that blog: examining the relationship between socialites and continued participation in the wallop we blogging system. Retrieved from http://www.ra.ethz.ch/CDstore/www2006/www.blogpulse.com/www2006-workshop/papers/Lento-Welser-Gu-Smith-TiesThatBlog.pdf (2006)
Leskovec, J., Lang, K.J., Mahoney, W.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International World Wide Web Conference (WWW’10), pp. 631–640 (2010)
Li, H., Nie, Z., Lee, W.C., Giles, L., Wen, J.R.: Scalable community discovery on textual data with relations. In: Proceedings of the 17th ACM 17th Conference on Information and Knowledge Management (CIKM’08), pp. 1203–1212 (2008)
Lin, W., Kong, X., Yu, P.S., Wu, Q., Jia, Y., Li, C.: Community detection in incomplete information networks. Proc. of the 21st World Wide Web Conference (WWW’12), pp. 341–350. ACM Press (2012)
Lu, Y., Wang, H., Zhai, C., Roth, D.: Unsupervised discovery of opposing opinion networks from forum discussions. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12), pp. 1642–1646 (2012)
McCallum, A., Wang, X., Corrada-Emmanuel, A.: Topic and role discovery in social networks with experiments on enron and academic Email. J. Artif. Intell. Res. 30, 249–272 (2007)
Google Scholar
Michiel, H.: Euclidean space. Encyclopedia of Mathematics, Springer (2001)
Musial, K., Kazienko, P.: Social networks on the Internet. World Wide Web: Internet Web Inf. Syst. 16(1), 31–72 (2013)
Article Google Scholar
Nardi, B.A., Schiano, D.J., Gumbrecht, M., Swartz, L.: Why we blog. Commun. ACM 47(12), 41–46 (2004)
Article Google Scholar
Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th Conference on Very Large Data Bases (VLDB’94), pp. 144–155. Morgan Kaufmann (1994)
Sachan, M., Contractor, D., Faruquie, T.A., Subramaniam, L.V.: Using content and interactions for discovering communities in social networks. In: Proceedings of the 21st World Wide Web Conference(WWW’12), pp. 331–340. ACM Press (2012)
Tversky, A.: Feature of similarty. Psychol. Rev. 84, 327–352 (1977)
Article Google Scholar
Yan, X., Yan, L.: Gender classification of weblog authors. In: Proceedings of the 2006 AAAI Spring Symposium on Computation Approaches for Analyzing Weblogs(AAAI’06), Technical Report SS-06-03, pp. 228–230 (2006)
Zardi, H., Romdhane, L.B.: An O(n ²) algorithm for detecting communities of unbalanced sizes in large scale social networks. Knowl. Based Syst. 37, 19–36 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Shanghai Key Lab of Modern Optical System, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
Dunlu Peng, Xie Lei & Ting Huang

Authors

Dunlu Peng
View author publications
You can also search for this author in PubMed Google Scholar
Xie Lei
View author publications
You can also search for this author in PubMed Google Scholar
Ting Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dunlu Peng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peng, D., Lei, X. & Huang, T. DICH: A framework for discovering implicit communities hidden in tweets. World Wide Web 18, 795–818 (2015). https://doi.org/10.1007/s11280-014-0279-z

Download citation

Received: 02 February 2013
Revised: 16 January 2014
Accepted: 29 January 2014
Published: 21 February 2014
Issue Date: July 2015
DOI: https://doi.org/10.1007/s11280-014-0279-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DICH: A framework for discovering implicit communities hidden in tweets

Abstract

Access this article

Similar content being viewed by others

Political mud slandering and power dynamics during Indian assembly elections

Analyzing voter behavior on social media during the 2020 US presidential election campaign

A hybrid information-based two-phase expansion algorithm for community detection with imbalanced scales

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DICH: A framework for discovering implicit communities hidden in tweets

Abstract

Access this article

Similar content being viewed by others

Political mud slandering and power dynamics during Indian assembly elections

Analyzing voter behavior on social media during the 2020 US presidential election campaign

A hybrid information-based two-phase expansion algorithm for community detection with imbalanced scales

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation