Abstract
Social media have been popular not only for individuals to share contents, but also for organizations to engage users and spread information. Given the trait differences between personal and organization accounts, the ability to distinguish between the two account types is important for developing better search/recommendation engines, marketing strategies, and information dissemination platforms. However, such task is non-trivial and has not been well studied thus far. In this paper, we present a new generic framework for classifying personal and organization accounts, based upon which comprehensive and systematic investigation on a rich variety of content, social, and temporal features can be carried out. In addition to generic feature transformation pipelines, the framework features a gradient boosting classifier that is accurate/robust and facilitates good data understanding such as the importance of different features. We demonstrate the efficacy of our approach through extensive experiments on Twitter data from Singapore, by which we discover several discriminative content, social, and temporal features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees (1984)
Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on Twitter. In: EMNLP, pp. 1301–1309 (2011)
Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM-TIST 2(27), 1–27 (2011)
Chang, J., Rosenn, I., Backstrom, L., Marlow, C.: ePluribus: Ethnicity on social networks. In: ICWSM, pp. 18–25 (2010)
Cohen, R., Ruths, D.: Classifying political orientation on Twiter: It’s not easy? In: ICWSM, pp. 91–99 (2013)
De Choudhury, M., Diakopoulos, N., Naaman, M.: Unfolding the event landscape on Twitter: Classification and exploration of user categories. In: CSCW (2012)
Friedman, J.H.: Greedy function approximation: A gradient boosting machine. The Annals of Statistics 29(5), 1189–1232 (2001)
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: A new benchmark collection for text categorization research. JMLR 5, 361–397 (2004)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
Smirnov, N.: Table for estimating the goodness of fit of empirical distributions. The Annals of Mathematical Statistics 19(2), 279–281 (1948)
Tavares, G., Faisal, A.A.: Scaling-laws of human broadcast communication enable distinction between human, corporate and robot Twitter users. PloS One 8(7), 1–11 (2013)
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bulletin 1, 80–88 (1945)
Yan, L., Ma, Q., Yoshikawa, M.: Classifying Twitter users based on user profile and followers distribution. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds.) DEXA 2013, Part I. LNCS, vol. 8055, pp. 396–403. Springer, Heidelberg (2013)
Yin, P., Ram, N., Lee, W.-C., Tucker, C., Khandelwal, S., Salathé, M.: Two sides of a coin: Separating personal communication and public dissemination accounts in Twitter. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014, Part I. LNCS(LNAI), vol. 8443, pp. 163–175. Springer, Heidelberg (2014)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Oentaryo, R.J., Low, JW., Lim, EP. (2015). Chalk and Cheese in Twitter: Discriminating Personal and Organization Accounts. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds) Advances in Information Retrieval. ECIR 2015. Lecture Notes in Computer Science, vol 9022. Springer, Cham. https://doi.org/10.1007/978-3-319-16354-3_51
Download citation
DOI: https://doi.org/10.1007/978-3-319-16354-3_51
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16353-6
Online ISBN: 978-3-319-16354-3
eBook Packages: Computer ScienceComputer Science (R0)