Chalk and Cheese in Twitter: Discriminating Personal and Organization Accounts

Oentaryo, Richard Jayadi; Low, Jia-Wei; Lim, Ee-Peng

doi:10.1007/978-3-319-16354-3_51

Richard Jayadi Oentaryo¹⁹,
Jia-Wei Low¹⁹ &
Ee-Peng Lim¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9022))

Included in the following conference series:

European Conference on Information Retrieval

3819 Accesses
6 Citations

Abstract

Social media have been popular not only for individuals to share contents, but also for organizations to engage users and spread information. Given the trait differences between personal and organization accounts, the ability to distinguish between the two account types is important for developing better search/recommendation engines, marketing strategies, and information dissemination platforms. However, such task is non-trivial and has not been well studied thus far. In this paper, we present a new generic framework for classifying personal and organization accounts, based upon which comprehensive and systematic investigation on a rich variety of content, social, and temporal features can be carried out. In addition to generic feature transformation pipelines, the framework features a gradient boosting classifier that is accurate/robust and facilitates good data understanding such as the importance of different features. We demonstrate the efficacy of our approach through extensive experiments on Twitter data from Singapore, by which we discover several discriminative content, social, and temporal features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees (1984)
Google Scholar
Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on Twitter. In: EMNLP, pp. 1301–1309 (2011)
Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM-TIST 2(27), 1–27 (2011)
Article Google Scholar
Chang, J., Rosenn, I., Backstrom, L., Marlow, C.: ePluribus: Ethnicity on social networks. In: ICWSM, pp. 18–25 (2010)
Google Scholar
Cohen, R., Ruths, D.: Classifying political orientation on Twiter: It’s not easy? In: ICWSM, pp. 91–99 (2013)
Google Scholar
De Choudhury, M., Diakopoulos, N., Naaman, M.: Unfolding the event landscape on Twitter: Classification and exploration of user categories. In: CSCW (2012)
Google Scholar
Friedman, J.H.: Greedy function approximation: A gradient boosting machine. The Annals of Statistics 29(5), 1189–1232 (2001)
Article MATH MathSciNet Google Scholar
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: A new benchmark collection for text categorization research. JMLR 5, 361–397 (2004)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
Google Scholar
Smirnov, N.: Table for estimating the goodness of fit of empirical distributions. The Annals of Mathematical Statistics 19(2), 279–281 (1948)
Article MATH Google Scholar
Tavares, G., Faisal, A.A.: Scaling-laws of human broadcast communication enable distinction between human, corporate and robot Twitter users. PloS One 8(7), 1–11 (2013)
Article Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bulletin 1, 80–88 (1945)
Article Google Scholar
Yan, L., Ma, Q., Yoshikawa, M.: Classifying Twitter users based on user profile and followers distribution. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds.) DEXA 2013, Part I. LNCS, vol. 8055, pp. 396–403. Springer, Heidelberg (2013)
Chapter Google Scholar
Yin, P., Ram, N., Lee, W.-C., Tucker, C., Khandelwal, S., Salathé, M.: Two sides of a coin: Separating personal communication and public dissemination accounts in Twitter. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014, Part I. LNCS(LNAI), vol. 8443, pp. 163–175. Springer, Heidelberg (2014)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Living Analytics Research Centre, Singapore Management University, 80 Stamford Road, Singapore, 178902, Singapore
Richard Jayadi Oentaryo, Jia-Wei Low & Ee-Peng Lim

Authors

Richard Jayadi Oentaryo
View author publications
You can also search for this author in PubMed Google Scholar
Jia-Wei Low
View author publications
You can also search for this author in PubMed Google Scholar
Ee-Peng Lim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Vienna University of Technology, Institute of Software Technology and Interactive Systems, Favoritenstraße 9-11/188, 1040, Vienna, Austria
Allan Hanbury
Lumi, Semion Ltd., 111 Charterhouse Street, EC1M 6AW, London, UK
Gabriella Kazai
Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstraße 9-11/188, 1040, Vienna, Austria
Andreas Rauber
Universität Duisburg-Essen, Lotharstraße 65, 47057, Duisburg, Germany
Norbert Fuhr

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oentaryo, R.J., Low, JW., Lim, EP. (2015). Chalk and Cheese in Twitter: Discriminating Personal and Organization Accounts. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds) Advances in Information Retrieval. ECIR 2015. Lecture Notes in Computer Science, vol 9022. Springer, Cham. https://doi.org/10.1007/978-3-319-16354-3_51

Download citation

DOI: https://doi.org/10.1007/978-3-319-16354-3_51
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16353-6
Online ISBN: 978-3-319-16354-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics