Skip to main content
Log in

User characterization for online social networks

  • Review Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Online social network analysis has attracted great attention with a vast number of users sharing information and availability of APIs that help to crawl online social network data. In this paper, we study the research studies that are helpful for user characterization as online users may not always reveal their true identity or attributes. We especially focused on user attribute determination such as gender and age; user behavior analysis such as motives for deception; mental models that are indicators of user behavior; user categorization such as bots versus humans; and entity matching on different social networks. We believe our summary of analysis of user characterization will provide important insights into researchers and better services to online users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Adali S, Golbeck J (2012) Predicting personality with social behavior. In: Proceedings of the 2012 international conference on advances in social networks analysis and mining (ASONAM 2012), pp 302–309

  • Alowibdi JS, Buy UA, Yu P (2013) Empirical evaluation of profile characteristics for gender classification on twitter. In: Proceedings of the 2013 12th international conference on machine learning and applications, vol 1. ICMLA ’13IEEE Computer Society, Washington, DC, USA, pp 365–369

  • Alowibdi JS, Buy UA, Yu P (2013) Language independent gender classification on twitter. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. ASONAM ’13ACM, New York, NY, USA, pp 739–743

  • Alowibdi JS, Buy U, Yu PS, Stenneth L et al (2014) Detecting deception in online social networks. In: 2014 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 383–390

  • Amichai-Hamburger Y, Vinitzky G (2010) Social network use and personality. Comput Hum Behav 26(6):1289–1295

    Article  Google Scholar 

  • Aydogan A, Tuna T, Yildirim A (2016) Does political elites represent their followers? Quantitative text analysis of Turkish tweets. In: Proceedings of the midwest political science association 74th annual conference. Chicago, USA

  • Backstrom L, Kleinberg J, Kumar R, Novak J (2008) Spatial variation in search engine queries. In: Proceedings of the 17th international conference on world wide web. WWW ’08ACM, New York, NY, USA, pp 357–366

  • Benevenuto F, Rodrigues T, Cha M, Almeida V (2009) Characterizing user behavior in online social networks. In: Proceedings of the 9th ACM SIGCOMM conference on internet measurement conference, pp 49–62

  • Bhaskaran N, Nwogu I, Frank MG, Govindaraju V (2011) Lie to me: deceit detection via online behavioral learning. In: 2011 IEEE international conference on automatic face & gesture recognition and workshops (FG 2011), pp 24–29

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Blythe J, Camp LJ (2012) Implementing mental models. In: 2012 IEEE symposium on security and privacy workshops (SPW), pp 86–90

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MathSciNet  MATH  Google Scholar 

  • Burger JD, Henderson J, Kim G, Zarrella G (2011) Discriminating gender on twitter. In: Proceedings of the conference on empirical methods in natural language processing, EMNLP ’11. Association for Computational Linguistics, pp 1301–1309

  • Burt DM, Perrett DI (1995) Perception of age in adult caucasian male faces: computer graphic manipulation of shape and colour information. P R Soc Lond B Biol Sci 259(1355):137–143. doi:10.1098/rspb.1995.0021

    Article  Google Scholar 

  • Chang Hw, Lee D, Eltaher M, Lee J (2012) @phillies tweeting from philly? Predicting twitter user locations with spatial word usage. In: Proceedings of the 2012 international conference on advances in social networks analysis and mining, ASONAM ’12. IEEE Computer Society, Washington, DC, USA, pp 111–118

  • Cheng N, Chandramouli R, Subbalakshmi KP (2011) Author gender identification from text. Digit Investig 8(1):78–88

    Article  Google Scholar 

  • Cheng Z, Caverlee J, Lee K (2010) You are where you tweet: a content-based approach to geo-locating twitter users. In: Proceedings of the 19th ACM international conference on information and knowledge management. CIKM ’10ACM, New York, NY, USA, pp 759–768

  • Chu Z, Gianvecchio S, Wang H, Jajodia S (2010) Who is tweeting on twitter: human, bot, or cyborg? In: Proceedings of the 26th annual computer security applications conference. ACSAC ’10ACM, New York, NY, USA, pp 21–30

  • Ciot M, Sonderegger M, Ruths D (2013) Gender inference of twitter users in non-English contexts. In: EMNLP. ACL, pp 1136–1145

  • Coleman JS (1988) Social capital in the creation of human capital. Am J Sociol 94:S95–S120

    Article  Google Scholar 

  • Darmon D, Sylvester J, Girvan M, Rand W (2013) Predictability of user behavior in social media: Bottom-up v. top-down modeling. In: 2013 international conference on social computing (SocialCom), pp 102–107

  • Deitrick W, Miller Z, Valyou B, Dickinson B, Munson T, Hu W (2012) Gender identification on twitter using the modified balanced winnow. Commun Netw 4(3):189–195

    Article  Google Scholar 

  • Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874

    MATH  Google Scholar 

  • Fazeen M, Dantu R, Guturu P (2011) Identification of leaders, lurkers, associates and spammers in a social network: context-dependent and context-independent approaches. Soc Netw Anal Min 1(3):241–254

    Article  Google Scholar 

  • Ferrara E, Varol O, Davis C, Menczer F, Flammini A (2016) The rise of social bots. Commun ACM 59(7):96–104

    Article  Google Scholar 

  • Fire M, Kagan D, Elyashar A, Elovici Y (2014) Friend or foe? Fake profile identification in online social networks. Soc Netw Anal Min 4(1):1–23

    Article  Google Scholar 

  • Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Saitta L (ed) Proceedings of the thirteenth international conference on machine learning (ICML 1996). Morgan Kaufmann, pp 148–156

  • Friedman J, Hastie T, Tibshirani R et al (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 28(2):337–407

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman C, Sideli R (1992) Tolerating spelling errors during patient validation. Comput Biomed Res 25(5):486–509

    Article  Google Scholar 

  • Garg V, Nilizadeh S (2013) Craigslist scams and community composition: investigating online fraud victimization. In: Security and privacy workshops (SPW), pp 123–126

  • Golbeck J, Hansen D (2014) A method for computing political preference among Twitter followers. Soc Netw 36:177–184

    Article  Google Scholar 

  • Golbeck J, Robles C, Turner K (2011) Predicting personality with social media. In: CHI ’11 extended abstracts on human factors in computing systems., CHI EA ’11ACM, New York, NY, USA, pp 253–262

  • Griffin C, Squicciarini A (2012) Toward a game theoretic model of information release in social media with experimental results. In: 2012 IEEE Symposium on security and privacy workshops (SPW), pp 113–116

  • Grimaudo L, Song HH, Baldi M, Mellia M, Munafo M (2014) Tucan: Twitter user centric analyzer. In: Online social media analysis and visualization. Springer, pp 63–79

  • Gyarmati L, Trinh TA (2010) Measuring user behavior in online social networks. IEEE Netw 24(5):26–31

    Article  Google Scholar 

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11(1):10–18

    Article  Google Scholar 

  • Harris interactive public relations research. A study of social networks scams

  • Holmes D, McCabe M (2002) Improving precision and recall for soundex retrieval. In: Proceedings of international conference on information technology: coding and computing, 2002, pp 22–26

  • Ikeda K, Hattori G, Matsumoto K, Ono C, Higashino T (2012) Demographic estimation of twitter users for marketing analysis. IPSJ Trans Consum Devices Syst 2(1):82–93

    Google Scholar 

  • Ito J, Hoshide T, Toda H, Uchiyama T, Nishida K (2013) What is he/she like?: Estimating twitter user attributes from contents and social neighbors. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. ASONAM ’13ACM, New York, NY, USA, pp 1448–1450

  • Jin L, Chen Y, Wang T, Hui P, Vasilakos AV (2013) Understanding user behavior in online social networks: a survey. Commun Mag 51(9):144–150

    Article  Google Scholar 

  • Johansson F, Kaati L, Shrestha A (2013) Detecting multiple aliases in social media. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining, pp 1004–1011

  • John O, Donahue E, Kentle R (1991) The big five inventory—versions 4a and 54, Berkeley: University of California. Institute of Personality and Social Research, Berkeley

  • Klimt B, Yang Y (2004) The Enron corpus: a new dataset for email classification research. In: Machine learning: ECML 2004. Springer, pp 217–226

  • Kohli S, Gupta A (2014) Modeling anonymous human behavior using social media. In: 2014 9th international conference for internet technology and secured transactions (ICITST), pp 409–412

  • Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 59(1):161–205

    Article  MATH  Google Scholar 

  • Lee ES (1966) A theory of migration. Demography 3(1):47–57

    Article  Google Scholar 

  • Lewis DD, Yang Y, Rose TG, Li F (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397

    Google Scholar 

  • Lotka AJ (1926) The frequency distribution of scientific productivity. J Wash Acad Sci 16:316–322

    Google Scholar 

  • Maia M, Almeida J, Almeida V (2008) Identifying user behavior in online social networks. In: Proceedings of the 1st workshop on social network systems, pp 1–6

  • Mazumder A, Das A, Kim N, Gokalp S, Sen A, Davulcu H (2013) Spatio-temporal signal recovery from political tweets in Indonesia. In: 2013 international conference on social computing (SocialCom), pp 280–287

  • Moon B (1995) Paradigms in migration research: exploring ‘moorings’ as a schema. Prog Hum Geogr 19(4):504–524

    Article  Google Scholar 

  • Moskowitz DS, Zuroff DC (2004) Flux, pulse, and spin: dynamic additions to the personality lexicon. J Pers Soc Psychol 86(6):880–893

    Article  Google Scholar 

  • Moskowitz DS, Zuroff DC (2005) Robust predictors of flux, pulse, and spin. J Res Pers 39:130–147

    Article  Google Scholar 

  • Murphy CA (2012) The role of perception in age estimation. In: Digital forensics and cyber crime: third international ICST conference, Berlin, Heidelberg, pp 1–16

  • Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88

    Article  Google Scholar 

  • Orebaugh A, Allnutt J (2010) Data mining instant messaging communications to perform author identification for cybercrime investigations. In: Digital forensics and cyber crime: first international ICST conference, Berlin, Heidelberg, pp 99–110

  • Orebaugh A, Allnutt J (2009) Classification of instant messaging communications for forensics analysis. Int J Forensic Comput Sci 1:22–28

    Article  Google Scholar 

  • Ortega FJ, Troyano JA, Cruz FL, Vallejo CG, EnríQuez F (2012) Propagation of trust and distrust for the detection of trolls in a social network. Comput Netw 56(12):2884–2895

    Article  Google Scholar 

  • Otte E, Rousseau R (2002) Social network analysis: a powerful strategy, also for the information sciences. J Inf Sci 28(6):441–453

    Article  Google Scholar 

  • Ozel B (2012) Link and node analysis of gender based collaborations in Turkish social sciences. In: Proceedings of the 2012 international conference on advances in social networks analysis and mining (ASONAM 2012), pp 15–19

  • Paradesi S, Seneviratne O, Kagal L (2012) Policy aware social miner. In: 2012 IEEE symposium on security and privacy workshops (SPW), pp 53–59

  • Peersman C, Daelemans W, Van Vaerenbergh L (2011) Predicting age and gender in online social networks. In: Proceedings of the 3rd international workshop on search and mining user-generated contents. SMUC ’11ACM, New York, NY, USA, pp 37–44

  • Peled O, Fire M, Rokach L, Elovici Y (2013) Entity matching in online social networks. In: 2013 international conference on social computing (SocialCom), pp 339–344

  • Perrin A (2015) Social media usage: 2005–2015. http://www.pewinternet.org/2015/10/08/social-networking-usage-2005-2015/

  • Quercia D, Kosinski M, Stillwell D, Crowcroft J (2011) Our twitter profiles, our selves: predicting personality with twitter. In: Privacy, security, risk and trust (PASSAT) and 2011 IEEE third international conference on social computing (SocialCom), pp 180–185

  • Raghavan S (2013) Digital forensic research: current state of the art. CSI Trans ICT 1(1):91–114

    Article  Google Scholar 

  • Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630

    Article  Google Scholar 

  • Sakakura Y, Amagasa T, Kitagawa H (2012) Detecting social bookmark spams using multiple user accounts. In: Proceedings of the 2012 international conference on advances in social networks analysis and mining (ASONAM 2012), pp 1153–1158

  • Savage D, Zhang X, Yu X, Chou P, Wang Q (2014) Anomaly detection in online social networks. Soc Netw 39:62–70

    Article  Google Scholar 

  • Sayaf R, Rule J, Clarke D (2013) Can users control their data in social software? An ethical analysis of control systems. In: 2013 IEEE security and privacy workshops (SPW), pp 1–4

  • Selwyn N (2007) screw blackboard... do it on facebook!: an investigation of students educational use of facebook. Ponencia. En: Poke

  • Serdyukov P, Murdock V, van Zwol R (2009) Placing flickr photos on a map. In: Proceedings of the 32rd international ACM SIGIR conference on research and development in information retrieval. SIGIR ’09ACM, New York, NY, USA, pp 484–491

  • Song HJ, Son JW, Park SB (2013) Identifying user attributes through non-i.i.d. multi-instance learning. Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. ASONAM ’13ACM, New York, NY, USA, pp 1467–1468

  • Squicciarini A, Griffin C (2014) Why and how to deceive: game results with sociological evidence. Soc Netw Anal Min 4(1):1–13

    Article  Google Scholar 

  • Stafford G, Yu L (2013) An evaluation of the effect of spam on twitter trending topics. In: 2013 international conference on social computing (SocialCom), pp 373–378

  • Statista: Leading social networks worldwide as of January 2016, ranked by number of active users (in millions) (2016). http://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/

  • Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In: Proceedings of the 26th annual computer security applications conference. ACSAC ’10ACM, New York, NY, USA, pp 1–9

  • Tagarelli A, Interdonato R (2013) who’s out there? identifying and ranking lurkers in social networks. In: 2013 IEEE/ACM International Conference on advances in social networks analysis and mining (ASONAM), pp 215–222

  • Tamburrini N, Cinnirella M, Jansen VA, Bryden J (2015) Twitter users change word usage according to conversation-partner social identity. Soc Netw 40:84–89

    Article  Google Scholar 

  • Teoh KK, Pourshafie T, Balakrishnan, V (2014) A gender lens perspective of the use of social network in higher education in Malaysia and Australia. In: Proceedings of the 2014 international conference on social computing, p 21

  • Thomson R, Murachver T (2001) Predicting gender from electronic discourse. Br J Soc Psychol 40(2):193–208. doi:10.1348/014466601164812.

    Article  Google Scholar 

  • ULAKBIM (2015) Scientific and Technological Research Council of Turkey. http://ulakbim.tubitak.gov.tr/en. Accessed 20 July 2015

  • Van Laere O, Schockaert S, Dhoedt B (2011) Finding locations of flickr resources using language models and similarity search. In: Proceedings of the 1st ACM international conference on multimedia retrieval, ICMR ’11, vol 8. ACM, New York, NY, USA, pp 48:1–48

  • Vosecky J, Hong D, Shen V (2009) User identification across multiple social networks. In: First international conference on networked digital technologies, NDT ’09, pp 360–365

  • Wagner C, Asur S, Hailpern J (2013) Religious politicians and creative photographers: automatic user categorization in twitter. In: Proceedings of the 2013 international conference on social computing. SOCIALCOM ’13IEEE Computer Society, Washington, DC, USA, pp 303–310

  • Wang D, Irani D, Pu C (2014) Spade: a social-spam analytics and detection framework. Soc Netw Anal Min 4(1):189

    Article  Google Scholar 

  • Wang Y, Nepali RK (2013) Privacy measurement for social network actor model. In: 2013 international conference on social computing (SocialCom), pp 659–664

  • Wash R, Rader E (2011) Influencing mental models of security: a research agenda. In: Proceedings of the 2011 workshop on new security paradigms workshop. NSPW ’11ACM, New York, NY, USA, pp 57–66

  • Winkler WE (1990) String comparator metrics and enhanced decision rules in the Fellegi–Sunter model of record linkage. In: Proceedings of the section on survey research methods. American Statistical Association, pp 354–359

  • Yang C, Zhang J, Gu G (2014) A taste of tweets: reverse engineering twitter spammers. In: Proceedings of the 30th annual computer security applications conference. ACSAC ’14ACM, New York, NY, USA, pp 86–95

  • Zangerle E, Specht G (2014) Sorry, I was hacked: a classification of compromised twitter accounts. In: Proceedings of the 29th annual ACM symposium on applied computing. SAC ’14ACM, New York, NY, USA, pp 587–593

  • Zavadski K (2015) ‘Terrorist’ troll pretended to be ISIS, white supremacist, and Jewish lawyer. www.thedailybeast.com/articles/2015/09/11/terrorist-troll-pretended-to-be-isis-white-supremacist-and-jewish-lawyer.html

  • Zheng J, Liu S, Ni L (2014) User characterization from geographic topic analysis in online social media. In: 2014 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 464–471

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tayfun Tuna.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tuna, T., Akbas, E., Aksoy, A. et al. User characterization for online social networks. Soc. Netw. Anal. Min. 6, 104 (2016). https://doi.org/10.1007/s13278-016-0412-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-016-0412-3

Keywords

Navigation