A review of features for the discrimination of twitter users: application to the prediction of offline influence

  • Jean-Valère CossuEmail author
  • Vincent Labatut
  • Nicolas Dugué
Original Article
Part of the following topical collections:
  1. Diffusion of Information and Influence in Social Networks


Many works related to Twitter aim at characterizing its users in some way: role on the service (spammers, bots, organizations, etc.), nature of the user (socio-professional category, age, etc.), topics of interest, and others. However, for a given user classification problem, it is very difficult to select a set of appropriate features, because the many features described in the literature are very heterogeneous, with name overlaps and collisions, and numerous very close variants. In this article, we review a wide range of such features. In order to present a clear state-of-the-art description, we unify their names, definitions and relationships, and we propose a new, neutral, typology. We then illustrate the interest of our review by applying a selection of these features to the offline influence detection problem. This task consists in identifying users who are influential in real life, based on their Twitter account and related data. We show that most features deemed efficient to predict online influence, such as the numbers of retweets and followers, are not relevant to this problem. However, we propose several content-based approaches to label Twitter users as influencers or not. We also rank them according to a predicted influence level. Our proposals are evaluated over the CLEF RepLab 2014 dataset, and outmatch state-of-the-art methods.


Twitter Influence Natural language processing  Social network analysis 



This work is a revised and extended version of the article Detecting Real-World Influence Through Twitter, presented at the 2nd European Network Intelligence Conference (ENIC 2015) by the same authors (Cossu et al. 2015). It was partly funded by the French National Research Agency (ANR), through the project ImagiWeb ANR-2012-CORD-002-01.


  1. Al Zamal, F, Liu W, Ruths D (2012) Homophily and latent attribute inference: inferring latent attributes of Twitter users from neighbors. In: ICWSMGoogle Scholar
  2. Aleahmad A, Karisani P, Rahgozar M, Oroumchian F (2014) University of Tehran at replab 2014. In: 4th international conference of the CLEF initiativeGoogle Scholar
  3. Amigó E, Carrillo-de Albornoz J, Chugur I, Corujo A, Gonzalo J, Meij E, de Rijke M, Spina D (2014) Overview of replab 2014: author profiling and reputation dimensions for online reputation management. In: Information access evaluation. Multilinguality, multimodality, and interaction, pp. 307–322Google Scholar
  4. Anger I, Kittl C (2011) Measuring influence on Twitter. i-KNOW, pp. 1–4Google Scholar
  5. Armentano MG, Godoy DL, Amandi AA (2011) A topology-based approach for followees recommendation in Twitter. In: Workshop chairs, p. 22Google Scholar
  6. Bakshy E, Hofman JM, Mason WA, Watts DJ (2011) Everyone’s an influencer: quantifying influence on Twitter. In: WSDM, pp. 65–74Google Scholar
  7. Bavelas A (1950) Communication patterns in task-oriented groups. J Acoust Soc Am 22(6):725–730CrossRefGoogle Scholar
  8. Benevenuto F, Magno F, Rodrigues T, Almeida V (2010) Detecting spammers on Twitter. In: CEASGoogle Scholar
  9. Bonacich PF (1987) Power and centrality: a family of measures. Am J Soc 92:1170–1182CrossRefGoogle Scholar
  10. Bond RM, Fariss CJ, Jones JJ, Kramer ADI, Marlow C, Settle JE, Fowler JH (2012) A 61-million-person experiment in social influence and political mobilization. Nature 489(7415):295–298CrossRefGoogle Scholar
  11. Boyd D, Golder S, Lotan G (2010) Tweet, tweet, retweet: conversational aspects of retweeting on twitter. In: HICSS, pp. 1–10Google Scholar
  12. Buckley C, Voorhees EM (2000) Evaluating evaluation measure stability. In: ACM SIGIR, pp. 33–40Google Scholar
  13. Cha M, Haddadi H, Benevenuto F, Gummadi K (2010) Measuring user influence in Twitter: the million follower fallacy. In: ICWSMGoogle Scholar
  14. Cheng Z, Caverlee J, Lee K (2010) You are where you tweet: a content-based approach to geo-locating Twitter users. In: CIKM, pp. 759–768Google Scholar
  15. Chu Z, Gianvecchio S, Wang H, Jajodia S (2012) Detecting automation of Twitter accounts: are you a human, bot, or cyborg? IEEE Trans Dependable Secure Comput 9(6):811–824CrossRefGoogle Scholar
  16. Conover MD, Goncalves B, Ratkiewicz J, Flammini A, Menczer F (2011) Predicting the political alignment of Twitter users. In: IEEE SocialCom, pp. 192–199Google Scholar
  17. Cossu JV, Dugué N, Labatut V (2015) Detecting real-world influence through Twitter. In: ENIC, pp. 83–90Google Scholar
  18. Cossu JV, Janod K, Ferreira E, Gaillard J, El-Bèze M (2014) Lia@replab 2014: 10 methods for 3 tasks. In: 4th international conference of the CLEF initiativeGoogle Scholar
  19. Cossu JV, Janod K, Ferreira E, Gaillard J, El-Bèze M (2015) Nlp-based classifiers to generalize experts assessments in e-reputation. In: Experimental IR meets multilinguality, multimodality, and interactionGoogle Scholar
  20. da Fontoura Costa L, Rodrigues FA, Travieso G, Villas Boas PR (2007) Characterization of complex networks: a survey of measurements. Adv Phys 56(1):167–242Google Scholar
  21. Danisch M, Dugué N, Perez A (2014) On the importance of considering social capitalism when measuring influence on Twitter. In: Behavioral, economic, and socio-cultural computing (2014)Google Scholar
  22. de-Choudhury M, Diakopoulos N, Naaman M (2012) Unfolding the event landscape on Twitter: classification and exploration of user categories. In: ACM CSCW, pp. 241–244Google Scholar
  23. de Silva L, Riloff E (2014) User type classification of tweets with implications for event recognition. In: Joint workshop on social dynamics and personal attributes in social media, pp. 98–108Google Scholar
  24. Dugué N, Labatut V, Perez A (2014) Identifying the community roles of social capitalists in the twitter network. In: IEEE/ACM ASONAM, Beijing, pp. 371–374Google Scholar
  25. Dugué N, Perez A (2014) Social capitalists on Twitter: detection, evolution and behavioral analysis. Social Network Analysis and Mining, Springer, 4(1):1–15Google Scholar
  26. Dugué N, Perez A, Danisch M, Bridoux F, Daviau A, Kolubako T, Munier S, Durbano H (2015) A reliable and evolutive web application to detect social capitalists. In: IEEE/ACM ASONAM exhibits and demosGoogle Scholar
  27. Estrada E, Rodriguez-Velazquez JA (2005) Subgraph centrality in complex networks. Phys Rev E 71(5):056103Google Scholar
  28. Fornell C (1992) A national customer satisfaction barometer: the Swedish experience. J Mark. pp. 6–21Google Scholar
  29. Freeman LC, Roeder D, Mulholland RR (1979) Centrality in social networks: II. Experimental results. Soc Netw 2(2):119–141CrossRefGoogle Scholar
  30. Garcia R, Amatriain X (2010) Weighted content based methods for recommending connections in online social networks. In: Workshop on recommender systems and the social web, Citeseer, pp. 68–71Google Scholar
  31. Gaussier E, Yvon F (2013) Opinion detection as a topic classification problem. In: Textual information access: statistical models, chap. 9, Wiley, New York, pp. 245–256Google Scholar
  32. Gayo-Avello D (2012) A balanced survey on election prediction using twitter data. ArxivGoogle Scholar
  33. Ghosh S, Viswanath B, Kooti F, Sharma N, Korlam G, Benevenuto F, Ganguly N, Gummadi K (2012) Understanding and combating link farming in the Twitter social network. In: WWW, pp. 61–70Google Scholar
  34. Golder SA, Yardi S, Marwick A, Boyd D (2009) A structural approach to contact recommendations in online social networks. In: Workshop on search in social media, SSMGoogle Scholar
  35. Greenfield R (2014) The latest Twitter hack: talking to yourself. Accessed 5 Feb 2014
  36. Guimerà R, Amaral LN (2005) Cartography of complex networks: modules and universal roles. J Stat Mech 02:02001Google Scholar
  37. Harary F (1969) Graph theory. Addison-Wesley, BostonGoogle Scholar
  38. Henseler J (2010) On the convergence of the partial least squares path modeling algorithm. Comput Stat 25(1):107–120MathSciNetCrossRefzbMATHGoogle Scholar
  39. Huang W, Weber I, Vieweg S (2014) Inferring nationalities of Twitter users and studying inter-national linking. In: ACM HypertextGoogle Scholar
  40. Java A, Song X, Finin T, Tseng B (2007) Why we twitter: understanding microblogging usage and communities. In: WebKDD/SNA-KDD, pp. 56–65Google Scholar
  41. Kim YM, Velcin J, Bonnevay S, Rizoiu MA (2015) Temporal multinomial mixture for instance-oriented evolutionary clustering. In: Advances in information retrievalGoogle Scholar
  42. Kred (2015) Kred story. Accessed 12 Feb 2014
  43. Kywe SM, Lim EP, Zhu F (2012) A survey of recommender systems in twitter. In: Social informatics, pp. 420–433. SpringerGoogle Scholar
  44. Laasby G (2014) Blocking fake Twitter followers and spam accounts just got easier. Accessed Apr 2014
  45. Lancichinetti A, Kivelä M, Saramäki J, Fortunato S (2010) Characterizing the community structure of complex networks. PLoS One 5(8):e11976Google Scholar
  46. Landherr A, Friedl B, Heidemann J (2010) A critical review of centrality measures in social networks. Bus Inf Syst Eng 2(6):371–385CrossRefGoogle Scholar
  47. Lee K, Caverlee J, Webb S (2010) Uncovering social spammers: social honeypots + machine learning. In: ACM SIGIR, pp. 435–442Google Scholar
  48. Lee K, Eoff BD, Caverlee J (2011) Seven months with the devils: a long-term study of content polluters on Twitter. In: ICWSMGoogle Scholar
  49. Lee K, Mahmud J, Chen J, Zhou M, Nichols J (2014) Who will retweet this? automatically identifying and engaging strangers on twitter to spread information. In: ACM IUI, pp. 247–256Google Scholar
  50. Lee K, Tamilarasan P, Caverlee J (2013) Crowdturfers, campaigns, and social media: tracking and revealing crowdsourced manipulation of social media. In: ICWSMGoogle Scholar
  51. Mahmud J, Nichols J, Drews C (2012) Where is this tweet from? inferring home locations of Twitter users. In: ICWSMGoogle Scholar
  52. Makazhanov A, Rafiei D (2013) Predicting political preference of Twitter users. In: IEEE/ACM ASONAM, pp. 298–305Google Scholar
  53. Mena Lomeña JJ, López Ostenero F (2014) Uned at clef replab 2014: author profiling. In: 4th international conference of the CLEF initiativeGoogle Scholar
  54. Messias J, Schmidt L, Oliveira R, Benevenuto F (2013) You followed my bot! transforming robots into influential users in Twitter. First Monday 18(7)Google Scholar
  55. Naaman M, Boase J, Lai CH (2010) Is it really about me?: message content in social awareness streams. In: ACM CSCW, pp. 189–192Google Scholar
  56. Orman GK, Labatut V, Cherifi H (2012) Comparative evaluation of community detection algorithms: a topological approach. J Stat Mech 8:08001Google Scholar
  57. Pennacchiotti M, Popescu AM (2011) A machine learning approach to Twitter user classification. In: ICWSM, pp. 281–288Google Scholar
  58. Pramanik S, Danisch M, Wang Q, Mitra B (2015) An empirical approach towards an efficient “whom to mention?” Twitter app. Twitter for research, 1st international interdisciplinary conferenceGoogle Scholar
  59. Ramage D, Dumais S, Liebling D (2010) Characterizing microblogs with topic models. In: ICWSM, pp. 130–137Google Scholar
  60. Rangel F, Celli F, Rosso P, Potthast M, Stein B, Daelemans W (2013) Overview of the 3rd author profiling task at PAN 2015. In: Experimental IR meets multilinguality, multimodality, and interactionGoogle Scholar
  61. Rangel F, Rosso P, Chugur I, Potthast M, Trenkmann M, Stein B, Verhoeven B, Daelemans W (2014) Overview of the 2nd author profiling task at pan 2014. In: CLEF evaluation labs and workshopGoogle Scholar
  62. Rao A, Spasojevic N, Li Z, DSouza T (2015) Klout score: measuring influence across multiple social networks. ArvixGoogle Scholar
  63. Rao D, Yarowsky D, Shreevats A, Gupta M (2010) Classifying latent user attributes in Twitter. In: CIKM SMUC workshop, pp. 37–44Google Scholar
  64. Ramírez-de-la Rosa G, Villatoro-Tello E, Jiménez-Salazar H, Sánchez-Sánchez C (2014) Towards automatic detection of user influence in Twitter by means of stylistic and behavioral features. In: Human-inspired computing and its applications, Springer, pp. 245–256Google Scholar
  65. Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Nat Acad Sci 105(4):1118CrossRefGoogle Scholar
  66. Jones Sparck (1972) K.: a statistical interpretation of term specificity and its application in retrieval. J Doc 28(1):11–21CrossRefGoogle Scholar
  67. Sriram B, Fuhry D, Demir E, Ferhatosmanoglu H, Demirbas M (2010) Short text classification in Twitter to improve information filtering. In: ACM SIGIR, pp. 841–842Google Scholar
  68. Suh B, Hong L, Pirolli P, Chi EH (2010) Want to be retweeted? large scale analytics on factors impacting retweet in Twitter network. In: Social computing, pp. 177–184Google Scholar
  69. Tenenhaus M, Amato S, Esposito Vinzi V (2004) A global goodness-of-fit index for PLS structural equation modelling. In: XLII SIS scientific meeting, vol 1, pp. 739–742Google Scholar
  70. Tommasel A, Godoy D (2015) A novel metric for assessing user influence based on user behaviour. In: Soc Inf, pp. 15–21Google Scholar
  71. Torres-Moreno JM (2012) Artex is another text summarizer. arXiv preprint. arXiv:1210.3312
  72. Uddin MM, Imran M, Sajjad H (2014) Understanding types of users on Twitter. arXiv cs.SI, 1406.1335Google Scholar
  73. Vilares D, Hermo M, Alonso MA, Gómez-Rodrıguez C, Vilares J (2014) Lys at clef replab 2014: creating the state of the art in author influence ranking and reputation classification on Twitter. In: 4th international conference of the CLEF initiative, pp. 1468–1478Google Scholar
  74. Villatoro-Tello E, Ramirez-de-la Rosa G, Sanchez-Sanchez C, Jiménez-Salazar H, Luna-Ramirez WA, Rodriguez-Lucatero C (2014) Uamclyr at replab 2014: author profiling task. In: 4th international conference of the CLEF initiativeGoogle Scholar
  75. Wang AH (2010) Don’t follow me: spam detection in Twitter. In: International conference on security and cryptography, pp. 1–10Google Scholar
  76. Watts DJ, Strogatz SH (1998) Collective dynamics of ’small-world’ networks. Nature 393(6684):440–442CrossRefGoogle Scholar
  77. Weng J, Lim EP, Jiang J, He Q (2010) TwitterRank: finding topic-sensitive influential twitterers. In: WSDM, pp. 261–270Google Scholar
  78. Weren ERD, Kauer AU, Mizusaki L, Moreira VP, de Oliveira JPM, Wives LK (2014) Examining multiple features for author profiling. J Inf Data Manag 5(3):266Google Scholar
  79. Wold H (1982) Soft modeling: the basic design and some extensions. In: Systems under indirect observations: causality, structure, prediction, pp. 36–37Google Scholar

Copyright information

© Springer-Verlag Wien 2016

Authors and Affiliations

  • Jean-Valère Cossu
    • 1
    Email author
  • Vincent Labatut
    • 1
  • Nicolas Dugué
    • 2
  1. 1.Université d’AvignonAvignonFrance
  2. 2.Université d’OrléansINSA Centre Val de LoireBloisFrance

Personalised recommendations