Say It with Colors: Language-Independent Gender Classification on Twitter

  • Jalal S. Alowibdi
  • Ugo A. Buy
  • Philip S. Yu
Part of the Lecture Notes in Social Networks book series (LNSN)


Online Social Networks (OSNs) have spread at stunning speed over the past decade. They are now a part of the lives of dozens of millions of people. The onset of OSNs has stretched the traditional notion of community to include groups of people who have never met in person but communicate with each other through OSNs to share knowledge, opinions, interests and activities. Here we explore in depth language independent gender classification. Our approach predicts gender using five color-based features extracted from Twitter profiles such as the background color in a user’s profile page. This is in contrast with most existing methods for gender prediction that are language dependent. Those methods use high-dimensional spaces consisting of unique words extracted from such text fields as postings, user names, and profile descriptions. Our approach is independent of the user’s language, efficient, scalable, and computationally tractable, while attaining a good level of accuracy.


Color-based feature Gender classification Twitter profile Color quantization Color feature Low dimensional space Social network analysis Online social network 


  1. 1.
    Mocanu D, Baronchelli A, Perra N, Gonçalves B, Zhang Q, Vespignani A (2013) The Twitter of Babel: mapping world languages through microblogging platforms. PLoS One 8(4):e61981CrossRefGoogle Scholar
  2. 2.
    Wauters R, Only 50% of Twitter messages are in English, study says.
  3. 3.
    Burger JD, Henderson J, Kim G, Zarrella G (2011) Discriminating gender on Twitter. In: Proceedings of the 2011 conference on empirical methods in natural language processing. Edinburgh, Scotland, UK. Association for Computational Linguistics, July 2011, pp 1301–1309. [Online]
  4. 4.
    Al Zamal F, Liu W, Ruths D (2012) Homophily and latent attribute inference: Inferring latent attributes of Twitter users from neighbors. In: 6th international AAAI conference on weblogs and social media (ICWSM’12), 2012Google Scholar
  5. 5.
    Liu W, Al Zamal F, Ruths D (2012) Using social media to infer gender composition of commuter populations. In: Proceedings of the when the city meets the citizen workshop, the international conference on weblogs and social mediaGoogle Scholar
  6. 6.
    Rao D, Yarowsky D, Shreevats A, Gupta M (2010) Classifying latent user attributes in Twitter. In: Proceedings of the 2nd international workshop on search and mining user-generated contents, pp 37–44Google Scholar
  7. 7.
    Liu W, Ruths D (2013) What’s in a name? Using first names as features for gender inference in Twitter. In: 2013 AAAI spring symposium series, in symposium on analyzing microtextGoogle Scholar
  8. 8.
    Alowibdi J, Buy U, Yu P (2013) Empirical evaluation of profile characteristics gender classification on Twitter. In: The 12th international conference on machine learning and applications (ICMLA), vol 1, pp 365–369, December 2013Google Scholar
  9. 9.
    Alowibdi J, Buy U, Yu P (2013) Language independent gender classification on Twitter. In: IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM’13, pp 739–743, August 2013Google Scholar
  10. 10.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18CrossRefGoogle Scholar
  11. 11.
    Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T, Meinl T, Ohl P, Thiel K, Wiswedel B (2009) Knime-the konstanz information miner: version 2.0 and beyond. ACM SIGKDD Explor Newsl 11(1):26–31CrossRefGoogle Scholar
  12. 12.
    Singh S (2001) A pilot study on gender differences in conversational speech on lexical richness measures. Lit Linguist Comput 16(3):251–264CrossRefGoogle Scholar
  13. 13.
    Argamon S, Koppel M, Fine J, Shimoni AR (2003) Gender, genre, and writing style in formal written texts. Text 23(3):321–346Google Scholar
  14. 14.
    Koppel M, Argamon S, Shimoni AR (2002) Automatically categorizing written texts by author gender. Lit Linguist Comput 17(4):401–412CrossRefGoogle Scholar
  15. 15.
    Sarawgi R, Gajulapalli K, Choi Y (2011) Gender attribution: tracing stylometric evidence beyond topic and genre. In: Proceedings of the fifteenth conference on computational natural language learning, Portland, OR, pp 78–86, June 2011Google Scholar
  16. 16.
    Nowson S, Oberlander J, Gill A (2005) Weblogs, genres and individual differences. In: Proceedings of the 27th annual meeting of the cognitive science society, Stresa, Italy, pp 1666–1671Google Scholar
  17. 17.
    Kucukyilmaz T, Cambazoglu BB, Aykanat C, Can F (2006) Chat mining for gender prediction. Advances in information systems. Springer, Berlin, pp 274–283CrossRefGoogle Scholar
  18. 18.
    Mukherjee A, Liu B (2010) Improving gender classification of blog authors. In: Proceedings of the 2010 conference on empirical methods in natural language, processing. Association for Computational Linguistics, Cambridge, MA, pp 207–217, October 2010. [online].
  19. 19.
    Peersman C, Daelemans W, Van Vaerenbergh L (2011) Predicting age and gender in online social networks. In: Proceedings of the 3rd international workshop on search and mining user-generated contents, pp 37–44Google Scholar
  20. 20.
    Herring SC, Paolillo JC (2006) Gender and genre variation in weblogs. J Socioling 10(4):439–459CrossRefGoogle Scholar
  21. 21.
  22. 22.
    Business T, Who is on Twitter?

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jalal S. Alowibdi
    • 1
    • 2
  • Ugo A. Buy
    • 1
  • Philip S. Yu
    • 1
  1. 1.Department of Computer ScienceUniversity of Illinois at ChicagoChicagoUSA
  2. 2.Faculty of Computing and Information TechnologyKing Abdulaziz UniversityJeddahSaudi Arabia

Personalised recommendations