Multi-task Learning for Gender and Age Prediction on Chinese Microblog

  • Liang Wang
  • Qi Li
  • Xuan Chen
  • Sujian LiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10102)


The demographic attributes gender and age play an important role for social media applications. Previous studies on gender and age prediction mostly explore efficient features which are labor intensive. In this paper, we propose to use the multi-task convolutional neural network (MTCNN) model for predicting gender and age simultaneously on Chinese microblog. With MTCNN, we can effectively reduce the burden of feature engineering and explore common and unique representations for both tasks. Experimental results show that our method can significantly outperform the state-of-the-art baselines.


Multi-task learning Social media Neural network 



We thank all the anonymous reviewers for their insightful comments on this paper. This work was partially supported by National Natural Science Foundation of China (61273278 and 61572049).


  1. 1.
    Alowibdi, J.S., Buy, U.A., Yu, P.: Language independent gender classification on Twitter. In: 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 739–743. IEEE (2013)Google Scholar
  2. 2.
    Bamman, D., Eisenstein, J., Schnoebelen, T.: Gender identity and lexical variation in social media. J. Sociolinguist. 18(2), 135–160 (2014)CrossRefGoogle Scholar
  3. 3.
    Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for scientific computing conference (SciPy), Austin, TX, vol. 4, p. 3 (2010)Google Scholar
  4. 4.
    Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on Twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1301–1309. Association for Computational Linguistics (2011)Google Scholar
  5. 5.
    Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Ciot, M., Sonderegger, M., Ruths, D.: Gender inference of Twitter users in Non-English contexts. In: EMNLP, pp. 1136–1145 (2013)Google Scholar
  7. 7.
    Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)Google Scholar
  8. 8.
    Culotta, A., Kumar, N.R., Cutler, J.: Predicting the demographics of Twitter users from website traffic data. In: AAAI, pp. 72–78 (2015)Google Scholar
  9. 9.
    Jaech, A., Ostendorf, M.: What your username says about you (2015). arXiv preprint: arXiv:1507.02045
  10. 10.
    Kim, Y.: Convolutional neural networks for sentence classification (2014). arXiv preprint: arXiv:1408.5882
  11. 11.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv preprint: arXiv:1412.6980
  12. 12.
    Koppel, M., Argamon, S., Shimoni, A.R.: Automatically categorizing written texts by author gender. Lit. Linguist. Comput. 17(4), 401–412 (2002)CrossRefGoogle Scholar
  13. 13.
    Mislove, A., Lehmann, S., Ahn, Y.Y., Onnela, J.P., Rosenquist, J.N.: Understanding the demographics of Twitter users. In: 5th ICWSM 2011 (2011)Google Scholar
  14. 14.
    Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 207–217. Association for Computational Linguistics (2010)Google Scholar
  15. 15.
    Nguyen, D., Smith, N.A., Rosé, C.P.: Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 115–123. Association for Computational Linguistics (2011)Google Scholar
  16. 16.
    Nguyen, D.P., Trieschnigg, R., Doğruöz, A., Gravel, R., Theune, M., Meder, T., de Jong, F.: Why gender and age prediction from tweets is hard: lessons from a crowdsourcing experiment. Association for Computational Linguistics (2014)Google Scholar
  17. 17.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Pennacchiotti, M., Popescu, A.M.: A machine learning approach to Twitter user classification. ICWSM 11(1), 281–288 (2011)Google Scholar
  19. 19.
    Pennebaker, J.W., Stone, L.D.: Words of wisdom: language use over the life span. J. Pers. Soc. Psychol. 85(2), 291 (2003)CrossRefGoogle Scholar
  20. 20.
    Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in Twitter. In: Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents, pp. 37–44. ACM (2010)Google Scholar
  21. 21.
    Sarawgi, R., Gajulapalli, K., Choi, Y.: Gender attribution: tracing stylometric evidence beyond topic and genre. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pp. 78–86. Association for Computational Linguistics (2011)Google Scholar
  22. 22.
    Zhang, X., LeCun, Y.: Text understanding from scratch (2015). arXiv preprint: arXiv:1502.01710

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Key Laboratory of Computational LinguisticsPeking University, MOEBeijingChina
  2. 2.School of InformationShandong University of Political Science and LawJinanChina
  3. 3.Collaborative Innovation Center for Language AbilityXuzhouChina

Personalised recommendations