Inference of demographic attributes based on mobile phone usage patterns and social network topology

  • Carlos Sarraute
  • Jorge Brea
  • Javier Burroni
  • Pablo Blanc
Original Article


Mobile phone usage provides a wealth of information, which can be used to better understand the demographic structure of a population. In this paper, we focus on the population of Mexican mobile phone users. We first present an observational study of mobile phone usage according to gender and age groups. We are able to detect significant differences in phone usage among different subgroups of the population. We then study the performance of different machine learning (ML) methods to predict demographic features (namely, age and gender) of unlabeled users by leveraging individual calling patterns, as well as the structure of the communication graph. We show how a specific implementation of a diffusion model, harnessing the graph structure, has significantly better performance over other node-based standard ML methods. We provide details of the methodology together with an analysis of the robustness of our results to changes in the model parameters. Furthermore, by carefully examining the topological relations of the training nodes (seed nodes) to the rest of the nodes in the network, we find topological metrics which have a direct influence on the performance of the algorithm.


Social network analysis Mobile phone social network  Call detail records Graph mining Demographics Homophily 


  1. Adali S, Golbeck J (2014) Predicting personality with social behavior: a comparative study. Soc Netw Anal Min 4(1):1–20Google Scholar
  2. Barrat A, Arth B, Elemy M, Vespignani A (2008) Dynamical process on complex networks. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  3. Blumenstock J, Eagle N (2010) Mobile divides: gender, socioeconomic status, and mobile phone use in Rwanda. In: Proceedings of the 4th ACM/IEEE international conference on information and communication technologies and development. ACM, London, United Kingdom, p 6Google Scholar
  4. Blumenstock JE, Gillick D, Eagle N (2010) Who’s calling? Demographics of mobile phone use in Rwanda. Transportation 32:2–5Google Scholar
  5. Dong Y, Tang J, Lou T, Wu B, Chawla NV (2013) How long will she call me? Distribution, social theory and duration prediction. In: Blockeel H, Kersting K, Nijssen S, Železný F (eds) Machine learning and knowledge discovery in databases. ECML PKDD 2013, Part II, pp 16–31Google Scholar
  6. Dong Y, Yang Y, Tang J, Yang Y, Chawla NV (2014) Inferring user demographics and social strategies in mobile social networks. In: ACM-KDDGoogle Scholar
  7. Dyagilev K, Mannor S, Yom-Tov E (2013) On information propagation in mobile call networks. Soc Netw Anal Min 3(3):521–541CrossRefGoogle Scholar
  8. Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874MATHGoogle Scholar
  9. Feld SL (1982) Social structural determinants of similarity among associates. Am Sociol Rev 47:797–801CrossRefGoogle Scholar
  10. Fischer CS, Stueve C, Jones LM, Jackson RM, Gerson K, Baldassare M (1977) Networks and places: social relations in the urban setting. Free Press, New YorkGoogle Scholar
  11. Frias-Martinez V, Frias-Martinez E, Oliver N (2010) A gender-centric analysis of calling behavior in a developing economy using call detail records. Artificial Intelligence for Development. In: AAAI spring symposiumGoogle Scholar
  12. Gonzalez MC, Hidalgo CA, Barabasi A-L (2008) Understanding individual human mobility patterns. Nature 453(7196):779–782CrossRefGoogle Scholar
  13. Greene WH (2011) Econometric analysis, 7 edn. Prentice Hall, Upper Saddle River. ISBN 0131395386Google Scholar
  14. Gutierrez T, Gautier K, Blondel VD (2013) Evaluating socio-economic state of a country analyzing airtime credit and mobile phone datasets. arXiv preprint arXiv:1309.4496
  15. Hsieh C-J, Chang K-W, Lin C-J, Sathiya Keerthi S, Sundararajan S (2008) A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th international conference on Machine learning. Helsinki, Finland. ACM, pp 408–415Google Scholar
  16. Katz EG, Correia MC (2001) The economics of gender in Mexico: work, family, state, and market. Africa Region Human Developments. World Bank PublicationsGoogle Scholar
  17. Kivelä M, Arenas A, Barthelemy M, Gleeson JP, Moreno Y, Porter MA (2014) Multilayer networks. J Complex Netw 2(3):203–271CrossRefGoogle Scholar
  18. McKinney W (2010) Data structures for statistical computing in python. In: van der Walt S, Millman J (eds) Proceedings of the 9th python in science conference, pp 51–56Google Scholar
  19. McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444CrossRefGoogle Scholar
  20. Miritello G, Lara R, Moro E (2013) Time allocation in social networks: correlation between social structure and human communication dynamics. In: Temporal networks. Springer, Berlin Heidelberg, pp 175–190Google Scholar
  21. Naboulsi D, Fiore M, Ribot S, Stanica R (2015) Mobile traffic analysis: a survey. PhD Thesis, Université de Lyon, INRIA, Grenoble-Rhône-Alpes; INSA, Lyon; CNR-IEIITGoogle Scholar
  22. Nicolis G, Prigogine I (1977) Self-organization in non equilibrium systems. Wiley, New YorkGoogle Scholar
  23. Onnela JP, Saramaki J, Hyvonen J, Szabo G, Lazer D, Kaski K, Kertesz J, Barabasi AL (2007) Structure and tie strengths in mobile communication networks. Proc Natl Acad Sci USA 104(18):7332–7336CrossRefGoogle Scholar
  24. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MATHMathSciNetGoogle Scholar
  25. Perc M (2014) The Matthew effect in empirical data. J R Soc Interface 11(98):20140378CrossRefGoogle Scholar
  26. Ponieman N, Salles A, Sarraute C (2013) Human mobility and predictability enriched by social phenomena information. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. Niagara, ON, Canada. ACM, pp 1331–1336Google Scholar
  27. Rosenbaum P, Rubin D (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55MATHMathSciNetCrossRefGoogle Scholar
  28. Saad Y (2003) Iterative methods for sparse linear systems. SIAM, PhiladelphiaMATHCrossRefGoogle Scholar
  29. Sarraute C, Blanc P, Burroni J (2014) A study of age and gender seen through mobile phone usage patterns in Mexico. In: 2014 IEEE/ACM international conference on advances in social networks analysis and mining. Beijing, China. IEEE, pp 836–843Google Scholar
  30. Seabold JS, Perktold J (2010) Statsmodels: econometric and statistical modeling with python. In: Proceedings of the 9th python in science conferenceGoogle Scholar
  31. Seshadri M, Machiraju S, Sridharan A, Bolot J, Leskovec J, Faloutsos C (2008) Mobile call graphs: beyond power-law and lognormal distributions. In: ACM-KDD, pp 596–604Google Scholar
  32. Shadish W, Cook TD, Campbell DT (2002) Experimental and quasi-experimental designs for generalized causal inference. Wadsworth Cengage Learning, BelmontGoogle Scholar
  33. Singh VK, Freeman L, Lepri B, Pentland AS (2013) Predicting spending behavior using socio-mobile features. In: 2013 International conference on Social computing (SocialCom). Washington, DC, USA. IEEE, pp 174–179Google Scholar
  34. Tukey JW (1949) Comparing individual means in the analysis of variance. Biometrics 5:99–114MathSciNetCrossRefGoogle Scholar
  35. Ugander J, Karrer B, Backstrom L, Marlow C (2011) The anatomy of the Facebook social graph. Structure 5:6Google Scholar
  36. Wang P, González MC, Hidalgo CA, Barabási A-L (2009) Understanding the spreading patterns of mobile phone viruses. Science 324(5930):1071–1076CrossRefGoogle Scholar
  37. Xavier FH, Malab CH, Silveira L, Ziviani A, Almeida J, Marques-Neto H (2013) Understanding human mobility due to large-scale events. In: Third international conference on the analysis of mobile phone datasets (NetMob)Google Scholar
  38. Xu Y, Dyer JS, Owen AB (2010) Empirical stationary correlations for semi-supervised learning on graphs. Ann Appl Stat 4(2):589–614. doi: 10.1214/09-AOAS293
  39. Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2004a) Learning with local and global consistency. In: Advances in neural information processing Systems, vol 16. MIT Press, Cambridge, pp 321–328Google Scholar
  40. Zhou D, Schölkopf B, Hofmann T (2004b) Semi-supervised learning on directed graphs. Adv Neural Inf Process Syst 17:1633–1640Google Scholar

Copyright information

© Springer-Verlag Wien 2015

Authors and Affiliations

  • Carlos Sarraute
    • 1
  • Jorge Brea
    • 1
  • Javier Burroni
    • 1
  • Pablo Blanc
    • 2
  1. 1.Grandata LabsBuenos AiresArgentina
  2. 2.IMAS, UBA-CONICET, FCENCiudad UniversitariaBuenos AiresArgentina

Personalised recommendations