Personal and Ubiquitous Computing

, Volume 18, Issue 1, pp 223–238 | Cite as

A probabilistic approach to mining mobile phone data sequences

  • Katayoun Farrahi
  • Daniel Gatica-Perez
Original Article


We present a new approach to address the problem of large sequence mining from big data. The particular problem of interest is the effective mining of long sequences from large-scale location data to be practical for Reality Mining applications, which suffer from large amounts of noise and lack of ground truth. To address this complex data, we propose an unsupervised probabilistic topic model called the distant n-gram topic model (DNTM). The DNTM is based on latent Dirichlet allocation (LDA), which is extended to integrate sequential information. We define the generative process for the model, derive the inference procedure, and evaluate our model on both synthetic data and real mobile phone data. We consider two different mobile phone datasets containing natural human mobility patterns obtained by location sensing, the first considering GPS/wi-fi locations and the second considering cell tower connections. The DNTM discovers meaningful topics on the synthetic data as well as the two mobile phone datasets. Finally, the DNTM is compared to LDA by considering log-likelihood performance on unseen data, showing the predictive power of the model. The results show that the DNTM consistently outperforms LDA as the sequence length increases.


Mobile Phone Topic Model Latent Dirichlet Allocation Unseen Data Mobile Phone Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research was funded by the SNSF HAI project and the LS-CONTEXT project funded by Nokia Research. K. Farrahi also acknowledges the Socionical project and the Pervasive Computing Group at JKU, Linz. We thank Olivier Bornet (Idiap) for help with location data processing and visualization, Gian Paolo Perrucci (Nokia Research Lausanne) for insights on routine visualization, and Trinh-Minh-Tri Do (Idiap) for discussions on sequence modeling methods.


  1. 1.
    Ashbrook D, Starner T (2003) Using GPS to learn significant locations and predict movement across multiple users. Personal Ubiquitous Comput 7(5):275–286CrossRefGoogle Scholar
  2. 2.
    Bao T, Cao H, Chen E, Tian J, Xiong H (2010) An unsupervised approach to modeling personalized contexts of mobile users. In: IEEE International Conference on Data Mining (ICDM), pp 38–47Google Scholar
  3. 3.
    Becker RA, Cáceres R, Hanson K, Loh JM, Urbanek S, Varshavsky A, Volinsky C (2011) Route classification using cellular handoff patterns. In: International Conference on Ubiquitous Computing (UbiComp), pp 123–132Google Scholar
  4. 4.
    Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  5. 5.
    Candia J, Gonzalez MC, Wang P, Schoenharl T, Madey G, Barabasi AL (2008) Uncovering individual and collective human dynamics from mobile phone records. J Phys A Math Theor 41(22):224015–224025Google Scholar
  6. 6.
    Do T, Gatica-Perez D (2011) Groupus: smartphone proximity data and human interaction type mining. In: Proceedings of IEEE international symposium on wearable computers (ISWC). San Francisco, USAGoogle Scholar
  7. 7.
    Eagle N, Pentland A (2009) Eigenbehaviors: identifying structure in routine. Behav Ecol Sociobiol 63(7):1057–1066CrossRefGoogle Scholar
  8. 8.
    Farrahi K (2011) A probabilistic approach to socio-geographic reality mining. Ph.D. thesis, Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland. doi: 10.5075/epfl-thesis-5018. URL
  9. 9.
    Farrahi K, Gatica-Perez D (2010) Mining human location-routines using a multi-level topic model. In: Socialcom symposium on social intelligence and networking (Socialcom SIN). Minneapolis, USAGoogle Scholar
  10. 10.
    Farrahi K, Gatica-Perez D (2010) Probabilistic mining of socio-geographic routines from mobile phone data. IEEE J Sel Top Signal Process (J-STSP) 4(4):746–755CrossRefGoogle Scholar
  11. 11.
    Farrahi K, Gatica-Perez D (2012) Extracting mobile behavioral patterns with the distant n-gram topic model. In: International symposium on wearable computers (ISWC), pp 1–8Google Scholar
  12. 12.
    Gonzalez MC, Hidalgo CA, Barabasi AL (2008) Understanding individual human mobility patterns. Nature 453(7196):779–782CrossRefGoogle Scholar
  13. 13.
    Görnerup O (2012) Scalable mining of common routes in mobile communication network traffic data. Pervasive. Newcastle upon Tyne, pp 99–106Google Scholar
  14. 14.
    Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(Suppl. 1):5228–5235Google Scholar
  15. 15.
    Hightower J, Consolvo S, Lamarca A, Smith I, Hughes J (2005) Learning and recognizing the places we go. In: International Conference on Ubiquitous Computing (UbiComp), pp 159–176Google Scholar
  16. 16.
    Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the conference on uncertainty in artificial intelligence (UAI). Stockholm, Sweden, pp 289–296Google Scholar
  17. 17.
    Huynh T, Fritz M, Schiele B (2008) Discovery of activity patterns using topic models. In: International Conference on Ubiquitous Computing (UbiComp), pp 10–19Google Scholar
  18. 18.
    Kang JH, Welbourne W, Stewart B, Borriello G (2005) Extracting places from traces of locations. ACM SIGMOBILE Mob Comput Commun Rev 9(3):58–68CrossRefGoogle Scholar
  19. 19.
    Kiukkonen N, Blom J, Dousse O, Gatica-Perez D, Laurila J (2010) Towards rich mobile phone datasets: Lausanne data collection campaign. In: Proceedings of ACM international conference on pervasive services (ICPS). Berlin, GermanyGoogle Scholar
  20. 20.
    Mackay DJC (2003) Information theory, inference, and learning algorithms. Cambridge University Press, CambridgezbMATHGoogle Scholar
  21. 21.
    Marmasse N, Schmandt C (2000) Location-aware information delivery with commotion. Springer, Berlin, pp 157–171Google Scholar
  22. 22.
    Montoliu R, Gatica-Perez D (2010) Discovering human places of interest from multimodal mobile phone data. In: Proceedings of ACM international conference on mobile and ubiquitous multimedia (MUM). Cypress, LimassolGoogle Scholar
  23. 23.
    Patterson D, Liao L, Fox D, Kautz H (2003) Inferring high-level behavior from low-level sensors. In: International Conference on Ubiquitous Computing (UbiComp), pp 73–89Google Scholar
  24. 24.
    Petterson J, Smola AJ, Caetano TS, Buntine WL, Narayanamurthy S (2010) Word features for latent dirichlet allocation. Adv Neural Inf Process Syst (NIPS) 23:1921–1929Google Scholar
  25. 25.
    Phithakkitnukoon S, Horanont T, Lorenzo GD, Shibasaki R, Ratti C (2010) Activity-aware map: Identifying human daily activity pattern using mobile phone data. In: Proceedings of the first international conference on human behavior understanding. Springer, Berlin, pp 14–25Google Scholar
  26. 26.
    Varadarajan J, Emonet R, Odobez JM (2012) Sparsity in topic models. In: Rish I, Lozano A, Cecchi G, Niculescu-Mizil A (eds) Practical applications of sparse modeling: biology, signal processing and beyond. MIT Press, Cambridge Google Scholar
  27. 27.
    Wallach H (2006) Topic modeling: beyond bag-of-words. In: Proceedings of the international conference on machine learning (ICML). Pittsburgh, USAGoogle Scholar
  28. 28.
    Wang X, McCallum A, Wei X (2007) Topical n-grams: phrase and topic discovery, with an application to information retrieval. In: IEEE international conference on data mining (ICDM).Washington, USA, pp 697–702Google Scholar
  29. 29.
    Yavas G, Katsaros D, Ulusoy O, Manolopoulos Y (2005) A data mining approach for location prediction in mobile environments. Data Knowl Eng 54(2):121–146CrossRefGoogle Scholar
  30. 30.
    Zheng J, Ni LM (2012) An unsupervised framework for sensing individual and cluster behavior patterns from human mobile data. In: International Conference on Ubiquitous Computing (UbiComp)Google Scholar
  31. 31.
    Zheng Y, Zhang L, Xie X, Ma WY (2009) Mining interesting locations and travel sequences from GPS trajectories. In: Proceedings of the 18th international conference on world wide web. ACM, New York, pp 791–800Google Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  1. 1.JKU University LinzLinzAustria
  2. 2.Idiap Research InstituteMartignySwitzerland
  3. 3.EPFLLausanneSwitzerland

Personalised recommendations