Skip to main content

Exploring cell tower data dumps for supervised learning-based point-of-interest prediction (industrial paper)

Abstract

Exploring massive mobile data for location-based services becomes one of the key challenges in mobile data mining. In this paper, we investigate a problem of finding a correlation between the collective behavior of mobile users and the distribution of points of interest (POIs) in a city. Specifically, we use large-scale cell tower data dumps collected from cell towers and POIs extracted from a popular social network service, Weibo. Our objective is to make use of the data from these two different types of sources to build a model for predicting the POI densities of different regions in the covered area. An application domain that may benefit from our research is a business recommendation application, where a prediction result can be used as a recommendation for opening a new store/branch. The crux of our contribution is the method of representing the collective behavior of mobile users as a histogram of connection counts over a period of time in each region. This representation ultimately enables us to apply a supervised learning algorithm to our problem in order to train a POI prediction model using the POI data set as the ground truth. We studied 12 state-of-the-art classification and regression algorithms; experimental results demonstrate the feasibility and effectiveness of the proposed method.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Notes

  1. http://www.chinamobileltd.com

  2. http://weibo.com

References

  1. Bao J, Zheng Y, Mokbel MF (2012) Location-based and preference-aware recommendation using sparse geo-social networking data. In: ACM SIGSPATIAL

  2. Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD (1972) Statistical inference under order restrictions: The theory and application of isotonic regression. Wiley, New York

    Google Scholar 

  3. Becker RA, Caceres R, Hanson K, Loh JM, Urbanek S, Varshavsky A, Volinsky C (2011) A tale of one city: Using cellular network data for urban planning. IEEE Pervasive Computing 10(4):18–26

    Article  Google Scholar 

  4. Birant D, St-dbscan AK (2007) An algorithm for clustering spatial–temporal data. DKE 60(1):208–221

    Article  Google Scholar 

  5. Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, New York

    Google Scholar 

  6. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Google Scholar 

  7. Chen XM, Liu WQ, Lai JH, Li Z, Lu C (2012) Face recognition via local preserving average neighborhood margin maximization and extreme learning machine. Soft Comput 16(9):1515–1523

    Article  Google Scholar 

  8. Collins M, Schapire RE, Singer Y (2002) Logistic regression, adaboost and bregman distances. Mach Learn 48(1-3):253–285

    Article  Google Scholar 

  9. Ghosh S, Lee K, Moorthy S (1995) Multiple scale analysis of heterogeneous elastic structures using homogenization theory and voronoi cell finite element method. IJSS 32(1):27–62

    Google Scholar 

  10. Goh JY, Taniar D (2004) Mobile data mining by location dependencies. In: IDEAL

  11. Gokaraju B, Durbha SS, King RL, Younan NH (2011) A machine learning based spatio-temporal data mining approach for detection of harmful algal blooms in the Gulf of Mexico. IEEE J-STARS 4(3):710–720

    Google Scholar 

  12. Hartigan JA, Wong MA (1979) Algorithm as 136: A k-means clustering algorithm. J R Stat Soc: Ser C: Appl Stat 28(1):100–108

  13. Haykin S (1994) Neural networks: A comprehensive foundation. Prentice Hall PTR

  14. Holmes G, Donkin A, Weka IH (1994) Witten: A machine learning workbench. In: ANZIIS

  15. Isaacman S, Becker R, Cáceres R, Kobourov S, Martonosi M, Rowland J, Varshavsky A (2011) Identifying important places in people’s lives from cellular network data. In: Pervasive Computing

  16. Kanasugi H, Sekimoto Y, Kurokawa M, Watanabe T, Muramatsu S, Shibasaki R (2013) Spatiotemporal route estimation consistent with human mobility using cellular network data. In: IEEE PerCom

  17. Miller HJ, Han J (2009) Geographic data mining and knowledge discovery. CRC Press

  18. Pan B, Zheng Y, Wilkie D, Shahabi C (2013) Crowd sensing of traffic anomalies based on human mobility and social media. In: ACM SIGSPATIAL

  19. Quinlan JR (1996) Improved use of continuous attributes in C4.5. JAIR 4:77–90

  20. Ratti C, Williams S, Frenchman D, Pulselli RM (2006) Mobile landscapes: using location data from cell phones for urban analysis. Environ Plan B: Planning and Design 33(5):727

    Article  Google Scholar 

  21. Rish I (2001) An empirical study of the naive bayes classifier. In: IJCAI

  22. Seber GAF, Lee AJ (2012) Linear regression analysis, volume 936. John Wiley & Sons

  23. Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. JRSS, Series B 53(3):683–690

    Google Scholar 

  24. Stone CJ (1985) Additive regression and other nonparametric models. Ann Stat:689–705

  25. Tong S, Koller D (2002) Support vector machine active learning with applications to text classification. J Mach Learn Res 2:45–66

    Google Scholar 

  26. Toole JL, Ulm M, González MC, Bauer D (2012) Inferring land use from mobile phone activity. In: ACM UrbComp

  27. Torgo L, Gama J (1996) Regression by classification. In: Advances in Artificial Intelligence, pp 51–60

  28. Vapnik V (2000) The nature of statistical learning theory. Springer

  29. Vieira MR, Frias-Martinez V, Oliver N, Frias-Martinez E (2010) Characterizing dense urban areas from mobile phone-call data: Discovery and social dynamics. In: IEEE SocialCom

  30. Wang L, Huang YP, Luo XY, Wang Z, Luo SW (2011) Image deblurring with filters learned by extreme learning machine. Neurocomputing 74(16):2464–2474

    Article  Google Scholar 

  31. Wang Y, Witten IH (1999) Pace regression. Technical Report 99/12, Department of Computer Science, The University of Waikato

  32. Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE TEVC 1(1):67–82

    Google Scholar 

  33. Yavaṡ G, Katsaros D, Ulusoy Ö, Manolopoulos Y (2005) A data mining approach for location prediction in mobile environments. DKE 54(2):121–146

    Article  Google Scholar 

  34. Ye M, Yin P, Lee W-C, Lee D-L (2011) Exploiting geographical influence for collaborative point-of-interest recommendation. In: ACM SIGSPATIAL

  35. Yuan J, Zheng Y, Xie X (2012) Discovering regions of different functions in a city using human mobility and pois. In: ACM SIGKDD

  36. Yuan J, Zheng Y, Xie X, Sun G (2013) T-drive: Enhancing driving directions with taxi drivers’ intelligence. IEEE TKDE 25(1):220–232

    Google Scholar 

  37. Zha Z, Wang M, Zheng Y, Yang Y, Hong R, Chua T (2012) Interactive video indexing with statistical active learning. IEEE TMM 14(1):17–27

    Google Scholar 

  38. Zhang J-D, Chow C-Y (2013) iGSLR: Personalized geo-social location recommendation: A kernel density estimation approach. In: ACM SIGSPATIAL

  39. Zheng J, Liu S, Ni LM (2013) Effective routine behavior pattern discovery from sparse mobile phone data via collaborative filtering. In: IEEE PerCom

  40. Zheng Y, Chen Y, Xie X, Ma WY (2009) Geolife2.0: A location-based social networking service. In: IEEE MDM

Download references

Acknowledgments

R. Wang and C.-Y. Chow were partially supported by a research grant (CityU Project No. 9231131). S. Nutanong was partially supported by a CityU research grant (CityU Project No. 7200387). This work was also supported by the National Natural Science Foundation of China under the Grant 61402460.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chi-Yin Chow.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, R., Chow, CY., Lyu, Y. et al. Exploring cell tower data dumps for supervised learning-based point-of-interest prediction (industrial paper). Geoinformatica 20, 327–349 (2016). https://doi.org/10.1007/s10707-015-0237-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-015-0237-7

Keywords

  • Spatio-temporal data analysis
  • Classification
  • Regression
  • Cell tower data dumps
  • Point-of-interest prediction