Networks and Spatial Economics

, Volume 14, Issue 3–4, pp 647–667 | Cite as

Inferring Urban Land Use Using Large-Scale Social Media Check-in Data

  • Xianyuan Zhan
  • Satish V. UkkusuriEmail author
  • Feng Zhu


Emerging location-based services in social media tools such as Foursquare and Twitter are providing an unprecedented amount of public-generated data on human movements and activities. This novel data source contains valuable information (e.g., geo-location, time and date, type of places) on human activities. While the data is tremendously beneficial in modeling human activity patterns, it is also greatly useful in inferring planning related variables such as a city’s land use characteristics. This paper provides a comprehensive investigation on the possibility and validity of utilizing large-scale social media check-in data to infer land use types by applying the state-of-art data mining techniques. Two inference approaches are proposed and tested in this paper: the unsupervised clustering method and supervised learning method. The land use inference is conducted in a uniform grid level of 200 by 200 m. The methods are applied to a case study of New York City. The validation result confirms that the two approaches effectively infer different land use types given sufficient check-in data. The encouraging result demonstrates the potential of using social media check-in data in urban land use inference, and also reveals the hidden linkage between the human activity pattern and the underlying urban land use pattern.


Land use Social media Foursquare Geo-location data Data mining Clustering algorithms Supervised learning algorithms 


  1. Abonyi J, Feil B (2007) Cluster analysis for data mining and system identification. Springer, LondonGoogle Scholar
  2. Alelyani S, Tang J, Liu H (2013) Feature selection for clustering: A review. Data Clust Algorithm Appl, CRC PressGoogle Scholar
  3. Balasko B, Abonyi J, Feil B (2005) Fuzzy clustering and data analysis toolbox.
  4. Barnsley MJ, Barr SL (1996) Inferring urban land use from satellite sensor images using kernel-based spatial reclassification. Photogramm Eng Remote Sens 62(8):949–958Google Scholar
  5. Bishop CM (2006) Pattern recognition and machine learning (Information Science and Statistics), 1st edn. Springer-Verlag New York, Inc, SecaucusGoogle Scholar
  6. Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefGoogle Scholar
  7. Cheng Z et al. (2011) Exploring millions of footprints in location sharing services. AAAI ICWSM, 2010(Cholera)Google Scholar
  8. ComScore, Inc (2012) 2012 mobile future in focus. ComScore, Inc.
  9. Davies D, Bouldin D (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(2):224–227CrossRefGoogle Scholar
  10. Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybernet 3(3):32–57CrossRefGoogle Scholar
  11. González MC, Hidalgo CA, Barabási A-L (2008) Understanding individual human mobility patterns. Nature 453(7196):779–782CrossRefGoogle Scholar
  12. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18CrossRefGoogle Scholar
  13. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, (Springer Series in Statistics), 2nd edn. Springer, New YorkCrossRefGoogle Scholar
  14. He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. Adv Neural Inf Process Syst 18:507Google Scholar
  15. Marchal F (2005) A trip generation method for time-dependent Large-Scale Simulations of Transport and Land-Use. Netw Spat Econ 5:179–192CrossRefGoogle Scholar
  16. Mesev V (1998) The use of census data in urban image classification. Photogramm Eng Remote Sens 5:431–438Google Scholar
  17. Moran MS, Inoue Y, Barnes EM (1997) Opportunities and limitations for image-based remote sensing in precision crop management. Remote Sens Environ 61(3):319–346CrossRefGoogle Scholar
  18. Müller E, Günnemann S, Assent I, Seidl T (2009) Evaluating clustering in subspace projections of high dimensional data. In Proc. 35th International Conference on Very Large Data Bases (VLDB 2009), Lyon, FranceGoogle Scholar
  19. New York City Department of City Planning (NYCDCP) (2013) MapPluto.
  20. Pfaffenbichler P, Emberger G, Shepherd S (2008) The integrated dynamic land use and transport model MARS. Netw Spat Econ 8(2–3):183–200CrossRefGoogle Scholar
  21. Qi G, Li X, Li S, Pan G, Wang Z, Zhang D (2011) Measuring social functions of city regions from large-scale taxi behaviors. In the proceeding of Ninth Annual IEEE International Conference on Pervasive Computing and Communications, PerCOM, 384–388Google Scholar
  22. Ray S, Turi RH (1999) Determination of number of clusters in k-means clustering and application in colour image segmentation. In ICAPRDTGoogle Scholar
  23. Schmit C, Rounsevell MDA, La Jeunesse I (2006) The limitations of spatial land use data in environmental analysis. Environ Sci Pol 9(2):174–188CrossRefGoogle Scholar
  24. Song C, Qu Z, Blumm N, Barabási A-L (2010) Limits of predictability in human mobility. Science (New York, NY) 327(5968):1018–1021CrossRefGoogle Scholar
  25. Soto V, Frias-Martinez E (2011a) Robust land use characterization of urban landscapes using cell phone data. In 1st Workshop on Pervasive Urban Applications, in conjunction with 9th Int. Conf. Pervasive Computing, June 2011Google Scholar
  26. Soto V, Frías-Martínez E (2011b) Automated land use identification using cell-phone records. In Proceedings of the 3rd ACM International Workshop on MobiArch - HotPlanet’11, 17. ACM Press, New YorkGoogle Scholar
  27. Sun H, Forsythe W, Waters N (2007) Modeling urban land use change and Urban Sprawl: Calgary, Alberta, Canada. Netw Spat Econ 7(4):353–376CrossRefGoogle Scholar
  28. Toole JL, Ulm M, González MC, Bauer D (2012) Inferring land use from mobile phone activity. In Proceedings of the ACM SIGKDD International Workshop on Urban Computing - UrbComp’12, 1. ACM Press, New YorkGoogle Scholar
  29. Winkler R, Klawonn F, Kruse R (2011) Fuzzy c-means in high dimensional spaces. Int J Fuzzy Syst Appl (IJFSA) 1(1):1–16Google Scholar
  30. Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans pattern anal mach intell 13(8):841–847Google Scholar
  31. Yang X, Lo CP (2002) Using a time series of satellite imagery to detect land use and land cover changes in the Atlanta, Georgia Metropolitan Area. Int J Remote Sens 23(9):1775–1798CrossRefGoogle Scholar
  32. Yuan J, Yu Z, Xing X (2012) Discovering regions of different functions in a City using human mobility and POIs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD’12, 186. ACM Press, New YorkGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Lyles School of Civil EngineeringPurdue UniversityWest LafayetteUSA

Personalised recommendations