Data Mining and Knowledge Discovery

, Volume 25, Issue 3, pp 478–510 | Cite as

Clustering daily patterns of human activities in the city

Article

Abstract

Data mining and statistical learning techniques are powerful analysis tools yet to be incorporated in the domain of urban studies and transportation research. In this work, we analyze an activity-based travel survey conducted in the Chicago metropolitan area over a demographic representative sample of its population. Detailed data on activities by time of day were collected from more than 30,000 individuals (and 10,552 households) who participated in a 1-day or 2-day survey implemented from January 2007 to February 2008. We examine this large-scale data in order to explore three critical issues: (1) the inherent daily activity structure of individuals in a metropolitan area, (2) the variation of individual daily activities—how they grow and fade over time, and (3) clusters of individual behaviors and the revelation of their related socio-demographic information. We find that the population can be clustered into 8 and 7 representative groups according to their activities during weekdays and weekends, respectively. Our results enrich the traditional divisions consisting of only three groups (workers, students and non-workers) and provide clusters based on activities of different time of day. The generated clusters combined with social demographic information provide a new perspective for urban and transportation planning as well as for emergency response and spreading dynamics, by addressing when, where, and how individuals interact with places in metropolitan areas.

Keywords

Human activity Eigen decomposition Daily activity clustering Metropolitan area Statistical learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Axhausen KW, Zimmermann A, Schönfelder S, Rindsfüser G, Haupt T (2002) Observing the rhythms of daily life: a six-week travel diary. Transportation 29(2): 95–124. doi:10.1023/a:1014247822322 CrossRefGoogle Scholar
  2. Balcan D, Colizza V, Gonçalves B, Hu H, Ramasco JJ, Vespignani A (2009) Multiscale mobility networks and the spatial spreading of infectious diseases. Proc Natl Acad Sci USA 106(51): 21484–21489. doi:10.1073/pnas.0906910106 CrossRefGoogle Scholar
  3. Balmer M, Axhausen KW, Nagel K (1985) Agent-based demand-modeling framework for large-scale microsimulations. vol 1985. National Research Council, Washington, DC, ETATS-UNISGoogle Scholar
  4. Batty M (2005) Cities and complexity: understanding cities with cellular automata, agent-based models, and fractals. The MIT press, CambridgeGoogle Scholar
  5. Becker GS (1965) A theory of the allocation of time. Econ J 75(299): 493–517CrossRefGoogle Scholar
  6. Becker GS (1977) The economic approach to human behavior. University of Chicago Press, ChicagoGoogle Scholar
  7. Becker GS (1991) A treatise on the family. Harvard University Press, CambridgeGoogle Scholar
  8. Bekhor S, Dobler C, Axhausen KW (2011) Integration of activity-based with agent-based models: an example from the tel aviv model and MATSim. In: Transportation Research Board 90th Annual Meeting, Washington DCGoogle Scholar
  9. Ben-Akiva M, Bowman JL (1998) Integration of an activity-based model system and a residential location model. Urban Stud 35(7): 1131–1153. doi:10.1080/0042098984529 CrossRefGoogle Scholar
  10. Bhat CR, Koppelman FS (1999) A retrospective and prospective survey of time-use research. Transportation 26(2): 119–139. doi:10.1023/a:1005196331393 CrossRefGoogle Scholar
  11. Bishop CM (2009) Pattern recognition and machine learning. Springer, New YorkGoogle Scholar
  12. Bowman JL, Ben-Akiva M (2001) Activity-based disaggregate travel demand model system with activity schedules. Transp Res Part A Policy Pract 35(1): 1–28CrossRefGoogle Scholar
  13. Brun M, Sima C, Hua J, Lowey J, Carroll B, Suh E, Dougherty ER (2007) Model-based evaluation of clustering validation measures. Pattern Recognit 40(3): 807–824MATHCrossRefGoogle Scholar
  14. Calabrese F, Reades J, Ratti C (2010) Eigenplaces: segmenting space through digital signatures. vol 9Google Scholar
  15. Candia J, González MC, Wang P, Schoenharl T, Madey G, Barabási A-L (2008) Uncovering individual and collective human dynamics from mobile phone records. J Phys A Math Theor 41(22): 224015CrossRefGoogle Scholar
  16. Chapin FS (1974) Human activity patterns in the city: things people do in time and in space. Wiley, New YorkGoogle Scholar
  17. Chicago Travel Tracker Household Travel Inventory (2008) http://www.cmap.illinois.gov/travel-tracker-survey
  18. Crane R, Sornette D (2008) Robust dynamic classes revealed by measuring the response function of a social system. Proc Natl Acad Sci 105(41): 15649–15653. doi:10.1073/pnas.0803685105 CrossRefGoogle Scholar
  19. Ding C, He X (2004) K-means clustering via principal component analysis. Paper presented at the Proceedings of the twenty-first international conference on Machine learning, Banff, Alberta, CanadaGoogle Scholar
  20. Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New YorkMATHGoogle Scholar
  21. Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3(3): 32–57MathSciNetMATHCrossRefGoogle Scholar
  22. Durrett R (2005) Probability: theory and examples. Thomson Brooks/Cole, BelmontMATHGoogle Scholar
  23. Eagle N, Pentland A (2009) Eigenbehaviors: identifying structure in routine. Behav Ecol Sociobiol 63(7): 1057–1066. doi:10.1007/s00265-009-0739-0 CrossRefGoogle Scholar
  24. Eagle N, Pentland A, Lazer D (2009) Inferring friendship network structure by using mobile phone data. Proc Natl Acad Sci USA. doi:10.1073/pnas.0900282106
  25. Foth, M, Forlano, L, Satchell, C, Gibbs, M (eds) (2011) From social butterfly to engaged citizen: urban informatics, social media, ubiquitous computing, and mobile technology to support citizen engagement. MIT Press, CambridgeGoogle Scholar
  26. Freud S (1953) Collected papers, vol IV. vol v. 1–5. Hogarth Press and The Institute of Psychoanalysis, LondonGoogle Scholar
  27. Geerken M, Gove WR (1983) At home and at work: the family’s allocation of labor. Sage Publications; Published in cooperation with the National Council on Family Relations, Beverly Hills, CAGoogle Scholar
  28. Gonzalez MC, Hidalgo CA, Barabasi A-L (2008) Understanding individual human mobility patterns. Nature 453(7196):779–782. http://www.nature.com/nature/journal/v453/n7196/suppinfo/nature06958_S1.html Google Scholar
  29. Goodchild MF, Janelle DG (1984) The city around the clock: space–time patterns of urban ecological structure. Environ Plan A 16(6): 807–820CrossRefGoogle Scholar
  30. Greaves S (2004) GIS and the collection of travel survey data. In: Hensher DA Handbook of transport geography and spatial systems. Elsevier, New YorkGoogle Scholar
  31. Gupta S, Rao K, Bhatnagar V (1999) K-means clustering algorithm for categorical attributes. Data Warehous Knowl Discov 1676: 797–797. doi:10.1007/3-540-48298-9_22 Google Scholar
  32. Hägerstrand T (1989) Reflections on “what about people in regional science?”. Pap Reg Sci 66(1): 1–6CrossRefGoogle Scholar
  33. Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17(2): 107–145. doi:10.1023/a:1012801612483 MATHCrossRefGoogle Scholar
  34. Hanson S, Hanson P (1980) Gender and urban activity patterns in Uppsala, Sweden. Geogr Rev 70(3): 291–299CrossRefGoogle Scholar
  35. Hanson S, Kwan M-P (eds) (2008) Transport: critical essays in human geography. 1 ednGoogle Scholar
  36. Harvey A, Taylor M (2000) Activity settings and travel behaviour: a social contact perspective. Transportation 27(1): 53–73. doi:10.1023/a:1005207320044 CrossRefGoogle Scholar
  37. Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, BerlinMATHGoogle Scholar
  38. Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3): 283–304. doi:10.1023/a:1009769707641 CrossRefGoogle Scholar
  39. Jolliffe IT (2002) Principal component analysis. Springer, New YorkMATHGoogle Scholar
  40. Kargupta, H, Han, J (eds) (2009) Next generation of data mining. CRC Press, Boca RatonGoogle Scholar
  41. Kim M, Kotz D, Kim S (2006) Extracting a mobility model from real user traces. In: IEEE INFOCOM’06, Barcelona, Spain. doi:citeulike-article-id:903652
  42. Kwan M-P (1999) Gender and individual access to urban opportunities: a study using space–time measures. Prof Geogr 51(2): 210–227CrossRefGoogle Scholar
  43. Li L, Prakash BA (2011) Time series clustering: complex is simpler! In: Proceedings of the 28th international conference on machine learningGoogle Scholar
  44. Maslow AH, Frager R (1987) Motivation and personality. Harper and Row, New YorkGoogle Scholar
  45. Nature Editorial (2008) A flood of hard data. Nature 453(7196):698Google Scholar
  46. Ordonez C (2003) Clustering binary data streams with K-means. Paper presented at the proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, San Diego, CaliforniaGoogle Scholar
  47. Portugali, J, Meyer, H, Stolk, E, Tan, E (eds) (2012) Complexity theories of cities have come of age: an overview with implications to urban planning and design. Springer, BerlinGoogle Scholar
  48. Ralambondrainy H (1995) A conceptual version of the K-means algorithm. Pattern Recognit Lett 16(11): 1147–1157. doi:10.1016/0167-8655(95)00075-r CrossRefGoogle Scholar
  49. Reggiani, A, Nijkamp, P (eds) (2009) Complexity and spatial networks: in search of simplicity. Springer, BerlinGoogle Scholar
  50. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20: 53–65MATHCrossRefGoogle Scholar
  51. Sang S, O’Kelly M, Kwan M-P (2011) Examining commuting patterns. Urban Stud 48(5): 891–909. doi:10.1177/0042098010368576 CrossRefGoogle Scholar
  52. Shen Q (1998) Location characteristics of inner-city neighborhoods and employment accessibility of low-wage workers. Environ Plan B Plan Des 25(3): 345–365CrossRefGoogle Scholar
  53. Song C, Qu Z, Blumm N, Barabási A-L (2010) Limits of predictability in human mobility. Science 327(5968): 1018–1021. doi:10.1126/science.1177170 MathSciNetMATHCrossRefGoogle Scholar
  54. Taylor PJ, Parkes DN (1975) A Kantian view of the city: a factorial-ecology experiment in space and time. Environ Plan A 7(6): 671–688CrossRefGoogle Scholar
  55. Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn Neurosci 3(1): 71–86. doi:10.1162/jocn.1991.3.1.71 CrossRefGoogle Scholar
  56. Waddell P (2002) UrbanSim: modeling urban development for land use, transportation and environmental planning. J Am Plan Assoc 68(3): 297–314CrossRefGoogle Scholar
  57. Wang D, Pedreschi D, Song C, Giannotti F, Barabási A-L (2011a) Human mobility, social ties and link prediction. Paper presented at the 17th ACM SIGKDD conference on knowledge discovery and data mining (KDD’11)Google Scholar
  58. Wang D, Wen Z, Tong H, Lin C-Y, Song C, Barabási A-L (2011b) Information spreading in context. Paper presented at the proceedings of the 20th international conference on World wide web, Hyderabad, IndiaGoogle Scholar
  59. Wang P, González MC, Hidalgo CA, Barabási A-L (2009) Understanding the spreading patterns of mobile phone viruses. Science 324(5930): 1071–1076. doi:10.1126/science.1167053 CrossRefGoogle Scholar
  60. Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, McLachlan G, Ng A, Liu B, Yu P, Zhou Z-H, Steinbach M, Hand D, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37. doi:10.1007/s10115-007-0114-2 CrossRefGoogle Scholar
  61. Xu R, Wunsch DC (2008) Partitional clustering. In: Clustering. Wiley, pp 63–110. doi:10.1002/9780470382776.ch4
  62. Yang J, Leskovec J (2011) Patterns of temporal variation in online media. Paper presented at the proceedings of the fourth ACM international conference on Web search and data mining, Hong Kong, ChinaGoogle Scholar
  63. Yu H, Shaw S-L (2008) Exploring potential human activities in physical and virtual spaces: a spatio-temporal GIS approach. Int J Geogr Inf Sci 22(4): 409–430CrossRefGoogle Scholar
  64. Zha H, Ding C, Gu M, He X, Simon H (2001) Spectral relaxation for K-means clustering. Adv Neural Inf Process Syst 14(NIPS’01): 1057–1064Google Scholar

Copyright information

© The Author(s) 2012

Authors and Affiliations

  • Shan Jiang
    • 1
  • Joseph Ferreira
    • 2
  • Marta C. González
    • 3
  1. 1.Department of Urban Studies and PlanningMassachusetts Institute of TechnologyCambridgeUSA
  2. 2.Department of Urban Studies and PlanningMassachusetts Institute of TechnologyCambridgeUSA
  3. 3.Department of Civil and Environmental Engineering and Engineering Systems DivisionMassachusetts Institute of TechnologyCambridgeUSA

Personalised recommendations