Journal of Intelligent Information Systems

, Volume 48, Issue 2, pp 287–308 | Cite as

Identifying urban crowds using geo-located Social media data: a Twitter experiment in New York City

  • Mohamed ben Khalifa
  • Rebeca P. Díaz Redondo
  • Ana Fernández Vilas
  • Sandra Servia Rodríguez


The massive growth of GPS equipped smartphones coupled with the increasing importance of Social Media has led to the emergence of new services over LBSNs (Location-based Social Networks) where both, opinions and location, are shared. This proactive attitude allow us to consider citizens as sensors in motion whose information supports our approach: monitoring multitudes or crowds all around the city. More specifically, our proposal is mining geotagged data from LBSNs in order to analyze crowds according to different parameters as size, duration, composition, motivation, cohesion and proximity. This analysis is gathered under a methodology for crowd detection in cities that combines social data mining, density-based clustering and outlier detection into a solution that can operate on-the-fly. This methodology enables foreseeing crowds in short term based on the prior analysis of time and previous behavior of individuals in the geographical area under study. Our approach was validated using Twitter, as public social network par excellence, to analyze geotagged data in New York City on a normal day (reference day) and on New Year’s Eve, as the study day, when public crowds are expected.


Data mining Location-based social network Crowd detection Citizen-as-a-sensor Density-based clustering Twitter 



This work is funded by: the European Regional Development Fund (ERDF) and the Galician Regional Government under agreement for funding the Atlantic Research Center for Information and Communication Technologies (AtlantTIC); the Spanish Government and the European Regional Development Fund (ERDF) under project TACTICA; the Spanish Ministry of Economy and Competitiveness under the National Science Program (TEC2014-54335-C4-3-R); and and the European Commission under the Erasmus Mundus GreenIT project (GreenIT for the benefit of civil society. 3772227-1-2012-ES-ERA MUNDUSEMA21; Grant Agreement n 2012-2625/001-001-EMA2).


  1. Acuna, E., & Rodriguez, C. (2004). A meta analysis study of outlier detection methods in classification, University of Puerto Rico Mayaguez.Google Scholar
  2. An, N.T., & Phuong, T.M. (2007). A Gaussian Mixture Model for Mobile Location Prediction. In The 9th International Conference on Advanced Communication Technology, (Vol. 2 pp. 914–919).Google Scholar
  3. Andrade, E.L., Blunsden, S., & Fisher, R.B. (2006). Modelling Crowd Scenes for Event Detection. In 18th International Conference on Pattern Recognition (ICPR’06), (Vol. 1 pp. 175–178).Google Scholar
  4. Ankerst, M., Breunig, M.M.M., Kriegel, H.H., & Sander, J. (1999). Optics: Ordering points to identify the clustering structure. ACM SIGMOD Record, 49–60.Google Scholar
  5. ben Khalifa, M., Díaz Redondo, R.P., & Fernández Vilas, A. (2015). Why are these people there? An analysis based on Twitter. In Proceedings of the 6th International Conference on Information, Intelligence, Systems and Applications, pp. 1–6.Google Scholar
  6. Beniger, J.R., Barnett, V., & Lewis, T. (1980). Outliers in statistical data. Contemporánea of Sociology, 9(4), 560.CrossRefGoogle Scholar
  7. Blumer, H. (1971). Social problems as collective behavior. Social Problems, 18, 298–306.CrossRefGoogle Scholar
  8. Cheng, Z., Caverlee, J., & Lee, K. (2010). You are where you tweet: a content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM international conference on Information and knowledge management (pp. 759–768).Google Scholar
  9. Cheng, Z., Caverlee, J., Lee, K., & Sui, D. (2011). Exploring millions of footprints in location sharing services. ICWSM, 2011, 81–88.Google Scholar
  10. De Longueville, B., Smith, R., & Luraschi, G. (2009). OMG, from here, I can see the flames!: a use case of mining location based social networks to acquire spatio-temporal data on forest fires. In Proceedings of the 2009 International Workshop on Location Based Social Networks (pp. 73–80).Google Scholar
  11. Dia, H. (2001). An object-oriented neural network approach to short-term traffic forecasting. European Journal of Operational Research, 131, 253–261.CrossRefzbMATHGoogle Scholar
  12. Elke Achtert, A.Z. (2011). Achmed hettab, Hans-Peter kriegel, erich schubert, spatial outlier detection: data, Algorithms, Visualizations. Advanced Spatial Temporal Databases, 512–516.Google Scholar
  13. Ester, M., Kriegel, H., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. KDD, 96, 226–231.Google Scholar
  14. Frigge, M., Hoaglin, D.C., & Iglewicz, B. (1989). Some implementations of the boxplot. The American Statistician, 43, 50–54.Google Scholar
  15. Gao, H., & Liu, H. (2014). Data analysis on Location-Based social networks, mobile social networking, (pp. 165–194). New York: Springer.CrossRefGoogle Scholar
  16. Gao, H., Tang, J., & Liu, H. (2012). Mobile location prediction in spatio-temporal context. Nokia Mobile data Challenge Working, 2, 1–4.Google Scholar
  17. Giffinger, R., Fertner, C., Kramar, H., Kalasek, R., Pichler-Milanović, N., & Meijers, E. (2007). Smart cities-Ranking of European medium-sized cities.Google Scholar
  18. Goodchild, M., & Glennon, J. (2010). Crowdsourcing geographic information for disaster response: a research frontier. International Journal on Digital Earth, 3, 231–241.CrossRefGoogle Scholar
  19. Greenberg, M.S. (2010). Mob psychology, in Corsini Encyclopedia of Psychology: Wiley.Google Scholar
  20. Hamid, R., Johnson, A., Batta, S., Bobick, A., Isbell, C., & Coleman, G. (2005). Detection and explanation of anomalous activities: Representing activities as bags of event n-grams. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, (Vol. I pp. 1031–1038).Google Scholar
  21. Han, J., Kamber, M., & Pei, J. (2006). Data mining: concepts and techniques.Google Scholar
  22. Hawkins, D. (1980). Identification of outliers. London: Chapman and Hall.CrossRefzbMATHGoogle Scholar
  23. Hinneburg, A., & Gabriel, H.H. (2007). Denclue 2.0: Fast clustering based on kernel density estimation. Advanced Intelligence Data Analysis, VII, 70–80.Google Scholar
  24. Ilic, A., Staake, T., & Fleisch, E. (2009). Using sensor information to reduce the carbon footprint of perishable goods. IEEE Pervasive Computing, 8, 22–29.CrossRefGoogle Scholar
  25. Kumar, S., Zafarani, R., & Liu, H. (2011). Understanding user migration patterns in social media in AAAI 2011.Google Scholar
  26. Le Bon, G. (1994). The crowd transaction publishers.Google Scholar
  27. Lee, R., & Sumiya, K. (2010). Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection. In Proceedings of the 2nd ACM SIGSPATIAL International Workshop on LBSNs (pp. 1–10).Google Scholar
  28. Momboisse, R. (1967). Riots, revolts, and insurrections, 3rd edn. Springfield: Charles Thomas.Google Scholar
  29. Monreale, A., Pinelli, F., & Trasarti, R. (2009). Wherenext: a Location Predictor on Trajectory Pattern Mining. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’09 (pp. 637–645).Google Scholar
  30. Morstatter, F., Pfeffer, J., Liu, H., & Carley, K. (2013). Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose, in ICWSM.Google Scholar
  31. Musse, S.R., & Thalmann, D. (2001). Hierarchical model for real time simulation of virtual human crowds. IEEE Transactions on Visualization and Computer Graphics, 7(2), 152–164.CrossRefGoogle Scholar
  32. Nanda, H., & Davis, L. (2002). Probabilistic template based pedestrian detection in infrared videos. In Intelligent Vehicle Symposium, 2002, (Vol. 1 pp. 15–20): IEEE.Google Scholar
  33. Pini, R., Ofer, M., Shai, A., & Amnon, S. (2004). Crowd detection in video sequences. IEEE Intelligence Vehicle Symposium, 2004, 66–71.Google Scholar
  34. Reicher, S. (2012). The psychology of crowd dynamics. Psychological Society, 44, 113–128.Google Scholar
  35. Russell, M.A. (2011). Mining the Social Web, 2nd edn.: O’Reilly Media.Google Scholar
  36. Sakaki, T., Okazaki, M., & Matsuo, Y. (2010). Earthquake shakes Twitter users: real-time event detection by social sensors. Proceedings of the 19th International Conference on World Wide Web, 851–860.Google Scholar
  37. Santoro, F., & Pedro, S. (2010). Crowd analysis by using optical flow and density based clustering. Proceedings of the European Signal Processes Conference, 18, 269–273.Google Scholar
  38. Sivaraman, V., & Carrapetta, J. (2013). Hazewatch: A participatory sensor system for monitoring air pollution in Sydney, Local Comput. Networks Work. (LCN Work., pp. 56–64.Google Scholar
  39. Sprake, J., & Rogers, P. (2014). Crowds, citizens and sensors: process and practice for mobilising learning. Personnel Ubiquitous Computer, 18(3), 753–764.CrossRefGoogle Scholar
  40. Taylor, A., & Galván-López, E. (2012). Management and control of energy usage and price using participatory sensing data. In The Third International Workshop on Agent Technologies for Energy Systems at Eleventh International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 111–119.Google Scholar
  41. White, D.J., Chang, H.G., Benach, J.L., Bosler, E.M., Meldrum, S.C., Means, R.G., Debbie, J.G., Birkhead, G.S., & Morse, D.L. (1991). The geographic spread and temporal increase of the Lyme disease epidemic. Journal of the American Medical Association, 266(9), 1230–1236.CrossRefGoogle Scholar
  42. Wolf, J.A., Moreau, J.F., Akilov, O., Patton, T., English, J.C. 3rd, Ho, J., & Ferris, L.K. (2013). Diagnostic inaccuracy of smartphone applications for melanoma detection. JAMA Dermatol, 149, 422–426.CrossRefGoogle Scholar
  43. Ye, M., Janowicz, K., Mülligann, C., & Lee, W. (2011). What you are is When you are: The Temporal Dimension of Feature Types in Location-based Social Networks. In Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 102–111).Google Scholar
  44. Ye, M., Yin, P., Lee, W.-C., & Lee, D.-L. (2011). Exploiting geographical influence for collaborative point-of-interest recommendation. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (pp. 325–334).Google Scholar
  45. Zhang, D., Gatica-Perez, D., Bengio, S., & McCowan, I. (2005). Semi-supervised Adapted HMMs for Unusual Event Detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, (Vol. 1 pp. 611–618).Google Scholar
  46. Zheng, Y., Xie, X., & Ma (2010). GeoLife: A collaborative social networking service among user, location and trajectory. IEEE Data Engineering Bulletin, 33(2), 32–39.Google Scholar
  47. Zhou, H., Wang, P., & Li, H. (2012). Research on adaptive parameters determination in DBSCAN algorithm. Journal of Information and Computing Science, 9(7), 1967–1973.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Mohamed ben Khalifa
    • 1
  • Rebeca P. Díaz Redondo
    • 1
  • Ana Fernández Vilas
    • 1
  • Sandra Servia Rodríguez
    • 1
  1. 1.Information and Computing Laboratory AtlantTIC Research Center, School of Telecommunications EngineeringUniversity of VigoVigoSpain

Personalised recommendations