Skip to main content

Advertisement

Log in

Identifying urban crowds using geo-located Social media data: a Twitter experiment in New York City

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

The massive growth of GPS equipped smartphones coupled with the increasing importance of Social Media has led to the emergence of new services over LBSNs (Location-based Social Networks) where both, opinions and location, are shared. This proactive attitude allow us to consider citizens as sensors in motion whose information supports our approach: monitoring multitudes or crowds all around the city. More specifically, our proposal is mining geotagged data from LBSNs in order to analyze crowds according to different parameters as size, duration, composition, motivation, cohesion and proximity. This analysis is gathered under a methodology for crowd detection in cities that combines social data mining, density-based clustering and outlier detection into a solution that can operate on-the-fly. This methodology enables foreseeing crowds in short term based on the prior analysis of time and previous behavior of individuals in the geographical area under study. Our approach was validated using Twitter, as public social network par excellence, to analyze geotagged data in New York City on a normal day (reference day) and on New Year’s Eve, as the study day, when public crowds are expected.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. However, Twitter API returned many tweets outside of the defined shape, a common issue for Twitter API users.

References

  • Acuna, E., & Rodriguez, C. (2004). A meta analysis study of outlier detection methods in classification, University of Puerto Rico Mayaguez.

  • An, N.T., & Phuong, T.M. (2007). A Gaussian Mixture Model for Mobile Location Prediction. In The 9th International Conference on Advanced Communication Technology, (Vol. 2 pp. 914–919).

  • Andrade, E.L., Blunsden, S., & Fisher, R.B. (2006). Modelling Crowd Scenes for Event Detection. In 18th International Conference on Pattern Recognition (ICPR’06), (Vol. 1 pp. 175–178).

  • Ankerst, M., Breunig, M.M.M., Kriegel, H.H., & Sander, J. (1999). Optics: Ordering points to identify the clustering structure. ACM SIGMOD Record, 49–60.

  • ben Khalifa, M., Díaz Redondo, R.P., & Fernández Vilas, A. (2015). Why are these people there? An analysis based on Twitter. In Proceedings of the 6th International Conference on Information, Intelligence, Systems and Applications, pp. 1–6.

  • Beniger, J.R., Barnett, V., & Lewis, T. (1980). Outliers in statistical data. Contemporánea of Sociology, 9(4), 560.

    Article  Google Scholar 

  • Blumer, H. (1971). Social problems as collective behavior. Social Problems, 18, 298–306.

    Article  Google Scholar 

  • Cheng, Z., Caverlee, J., & Lee, K. (2010). You are where you tweet: a content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM international conference on Information and knowledge management (pp. 759–768).

  • Cheng, Z., Caverlee, J., Lee, K., & Sui, D. (2011). Exploring millions of footprints in location sharing services. ICWSM, 2011, 81–88.

    Google Scholar 

  • De Longueville, B., Smith, R., & Luraschi, G. (2009). OMG, from here, I can see the flames!: a use case of mining location based social networks to acquire spatio-temporal data on forest fires. In Proceedings of the 2009 International Workshop on Location Based Social Networks (pp. 73–80).

  • Dia, H. (2001). An object-oriented neural network approach to short-term traffic forecasting. European Journal of Operational Research, 131, 253–261.

    Article  MATH  Google Scholar 

  • Elke Achtert, A.Z. (2011). Achmed hettab, Hans-Peter kriegel, erich schubert, spatial outlier detection: data, Algorithms, Visualizations. Advanced Spatial Temporal Databases, 512–516.

  • Ester, M., Kriegel, H., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. KDD, 96, 226–231.

    Google Scholar 

  • Frigge, M., Hoaglin, D.C., & Iglewicz, B. (1989). Some implementations of the boxplot. The American Statistician, 43, 50–54.

    Google Scholar 

  • Gao, H., & Liu, H. (2014). Data analysis on Location-Based social networks, mobile social networking, (pp. 165–194). New York: Springer.

    Book  Google Scholar 

  • Gao, H., Tang, J., & Liu, H. (2012). Mobile location prediction in spatio-temporal context. Nokia Mobile data Challenge Working, 2, 1–4.

    Google Scholar 

  • Giffinger, R., Fertner, C., Kramar, H., Kalasek, R., Pichler-Milanović, N., & Meijers, E. (2007). Smart cities-Ranking of European medium-sized cities.

  • Goodchild, M., & Glennon, J. (2010). Crowdsourcing geographic information for disaster response: a research frontier. International Journal on Digital Earth, 3, 231–241.

    Article  Google Scholar 

  • Greenberg, M.S. (2010). Mob psychology, in Corsini Encyclopedia of Psychology: Wiley.

  • Hamid, R., Johnson, A., Batta, S., Bobick, A., Isbell, C., & Coleman, G. (2005). Detection and explanation of anomalous activities: Representing activities as bags of event n-grams. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, (Vol. I pp. 1031–1038).

  • Han, J., Kamber, M., & Pei, J. (2006). Data mining: concepts and techniques.

  • Hawkins, D. (1980). Identification of outliers. London: Chapman and Hall.

    Book  MATH  Google Scholar 

  • Hinneburg, A., & Gabriel, H.H. (2007). Denclue 2.0: Fast clustering based on kernel density estimation. Advanced Intelligence Data Analysis, VII, 70–80.

    Google Scholar 

  • Ilic, A., Staake, T., & Fleisch, E. (2009). Using sensor information to reduce the carbon footprint of perishable goods. IEEE Pervasive Computing, 8, 22–29.

    Article  Google Scholar 

  • Kumar, S., Zafarani, R., & Liu, H. (2011). Understanding user migration patterns in social media in AAAI 2011.

  • Le Bon, G. (1994). The crowd transaction publishers.

  • Lee, R., & Sumiya, K. (2010). Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection. In Proceedings of the 2nd ACM SIGSPATIAL International Workshop on LBSNs (pp. 1–10).

  • Momboisse, R. (1967). Riots, revolts, and insurrections, 3rd edn. Springfield: Charles Thomas.

    Google Scholar 

  • Monreale, A., Pinelli, F., & Trasarti, R. (2009). Wherenext: a Location Predictor on Trajectory Pattern Mining. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’09 (pp. 637–645).

  • Morstatter, F., Pfeffer, J., Liu, H., & Carley, K. (2013). Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose, in ICWSM.

  • Musse, S.R., & Thalmann, D. (2001). Hierarchical model for real time simulation of virtual human crowds. IEEE Transactions on Visualization and Computer Graphics, 7(2), 152–164.

    Article  Google Scholar 

  • Nanda, H., & Davis, L. (2002). Probabilistic template based pedestrian detection in infrared videos. In Intelligent Vehicle Symposium, 2002, (Vol. 1 pp. 15–20): IEEE.

  • Pini, R., Ofer, M., Shai, A., & Amnon, S. (2004). Crowd detection in video sequences. IEEE Intelligence Vehicle Symposium, 2004, 66–71.

    Google Scholar 

  • Reicher, S. (2012). The psychology of crowd dynamics. Psychological Society, 44, 113–128.

    Google Scholar 

  • Russell, M.A. (2011). Mining the Social Web, 2nd edn.: O’Reilly Media.

  • Sakaki, T., Okazaki, M., & Matsuo, Y. (2010). Earthquake shakes Twitter users: real-time event detection by social sensors. Proceedings of the 19th International Conference on World Wide Web, 851–860.

  • Santoro, F., & Pedro, S. (2010). Crowd analysis by using optical flow and density based clustering. Proceedings of the European Signal Processes Conference, 18, 269–273.

    Google Scholar 

  • Sivaraman, V., & Carrapetta, J. (2013). Hazewatch: A participatory sensor system for monitoring air pollution in Sydney, Local Comput. Networks Work. (LCN Work., pp. 56–64.

  • Sprake, J., & Rogers, P. (2014). Crowds, citizens and sensors: process and practice for mobilising learning. Personnel Ubiquitous Computer, 18(3), 753–764.

    Article  Google Scholar 

  • Taylor, A., & Galván-López, E. (2012). Management and control of energy usage and price using participatory sensing data. In The Third International Workshop on Agent Technologies for Energy Systems at Eleventh International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 111–119.

  • White, D.J., Chang, H.G., Benach, J.L., Bosler, E.M., Meldrum, S.C., Means, R.G., Debbie, J.G., Birkhead, G.S., & Morse, D.L. (1991). The geographic spread and temporal increase of the Lyme disease epidemic. Journal of the American Medical Association, 266(9), 1230–1236.

    Article  Google Scholar 

  • Wolf, J.A., Moreau, J.F., Akilov, O., Patton, T., English, J.C. 3rd, Ho, J., & Ferris, L.K. (2013). Diagnostic inaccuracy of smartphone applications for melanoma detection. JAMA Dermatol, 149, 422–426.

    Article  Google Scholar 

  • Ye, M., Janowicz, K., Mülligann, C., & Lee, W. (2011). What you are is When you are: The Temporal Dimension of Feature Types in Location-based Social Networks. In Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 102–111).

  • Ye, M., Yin, P., Lee, W.-C., & Lee, D.-L. (2011). Exploiting geographical influence for collaborative point-of-interest recommendation. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (pp. 325–334).

  • Zhang, D., Gatica-Perez, D., Bengio, S., & McCowan, I. (2005). Semi-supervised Adapted HMMs for Unusual Event Detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, (Vol. 1 pp. 611–618).

  • Zheng, Y., Xie, X., & Ma (2010). GeoLife: A collaborative social networking service among user, location and trajectory. IEEE Data Engineering Bulletin, 33(2), 32–39.

    Google Scholar 

  • Zhou, H., Wang, P., & Li, H. (2012). Research on adaptive parameters determination in DBSCAN algorithm. Journal of Information and Computing Science, 9(7), 1967–1973.

    Google Scholar 

Download references

Acknowledgments

This work is funded by: the European Regional Development Fund (ERDF) and the Galician Regional Government under agreement for funding the Atlantic Research Center for Information and Communication Technologies (AtlantTIC); the Spanish Government and the European Regional Development Fund (ERDF) under project TACTICA; the Spanish Ministry of Economy and Competitiveness under the National Science Program (TEC2014-54335-C4-3-R); and and the European Commission under the Erasmus Mundus GreenIT project (GreenIT for the benefit of civil society. 3772227-1-2012-ES-ERA MUNDUSEMA21; Grant Agreement n 2012-2625/001-001-EMA2).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed ben Khalifa.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khalifa, M.b., Díaz Redondo, R.P., Vilas, A.F. et al. Identifying urban crowds using geo-located Social media data: a Twitter experiment in New York City. J Intell Inf Syst 48, 287–308 (2017). https://doi.org/10.1007/s10844-016-0411-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-016-0411-x

Keywords

Navigation