Is There a Crowd? Experiences in Using Density-Based Clustering and Outlier Detection

  • Mohamed Ben Kalifa
  • Rebeca P. Díaz Redondo
  • Ana Fernández Vilas
  • Rafael López Serrano
  • Sandra Servia Rodríguez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8891)


The massive growth of GPS equipped smartphones coupled with the increasing importance of Social Media has led to the emergence of new location-based services over LBSNs (Location-based Social Networks) which allow citizens to act as social sensors reporting about their locations. This proactive social reporting might be beneficial for researchers in a wide number of scenarios like the one addressed in this paper: monitoring crowds in the city involving an assembly of individuals in term of size, duration, motivation, cohesion and proximity. We introduce a methodology for crowd-detection that combines social data mining, density-based clustering and outlier detection into a solution that can operate on-the-fly to predictpublic crowds, i.e. to foresee, in short term, the formation of potential multitudes based on the prior analysis of the region. Twitter is mined to analyze geo-tagged data in New York at New Year’s Eve, so that those predictable public crowds are discovered.


data mining Location-based social network crowd detection citizen- as-a-sensor density-based clustering Twitter 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cheng, Z., Caverlee, J., Lee, K., Sui, D.: Exploring Millions of Footprints in Location Sharing Services. In: ICWSM 2011, pp. 81–88 (2011)Google Scholar
  2. 2.
    Zheng, Y., Xie, X.: Ma GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory. IEEE Data Eng. Bull. 33, 32–39 (2010)Google Scholar
  3. 3.
    Gao, H., Liu, H.: Data Analysis on Location-Based Social Networks. In: Mob. Soc. Netw., pp. 165–194. Springer, New York (2014)CrossRefGoogle Scholar
  4. 4.
    Forsyth, D.: Group dynamics, 5th edn., vol. 40, p. 9823 (2009)Google Scholar
  5. 5.
    Reicher, S.: The Psychology of Crowd Dynamics. Psychol. Soc. 44, 113–128 (2012)Google Scholar
  6. 6.
    Le Bon, G.: The Crowd. Transaction Publishers (1994)Google Scholar
  7. 7.
    Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proc. 19th Int. Conf. World Wide Web, pp. 851–860 (2010)Google Scholar
  8. 8.
    De Longueville, B., Smith, R., Luraschi, G.: OMG, from here, I can see the flames!: a use case of mining location based social networks to acquire spatio-temporal data on forest fires. In: Proc. 2009 Int. Work. Locat. Based Soc. Networks, pp. 73–80 (2009)Google Scholar
  9. 9.
    Kumar, S., Zafarani, R., Liu, H.: Understanding User Migration Patterns in Social Media. In: AAAI 2011 (2011)Google Scholar
  10. 10.
    Lee, R., Sumiya, K.: Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection. In: Proc. 2nd ACM SIGSPATIAL Int. Work. LBSNs, pp. 1–10 (2010)Google Scholar
  11. 11.
    Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD 1996, pp. 226–231 (1996)Google Scholar
  12. 12.
    Beniger, J.R., Barnett, V., Lewis, T.: Outliers in Statistical Data. Contemp. Sociol. 9, 560 (1980)CrossRefGoogle Scholar
  13. 13.
    White, D.J., Chang, H.G., Benach, J.L., et al.: The geographic spread and temporal increase of the Lyme disease epidemic. JAMA 266, 1230–1236 (1991)CrossRefGoogle Scholar
  14. 14.
    Ankerst, M., Breunig, M.M.M., Kriegel, H.H., Sander, J.: Optics: Ordering points to identify the clustering structure. ACM SIGMOD Rec., 49–60 (1999)Google Scholar
  15. 15.
    Hinneburg, A., Gabriel, H.H.: Denclue 2.0: Fast clustering based on kernel density estimation. In: Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 70–80. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  16. 16.
    Zhou, H., Wang, P., Li, H.: Research on Adaptive Parameters Determination in DBSCAN Algorithm. J. Inf. Comput. Sci. 9, 1967–1973 (2012)Google Scholar
  17. 17.
    Morstatter, F., Pfeffer, J., Liu, H., Carley, K.: Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose. In: ICWSM (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Mohamed Ben Kalifa
    • 1
  • Rebeca P. Díaz Redondo
    • 1
  • Ana Fernández Vilas
    • 1
  • Rafael López Serrano
    • 1
  • Sandra Servia Rodríguez
    • 1
  1. 1.Information & Computing Lab. AtlantTIC Research Center, School of Telecommunications EngineeringUniversity of VigoSpain

Personalised recommendations