Skip to main content

Event Detection in Location-Based Social Networks

  • Chapter
  • First Online:

Part of the book series: Studies in Big Data ((SBD,volume 24))

Abstract

With the advent of social networks and the rise of mobile technologies, users have become ubiquitous sensors capable of monitoring various real-world events in a crowd-sourced manner. Location-based social networks have proven to be faster than traditional media channels in reporting and geo-locating breaking news, i.e. Osama Bin Laden’s death was first confirmed on Twitter even before the announcement from the communication department at the White House. However, the deluge of user-generated data on these networks requires intelligent systems capable of identifying and characterizing such events in a comprehensive manner. The data mining community coined the term, event detection , to refer to the task of uncovering emerging patterns in data streams . Nonetheless, most data mining techniques do not reproduce the underlying data generation process, hampering to self-adapt in fast-changing scenarios. Because of this, we propose a probabilistic machine learning approach to event detection which explicitly models the data generation process and enables reasoning about the discovered events. With the aim to set forth the differences between both approaches, we present two techniques for the problem of event detection in Twitter : a data mining technique called Tweet-SCAN and a machine learning technique called Warble. We assess and compare both techniques in a dataset of tweets geo-located in the city of Barcelona during its annual festivities. Last but not least, we present the algorithmic changes and data processing frameworks to scale up the proposed techniques to big data workloads.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://blog.twitter.com/2013/new-tweets-per-second-record-and-how.

  2. 2.

    https://dev.twitter.com/rest/public.

  3. 3.

    http://dev.twitter.com/streaming/overview.

  4. 4.

    Dataset published in https://github.com/jcapde87/Twitter-DS.

  5. 5.

    http://lameva.barcelona.cat/merce/en/.

  6. 6.

    https://support.twitter.com/articles/78525.

  7. 7.

    Although we have tested several different MinPts values, \(MinPts=10\) outperforms all others given that labeled events had at least 10 tweets.

References

  1. Agarwal A, Duchi JC (2011) Distributed delayed stochastic optimization. In: Advances in Neural Information Processing Systems, pp 873–881

    Google Scholar 

  2. Akyildiz IF, Su W, Sankarasubramaniam Y, Cayirci E (2002) A survey on sensor networks. IEEE Communications Magazine 40(8):102–114, doi:10.1109/MCOM.2002.1024422

    Article  Google Scholar 

  3. Atefeh F, Khreich W (2015) A survey of techniques for event detection in twitter. Computational Intelligence 31(1):132–164

    Article  MathSciNet  Google Scholar 

  4. Banfield JD, Raftery AE (1993) Model-based gaussian and non-gaussian clustering. Biometrics pp 803–821

    Google Scholar 

  5. Bishop CM (2013) Model-based machine learning. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371(1984):20120, 222

    Google Scholar 

  6. Blei DM (2012) Probabilistic topic models. Communications of the ACM 55(4):77–84

    Article  Google Scholar 

  7. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. the Journal of machine Learning research 3:993–1022

    MATH  Google Scholar 

  8. Blei DM, Jordan MI, et al (2006) Variational inference for dirichlet process mixtures. Bayesian analysis 1(1):121–144

    Article  MathSciNet  MATH  Google Scholar 

  9. Capdevila J, Cerquides J, Nin J, Torres J (2016a) Tweet-scan: An event discovery technique for geo-located tweets. Pattern Recognition Letters pp –

    Google Scholar 

  10. Capdevila J, Cerquides J, Torres J (2016b) Recognizing warblers: a probabilistic model for event detection in twitter, iCML Anomaly Detection Workshop

    Google Scholar 

  11. Capdevila J, Cerquides J, Torres J (2016c) Variational forms and updates for the Warble model. Tech. rep., https://www.dropbox.com/s/0qyrkivpsxxv55v/report.pdf?dl=0

  12. Capdevila J, Pericacho G, Torres J, Cerquides J (2016d) Scaling dbscan-like algorithms for event detection systems in twitter. In: Proceedings of 16th International Conference, ICA3PP, Granada, Spain, December 14–16, 2016, Springer, vol 10048

    Google Scholar 

  13. Cea D, Nin J, Tous R, Torres J, Ayguadé E (2014) Towards the cloudification of the social networks analytics. In: Modeling Decisions for Artificial Intelligence, Springer, pp 192–203

    Google Scholar 

  14. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: A survey. ACM Comput Surv 41(3):15:1–15:58, doi:10.1145/1541880.1541882, URL http://doi.acm.org/10.1145/1541880.1541882

  15. Cordova I, Moh TS (2015) Dbscan on resilient distributed datasets. In: High Performance Computing Simulation (HPCS), 2015 International Conference on, pp 531–540, doi:10.1109/HPCSim.2015.7237086

  16. Endres DM, Schindelin JE (2003) A new metric for probability distributions. IEEE Transactions on Information theory

    Google Scholar 

  17. Escobar MD, West M (1995) Bayesian density estimation and inference using mixtures. Journal of the american statistical association 90(430):577–588

    Article  MathSciNet  MATH  Google Scholar 

  18. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, vol 96, pp 226–231

    Google Scholar 

  19. Fox CW, Roberts SJ (2012) A tutorial on variational Bayesian inference. Artificial Intelligence Review 38(2):85–95, doi:10.1007/s10462-011-9236-8

    Article  Google Scholar 

  20. Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. Journal of the American statistical Association 97(458):611–631

    Article  MathSciNet  MATH  Google Scholar 

  21. He Y, Tan H, Luo W, Feng S, Fan J (2014a) Mr-dbscan: a scalable mapreduce-based dbscan algorithm for heavily skewed data. Frontiers of Computer Science 8(1):83–99, doi:10.1007/s11704-013-3158-3, URL http://dx.doi.org/10.1007/s11704-013-3158-3

  22. He Y, Tan H, Luo W, Feng S, Fan J (2014b) Mr-dbscan: a scalable mapreduce-based dbscan algorithm for heavily skewed data. Frontiers of Computer Science 8(1):83–99, doi:10.1007/s11704-013-3158-3, URL http://dx.doi.org/10.1007/s11704-013-3158-3

  23. Hoffman M, Bach FR, Blei DM (2010) Online learning for latent dirichlet allocation. In: advances in neural information processing systems, pp 856–864

    Google Scholar 

  24. Hoffman MD, Blei DM, Wang C, Paisley J (2013) Stochastic variational inference. The Journal of Machine Learning Research 14(1):1303–1347

    MathSciNet  MATH  Google Scholar 

  25. Hong L, Davison BD (2010) Empirical study of topic modeling in twitter. In: Proceedings of the First Workshop on Social Media Analytics, ACM, pp 80–88

    Google Scholar 

  26. Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Machine learning 37(2):183–233

    Article  MATH  Google Scholar 

  27. Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT press

    Google Scholar 

  28. Li L, Goodchild MF, Xu B (2013) Spatial, temporal, and socioeconomic patterns in the use of twitter and flickr. Cartography and Geographic Information Science 40(2):61–77

    Article  Google Scholar 

  29. McInerney J, Blei DM (2014) Discovering newsworthy tweets with a geographical topic model. NewsKDD: Data Science for News Publishing workshop Workshop in conjunction with KDD2014 the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

    Google Scholar 

  30. McMinn AJ, Moshfeghi Y, Jose JM (2013) Building a large-scale corpus for evaluating event detection on twitter. In: Proceedings of the 22nd ACM international conference on Information & Knowledge Management, ACM, pp 409–418

    Google Scholar 

  31. Murphy KP (2012) Machine learning: a probabilistic perspective. MIT press

    Google Scholar 

  32. Newman D, Smyth P, Welling M, Asuncion AU (2007) Distributed inference for latent dirichlet allocation. In: Advances in neural information processing systems, pp 1081–1088

    Google Scholar 

  33. Newman N (2011) Mainstream media and the distribution of news in the age of social discovery. Reuters Institute for the Study of Journalism, University of Oxford

    Google Scholar 

  34. Panagiotou N, Katakis I, Gunopulos D (2016) Detecting events in online social networks: Definitions, trends and challenges. Solving Large Scale Learning Tasks: Challenges and Algorithms

    Google Scholar 

  35. Quan X, Kit C, Ge Y, Pan SJ (2015) Short and sparse text topic modeling via self-aggregation. In: Proceedings of the 24th International Conference on Artificial Intelligence, AAAI Press, pp 2270–2276

    Google Scholar 

  36. Rajasegarar S, Leckie C, Palaniswami M (2008) Anomaly detection in wireless sensor networks. IEEE Wireless Communications 15(4):34–40, doi:10.1109/MWC.2008.4599219

    Article  Google Scholar 

  37. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manage 24(5):513–523

    Article  Google Scholar 

  38. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Communications of the ACM 18(11):613–620

    Article  MATH  Google Scholar 

  39. Sander J, Ester M, Kriegel HP, Xu X (1998) Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Mining and Knowledge Discovery 2(2):169–194

    Article  Google Scholar 

  40. Stelter B, Cohen N (2008) Citizen journalists provided glimpses of mumbai attacks. URL http://www.nytimes.com/2008/11/30/world/asia/30twitter.html

  41. Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical dirichlet processes. Journal of the american statistical association 101(476)

    Google Scholar 

  42. Van Rijsbergen CJ (1974) Foundation of evaluation. Journal of Documentation 30(4):365–373

    Article  Google Scholar 

  43. Wong WK, Neill DB (2009) Tutorial on event detection. In: KDD

    Google Scholar 

  44. Yang Y, Pierce T, Carbonell J (1998) A study of retrospective and on-line event detection. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 28–36

    Google Scholar 

  45. Yuan Q, Cong G, Ma Z, Sun A, Thalmann NM (2013) Who, where, when and what: discover spatio-temporal topics for twitter users. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 605–613

    Google Scholar 

  46. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, vol 10, p 10

    Google Scholar 

  47. Zhang Y, Jordan MI (2015) Splash: User-friendly programming interface for parallelizing stochastic algorithms. arXiv preprint arXiv:150607552

  48. Zheng Y (2012) Tutorial on location-based social networks. In: Proceedings of the 21st international conference on World wide web, WWW, ACM

    Google Scholar 

Download references

Acknowledgements

This work is partially supported by Obra Social “la Caixa”, by the Spanish Ministry of Science and Innovation under contract (TIN2015-65316), by the Severo Ochoa Program (SEV2015-0493), by SGR programs of the Catalan Government (2014-SGR-1051, 2014-SGR-118), Collectiveware (TIN2015-66863-C2-1-R) and BSC/UPC NVIDIA GPU Center of Excellence. We would also like to thank the reviewers for their constructive feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joan Capdevila .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Capdevila, J., Cerquides, J., Torres, J. (2017). Event Detection in Location-Based Social Networks. In: Pedrycz, W., Chen, SM. (eds) Data Science and Big Data: An Environment of Computational Intelligence. Studies in Big Data, vol 24. Springer, Cham. https://doi.org/10.1007/978-3-319-53474-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-53474-9_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-53473-2

  • Online ISBN: 978-3-319-53474-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics