Mining Social Networks to Detect Traffic Incidents

Abstract

Social networks are usually used by citizens to report or complain about traffic incidents that affect their daily mobility. Automatically finding traffic-related reports and extracting useful information from them is not a trivial task, due to the informal language used in social networks, to the lack of geographic metadata, and to the large amount of non traffic-related publications. In this article, we address this problem by combining Machine Learning and Natural Language Processing techniques. Our approach (a) filters publications that report traffic incidents in social networks, (b) extracts geographic information from the textual content of the publications, and (c) provides a broadcasting service that clusters all the reports of the same incident. We compared the performance of our approach with state of the art approaches and with a popular traffic-specific social network, obtaining promising results.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. 1.

    https://www.waze.com/

  2. 2.

    https://www.twitter.com/

  3. 3.

    We decided to keep two repeated letters at maximum because some words and names have repetitions of letters (e.g. ‘Saavedra’ and ‘calle’ in Spanish or ‘street’ and ‘traffic’ in English).

  4. 4.

    Stopwords are irrelevant words such as articles or prepositions that have a high probability of occurrence in the publications, regardless of the topic discussed.

  5. 5.

    https://developers.google.com/maps/documentation/geocoding/

  6. 6.

    https://fasttext.cc/docs/en/pretrained-vectors.html

  7. 7.

    http://data.buenosaires.gob.ar/dataset/calles

  8. 8.

    http://www.rae.es/diccionario-panhispanico-de-dudas/terminos-linguisticos

  9. 9.

    http://twitter4j.org/en/index.html

  10. 10.

    https://developer.twitter.com/en/docs/tweets/filter-realtime/guides/basic-stream-parameters

  11. 11.

    https://www.cs.waikato.ac.nz/ml/weka/

  12. 12.

    https://www.postgresql.org/

  13. 13.

    https://docs.oracle.com/javase/7/docs/api/index.html

  14. 14.

    https://developers.google.com/maps/documentation/geocoding/start

  15. 15.

    https://twitter.com/Manwebsas

  16. 16.

    http://si.isistan.unicen.edu.ar/Manwe/research-demo/

  17. 17.

    https://eclipse-ee4j.github.io/jersey/

  18. 18.

    http://tomcat.apache.org/

References

  1. Albuquerque, F. C., Casanova, M. A., Lopes, H., Redlich, L. R., de Macedo, J. A. F., Lemos, M., de Carvalho, M. T. M., & Renso, C. (2016). A methodology for traffic-related twitter messages interpretation. Computers in Industry, 78(May), 57–69.

    Article  Google Scholar 

  2. Bothorel, Cécile, Neal Lathia, Romain Picot-Clemente, and Anastasios Noulas. 2018. “Location recommendation with social media data.” In Lecture Notes in Computer Science, 624–653.

  3. Carvalho. (2010). Real-Time Sensing of Traffic Information in Twitter Messages. In Edited by Rosaldo Rossetti. Master: Universidade do Porto.

    Google Scholar 

  4. D’Andrea, E., Ducange, P., Lazzerini, B., & Marcelloni, F. (2015). Real-time detection of traffic from twitter stream analysis. IEEE Transactions on Intelligent Transportation Systems, 16(4), 2269–2283.

    Article  Google Scholar 

  5. Dabiri, Sina, and Kevin Heaslip. 2019. “Developing a twitter-based traffic event detection model using deep learning architectures.” Expert Systems with Applications.https://doi.org/10.1016/j.eswa.2018.10.017.

  6. Endarnoto, Sri Krisna, Sonny Pradipta, Anto Satriyo Nugroho, and James Purnama. 2011. “Traffic Condition Information Extraction & Visualization from social media twitter for android Mobile application.” In Proceedings of the 2011 International Conference on Electrical Engineering and Informatics, 1–4.

  7. Ertekin, Seyda, Jian Huang, Leon Bottou, and Lee Giles. 2007. “Learning on the Border: Active Learning in Imbalanced Data Classification.” In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, 127–36. CIKM ‘07. New York, NY, USA: ACM.

  8. Gu, Y., Qian, Z., & Chen, F. (2016). From twitter to detector: Real-time traffic incident detection using social media data. Transportation Research Part C: Emerging Technologies, 67(June), 321–342.

    Article  Google Scholar 

  9. Kamran, Shoaib, and Olivier Haas. 2007. “A Multilevel Traffic Incidents Detection Approach: Identifying Traffic Patterns and Vehicle Behaviours Using Real-Time GPS Data.” In Intelligent Vehicles Symposium, 2007 IEEE, 912–17. IEEE.

  10. Köknar-Tezel, Suzan, and Longin Jan Latecki. 2009. “Improving SVM classification on imbalanced data sets in distance spaces.” In 2009 Ninth IEEE International Conference on Data Mining, 259–267.

  11. Kozareva, Zornitsa, Boyan Bonev, and Andres Montoyo. 2005. “Self-Training and Co-Training Applied to Spanish Named Entity Recognition.” In Lecture Notes in Computer Science, 770–79.

  12. Kuflik, T., Minkov, E., Nocera, S., Grant-Muller, S., Gal-Tzur, A., & Shoor, I. (2017). Automating a framework to extract and analyse transport related social media content: The potential and the challenges. Transportation Research Part C: Emerging Technologies, 77(April), 275–291.

    Article  Google Scholar 

  13. Levenshtein, V. 1996. “Lower Bounds on Cross-Correlation of Codes.” In Proceedings of ISSSTA’95 International Symposium on Spread Spectrum Techniques and Applications, 2:657–61 vol.2.

  14. Mandal, A. K., & Sen, R. (2014). Supervised learning methods for Bangla web document categorization. International Journal of Artificial Intelligence & Applications, 5(5), 93–105.

    Article  Google Scholar 

  15. Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press.

  16. Manning, C., M. Surdeanu, J. Bauer, and J. Finkel. 2014. “The Stanford CoreNLP Natural Language Processing Toolkit.” Proceedings of 52nd.http://www.aclweb.org/anthology/P14-5010.

  17. Pereira, João, Arian Pasquali, Pedro Saleiro, and Rosaldo Rossetti. 2017. “Transportation in social media: An automatic classifier for travel-related tweets.” Progress in Artificial Intelligencehttps://doi.org/10.1007/978-3-319-65340-2_30.

  18. Ritter, Alan, Sam Clark, Mausam, and Oren Etzioni. 2011. “Named Entity Recognition in Tweets: An Experimental Study.” In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1524–34. EMNLP ‘11. Stroudsburg, PA, USA: Association for Computational Linguistics.

  19. Schneider, Karl-Michael. 2005. “Techniques for improving the performance of naive Bayes for text classification.” In Lecture Notes in Computer Science, 682–93.

  20. Schulz, Axel, Petar Ristoski, and Heiko Paulheim. 2013. “I See a Car Crash: Real-Time Detection of Small Scale Incidents in Microblogs: 25th International Conference, CAiSE 2013, Valencia, Spain, June 17–21, 2013. Proceedings.” In Advanced Information Systems Engineering, edited by Camille Salinesi, Moira C. Norrie, and Óscar Pastor, 7908:22–33. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg.

  21. Sumalee, Agachai, and Hung Wai Ho. 2018. “Smarter and more connected: Future intelligent transportation system.” IATSS Research. https://doi.org/10.1016/j.iatssr.2018.05.005.

  22. Wanichayapong, Napong, Wasawat Pruthipunyaskul, Wasan Pattara-Atikom, and Pimwadee Chaovalit. 2011. “Social-Based Traffic Information Extraction and Classification.” In 2011 11th International Conference on ITS Telecommunications, 107–12.

  23. Yang, Zhenglu, Jianjun Yu, and Masaru Kitsuregawa. 2010. “Fast algorithms for top-K approximate string matching.” In AAAI. Vol. 1463.http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/viewFile/1939/2234.

  24. Zhou, Xiaokang, and Qun Jin. 2011. “Dynamical User Networking and Profiling Based on Activity Streams for Enhanced Social Learning.” Advances in Web-Based Learning - ICWL 2011.https://doi.org/10.1007/978-3-642-25813-8_23.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Sebastián Vallejos.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Vallejos, S., Alonso, D.G., Caimmi, B. et al. Mining Social Networks to Detect Traffic Incidents. Inf Syst Front 23, 115–134 (2021). https://doi.org/10.1007/s10796-020-09994-3

Download citation

Keywords

  • Social networks
  • Natural language processing
  • Machine learning
  • Traffic incident detection