Abstract
The increasing use of mobile social networks has lately transformed news media. Real-world events are nowadays reported in social networks much faster than in traditional channels. As a result, the autonomous detection of events from networks like Twitter has gained lot of interest in both research and media groups. DBSCAN-like algorithms constitute a well-known clustering approach to retrospective event detection. However, scaling such algorithms to geographically large regions and temporarily long periods present two major shortcomings. First, detecting real-world events from the vast amount of tweets cannot be performed anymore in a single machine. Second, the tweeting activity varies a lot within these broad space-time regions limiting the use of global parameters. Against this background, we propose to scale DBSCAN-like event detection techniques by parallelizing and distributing them through a novel density-aware MapReduce scheme. The proposed scheme partitions tweet data as per its spatial and temporal features and tailors local DBSCAN parameters to local tweet densities. We implement the scheme in Apache Spark and evaluate its performance in a dataset composed of geo-located tweets in the Iberian peninsula during the course of several football matches. The results pointed out to the benefits of our proposal against other state-of-the-art techniques in terms of speed-up and detection accuracy.
J. Capdevila—Obra Social “la Caixa”.
J. Torres—Spanish Ministry of Economy and Competitivity under contract TIN2015-65316 and BSC-CNS Severo Ochoa programs (SEV2015-0493, SEV-2011-00067).
J. Cerquides—The SGR program (2014 SGR 118) of the Catalan Governement and Collectiveware (TIN2015-66863-C2-1-R).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wong, W., Neill, D.: Tutorial on event detection. In: Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) (2009)
Kulldorff, M., Athas, W., Feurer, E., Miller, B., Key, C.: Am. J. Publ. Health 88(9), 1377–1380 (1998)
Yu, Z.: Tutorial on location-based social networks. In: Proceedings of the 21st International Conference on World wide web (WWW) (2012)
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web (WWW) (2010)
Lee, R., Sumiya, K.: Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection. In: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Location Based Social Networks (LBSN) (2010)
Newman, N.: Mainstream media and the distribution of news in the age of social discovery. Reuters Institute for the Study of Journalism, University of Oxford (2011)
Stelter, B., Cohen, N.: Citizen Journalists Provided Glimpses of Mumbai Attacks. (2008). http://www.nytimes.com/2008/11/30/world/asia/30twitter.html
Atefeh, F., Khreich, W.: A survey of techniques for event detection in Twitter. Comput. Intell. 1, 132–164 (2015)
Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: real-world event identification on Twitter. In: Proceedings of the Fifth International Conference on Weblogs and Social Media (2011)
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, vol. 96(34) (1996)
Gomide, J., Veloso, A., Meira, W., Almeida, V., Benevenuto, F., Ferraz, F., Teixeira, M.: Dengue surveillance based on a computational model of spatio-temporal locality of Twitter. In: Proceedings of the 3rd International Web Science Conference (2011)
Tamura, K., Ichimura, T.: Density-based spatiotemporal clustering algorithm for extracting bursty areas from georeferenced documents. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC) (2013)
Birant, D., Kut, A.: ST-DBSCAN: an algorithm for clustering spatial-temporal data. Data and Knowledge Engineering (2007)
Singh, S.: Spatial temporal analysis of social media data. Master Thesis at Technische Universität München (2015)
Capdevila, J., Cerquides, J., Nin, J., Torres, J.: Tweet-SCAN: an event discovery technique for geo-located tweets. In: Artificial Intelligence Research and Development - Proceedings of the 18th International Conference of the Catalan Association for Artificial Intelligence (2015)
Capdevila, J., Cerquides, J., Nin, J., Torres, J.: Tweet-SCAN: An event discovery technique for geo-located tweets. Pattern Recognition Letters. Available online 25 August (2016)
Blei, D.: Probabilistic topic models. Commun. ACM. 55(4), 77–84 (2012)
Li, L., Goodchild, M., Xu, B.: Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr. Cartography Geogr. Inf. Sci. 40, 261–277 (2013)
Zaharia, M., Chowdhury, M., Franklin, M., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (2010)
Sander, J., Ester, M., Kriegel, H., Xu, X.: Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Mining Knowl. Discov. 2(2), 169–194 (1998)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)
He, Y., Tan, H., Luo, W., Feng, S., Fan, J.: MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Front. Comput. Sci. 8, 83–99 (2014)
Cordova, I., Moh, T.S.: DBSCAN on resilient distributed datasets. In: International Conference on High Performance Computing Simulation (HPCS), pp. 531–540 (2015)
Meagher, D.: Octree Encoding: A New Technique for the Representation, Manipulation and Display of Arbitrary 3-D Objects by Computer. Electrical and Systems Engineering Department Rensseiaer Polytechnic Institute Image Processing Laboratory (1980)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retrieval 12(4), 461–486 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Capdevila, J., Pericacho, G., Torres, J., Cerquides, J. (2016). Scaling DBSCAN-like Algorithms for Event Detection Systems in Twitter. In: Carretero, J., Garcia-Blas, J., Ko, R., Mueller, P., Nakano, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2016. Lecture Notes in Computer Science(), vol 10048. Springer, Cham. https://doi.org/10.1007/978-3-319-49583-5_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-49583-5_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49582-8
Online ISBN: 978-3-319-49583-5
eBook Packages: Computer ScienceComputer Science (R0)