Abstract
The rapid development and adoption of Internet-of-Things (IoT) and sensors such as Global Positioning System (GPS) in our daily life allows gathering of wealth of information. This tremendous amount of sensor data can be clustered to infer information about the whereabouts of users. Researchers have proposed various algorithms for analyzing spatio-temporal data and clustering such datasets. However, a big challenge is to discover clusters with large density variation and to solve this problem most of the existing clustering algorithms manually set the input parameters. In the Density-based spatial clustering (DBSCAN) clustering algorithm, the epsilon and minimum point parameters have to be set manually for further computation. In this paper, we propose an improved DBSCAN algorithm to adaptively set minimum point parameter for dynamic length of data, based on merge sort and silhouette analysis. Validation of the proposed algorithm has been done on GeoLife GPS dataset. Experimental results show that the proposed algorithm is competitive with state-of-the-art methods for identifying users’ significant locations and whereabouts with the density variation. The paper also aims to highlight the issues pertaining to the invasion of user privacy by the potential application of the clustering algorithm on the aforementioned data.
Similar content being viewed by others
References
Buczkowski, A., Location-Based Services - Concept – Geoawesomeness. http://geoawesomeness.com/knowledge-base/location-based-services/location-based-services-conceptAccessed on 6 July, 2018.
Zheng, Y., Xie, X., & Ma, W. Y. (2010). Geolife: A collaborative social networking service among user, location and trajectory. IEEE Data Eng. Bull., 33(2), 32–39.
Hasnat, M. M., & Hasan, S. (2018). Identifying tourists and analyzing spatial patterns of their destinations from location-based social media data. Transportation Research Part C: Emerging Technologies, 96, 38–54.
Krumm, J., Rouhana, D., & Chang, M. W. (2015, March). Placer++: Semantic place labels beyond the visit. In 2015 IEEE International Conference on Pervasive Computing and Communications (PerCom) (pp. 11–19). IEEE.
Loroy, J., Lee, J., Verleysen, M., HOSTYN, T., GIARD, J., & CAPPART, Q.(2016). Detecting user’s habits using GPS data.
Hartigan, J. A., & Wong, M. A. (1979). AK-means clustering algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28(1), 100–108.
Moore,A. (2001) K-means and Hierarchical Clustering-Tutorial Slides. Carnegie Mellon University, htttp.www-2. cs. cmu. edu/~ awm/tutorials/kmeans.html, accessed on 26 April 2019.
Kriegel, H. P., Kröger, P., Sander, J., & Zimek, A. (2011). Density-based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3), 231–240.
Borah, B., & Bhattacharyya, D. K. (2004, January). An improved sampling-based DBSCAN for large spatial databases. In International Conference on Intelligent Sensing and Information Processing, 2004. Proceedings of (pp. 92–96). IEEE.
Tsai, C. F., & Liu, C. W. (2006, June). KIDBSCAN: a new efficient data clustering algorithm. In International Conference on Artificial Intelligence and Soft Computing (pp. 702–711). Springer, Berlin, Heidelberg.
Chen, W., Ji, M., & Wang, J. (2014). T-DBSCAN: A spatiotemporal density clustering for GPS trajectory segmentation. International Journal of Online Engineering (iJOE), 10(6), 19–24.
Tsai, C. F., & Sung, C. Y. (2010). EIDBSCAN: An Extended Improving DBSCAN algorithm with sampling techniques. International Journal of Business Intelligence and Data Mining, 5(1), 94–111.
Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. SIGMOD Record (ACM Special Interest Group on Management of Data). https://doi.org/10.1145/304181.304187
Hotait, H., Chiementin, X., Mouchaweh, M. S., & Rasolofondraibe, L. (2021). Monitoring of Ball Bearing Based on Improved Real-Time OPTICS Clustering. Journal of Signal Processing Systems, 93(2), 221–237.
Campello, R. J. G. B., Moulavi, D., Zimek, A., & Sander, J. (2015). Hierarchical density estimates for data clustering visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data, 10(1), 1–51. https://doi.org/10.1145/2733381
Ghaemi, Z., & Farnaghi, M. (2019). A varied density-based clustering approach for event detection from heterogeneous twitter data. ISPRS International Journal of Geo-Information, 8(2), 82.
Wang, T., Ren, C., Luo, Y., & Tian, J. (2019). NS-DBSCAN: A density-based clustering algorithm in network space. ISPRS International Journal of Geo-Information, 8(5), 218.
Luong, C., Do, S., & Hoang, T. (2015). A method for detecting significant places from GPS trajectory data. Journal of Advances in Information Technology, 6(1), 44–49.
Huang, J., Xu, R., Cheng, D., Zhang, S., & Shang, K. (2019). A novel hybrid clustering algorithm based on minimum spanning tree of natural core points. IEEE Access, 7, 43707–43720.
Khan, S. A. (2019, January). Clustering Algorithm on Spatiotemporal Trajectories. In 2019 2nd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET) (pp. 1–7). IEEE..
Berkhin, P. (2006). A survey of clustering data mining techniques. In J. Kogan, C. Nicholas, & M. Teboulle (Eds.), Grouping multidimensional data (pp. 25–71). Berlin, Heidelberg: Springer.
Cruz, M. O., Macedo, H., & Guimaraes, A. (2015, November). Grouping similar trajectories for carpooling purposes. In 2015 Brazilian Conference on Intelligent Systems (BRACIS) (pp. 234–239). IEEE.
Li, L., Jiang, R., He, Z., Chen, X. M., & Zhou, X. (2020). Trajectory data-based traffic flow studies: A revisit. Transportation Research Part C: Emerging Technologies, 114, 225–240.
Maggiore, G., Santos, C., & Plaat, A. (2014). Smarter smartphones: Understanding and predicting user habits from GPS sensor data. Procedia Computer Science, 34, 297–304.
Rai, P., & Singh, S. (2010). A survey of clustering techniques. International Journal of Computer Applications, 7(12), 1–5.
Kameshwaran, K., & Malarvizhi, K. (2014). Survey on clustering techniques in data mining. International Journal of Computer Science and Information Technologies, 5(2), 2272–2276.
Jiang, L., Luo, J., Zhang, C., Tian, L., Liu, Q., Chen, G., & Tian, Y. (2020). Study on the level and type identification of rural development in Wuhan city’s new urban districts. ISPRS International Journal of Geo-Information, 9(3), 172.
Gan, G., Ma, C., & Wu, J. (2007). Data clustering: theory, algorithms, and applications (Vol. 20). Siam.
Hartigan, J. A. (1973). Clustering. Annual review of biophysics and bioengineering, 2(1), 81–102.
Aggarwal, C. C. (Ed.). (2014). Data classification: algorithms and applications. London: CRC Press.
Äyrämö, S., & Kärkkäinen, T. (2006). Introduction to partitioning-based clustering methods with a robust example. Reports of the Department of Mathematical Information Technology. Series C, Software engineering and computational intelligence, (1/2006).
Cutting, D. R., Karger, D. R., Pedersen, J. O., & Tukey, J. W. (1992, June). Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 318–329). ACM.
Sajidha, S. A., Desikan, K., & Chodnekar, S. P. (2019). Initial seed selection for mixed data using modified K-means clustering algorithm. Arabian Journal for Science and Engineering, 45(4), 2685–2703.
Cohen-Addad, V., Kanade, V., Mallmann-Trenn, F., & Mathieu, C. (2019). Hierarchical clustering: Objective functions and algorithms. Journal of the ACM (JACM), 66(4), 26.
Sousa, L., & Gama, J. (2014). The application of hierarchical clustering algorithms for recognition using biometrics of the hand.
Moreno, R., Huáng, W., Younus, A., O’Mahony, M., & Hurley, N. J. (2017, September). Evaluation of Hierarchical Clustering via Markov Decision Processes for Efficient Navigation and Search. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 125–131). Springer, Cham.
Richards, M., Ghanem, M., Osmond, M., Guo, Y., & Hassard, J. (2006). Grid-based analysis of air pollution data. Ecological modelling, 194(1–3), 274–286.
Wu, B., & Wilamowski, B. M. (2016, October). An efficient grid-based clustering method by finding density peaks. In IECON 2016–42nd Annual Conference of the IEEE Industrial Electronics Society (pp. 837–842). IEEE.
Bouveyron, C., & Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: A review. Computational Statistics and Data Analysis, 71, 52–78.
Bay, S. D. (2001). Multivariate discretization for set mining. Knowledge and Information Systems, 3(4), 491–512.
Barbará, D., Li, Y., & Couto, J. (2002, November). COOLCAT: an entropy-based algorithm for categorical clustering. In Proceedings of the eleventh international conference on Information and knowledge management (pp. 582–589). ACM.
Kotsiantis, S., & Pintelas, P. (2004). Recent advances in clustering: A brief survey. WSEAS Transactions on Information Science and Applications, 1(1), 73–81.
Kent, B. P., Rinaldo, A., & Verstynen, T. (2013). Debacl: A python package for interactive density-based clustering. arXiv preprint.arXiv:1307.8136.
Jahirabadkar, S., & Kulkarni, P. (2014). Algorithm to determine ε-distance parameter in density based clustering. Expert systems with applications, 41(6), 2939–2946.
Guo, S., Li, X., Ching, W. K., Dan, R., Li, W. K., & Zhang, Z. (2018). GPS trajectory data segmentation based on probabilistic logic. International Journal of Approximate Reasoning, 103, 227–247.
Birant, D., & Kut, A. (2007). ST-DBSCAN: An algorithm for clustering spatial–temporal data. Data and Knowledge Engineering, 60(1), 208–221.
47] Hernández, D. C., Hoang, V. D., Filonenko, A., & Jo, K. H. (2014, June). Vision-based heading angle estimation for an autonomous mobile robots navigation. In 2014 IEEE 23rd International Symposium on Industrial Electronics (ISIE) (pp. 1967–1972). IEEE.
Wang, F., Franco-Penya, H. H., Kelleher, J. D., Pugh, J., & Ross, R. (2017, July). An Analysis of the Application of Simplified Silhouette to the Evaluation of k-means Clustering Validity. In International Conference on Machine Learning and Data Mining in Pattern Recognition (pp. 291–305). Springer, Cham.
Hruschka, E. R., & Covoes, T. F. (2005, November). Feature selection for cluster analysis: an approach based on the simplified Silhouette criterion. In International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06) (Vol. 1, pp. 32–38). IEEE.
MacQueen, J. (1967, June). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 14, pp. 281–297).
Jain, A. K., & Dubes, R. C. (1988). 1988. Algorithms for clustering data. Englewood Cliffs: Prentice Hall.
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (Kdd) (Vol. 96, No. 34, pp. 226–231).
Kanagala, H. K. & Jaya Rama Krishnaiah, V. V. (2016). A comparative study of K-Means, DBSCAN and OPTICS. In 2016 International Conference on Computer Communication and Informatics (ICCCI), Jan., pp. 1–6, doi: https://doi.org/10.1109/ICCCI.2016.7479923.
Campello RJ, Moulavi D, Sander J. (2013). Density-based clustering based on hierarchical density estimates. In Pacific-Asia conference on knowledge discovery and data mining, (pp. 160–172). Springer, Berlin, Heidelberg.
Veerhoek, L. (2020). Clustering Satellite Data to Define Eutrophication Monitoring Zones Based on Chlorophyll-a Concentration. Bachelor’s Thesis, Delft University of Technology, Delft, The Netherlands.
Rosalina, E.; Salim, F.D.; Sellis, T. (2017). Automated density-based clustering of spatial urban data for interactive data exploration. In Proceedings of the 2017 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Atlanta, GA, USA, pp. 295–300.
Lin, C.-H., Hsu, K.-C., Johnson, K. R., Luby, M., & Fann, Y. C. (2019). Applying density-based outlier identifications using multiple datasets for validation of stroke clinical outcomes. International Journal of Medical Informatics, 132, 103988.
Zheng, Y., Zhang, L., Xie, X., & Ma, W. Y. (2009, April). Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of the 18th international conference on World wide web (pp. 791–800). ACM.
Alam, C. N., Manaf, K., Atmadja, A. R., & Aurum, D. K. (2016, April). Implementation of haversine formula for counting event visitor in the radius based on Android application. In 2016 4th International Conference on Cyber and IT Service Management (pp. 1–6). IEEE.
Acknowledgements
The authors are grateful to the Ministry of Human Resource Development (MHRD) of Government of India for supporting this research under Design Innovation Center (MHRD-DIC) under the subtheme “TRAFFIC SENSING AND IT”. The authors also would like to thank Akanksha Chuchra, (akanksha.chuchra96@gmail.com), and Dravita Singla (dravitasingla1995@gmail.com); Intern, DIC, UIET, for their contribution to the project.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Angmo, R., Aggarwal, N., Mangat, V. et al. An Improved Clustering Approach for Identifying Significant Locations from Spatio-temporal Data. Wireless Pers Commun 121, 985–1009 (2021). https://doi.org/10.1007/s11277-021-08668-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-021-08668-w