An Improved Clustering Approach for Identifying Significant Locations from Spatio-temporal Data

Angmo, Rigzin; Aggarwal, Naveen; Mangat, Veenu; Lal, Anurag; Kaur, Simarpreet

doi:10.1007/s11277-021-08668-w

An Improved Clustering Approach for Identifying Significant Locations from Spatio-temporal Data

Published: 28 June 2021

Volume 121, pages 985–1009, (2021)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

Rigzin Angmo ORCID: orcid.org/0000-0002-2382-8906¹,
Naveen Aggarwal¹,
Veenu Mangat²,
Anurag Lal³ &
…
Simarpreet Kaur⁴

351 Accesses
4 Citations
Explore all metrics

Abstract

The rapid development and adoption of Internet-of-Things (IoT) and sensors such as Global Positioning System (GPS) in our daily life allows gathering of wealth of information. This tremendous amount of sensor data can be clustered to infer information about the whereabouts of users. Researchers have proposed various algorithms for analyzing spatio-temporal data and clustering such datasets. However, a big challenge is to discover clusters with large density variation and to solve this problem most of the existing clustering algorithms manually set the input parameters. In the Density-based spatial clustering (DBSCAN) clustering algorithm, the epsilon and minimum point parameters have to be set manually for further computation. In this paper, we propose an improved DBSCAN algorithm to adaptively set minimum point parameter for dynamic length of data, based on merge sort and silhouette analysis. Validation of the proposed algorithm has been done on GeoLife GPS dataset. Experimental results show that the proposed algorithm is competitive with state-of-the-art methods for identifying users’ significant locations and whereabouts with the density variation. The paper also aims to highlight the issues pertaining to the invasion of user privacy by the potential application of the clustering algorithm on the aforementioned data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Practical Guide to an Open-Source Map-Matching Approach for Big GPS Data

Article Open access 04 August 2022

Spatial Data Management, Analysis, and Modeling in GIS: Principles and Applications

Geographic Information System: Principles and Applications

References

Buczkowski, A., Location-Based Services - Concept – Geoawesomeness. http://geoawesomeness.com/knowledge-base/location-based-services/location-based-services-conceptAccessed on 6 July, 2018.
Zheng, Y., Xie, X., & Ma, W. Y. (2010). Geolife: A collaborative social networking service among user, location and trajectory. IEEE Data Eng. Bull., 33(2), 32–39.
Google Scholar
Hasnat, M. M., & Hasan, S. (2018). Identifying tourists and analyzing spatial patterns of their destinations from location-based social media data. Transportation Research Part C: Emerging Technologies, 96, 38–54.
Article Google Scholar
Krumm, J., Rouhana, D., & Chang, M. W. (2015, March). Placer++: Semantic place labels beyond the visit. In 2015 IEEE International Conference on Pervasive Computing and Communications (PerCom) (pp. 11–19). IEEE.
Loroy, J., Lee, J., Verleysen, M., HOSTYN, T., GIARD, J., & CAPPART, Q.(2016). Detecting user’s habits using GPS data.
Hartigan, J. A., & Wong, M. A. (1979). AK-means clustering algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28(1), 100–108.
MATH Google Scholar
Moore,A. (2001) K-means and Hierarchical Clustering-Tutorial Slides. Carnegie Mellon University, htttp.www-2. cs. cmu. edu/~ awm/tutorials/kmeans.html, accessed on 26 April 2019.
Kriegel, H. P., Kröger, P., Sander, J., & Zimek, A. (2011). Density-based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3), 231–240.
Google Scholar
Borah, B., & Bhattacharyya, D. K. (2004, January). An improved sampling-based DBSCAN for large spatial databases. In International Conference on Intelligent Sensing and Information Processing, 2004. Proceedings of (pp. 92–96). IEEE.
Tsai, C. F., & Liu, C. W. (2006, June). KIDBSCAN: a new efficient data clustering algorithm. In International Conference on Artificial Intelligence and Soft Computing (pp. 702–711). Springer, Berlin, Heidelberg.
Chen, W., Ji, M., & Wang, J. (2014). T-DBSCAN: A spatiotemporal density clustering for GPS trajectory segmentation. International Journal of Online Engineering (iJOE), 10(6), 19–24.
Article Google Scholar
Tsai, C. F., & Sung, C. Y. (2010). EIDBSCAN: An Extended Improving DBSCAN algorithm with sampling techniques. International Journal of Business Intelligence and Data Mining, 5(1), 94–111.
Article Google Scholar
Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. SIGMOD Record (ACM Special Interest Group on Management of Data). https://doi.org/10.1145/304181.304187
Article Google Scholar
Hotait, H., Chiementin, X., Mouchaweh, M. S., & Rasolofondraibe, L. (2021). Monitoring of Ball Bearing Based on Improved Real-Time OPTICS Clustering. Journal of Signal Processing Systems, 93(2), 221–237.
Article Google Scholar
Campello, R. J. G. B., Moulavi, D., Zimek, A., & Sander, J. (2015). Hierarchical density estimates for data clustering visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data, 10(1), 1–51. https://doi.org/10.1145/2733381
Article Google Scholar
Ghaemi, Z., & Farnaghi, M. (2019). A varied density-based clustering approach for event detection from heterogeneous twitter data. ISPRS International Journal of Geo-Information, 8(2), 82.
Article Google Scholar
Wang, T., Ren, C., Luo, Y., & Tian, J. (2019). NS-DBSCAN: A density-based clustering algorithm in network space. ISPRS International Journal of Geo-Information, 8(5), 218.
Article Google Scholar
Luong, C., Do, S., & Hoang, T. (2015). A method for detecting significant places from GPS trajectory data. Journal of Advances in Information Technology, 6(1), 44–49.
Article Google Scholar
Huang, J., Xu, R., Cheng, D., Zhang, S., & Shang, K. (2019). A novel hybrid clustering algorithm based on minimum spanning tree of natural core points. IEEE Access, 7, 43707–43720.
Article Google Scholar
Khan, S. A. (2019, January). Clustering Algorithm on Spatiotemporal Trajectories. In 2019 2nd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET) (pp. 1–7). IEEE..
Berkhin, P. (2006). A survey of clustering data mining techniques. In J. Kogan, C. Nicholas, & M. Teboulle (Eds.), Grouping multidimensional data (pp. 25–71). Berlin, Heidelberg: Springer.
Chapter Google Scholar
Cruz, M. O., Macedo, H., & Guimaraes, A. (2015, November). Grouping similar trajectories for carpooling purposes. In 2015 Brazilian Conference on Intelligent Systems (BRACIS) (pp. 234–239). IEEE.
Li, L., Jiang, R., He, Z., Chen, X. M., & Zhou, X. (2020). Trajectory data-based traffic flow studies: A revisit. Transportation Research Part C: Emerging Technologies, 114, 225–240.
Article Google Scholar
Maggiore, G., Santos, C., & Plaat, A. (2014). Smarter smartphones: Understanding and predicting user habits from GPS sensor data. Procedia Computer Science, 34, 297–304.
Article Google Scholar
Rai, P., & Singh, S. (2010). A survey of clustering techniques. International Journal of Computer Applications, 7(12), 1–5.
Article Google Scholar
Kameshwaran, K., & Malarvizhi, K. (2014). Survey on clustering techniques in data mining. International Journal of Computer Science and Information Technologies, 5(2), 2272–2276.
Google Scholar
Jiang, L., Luo, J., Zhang, C., Tian, L., Liu, Q., Chen, G., & Tian, Y. (2020). Study on the level and type identification of rural development in Wuhan city’s new urban districts. ISPRS International Journal of Geo-Information, 9(3), 172.
Article Google Scholar
Gan, G., Ma, C., & Wu, J. (2007). Data clustering: theory, algorithms, and applications (Vol. 20). Siam.
Hartigan, J. A. (1973). Clustering. Annual review of biophysics and bioengineering, 2(1), 81–102.
Article Google Scholar
Aggarwal, C. C. (Ed.). (2014). Data classification: algorithms and applications. London: CRC Press.
Google Scholar
Äyrämö, S., & Kärkkäinen, T. (2006). Introduction to partitioning-based clustering methods with a robust example. Reports of the Department of Mathematical Information Technology. Series C, Software engineering and computational intelligence, (1/2006).
Cutting, D. R., Karger, D. R., Pedersen, J. O., & Tukey, J. W. (1992, June). Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 318–329). ACM.
Sajidha, S. A., Desikan, K., & Chodnekar, S. P. (2019). Initial seed selection for mixed data using modified K-means clustering algorithm. Arabian Journal for Science and Engineering, 45(4), 2685–2703.
Article Google Scholar
Cohen-Addad, V., Kanade, V., Mallmann-Trenn, F., & Mathieu, C. (2019). Hierarchical clustering: Objective functions and algorithms. Journal of the ACM (JACM), 66(4), 26.
Article MathSciNet Google Scholar
Sousa, L., & Gama, J. (2014). The application of hierarchical clustering algorithms for recognition using biometrics of the hand.
Moreno, R., Huáng, W., Younus, A., O’Mahony, M., & Hurley, N. J. (2017, September). Evaluation of Hierarchical Clustering via Markov Decision Processes for Efficient Navigation and Search. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 125–131). Springer, Cham.
Richards, M., Ghanem, M., Osmond, M., Guo, Y., & Hassard, J. (2006). Grid-based analysis of air pollution data. Ecological modelling, 194(1–3), 274–286.
Article Google Scholar
Wu, B., & Wilamowski, B. M. (2016, October). An efficient grid-based clustering method by finding density peaks. In IECON 2016–42nd Annual Conference of the IEEE Industrial Electronics Society (pp. 837–842). IEEE.
Bouveyron, C., & Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: A review. Computational Statistics and Data Analysis, 71, 52–78.
Article MathSciNet Google Scholar
Bay, S. D. (2001). Multivariate discretization for set mining. Knowledge and Information Systems, 3(4), 491–512.
Article Google Scholar
Barbará, D., Li, Y., & Couto, J. (2002, November). COOLCAT: an entropy-based algorithm for categorical clustering. In Proceedings of the eleventh international conference on Information and knowledge management (pp. 582–589). ACM.
Kotsiantis, S., & Pintelas, P. (2004). Recent advances in clustering: A brief survey. WSEAS Transactions on Information Science and Applications, 1(1), 73–81.
Google Scholar
Kent, B. P., Rinaldo, A., & Verstynen, T. (2013). Debacl: A python package for interactive density-based clustering. arXiv preprint.arXiv:1307.8136.
Jahirabadkar, S., & Kulkarni, P. (2014). Algorithm to determine ε-distance parameter in density based clustering. Expert systems with applications, 41(6), 2939–2946.
Article Google Scholar
Guo, S., Li, X., Ching, W. K., Dan, R., Li, W. K., & Zhang, Z. (2018). GPS trajectory data segmentation based on probabilistic logic. International Journal of Approximate Reasoning, 103, 227–247.
Article MathSciNet Google Scholar
Birant, D., & Kut, A. (2007). ST-DBSCAN: An algorithm for clustering spatial–temporal data. Data and Knowledge Engineering, 60(1), 208–221.
Article Google Scholar
47] Hernández, D. C., Hoang, V. D., Filonenko, A., & Jo, K. H. (2014, June). Vision-based heading angle estimation for an autonomous mobile robots navigation. In 2014 IEEE 23rd International Symposium on Industrial Electronics (ISIE) (pp. 1967–1972). IEEE.
Wang, F., Franco-Penya, H. H., Kelleher, J. D., Pugh, J., & Ross, R. (2017, July). An Analysis of the Application of Simplified Silhouette to the Evaluation of k-means Clustering Validity. In International Conference on Machine Learning and Data Mining in Pattern Recognition (pp. 291–305). Springer, Cham.
Hruschka, E. R., & Covoes, T. F. (2005, November). Feature selection for cluster analysis: an approach based on the simplified Silhouette criterion. In International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06) (Vol. 1, pp. 32–38). IEEE.
MacQueen, J. (1967, June). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 14, pp. 281–297).
Jain, A. K., & Dubes, R. C. (1988). 1988. Algorithms for clustering data. Englewood Cliffs: Prentice Hall.
Google Scholar
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (Kdd) (Vol. 96, No. 34, pp. 226–231).
Kanagala, H. K. & Jaya Rama Krishnaiah, V. V. (2016). A comparative study of K-Means, DBSCAN and OPTICS. In 2016 International Conference on Computer Communication and Informatics (ICCCI), Jan., pp. 1–6, doi: https://doi.org/10.1109/ICCCI.2016.7479923.
Campello RJ, Moulavi D, Sander J. (2013). Density-based clustering based on hierarchical density estimates. In Pacific-Asia conference on knowledge discovery and data mining, (pp. 160–172). Springer, Berlin, Heidelberg.
Veerhoek, L. (2020). Clustering Satellite Data to Define Eutrophication Monitoring Zones Based on Chlorophyll-a Concentration. Bachelor’s Thesis, Delft University of Technology, Delft, The Netherlands.
Rosalina, E.; Salim, F.D.; Sellis, T. (2017). Automated density-based clustering of spatial urban data for interactive data exploration. In Proceedings of the 2017 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Atlanta, GA, USA, pp. 295–300.
Lin, C.-H., Hsu, K.-C., Johnson, K. R., Luby, M., & Fann, Y. C. (2019). Applying density-based outlier identifications using multiple datasets for validation of stroke clinical outcomes. International Journal of Medical Informatics, 132, 103988.
Article Google Scholar
Zheng, Y., Zhang, L., Xie, X., & Ma, W. Y. (2009, April). Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of the 18th international conference on World wide web (pp. 791–800). ACM.
Alam, C. N., Manaf, K., Atmadja, A. R., & Aurum, D. K. (2016, April). Implementation of haversine formula for counting event visitor in the radius based on Android application. In 2016 4th International Conference on Cyber and IT Service Management (pp. 1–6). IEEE.

Download references

Acknowledgements

The authors are grateful to the Ministry of Human Resource Development (MHRD) of Government of India for supporting this research under Design Innovation Center (MHRD-DIC) under the subtheme “TRAFFIC SENSING AND IT”. The authors also would like to thank Akanksha Chuchra, (akanksha.chuchra96@gmail.com), and Dravita Singla (dravitasingla1995@gmail.com); Intern, DIC, UIET, for their contribution to the project.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, UIET Panjab University, Chandigarh, India
Rigzin Angmo & Naveen Aggarwal
Department of Information Technology, UIET Panjab University, Chandigarh, India
Veenu Mangat
Deloitte USI, Deloitte, Hyderabad, India
Anurag Lal
Infosys, Pune, India
Simarpreet Kaur

Authors

Rigzin Angmo
View author publications
You can also search for this author in PubMed Google Scholar
Naveen Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar
Veenu Mangat
View author publications
You can also search for this author in PubMed Google Scholar
Anurag Lal
View author publications
You can also search for this author in PubMed Google Scholar
Simarpreet Kaur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rigzin Angmo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Angmo, R., Aggarwal, N., Mangat, V. et al. An Improved Clustering Approach for Identifying Significant Locations from Spatio-temporal Data. Wireless Pers Commun 121, 985–1009 (2021). https://doi.org/10.1007/s11277-021-08668-w

Download citation

Accepted: 14 June 2021
Published: 28 June 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s11277-021-08668-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Improved Clustering Approach for Identifying Significant Locations from Spatio-temporal Data

Abstract

Access this article

Similar content being viewed by others

A Practical Guide to an Open-Source Map-Matching Approach for Big GPS Data

Spatial Data Management, Analysis, and Modeling in GIS: Principles and Applications

Geographic Information System: Principles and Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation