Skip to main content
Log in

An Improved Clustering Approach for Identifying Significant Locations from Spatio-temporal Data

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

The rapid development and adoption of Internet-of-Things (IoT) and sensors such as Global Positioning System (GPS) in our daily life allows gathering of wealth of information. This tremendous amount of sensor data can be clustered to infer information about the whereabouts of users. Researchers have proposed various algorithms for analyzing spatio-temporal data and clustering such datasets. However, a big challenge is to discover clusters with large density variation and to solve this problem most of the existing clustering algorithms manually set the input parameters. In the Density-based spatial clustering (DBSCAN) clustering algorithm, the epsilon and minimum point parameters have to be set manually for further computation. In this paper, we propose an improved DBSCAN algorithm to adaptively set minimum point parameter for dynamic length of data, based on merge sort and silhouette analysis. Validation of the proposed algorithm has been done on GeoLife GPS dataset. Experimental results show that the proposed algorithm is competitive with state-of-the-art methods for identifying users’ significant locations and whereabouts with the density variation. The paper also aims to highlight the issues pertaining to the invasion of user privacy by the potential application of the clustering algorithm on the aforementioned data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

© Google Maps)

Fig. 3

© Google Maps)

Fig. 4

© Google Maps)

Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Buczkowski, A., Location-Based Services - Concept – Geoawesomeness. http://geoawesomeness.com/knowledge-base/location-based-services/location-based-services-conceptAccessed on 6 July, 2018.

  2. Zheng, Y., Xie, X., & Ma, W. Y. (2010). Geolife: A collaborative social networking service among user, location and trajectory. IEEE Data Eng. Bull., 33(2), 32–39.

    Google Scholar 

  3. Hasnat, M. M., & Hasan, S. (2018). Identifying tourists and analyzing spatial patterns of their destinations from location-based social media data. Transportation Research Part C: Emerging Technologies, 96, 38–54.

    Article  Google Scholar 

  4. Krumm, J., Rouhana, D., & Chang, M. W. (2015, March). Placer++: Semantic place labels beyond the visit. In 2015 IEEE International Conference on Pervasive Computing and Communications (PerCom) (pp. 11–19). IEEE.

  5. Loroy, J., Lee, J., Verleysen, M., HOSTYN, T., GIARD, J., & CAPPART, Q.(2016). Detecting user’s habits using GPS data.

  6. Hartigan, J. A., & Wong, M. A. (1979). AK-means clustering algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28(1), 100–108.

    MATH  Google Scholar 

  7. Moore,A. (2001) K-means and Hierarchical Clustering-Tutorial Slides. Carnegie Mellon University, htttp.www-2. cs. cmu. edu/~ awm/tutorials/kmeans.html, accessed on 26 April 2019.

  8. Kriegel, H. P., Kröger, P., Sander, J., & Zimek, A. (2011). Density-based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3), 231–240.

    Google Scholar 

  9. Borah, B., & Bhattacharyya, D. K. (2004, January). An improved sampling-based DBSCAN for large spatial databases. In International Conference on Intelligent Sensing and Information Processing, 2004. Proceedings of (pp. 92–96). IEEE.

  10. Tsai, C. F., & Liu, C. W. (2006, June). KIDBSCAN: a new efficient data clustering algorithm. In International Conference on Artificial Intelligence and Soft Computing (pp. 702–711). Springer, Berlin, Heidelberg.

  11. Chen, W., Ji, M., & Wang, J. (2014). T-DBSCAN: A spatiotemporal density clustering for GPS trajectory segmentation. International Journal of Online Engineering (iJOE), 10(6), 19–24.

    Article  Google Scholar 

  12. Tsai, C. F., & Sung, C. Y. (2010). EIDBSCAN: An Extended Improving DBSCAN algorithm with sampling techniques. International Journal of Business Intelligence and Data Mining, 5(1), 94–111.

    Article  Google Scholar 

  13. Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. SIGMOD Record (ACM Special Interest Group on Management of Data). https://doi.org/10.1145/304181.304187

    Article  Google Scholar 

  14. Hotait, H., Chiementin, X., Mouchaweh, M. S., & Rasolofondraibe, L. (2021). Monitoring of Ball Bearing Based on Improved Real-Time OPTICS Clustering. Journal of Signal Processing Systems, 93(2), 221–237.

    Article  Google Scholar 

  15. Campello, R. J. G. B., Moulavi, D., Zimek, A., & Sander, J. (2015). Hierarchical density estimates for data clustering visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data, 10(1), 1–51. https://doi.org/10.1145/2733381

    Article  Google Scholar 

  16. Ghaemi, Z., & Farnaghi, M. (2019). A varied density-based clustering approach for event detection from heterogeneous twitter data. ISPRS International Journal of Geo-Information, 8(2), 82.

    Article  Google Scholar 

  17. Wang, T., Ren, C., Luo, Y., & Tian, J. (2019). NS-DBSCAN: A density-based clustering algorithm in network space. ISPRS International Journal of Geo-Information, 8(5), 218.

    Article  Google Scholar 

  18. Luong, C., Do, S., & Hoang, T. (2015). A method for detecting significant places from GPS trajectory data. Journal of Advances in Information Technology, 6(1), 44–49.

    Article  Google Scholar 

  19. Huang, J., Xu, R., Cheng, D., Zhang, S., & Shang, K. (2019). A novel hybrid clustering algorithm based on minimum spanning tree of natural core points. IEEE Access, 7, 43707–43720.

    Article  Google Scholar 

  20. Khan, S. A. (2019, January). Clustering Algorithm on Spatiotemporal Trajectories. In 2019 2nd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET) (pp. 1–7). IEEE..

  21. Berkhin, P. (2006). A survey of clustering data mining techniques. In J. Kogan, C. Nicholas, & M. Teboulle (Eds.), Grouping multidimensional data (pp. 25–71). Berlin, Heidelberg: Springer.

    Chapter  Google Scholar 

  22. Cruz, M. O., Macedo, H., & Guimaraes, A. (2015, November). Grouping similar trajectories for carpooling purposes. In 2015 Brazilian Conference on Intelligent Systems (BRACIS) (pp. 234–239). IEEE.

  23. Li, L., Jiang, R., He, Z., Chen, X. M., & Zhou, X. (2020). Trajectory data-based traffic flow studies: A revisit. Transportation Research Part C: Emerging Technologies, 114, 225–240.

    Article  Google Scholar 

  24. Maggiore, G., Santos, C., & Plaat, A. (2014). Smarter smartphones: Understanding and predicting user habits from GPS sensor data. Procedia Computer Science, 34, 297–304.

    Article  Google Scholar 

  25. Rai, P., & Singh, S. (2010). A survey of clustering techniques. International Journal of Computer Applications, 7(12), 1–5.

    Article  Google Scholar 

  26. Kameshwaran, K., & Malarvizhi, K. (2014). Survey on clustering techniques in data mining. International Journal of Computer Science and Information Technologies, 5(2), 2272–2276.

    Google Scholar 

  27. Jiang, L., Luo, J., Zhang, C., Tian, L., Liu, Q., Chen, G., & Tian, Y. (2020). Study on the level and type identification of rural development in Wuhan city’s new urban districts. ISPRS International Journal of Geo-Information, 9(3), 172.

    Article  Google Scholar 

  28. Gan, G., Ma, C., & Wu, J. (2007). Data clustering: theory, algorithms, and applications (Vol. 20). Siam.

  29. Hartigan, J. A. (1973). Clustering. Annual review of biophysics and bioengineering, 2(1), 81–102.

    Article  Google Scholar 

  30. Aggarwal, C. C. (Ed.). (2014). Data classification: algorithms and applications. London: CRC Press.

    Google Scholar 

  31. Äyrämö, S., & Kärkkäinen, T. (2006). Introduction to partitioning-based clustering methods with a robust example. Reports of the Department of Mathematical Information Technology. Series C, Software engineering and computational intelligence, (1/2006).

  32. Cutting, D. R., Karger, D. R., Pedersen, J. O., & Tukey, J. W. (1992, June). Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 318–329). ACM.

  33. Sajidha, S. A., Desikan, K., & Chodnekar, S. P. (2019). Initial seed selection for mixed data using modified K-means clustering algorithm. Arabian Journal for Science and Engineering, 45(4), 2685–2703.

    Article  Google Scholar 

  34. Cohen-Addad, V., Kanade, V., Mallmann-Trenn, F., & Mathieu, C. (2019). Hierarchical clustering: Objective functions and algorithms. Journal of the ACM (JACM), 66(4), 26.

    Article  MathSciNet  Google Scholar 

  35. Sousa, L., & Gama, J. (2014). The application of hierarchical clustering algorithms for recognition using biometrics of the hand.

  36. Moreno, R., Huáng, W., Younus, A., O’Mahony, M., & Hurley, N. J. (2017, September). Evaluation of Hierarchical Clustering via Markov Decision Processes for Efficient Navigation and Search. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 125–131). Springer, Cham.

  37. Richards, M., Ghanem, M., Osmond, M., Guo, Y., & Hassard, J. (2006). Grid-based analysis of air pollution data. Ecological modelling, 194(1–3), 274–286.

    Article  Google Scholar 

  38. Wu, B., & Wilamowski, B. M. (2016, October). An efficient grid-based clustering method by finding density peaks. In IECON 2016–42nd Annual Conference of the IEEE Industrial Electronics Society (pp. 837–842). IEEE.

  39. Bouveyron, C., & Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: A review. Computational Statistics and Data Analysis, 71, 52–78.

    Article  MathSciNet  Google Scholar 

  40. Bay, S. D. (2001). Multivariate discretization for set mining. Knowledge and Information Systems, 3(4), 491–512.

    Article  Google Scholar 

  41. Barbará, D., Li, Y., & Couto, J. (2002, November). COOLCAT: an entropy-based algorithm for categorical clustering. In Proceedings of the eleventh international conference on Information and knowledge management (pp. 582–589). ACM.

  42. Kotsiantis, S., & Pintelas, P. (2004). Recent advances in clustering: A brief survey. WSEAS Transactions on Information Science and Applications, 1(1), 73–81.

    Google Scholar 

  43. Kent, B. P., Rinaldo, A., & Verstynen, T. (2013). Debacl: A python package for interactive density-based clustering. arXiv preprint.arXiv:1307.8136.

  44. Jahirabadkar, S., & Kulkarni, P. (2014). Algorithm to determine ε-distance parameter in density based clustering. Expert systems with applications, 41(6), 2939–2946.

    Article  Google Scholar 

  45. Guo, S., Li, X., Ching, W. K., Dan, R., Li, W. K., & Zhang, Z. (2018). GPS trajectory data segmentation based on probabilistic logic. International Journal of Approximate Reasoning, 103, 227–247.

    Article  MathSciNet  Google Scholar 

  46. Birant, D., & Kut, A. (2007). ST-DBSCAN: An algorithm for clustering spatial–temporal data. Data and Knowledge Engineering, 60(1), 208–221.

    Article  Google Scholar 

  47. 47] Hernández, D. C., Hoang, V. D., Filonenko, A., & Jo, K. H. (2014, June). Vision-based heading angle estimation for an autonomous mobile robots navigation. In 2014 IEEE 23rd International Symposium on Industrial Electronics (ISIE) (pp. 1967–1972). IEEE.

  48. Wang, F., Franco-Penya, H. H., Kelleher, J. D., Pugh, J., & Ross, R. (2017, July). An Analysis of the Application of Simplified Silhouette to the Evaluation of k-means Clustering Validity. In International Conference on Machine Learning and Data Mining in Pattern Recognition (pp. 291–305). Springer, Cham.

  49. Hruschka, E. R., & Covoes, T. F. (2005, November). Feature selection for cluster analysis: an approach based on the simplified Silhouette criterion. In International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06) (Vol. 1, pp. 32–38). IEEE.

  50. MacQueen, J. (1967, June). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 14, pp. 281–297).

  51. Jain, A. K., & Dubes, R. C. (1988). 1988. Algorithms for clustering data. Englewood Cliffs: Prentice Hall.

    Google Scholar 

  52. Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (Kdd) (Vol. 96, No. 34, pp. 226–231).

  53. Kanagala, H. K. & Jaya Rama Krishnaiah, V. V. (2016). A comparative study of K-Means, DBSCAN and OPTICS. In 2016 International Conference on Computer Communication and Informatics (ICCCI), Jan., pp. 1–6, doi: https://doi.org/10.1109/ICCCI.2016.7479923.

  54. Campello RJ, Moulavi D, Sander J. (2013). Density-based clustering based on hierarchical density estimates. In Pacific-Asia conference on knowledge discovery and data mining, (pp. 160–172). Springer, Berlin, Heidelberg.

  55. Veerhoek, L. (2020). Clustering Satellite Data to Define Eutrophication Monitoring Zones Based on Chlorophyll-a Concentration. Bachelor’s Thesis, Delft University of Technology, Delft, The Netherlands.

  56. Rosalina, E.; Salim, F.D.; Sellis, T. (2017). Automated density-based clustering of spatial urban data for interactive data exploration. In Proceedings of the 2017 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Atlanta, GA, USA, pp. 295–300.

  57. Lin, C.-H., Hsu, K.-C., Johnson, K. R., Luby, M., & Fann, Y. C. (2019). Applying density-based outlier identifications using multiple datasets for validation of stroke clinical outcomes. International Journal of Medical Informatics, 132, 103988.

    Article  Google Scholar 

  58. Zheng, Y., Zhang, L., Xie, X., & Ma, W. Y. (2009, April). Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of the 18th international conference on World wide web (pp. 791–800). ACM.

  59. Alam, C. N., Manaf, K., Atmadja, A. R., & Aurum, D. K. (2016, April). Implementation of haversine formula for counting event visitor in the radius based on Android application. In 2016 4th International Conference on Cyber and IT Service Management (pp. 1–6). IEEE.

Download references

Acknowledgements

The authors are grateful to the Ministry of Human Resource Development (MHRD) of Government of India for supporting this research under Design Innovation Center (MHRD-DIC) under the subtheme “TRAFFIC SENSING AND IT”. The authors also would like to thank Akanksha Chuchra, (akanksha.chuchra96@gmail.com), and Dravita Singla (dravitasingla1995@gmail.com); Intern, DIC, UIET, for their contribution to the project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rigzin Angmo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Angmo, R., Aggarwal, N., Mangat, V. et al. An Improved Clustering Approach for Identifying Significant Locations from Spatio-temporal Data. Wireless Pers Commun 121, 985–1009 (2021). https://doi.org/10.1007/s11277-021-08668-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-021-08668-w

Keywords

Navigation