Skip to main content
Log in

An experimental study of existing tools for outlier detection and cleaning in trajectories

  • Research
  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

Outlier detection and cleaning are essential steps in data preprocessing to ensure the integrity and validity of data analyses. This paper focuses on outlier points within individual trajectories, i.e., points that deviate significantly inside a single trajectory. We experiment with ten open-source libraries to comprehensively evaluate available tools, comparing their efficiency and accuracy in identifying and cleaning outliers. This experiment considers the libraries as they are offered to end users, with real-world applicability. We compare existing outlier detection libraries, introduce a method for establishing ground-truth, and aim to guide users in choosing the most appropriate tool for their specific outlier detection needs. Furthermore, we survey the state-of-the-art algorithms for outlier detection and classify them into five types: Statistic-based methods, Sliding window algorithms, Clustering-based methods, Graph-based methods, and Heuristic-based methods. Our research provides insights into these libraries’ performance and contributes to developing data preprocessing and outlier detection methodologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data and codes availability statement

The data that support the findings of this study are available at GrabPosisi [15], \(^{17}\) Denmark AIS data, and OpenSky data. \(^{18}\) The codes are available in Github \(^{19}\).

Notes

  1. https://github.com/anitagraser/movingpandas-examples

  2. https://github.com/scikit-mobility/scikit-mobility

  3. https://scikit-learn.org/stable/

  4. https://github.com/YakshHaranwala/PTRAIL

  5. https://github.com/InsightLab/PyMove

  6. https://github.com/movetk/movetk

  7. https://libmeos.org, https://github.com/MobilityDB/

  8. https://cran.r-project.org/web/packages/argosfilter/argosfilter.pdf

  9. https://tinyurl.com/stmove

  10. https://cran.r-project.org/web/packages/mvoutlier/index.html

  11. https://opensky-network.org

  12. https://www.grab.com/sg/

  13. https://github.com/marianaGarcez/OutlierDetectionLibraries

  14. https://www.eurocontrol.int/publication/objective-skygreen-2022-2030

  15. http://web.ais.dk/aisdata/

  16. https://opensky-network.org/data/impala

  17. https://github.com/marianaGarcez/OutlierDetectionLibraries

References

  1. Attia Sakr M, Güting RH (2009) Spatiotemporal pattern queries in secondo. Advances in Spatial and Temporal Databases: 11th International Symposium, SSTD 2009 Aalborg, Denmark, Proceedings 11. Springer, Berlin Heidelberg, pp 422–426. Accessed 8–10 July 2009

  2. Bakli M, Sakr M, Zimanyi E (2019) Distributed moving object data management in mobilitydb. In: Proceedings of the 8th ACM SIGSPATIAL international workshop on analytics for big geospatial data, pp 1–10

  3. Breunig M, Kriegel HP, Ng R, et al (2000) Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, ACM, pp 93–104

  4. Brinkhoff T (2002) A framework for generating network-based moving objects. GeoInformatica 6

  5. Cao K, Liu Y, Meng G et al (2020) Trajectory outlier detection on trajectory data streams. IEEE Access pp 1–1

  6. Control E (2022) The economics of aviation decarbonisation towards the 2030 green deal milestone. Euro Control

  7. Custers B, Kerkhof M, Meulemans W, et al (2021) Maximum physically consistent trajectories. ACM Trans Spatial Algorithms Syst 7(4)

  8. Duarte M, Sakr M (2023) Outlier detection and cleaning in trajectories: a benchmark of existing tools. In: Proceedings of the workshops of the EDBT/ICDT 2023 joint conference, Ioannina, Greece, vol 3379. CEUR-WS. Accessed 28 March 2023

  9. Eldawy E, Mokhtar H (2020) Clustering-based trajectory outlier detection. Int J Adv Comput Sci Appl 11(5)

  10. Ester M, Kriegel H, Sander J, et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining. AAAI Press, KDD’96, pp 226–231

  11. Filzmoser P, Gschwandtner M (2017) mvoutlier: multivariate outlier detection based on robust methods. R package

  12. Freitas C, Lydersen C, Fedak MA et al (2008) A simple new algorithm to filter marine mammal argos locations. Mar Mamm Sci

  13. Graser A (2019) Movingpandas: efficient structures for movement data in Python. GI Forum 7:54–68

    Google Scholar 

  14. Haidri S, Haranwala YJ, Bogorny V et al (2021) Ptrail – a Python package for parallel trajectory data preprocessing.

  15. Huang X, Yin Y, Lim S et al (2019) Grab-posisi: an extensive real-life gps trajectory dataset in Southeast Asia. In: SIGSPATIAL, New York, USA

  16. Jain A (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666

    Article  Google Scholar 

  17. Knorr E, Ng R, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8:237–253

    Article  Google Scholar 

  18. Kotecha J, Djuric P (2003) Gaussian particle filtering. IEEE Trans Signal Process 51(10):2592–2601

    Article  MathSciNet  Google Scholar 

  19. Lee SH, West M (2010) Performance comparison of the distributed extended kalman filter and markov chain distributed particle filter. IFAC Proceedings

  20. Magdy N, Sakr MA, El-Bahnasy K (2017) A generic trajectory similarity operator in moving object databases. Egypt Inform J 18(1):29–37

  21. Wes McKinney (2010) Data structures for statistical computing in Python. In: Stéfan van der Walt, Jarrod Millman (eds) Proceedings of the 9th python in science conference, pp 56 – 61

  22. Moosavi S, Omidvar-Tehrani B, Ramnath R (2017) Trajectory annotation by discovering driving patterns. In: the 3rd ACM SIGSPATIAL workshop

  23. Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems, pp 849–856

  24. Ng R, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th international conference on very large data bases. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, VLDB ’94, pp 144–155

  25. Oliveira A (2019) Uma arquitetura e implementação do módulo de visualização para biblioteca pymove. Bachelor’s thesis, UFC

  26. Pappalardo L, Simini F, Barlacchi G, et al (2019) Scikit-mobility: a Python library for the analysis, generation and risk assessment of mobility data

  27. Pearson R, Neuvo Y, Astola J et al (2016) Generalized hampel filters. EURASIP Journal on Advances in Signal Processing 2016

  28. Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  Google Scholar 

  29. Sanches A (2019) Uma arquitetura e implementação do módulo de pré-processamento para biblioteca pymove. Bachelor’s thesis, UFC

  30. Seidel D, et al (2019) Exploratory movement analysis and report building with r package stmove.

  31. Shi J, Pan Z, Fang J et al (2021) Rutod: real-time urban traffic outlier detection on streaming trajectory. Neural Comput Appl 35:3625–3637

    Article  Google Scholar 

  32. Thomas P, Barr J, Balaji B et al (2017) An open source framework for tracking and state estimation. In: Society of photo-optical instrumentation engineers (SPIE) conference series

  33. Trofficus M (2021) Hampel filter in Python

  34. Urrea C, Agramonte R (2021) Kalman filter: historical overview and review of its use in robotics 60 years after its creation. Sensors

  35. Wang H, Bah M, Hammad M (2019) Progress in outlier detection techniques: a survey. IEEE Access 7:107964–108000

    Article  Google Scholar 

  36. Wu S, Zimanyi E, Sakr M et al (2022) Semantic segmentation of ais trajectories for detecting complete fishing activities. In: 2022 23rd IEEE International conference on mobile data management (MDM). IEEE Comput Soc

  37. Yang S, Madsen M, Bednar J (2022) HoloViz: Visualization and interactive dashboards in Python. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining. SIGKDD

  38. Yang X, Tang L, Li Q (2018) A data cleaning method for big trace data using movement consistency. In: Sensors

  39. Yu Y, Cao L, Rundensteiner E et al (2017) Outlier detection over massive-scale trajectory streams. ACM Trans Database Syst 42(2)

  40. Yuan J, Zheng Y, Zhang C et al (2010) T-drive: driving directions based on taxi trajectories. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems. Association for computing machinery

  41. Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. SIGMOD Rec 25(2):103–114

  42. Zheng X, Yu D, Xie C et al (2023) Outlier detection of crowdsourcing trajectory data based on spatial and temporal characterization. Mathematics 11(3)

  43. Zheng Y (2015) Trajectory data mining: an overview. ACM Trans Intell Syst Technol 6(3)

  44. Zimányi E, Sakr M, Lesuisse A (2020) Mobilitydb: a mobility database based on postgresql and postgis. In: ACM Trans. Database Syst., New York, USA

Download references

Acknowledgements

This work was funded by the EU’s Horizon Europe research and innovation program under Grant No. 101070279 MobiSpaces.

Author information

Authors and Affiliations

Authors

Contributions

Mariana wrote the main manuscript. Mahmoud guided the research and reviewed the manuscript.

Corresponding authors

Correspondence to Mariana M Garcez Duarte or Mahmoud Sakr.

Ethics declarations

Competing of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Garcez Duarte, M.M., Sakr, M. An experimental study of existing tools for outlier detection and cleaning in trajectories. Geoinformatica (2024). https://doi.org/10.1007/s10707-024-00522-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10707-024-00522-y

Keywords

Navigation