Abstract
Outlier detection and cleaning are essential steps in data preprocessing to ensure the integrity and validity of data analyses. This paper focuses on outlier points within individual trajectories, i.e., points that deviate significantly inside a single trajectory. We experiment with ten open-source libraries to comprehensively evaluate available tools, comparing their efficiency and accuracy in identifying and cleaning outliers. This experiment considers the libraries as they are offered to end users, with real-world applicability. We compare existing outlier detection libraries, introduce a method for establishing ground-truth, and aim to guide users in choosing the most appropriate tool for their specific outlier detection needs. Furthermore, we survey the state-of-the-art algorithms for outlier detection and classify them into five types: Statistic-based methods, Sliding window algorithms, Clustering-based methods, Graph-based methods, and Heuristic-based methods. Our research provides insights into these libraries’ performance and contributes to developing data preprocessing and outlier detection methodologies.
Similar content being viewed by others
Data and codes availability statement
The data that support the findings of this study are available at GrabPosisi [15], \(^{17}\) Denmark AIS data, and OpenSky data. \(^{18}\) The codes are available in Github \(^{19}\).
Notes
References
Attia Sakr M, Güting RH (2009) Spatiotemporal pattern queries in secondo. Advances in Spatial and Temporal Databases: 11th International Symposium, SSTD 2009 Aalborg, Denmark, Proceedings 11. Springer, Berlin Heidelberg, pp 422–426. Accessed 8–10 July 2009
Bakli M, Sakr M, Zimanyi E (2019) Distributed moving object data management in mobilitydb. In: Proceedings of the 8th ACM SIGSPATIAL international workshop on analytics for big geospatial data, pp 1–10
Breunig M, Kriegel HP, Ng R, et al (2000) Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, ACM, pp 93–104
Brinkhoff T (2002) A framework for generating network-based moving objects. GeoInformatica 6
Cao K, Liu Y, Meng G et al (2020) Trajectory outlier detection on trajectory data streams. IEEE Access pp 1–1
Control E (2022) The economics of aviation decarbonisation towards the 2030 green deal milestone. Euro Control
Custers B, Kerkhof M, Meulemans W, et al (2021) Maximum physically consistent trajectories. ACM Trans Spatial Algorithms Syst 7(4)
Duarte M, Sakr M (2023) Outlier detection and cleaning in trajectories: a benchmark of existing tools. In: Proceedings of the workshops of the EDBT/ICDT 2023 joint conference, Ioannina, Greece, vol 3379. CEUR-WS. Accessed 28 March 2023
Eldawy E, Mokhtar H (2020) Clustering-based trajectory outlier detection. Int J Adv Comput Sci Appl 11(5)
Ester M, Kriegel H, Sander J, et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining. AAAI Press, KDD’96, pp 226–231
Filzmoser P, Gschwandtner M (2017) mvoutlier: multivariate outlier detection based on robust methods. R package
Freitas C, Lydersen C, Fedak MA et al (2008) A simple new algorithm to filter marine mammal argos locations. Mar Mamm Sci
Graser A (2019) Movingpandas: efficient structures for movement data in Python. GI Forum 7:54–68
Haidri S, Haranwala YJ, Bogorny V et al (2021) Ptrail – a Python package for parallel trajectory data preprocessing.
Huang X, Yin Y, Lim S et al (2019) Grab-posisi: an extensive real-life gps trajectory dataset in Southeast Asia. In: SIGSPATIAL, New York, USA
Jain A (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666
Knorr E, Ng R, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8:237–253
Kotecha J, Djuric P (2003) Gaussian particle filtering. IEEE Trans Signal Process 51(10):2592–2601
Lee SH, West M (2010) Performance comparison of the distributed extended kalman filter and markov chain distributed particle filter. IFAC Proceedings
Magdy N, Sakr MA, El-Bahnasy K (2017) A generic trajectory similarity operator in moving object databases. Egypt Inform J 18(1):29–37
Wes McKinney (2010) Data structures for statistical computing in Python. In: Stéfan van der Walt, Jarrod Millman (eds) Proceedings of the 9th python in science conference, pp 56 – 61
Moosavi S, Omidvar-Tehrani B, Ramnath R (2017) Trajectory annotation by discovering driving patterns. In: the 3rd ACM SIGSPATIAL workshop
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems, pp 849–856
Ng R, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th international conference on very large data bases. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, VLDB ’94, pp 144–155
Oliveira A (2019) Uma arquitetura e implementação do módulo de visualização para biblioteca pymove. Bachelor’s thesis, UFC
Pappalardo L, Simini F, Barlacchi G, et al (2019) Scikit-mobility: a Python library for the analysis, generation and risk assessment of mobility data
Pearson R, Neuvo Y, Astola J et al (2016) Generalized hampel filters. EURASIP Journal on Advances in Signal Processing 2016
Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Sanches A (2019) Uma arquitetura e implementação do módulo de pré-processamento para biblioteca pymove. Bachelor’s thesis, UFC
Seidel D, et al (2019) Exploratory movement analysis and report building with r package stmove.
Shi J, Pan Z, Fang J et al (2021) Rutod: real-time urban traffic outlier detection on streaming trajectory. Neural Comput Appl 35:3625–3637
Thomas P, Barr J, Balaji B et al (2017) An open source framework for tracking and state estimation. In: Society of photo-optical instrumentation engineers (SPIE) conference series
Trofficus M (2021) Hampel filter in Python
Urrea C, Agramonte R (2021) Kalman filter: historical overview and review of its use in robotics 60 years after its creation. Sensors
Wang H, Bah M, Hammad M (2019) Progress in outlier detection techniques: a survey. IEEE Access 7:107964–108000
Wu S, Zimanyi E, Sakr M et al (2022) Semantic segmentation of ais trajectories for detecting complete fishing activities. In: 2022 23rd IEEE International conference on mobile data management (MDM). IEEE Comput Soc
Yang S, Madsen M, Bednar J (2022) HoloViz: Visualization and interactive dashboards in Python. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining. SIGKDD
Yang X, Tang L, Li Q (2018) A data cleaning method for big trace data using movement consistency. In: Sensors
Yu Y, Cao L, Rundensteiner E et al (2017) Outlier detection over massive-scale trajectory streams. ACM Trans Database Syst 42(2)
Yuan J, Zheng Y, Zhang C et al (2010) T-drive: driving directions based on taxi trajectories. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems. Association for computing machinery
Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. SIGMOD Rec 25(2):103–114
Zheng X, Yu D, Xie C et al (2023) Outlier detection of crowdsourcing trajectory data based on spatial and temporal characterization. Mathematics 11(3)
Zheng Y (2015) Trajectory data mining: an overview. ACM Trans Intell Syst Technol 6(3)
Zimányi E, Sakr M, Lesuisse A (2020) Mobilitydb: a mobility database based on postgresql and postgis. In: ACM Trans. Database Syst., New York, USA
Acknowledgements
This work was funded by the EU’s Horizon Europe research and innovation program under Grant No. 101070279 MobiSpaces.
Author information
Authors and Affiliations
Contributions
Mariana wrote the main manuscript. Mahmoud guided the research and reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Garcez Duarte, M.M., Sakr, M. An experimental study of existing tools for outlier detection and cleaning in trajectories. Geoinformatica (2024). https://doi.org/10.1007/s10707-024-00522-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10707-024-00522-y