Abstract
The idea of similarity forest comes from Sathe and Aggarwal (Similarity forests, pp 395–403, 2017, [1]) and is derived from random forest. Random forests, during already 20 years of existence, proved to be one of the most excellent methods, showing top performance across a vast array of domains, preserving simplicity, time efficiency, still being interpretable at the same time. However, its usage is limited to multidimensional data. Similarity forest does not require such representation – it is only needed to compute similarities between observations. Thus, it may be applied to data, for which multidimensional representation is not available. In this paper, we propose the implementation of similarity forest for time series classification. We investigate 2 distance measures: Euclidean and dynamic time warping (DTW) as the underlying measure for the algorithm. We compare the performance of similarity forest with 1-nearest neighbor and random forest on the UCR (University of California, Riverside) benchmark database.We show that similarity forest with DTW, taking into account mean ranks, outperforms other classifiers. The comparison is enriched with statistical analysis.
Chapter PDF
Similar content being viewed by others
References
Bagnall, A., Lines, J., Hills, J., Bostrom A.: Time-series classification with COTE: The collective of transformation-based ensembles. IEEE Trans. on Knowl. and Data Eng. 27, 2522–2535 (2015)
Bagnall, A., Lines, J., Bostrom, A., Large J., Keogh, E.: The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min. and Knowl. Discov. 31, 606–660 (2017)
Berndt, D. J., Clifford, J.: Using dynamic time warping to find patterns in time series. Proc. of the 3rd Int. Conf. on Knowl. Discov. and Data Min., pp. 359–370 (1994)
Brieman, L.: Random forests. J. Mach. Learn. Arch. 45, 5–32 (2001)
Chen, L., Ng, R.: On the marriage of Lp-norms and edit distance. Proc. of the 30th Int. Conf. on Very Large Data Bases 30, pp. 792–803 (2004)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. on Inf. Theor. 13, 21–27 (1967)
Dau, H.A., Keogh, E., Kamgar, K., Yeh, Chin-Chia M., Zhu, Y.,Gharghabi, S., Ratanamahatana, C.A., Yanping, C., Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G., Hexagon-ML: The UCR time series classification archive (2019) https://www.cs.ucr.edu/\string~eamonn/time\_series\_data\_2018
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. of Mach. Learn. Res. 7, 1–30 (2006).
Du,a D., Graff, C.: UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
Fernandez-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems?. J. of Mach. Learn. Res. 15, 3133–3181 (2014)
Fix, E, Hodges, J. L.: Discriminatory analysis: nonparametric discrimination, consistency properties. Techn. Rep. 4, (1951)
García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental Analysis of Power. Inf. Sci. 180, 2044–2064 (2010)
Lines, J., Taylor S., Bagnall, A.: HIVE-COTE: The hierarchical vote collective of transformation based ensembles for time series classification. IEEE Int. Conf. on Data Min., pp. 1041–1046 (2016)
Maharaj, E. A., D’Urso, P., Caiado, J.: Time Series Clustering and Classification. Chapman and Hall/CRC. (2019)
Middlehurst, M., Large, J., Flynn, M., Lines, J., Bostrom, A., Bagnall, A.: HIVE-COTE 2.0: a new meta ensemble for time series classification. (2021) https://arxiv.org/abs/2104.07551
Nemenyi, P.:Distribution-free multiple comparisons. PhD thesis at PrincetonUniversity (1963)
Pavlyshenko, B. M.: Machine-learning models for sales time series forecasting. Data 4, 15 (2019)
Rastogi, V., Srivastava, S., Mishra, M., Thukral, R.: Predictive maintenance for SME in industry 4.0. 2020 Glob. Smart Ind. Conf., pp. 382–390 (2020)
Sathe, S., Aggarwal, C. C.: Similarity forests. Proc. of the 23rd ACM SIGKDD, pp. 395–403 (2017)
Tang, J., Chen, X.: Stock market prediction based on historic prices and news titles. Proc. Of the 2018 Int. Conf. on Mach. Learn. Techn., pp. 29–34 (2018)
Vlachos, M., Kollios, G., Gunopulos, D.: Discovering similar multidimensional trajectories. Proc. 18th Int. Conf. on Data Eng., pp. 673–684 (2002)
Wuest, T., Irgens, C., Thoben, K. D.: An approach to quality monitoring in manufacturing using supervised machine learning on product state data. J. of Int. Man. 25, 1167–1180 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this paper
Cite this paper
Górecki, T., Łuczak, M., Piasecki, P. (2023). Similarity Forest for Time Series Classification. In: Brito, P., Dias, J.G., Lausen, B., Montanari, A., Nugent, R. (eds) Classification and Data Science in the Digital Age. IFCS 2022. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-031-09034-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-09034-9_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09033-2
Online ISBN: 978-3-031-09034-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)