Similarity Forest for Time Series Classification

Górecki, Tomasz; Łuczak, Maciej; Piasecki, Paweł

doi:10.1007/978-3-031-09034-9_19

Tomasz Górecki²²,
Maciej Łuczak²² &
Paweł Piasecki²²

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Included in the following conference series:

Conference of the International Federation of Classification Societies

1038 Accesses

Abstract

The idea of similarity forest comes from Sathe and Aggarwal (Similarity forests, pp 395–403, 2017, [1]) and is derived from random forest. Random forests, during already 20 years of existence, proved to be one of the most excellent methods, showing top performance across a vast array of domains, preserving simplicity, time efficiency, still being interpretable at the same time. However, its usage is limited to multidimensional data. Similarity forest does not require such representation – it is only needed to compute similarities between observations. Thus, it may be applied to data, for which multidimensional representation is not available. In this paper, we propose the implementation of similarity forest for time series classification. We investigate 2 distance measures: Euclidean and dynamic time warping (DTW) as the underlying measure for the algorithm. We compare the performance of similarity forest with 1-nearest neighbor and random forest on the UCR (University of California, Riverside) benchmark database.We show that similarity forest with DTW, taking into account mean ranks, outperforms other classifiers. The comparison is enriched with statistical analysis.

Download to read the full chapter text

Chapter PDF

Proximity Forest: an effective and scalable distance-based classifier for time series

Article 06 February 2019

TSCF: An Improved Deep Forest Model for Time Series Classification

Article Open access 07 February 2024

Random Pairwise Shapelets Forest

Keywords

References

Bagnall, A., Lines, J., Hills, J., Bostrom A.: Time-series classification with COTE: The collective of transformation-based ensembles. IEEE Trans. on Knowl. and Data Eng. 27, 2522–2535 (2015)
Google Scholar
Bagnall, A., Lines, J., Bostrom, A., Large J., Keogh, E.: The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min. and Knowl. Discov. 31, 606–660 (2017)
Article MathSciNet Google Scholar
Berndt, D. J., Clifford, J.: Using dynamic time warping to find patterns in time series. Proc. of the 3rd Int. Conf. on Knowl. Discov. and Data Min., pp. 359–370 (1994)
Google Scholar
Brieman, L.: Random forests. J. Mach. Learn. Arch. 45, 5–32 (2001)
Article Google Scholar
Chen, L., Ng, R.: On the marriage of L_p-norms and edit distance. Proc. of the 30th Int. Conf. on Very Large Data Bases 30, pp. 792–803 (2004)
Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. on Inf. Theor. 13, 21–27 (1967)
Article Google Scholar
Dau, H.A., Keogh, E., Kamgar, K., Yeh, Chin-Chia M., Zhu, Y.,Gharghabi, S., Ratanamahatana, C.A., Yanping, C., Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G., Hexagon-ML: The UCR time series classification archive (2019) https://www.cs.ucr.edu/\string~eamonn/time\_series\_data\_2018
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. of Mach. Learn. Res. 7, 1–30 (2006).
MathSciNet Google Scholar
Du,a D., Graff, C.: UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
Fernandez-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems?. J. of Mach. Learn. Res. 15, 3133–3181 (2014)
MathSciNet Google Scholar
Fix, E, Hodges, J. L.: Discriminatory analysis: nonparametric discrimination, consistency properties. Techn. Rep. 4, (1951)
Google Scholar
García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental Analysis of Power. Inf. Sci. 180, 2044–2064 (2010)
Article Google Scholar
Lines, J., Taylor S., Bagnall, A.: HIVE-COTE: The hierarchical vote collective of transformation based ensembles for time series classification. IEEE Int. Conf. on Data Min., pp. 1041–1046 (2016)
Google Scholar
Maharaj, E. A., D’Urso, P., Caiado, J.: Time Series Clustering and Classification. Chapman and Hall/CRC. (2019)
Google Scholar
Middlehurst, M., Large, J., Flynn, M., Lines, J., Bostrom, A., Bagnall, A.: HIVE-COTE 2.0: a new meta ensemble for time series classification. (2021) https://arxiv.org/abs/2104.07551
Nemenyi, P.:Distribution-free multiple comparisons. PhD thesis at PrincetonUniversity (1963)
Google Scholar
Pavlyshenko, B. M.: Machine-learning models for sales time series forecasting. Data 4, 15 (2019)
Article Google Scholar
Rastogi, V., Srivastava, S., Mishra, M., Thukral, R.: Predictive maintenance for SME in industry 4.0. 2020 Glob. Smart Ind. Conf., pp. 382–390 (2020)
Google Scholar
Sathe, S., Aggarwal, C. C.: Similarity forests. Proc. of the 23rd ACM SIGKDD, pp. 395–403 (2017)
Google Scholar
Tang, J., Chen, X.: Stock market prediction based on historic prices and news titles. Proc. Of the 2018 Int. Conf. on Mach. Learn. Techn., pp. 29–34 (2018)
Google Scholar
Vlachos, M., Kollios, G., Gunopulos, D.: Discovering similar multidimensional trajectories. Proc. 18th Int. Conf. on Data Eng., pp. 673–684 (2002)
Google Scholar
Wuest, T., Irgens, C., Thoben, K. D.: An approach to quality monitoring in manufacturing using supervised machine learning on product state data. J. of Int. Man. 25, 1167–1180 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Uniwersytetu Poznanskiego 4, Poznań, Poland
Tomasz Górecki, Maciej Łuczak & Paweł Piasecki

Authors

Tomasz Górecki
View author publications
You can also search for this author in PubMed Google Scholar
Maciej Łuczak
View author publications
You can also search for this author in PubMed Google Scholar
Paweł Piasecki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomasz Górecki .

Editor information

Editors and Affiliations

Faculty of Economics, University of Porto, Porto, Portugal
Paula Brito
Business Research Unit, University Institute of Lisbon, Lisbon, Portugal
José G. Dias
Department of Mathematical Sciences, University of Essex, Colchester, UK
Berthold Lausen
Department of Statistical Sciences "Paolo Fortunati", University of Bologna, Bologna, Italy
Angela Montanari
Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA, USA
Rebecca Nugent

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Górecki, T., Łuczak, M., Piasecki, P. (2023). Similarity Forest for Time Series Classification. In: Brito, P., Dias, J.G., Lausen, B., Montanari, A., Nugent, R. (eds) Classification and Data Science in the Digital Age. IFCS 2022. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-031-09034-9_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-09034-9_19
Published: 08 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09033-2
Online ISBN: 978-3-031-09034-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Similarity Forest for Time Series Classification

Abstract

Chapter PDF

Similar content being viewed by others

Proximity Forest: an effective and scalable distance-based classifier for time series

TSCF: An Improved Deep Forest Model for Time Series Classification

Random Pairwise Shapelets Forest

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Similarity Forest for Time Series Classification

Abstract

Chapter PDF

Similar content being viewed by others

Proximity Forest: an effective and scalable distance-based classifier for time series

TSCF: An Improved Deep Forest Model for Time Series Classification

Random Pairwise Shapelets Forest

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation