# Multi-period classification: learning sequent classes from temporal domains

- 401 Downloads

## Abstract

As the majority of real-world decisions change over time, extending traditional classifiers to deal with the problem of classifying an attribute of interest across different time periods becomes increasingly important. Tackling this problem, referred to as *multi-period classification*, is critical to answer real-world tasks, such as the prediction of upcoming healthcare needs or administrative planning tasks. In this context, although existing research provides principles for learning single labels from complex data domains, less attention has been given to the problem of learning sequences of classes (symbolic time series). This work motivates the need for multi-period classifiers, and proposes a method, cluster-based multi-period classification (CMPC), that preserves local dependencies across the periods under classification. Evaluation against real-world datasets provides evidence of the relevance of multi-period classifiers, and shows the superior performance of the CMPC method against peer methods adapted from long-term prediction for multi-period tasks with a high number of periods.

## Keywords

Multi-period classification Long-term prediction Time-sensitive supervised learning## Notes

### Acknowledgments

The authors deeply thank the reviewers of this manuscript for the detailed, attentive and insightful feedback. This work was supported by *Fundação para a Ciência e Tecnologia* under the multi-annual funding of INESC-ID PEst-OE/EEI/LA0021/2013 and the Ph.D. Grant SFRH/BD/75924/2011.

## References

- Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66Google Scholar
- Azuaje F (2011) Integrative data analysis for biomarker discovery. In: Bioinformatics and biomarker discovery: omic data analysis for personalized medicine, pp 137–154Google Scholar
- Bache K, Lichman M (2013) UCI machine learning repositoryGoogle Scholar
- Baldi P, Chauvin Y, Hunkapliier Y, McClure M (1994) Hidden Markov models of biological primary sequence information. Proc Natl Acad Sci USA 91(3):1059–1063CrossRefGoogle Scholar
- Batista GEAPA, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In SDM’11. SIAM / Omnipress, Mesa, pp 699–710Google Scholar
- Baxter RA, Williams GJ, He H (2001) Feature selection for temporal health records. In PAKDD, London, UK. Springer-Verlag, London, pp 198–209Google Scholar
- Ben Taieb S, Bontempi G, Atiya AF, Sorjamaa A (2012) A review and comparison of strategies for multi-step ahead time series forecasting based on the nn5 forecasting competition. Expert Syst Appl 39(8):7067–7083CrossRefGoogle Scholar
- Ben Taieb S, Sorjamaa A, Bontempi G (2010) Multiple-output modeling for multi-step-ahead time series forecasting. Neurocomputing 73:1950–1957CrossRefGoogle Scholar
- Bengio S, Fessant F, Collobert D (1996) Use of modular architectures for time series prediction. Neural Process Lett 3:101–106CrossRefGoogle Scholar
- Bishop C (2006) Pattern recognition and machine learning., Information science and statisticsSpringer, New YorkzbMATHGoogle Scholar
- Bontempi G, Ben Taieb S (2011) Conditionally dependent strategies for multiple-step-ahead prediction in local learning. Int J Forecast 27(2004):689–699CrossRefGoogle Scholar
- Bontempi G, Birattari M, and Bersini H (1998) Lazy learning for iterated time-series prediction. In Suykens JAK, Vandewalle J (eds) IW on advanced black-box tech for nonlinear modeling, Leuven, Belgium. Katholieke University, Leuven, pp 62–68Google Scholar
- Bradley PS, Reina CA, Fayyad UM (2000) Clustering very large databases using EM mixture models. In: Pattern recognition, international conference on 2:2076+Google Scholar
- Brahim-Belhouari S, Bermak A (2004) Gaussian process for nonstationary time series prediction. Comput Stat Data Anal 47(4):705–712CrossRefzbMATHMathSciNetGoogle Scholar
- Cadez I, Heckerman D, Meek C, Smyth P, White S (2000) Visualization of navigation patterns on a web site using model-based clustering. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’00, New York, NY, USA. ACM, New York, pp 280–284Google Scholar
- Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh EJ (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552CrossRefGoogle Scholar
- Geurts P (2001) Pattern extraction for time series classification. In: Principles of data mining and knowledge discovery. LNCS, vol 2168. Springer, Heidelberg, pp 115–127Google Scholar
- Graves A (2012) Supervised sequence labelling with recurrent neural networks., Studies in computational intelligenceSpringer, New YorkCrossRefzbMATHGoogle Scholar
- Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11(1):10–18CrossRefGoogle Scholar
- Hartigan JA, Wong MA (1979) A k-means clustering algorithm. JSTOR Appl Stat 28(1):100–108CrossRefzbMATHGoogle Scholar
- Henriques R, Antunes C (2012) On the need of new approaches for the novel problem of long-term prediction over multi-dimensional data. In: Lee R (ed) Computer and information science 2012, vol 429., Studies in computational intelligenceSpringer, Berlin, pp 121–138CrossRefGoogle Scholar
- Henriques R, Antunes C (2014) Learning predictive models from integrated healthcare data: capturing temporal and cross-attribute dependencies. In: HICSS, IEEEGoogle Scholar
- Henriques R, Pina S, Antunes C (2013) Temporal mining of integrated healthcare data: methods, revealings and implications. In: SDM IW on data mining for medicine and healthcare. SIAM, pp 52–60Google Scholar
- Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323CrossRefGoogle Scholar
- Ji Y, Hao J, Reyhani N, Lendasse A (2005) Direct and recursive prediction of time series using mutual information selection. In: IWANN. LNCS, vol 3512. Springer, Heidelberg, pp 1010–1017Google Scholar
- Kirshner S (2005) Modeling of multivariate time series using hidden Markov models. PhD thesis, AAI3164062Google Scholar
- Kriegel H-P, Kröger P, Sander J, Zimek A (2011) Density-based clustering. Wiley Interdisc Rew 1(3):231–240Google Scholar
- Letham B, Rudin C, Madigan D (2013) Sequential event prediction. Mach Learn 93(2–3):357–380CrossRefzbMATHMathSciNetGoogle Scholar
- Lockett AJ, Miikkulainen R (2009) Temporal convolution machines for sequence learning. Technical report AI-09-04, University of Texas at AustinGoogle Scholar
- Mantaci S, Restivo A, Sciortino M (2008) Distance measures for biological sequences: some recent approaches. Int J Approx Reason 47(1):109–124CrossRefzbMATHMathSciNetGoogle Scholar
- Moen P (2000) Attribute, event sequence and event type similarity notions for data mining. University of HelsinkiGoogle Scholar
- Mörchen F (2003) Time series feature extraction for data mining using DWT and DFT. Reihe Informatik UnivGoogle Scholar
- Mörchen F (2006) Time series knowledge mining. Wissenschaft in Dissertationen. Görich & WeiershäuserGoogle Scholar
- Murphy K (2002) Dynamic Bayesian networks: representation, inference and learning. PhD thesis, UC Berkeley, Computer Science DivisionGoogle Scholar
- Nguyen H-L, Ng W-K, Woon Y-K (2013) Closed motifs for streaming time series classification. KAIS, pp 1–25Google Scholar
- Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc., San Francisco, CAGoogle Scholar
- Povinelli RJ, Johnson MT, Lindgren AC, Ye J (2004) Time series classification using gaussian mixture models of reconstructed phase spaces. IEEE Trans Knowl Data Eng 16(6):779–789CrossRefGoogle Scholar
- Quinlan R (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo, CAGoogle Scholar
- Rahman S, Bakar A, Hussein Z (2008) A review on protein sequence clustering research. ICBE, vol 21., IFMBE ProceedingsSpringer, Berlin-Heidelberg, pp 275–278Google Scholar
- Roddick JF, Spiliopoulou M (2002) A survey of temporal knowledge discovery paradigms and methods. IEEE Trans Knowl Data Eng 14(4):750–767Google Scholar
- Sorjamaa A, Hao J, Reyhani N, Ji Y, Lendasse A (2007) Methodology for long-term prediction of time series. Neurocomputing 70:2861–2869CrossRefGoogle Scholar
- Sorjamaa A, Lendasse A (2006) Time series prediction using dirrec strategy. In: ESANN’06, pp 143–148Google Scholar
- Taieb SB, Bontempi G, Sorjamaa A, Lendasse A (2009) Long-term prediction of time series by combining direct and mimo strategies. In IJCNN, Piscataway, NJ, USA. IEEE Press, pp 1559–1566Google Scholar
- Toft P, Rostrup E, Nielsen FA, Nielsen FA, Hansen LK, Goutte C, Goutte C (1998) On clustering fMRI time series. Neuroimage 9:298–310Google Scholar
- Tseng V, Lee C-H (2009a) Effective temporal data classification by integrating sequential pattern mining and probabilistic induction. Expert Syst Appl 36(5):9524–9532CrossRefGoogle Scholar
- Tseng VS, Lee C-H (2009b) Effective temporal data classification by integrating sequential pattern mining and probabilistic induction. Expert Syst Appl 36(5):9524–9532CrossRefGoogle Scholar
- Tsoumakas G, Katakis I (2007) Multi label classification: an overview. Int J Data Wareh Min 3(3):1–13CrossRefGoogle Scholar
- Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244CrossRefGoogle Scholar
- Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana CA (2006) Fast time series classification using numerosity reduction. In ICML. ACM, New York, pp 1033–1040Google Scholar
- Zhang M-L, Zhou Z-H (2005) A k-nearest neighbor based algorithm for multi-label classification. IEEE International Conference on Granular Computing, vol 2, pp 718–721Google Scholar