Learning sequential classifiers from long and noisy discrete-event sequences efficiently

Dafé, Gessé; Veloso, Adriano; Zaki, Mohammed; Meira, Wagner

doi:10.1007/s10618-014-0391-9

Learning sequential classifiers from long and noisy discrete-event sequences efficiently

Published: 04 November 2014

Volume 29, pages 1685–1708, (2015)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Gessé Dafé¹,
Adriano Veloso¹,
Mohammed Zaki² &
…
Wagner Meira Jr.¹

855 Accesses
12 Citations
Explore all metrics

Abstract

A variety of applications, such as information extraction, intrusion detection and protein fold recognition, can be expressed as sequences of discrete events or elements (rather than unordered sets of features), that is, there is an order dependence among the elements composing each data instance. These applications may be modeled as classification problems, and in this case the classifier should exploit sequential interactions among the elements, so that the ordering relationship among them is properly captured. Dominant approaches to this problem include: (i) learning Hidden Markov Models, (ii) exploiting frequent sequences extracted from the data and (iii) computing string kernels. Such approaches, however, are computationally hard and vulnerable to noise, especially if the data shows long range dependencies (i.e., long subsequences are necessary in order to model the data). In this paper we provide simple algorithms that build highly effective sequential classifiers. Our algorithms are based on enumerating approximately contiguous subsequences from the training set on a demand-driven basis, exploiting a lightweight and flexible subsequence matching function and an innovative subsequence enumeration strategy called pattern silhouettes, making our learning algorithms fast and the corresponding classifiers robust to noisy data. Our empirical results on a variety of datasets indicate that the best trade-off between accuracy and learning time is usually obtained by limiting the length of the subsequences by a factor of \(\log {n}\), which leads to a \(O(n\log {n})\) learning cost (where \(n\) is the length of the sequence being classified). Finally, we show that, in most of the cases, our classifiers are faster than existing solutions (sometimes, by orders of magnitude), also providing significant accuracy improvements in most of the evaluated cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introduction to Machine Learning

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Notes

A similar trend is observed for SC-SC algorithm.
The reason we include the LAC algorithm (which does not exploit adjacency information) as a baseline, is to evaluate the possible benefits of exploiting adjacency information while producing the classifier.

References

Agrawal R, Srikant R (1995) Mining sequential patterns. In: ICDE, pp 3–14
Bannister W (2007) Associative and sequential classification with adaptive constrained regression methods. PhD thesis, Tempe, AZ, USA
Baum L, Petrie T (1966) Statistical inference for probabilistic functions of finite state Markov chains. Ann Math Stat 37(6):1554–1563
Article MathSciNet MATH Google Scholar
Baum L, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41:164
Article MathSciNet MATH Google Scholar
Bicego M, Murino V, Figueiredo M (2003a) A sequential pruning strategy for the selection of the number of states in hidden Markov models. Pattern Recognit Lett 24(9–10):1395–1407
Article MATH Google Scholar
Bicego M, Murino V, Figueiredo M (2004) Similarity-based classification of sequences using hidden Markov models. Pattern Recognit 37(12):2281–2291
Article Google Scholar
Bicego M, Murino V, Pelillo M, Torsello A (2006) Similarity-based pattern recognition. Pattern Recognit 39(10):1813–1814
Article Google Scholar
Bühlmann P, Wyner A (1999) Variable length Markov chains. Ann Stat 27(2):480–513
Article MATH Google Scholar
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27, software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Davis A, Veloso A, da Silva A, Laender A, Meira W Jr (2012) Named entity disambiguation in streaming data. In: ACL, pp 815–824
Deshpande M, Karypis G (2004) Selective Markov models for predicting web page accesses. ACM Trans Internet Technol 4(2):163–184
Article Google Scholar
Durbin R, Eddy AKS, Mitchison G (1998) Biological sequence analysis. Cambridge University Press
Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press
Du Preez J (1998) Efficient training of high-order hidden Markov models using first-order representations. Comput Speech Lang 12(1):23–39
Article Google Scholar
Eddy S (1998) Profile hidden Markov models. Bioinformatics 14(9):755–763
Article Google Scholar
Galassi U, Giordana A, Saitta L (2007) Incremental construction of structured hidden Markov models. In: IJCAI, pp 798–803
Golding A, Roth D (1996) Applying winnow to context-sensitive spelling correction. CoRR
Han H, Giles C, Zha H, Li C, Tsioutsiouliklis K (2004) Two supervised learning approaches for name disambiguation in author citations. In: JCDL, pp 296–305
Han H, Zha H, Giles C (2005) Name disambiguation in author citations using a k-way spectral clustering method. In: JCDL, pp 334–343
Haussler D (1999) Convolution kernels on discrete structures. Tech. rep,Technical report, UC Santa Cruz
Hu J, Brown M, Turin W (1996) Hmm based on-line handwriting recognition. IEEE Trans Pattern Anal Mach Intell 18(10):1039–1045
Article Google Scholar
Hughey R, Krogh A (1996) Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput Appl Biosci 12(2):95–107
Google Scholar
Kriouile A, Mari J, Haon J (1990) Some improvements in speech recognition algorithms based on HMM. In: ICASSP, pp 545–548
Kuksa P, Huang PH, Pavlovic V (2008) A fast, large-scale learning method for protein sequence classification. In: 8th international workshop on data mining in bioinformatics, pp 29–37
Lane T, Brodley C (1999) Temporal sequence learning and data reduction for anomaly detection. ACM Trans Inf Syst Secur 2(3):295–331
Article Google Scholar
Law H, Chan C (1996) N-th order ergodic multigram hmm for modeling of languages without marked word boundaries. In: COLING, pp 204–209
Lesh N, Zaki M, Ogihara M (1999) Mining features for sequence classification. In: KDD, pp 342–346
Leslie C, Kuang R (2003) Fast kernels for inexact string matching. In: COLT, pp 114–128
Leslie C, Kuang R (2004) Fast string kernels using inexact matching for protein sequences. J Mach Learn Res 5:1435–1455
MathSciNet MATH Google Scholar
Leslie C, Eskin E, Noble WS (2002a) The spectrum kernel: a string kernel for SVM protein classification. Pac Symp Biocomput 7:566–575
Google Scholar
Leslie C, Eskin E, Weston J, Noble W (2002b) Mismatch string kernels for SVM protein classification. In: NIPS, pp 1417–1424
Leslie C, Eskin E, Cohen A, Weston J, Noble W (2004) Mismatch string kernels for discriminative protein classification. Bioinformatics 20(4):467–476
Article Google Scholar
Lin M, Hsueh S, Chen M, Hsu H (2009) Mining sequential patterns for image classification in ubiquitous multimedia systems. In: IIH-MSP, pp 303–306
Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002) Text classification using string kernels. J Mach Learn Res 2:419–444
MATH Google Scholar
Lodhi H, Muggleton S, Sternberg M (2009) Multi-class protein fold recognition using large margin logic based divide and conquer learning. SIGKDD Explor 11(2):117–122
Article Google Scholar
Malik H, Kender J (2008) Classifying high-dimensional text and web data using very short patterns. In: ICDM, pp 923–928
Müller S, Eickeler S, Rigoll G (2000) Crane gesture recognition using pseudo 3-d hidden Markov models. In: FG (Conf. on Automatic Face and Gesture Recognition), pp 398–402
Murzin A, Brenner S, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247(4):536–540
Google Scholar
Pitkow J, Pirolli P (1999) Mining longest repeating subsequences to predict world wide web surfing. In: USENIX symposium on Internet technologies and systems
Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Article Google Scholar
Rieck K, Laskov P (2008) Linear-time computation of similarity measures for sequential data. J Mach Learn Res 9:23–48
MATH Google Scholar
Rousu J, Shawe-Taylor J (2005) Efficient computation of gapped substring kernels on large alphabets. J Mach Learn Res 6(1):1323–1344
MathSciNet MATH Google Scholar
Saul L, Jordan M (1999) Mixed memory Markov models: decomposing complex stochastic processes as mixtures of simpler ones. Mach Learn 37(1):75–87
Article MATH Google Scholar
Schwardt L, Preez JD (2000) Efficient mixed-order hidden Markov model inference. In: ICSLP, pp 238–241
Sha F, Saul L (2006) Large margin hidden Markov models for automatic speech recognition. In: NIPS, pp 1249–1256
Silva I, Gomide J, Veloso A, Meira Jr W, Ferreira R (2011) Effective sentiment stream analysis with self-augmenting training and demand-driven projection. In: SIGIR, pp 475–484
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: EDBT, pp 3–17
Srivatsan L, Sastry P, Unnikrishnan K (2005) Discovering frequent episodes and learning hidden Markov models: a formal connection. IEEE TKDE 17:1505–1517
Google Scholar
Syed Z, Indyk P, Guttag J (2009) Learning approximate sequential patterns for classification. J Mach Learn Res 10:1913–1936
MathSciNet MATH Google Scholar
Szymanski B (2004) Recursive data mining for masquerade detection and author identification. Workshop on Information Assurance, pp 424–431
Tseng V, Lee C (2005) CBS: a new classification method by using sequential patterns. In: SDM
Vapnik V (1979) Estimation of dependences based on empirical data (in Russian). Nauka
Veloso A, Meira W Jr (2011) Demand-driven associative classification. Springer
Wang Y, Zhou L, Feng J, Wang J, Liu Z (2006) Mining complex time-series data by learning Markovian models. In: ICDM, pp 1136–1140
Watkins C (1999) Dynamic alignment kernels. Advances in neural information processing systems, pp 39–50
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: KDD, pp 947–956
Ye L, Keogh E (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1–2):149–182
Article MathSciNet MATH Google Scholar
Zaki M (2000) Sequence mining in categorical domains: Incorporating constraints. In: CIKM, pp 422–429
Zaki M, Carothers C, Szymanski B (2010) Vogue: a variable order hidden Markov model with duration based on frequent sequence mining. TKDD 4(1):1–31

Download references

Author information

Authors and Affiliations

Computer Science Department, Federal University of Minas Gerais, Belo Horizonte, Brazil
Gessé Dafé, Adriano Veloso & Wagner Meira Jr.
Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, USA
Mohammed Zaki

Authors

Gessé Dafé
View author publications
You can also search for this author in PubMed Google Scholar
Adriano Veloso
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Zaki
View author publications
You can also search for this author in PubMed Google Scholar
Wagner Meira Jr.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adriano Veloso.

Additional information

Responsible editors: Joao Gama, Indre Zliobaite and Alipio Jorge.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dafé, G., Veloso, A., Zaki, M. et al. Learning sequential classifiers from long and noisy discrete-event sequences efficiently. Data Min Knowl Disc 29, 1685–1708 (2015). https://doi.org/10.1007/s10618-014-0391-9

Download citation

Received: 02 March 2014
Accepted: 18 October 2014
Published: 04 November 2014
Issue Date: November 2015
DOI: https://doi.org/10.1007/s10618-014-0391-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning sequential classifiers from long and noisy discrete-event sequences efficiently

Abstract

Access this article

Similar content being viewed by others

Introduction to Machine Learning

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Feature selection techniques for machine learning: a survey of more than two decades of research

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning sequential classifiers from long and noisy discrete-event sequences efficiently

Abstract

Access this article

Similar content being viewed by others

Introduction to Machine Learning

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Feature selection techniques for machine learning: a survey of more than two decades of research

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation