Abstract
We present VOGUE, a new state machine that combines two separate techniques for modeling long range dependencies in sequential data: data mining and data modeling. VOGUE relies on a novel Variable-Gap Sequence mining method (VGS), to mine frequent patterns with different lengths and gaps between elements. It then uses these mined sequences to build the state machine. We applied VOGUE to the task of protein sequence classification on real data from the PROSITE protein families. We show that VOGUE yields significantly better scores than higher-order Hidden Markov Models. Moreover, we show that VOGUE’s classification sensitivity outperforms that of HMMER, a state-of-the-art method for protein classification.
Chapter PDF
References
Antunes, C., Oliveira, A.L.: Generalization of pattern-growth methods for sequential pattern mining with gap constraints. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNCS, vol. 2734, pp. 239–251. Springer, Heidelberg (2003)
Botta, M., Galassi, U., Giordana, A.: Learning Complex and Sparse Events in Long Sequences. In: European Conference on Artificial Intelligence (2004)
Deshpande, M., Karypis, G.: Selective markov models for predicting web-page accesses. In: SIAM International Conference on Data Mining (April 2001)
Eddy, S.R.: Profile hidden markov models. Bioinformatics 14, 755–763 (1998)
Evangelista, P.F., Embrechts, M.J., Bonissone, P., Szymanski, B.K.: Fuzzy ROC curves for unsupervised nonparametric ensemble techniques. IJCNN (2005)
Laxman, S., et al.: Discovering frequent episodes and learning hidden markov models: A formal connection. IEEE TKDE 17(11), 1505–1517 (2005)
Nanopoulos, A., Katsaros, D., Manolopoulos, Y.: A data mining algorithm for generalized web prefetching. IEEE TKDE 15(5), 1155–1169 (2003)
Pitkow, J., Pirolli, P.: Mining longest repeating subsequence to predict WWW surfing. In: 2nd USENIX Symp. on Internet Technologies and Systems (1999)
Saul, L., Jordan, M.: Mixed memory markov models: Decomposing complex stochastic processes as mix of simpler ones. Machine Learning 37(1), 75–87 (1999)
Schwardt, L.C., du Preez, J.A.: Efficient mixed-order hidden markov model inference. In: Int’l Conf. on Spoken Language Processing (October 2000)
Zaki, M.J.: Sequences mining in categorical domains: Incorporating constraints. In: 9th Int’l Conf. on Information and Knowledge Management (November 2000)
Zaki, M.J.: SPADE: An efficient algorithm for mining frequent sequences. Machine Learning Journal 42(1/2), 31–60 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bouqata, B., Carothers, C.D., Szymanski, B.K., Zaki, M.J. (2006). VOGUE: A Novel Variable Order-Gap State Machine for Modeling Sequences. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Knowledge Discovery in Databases: PKDD 2006. PKDD 2006. Lecture Notes in Computer Science(), vol 4213. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871637_9
Download citation
DOI: https://doi.org/10.1007/11871637_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45374-1
Online ISBN: 978-3-540-46048-0
eBook Packages: Computer ScienceComputer Science (R0)