VOGUE: A Novel Variable Order-Gap State Machine for Modeling Sequences

Bouqata, Bouchra; Carothers, Christopher D.; Szymanski, Boleslaw K.; Zaki, Mohammed J.

doi:10.1007/11871637_9

Bouchra Bouqata²¹,
Christopher D. Carothers²¹,
Boleslaw K. Szymanski²¹ &
…
Mohammed J. Zaki²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4213))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

3646 Accesses
7 Citations

Abstract

We present VOGUE, a new state machine that combines two separate techniques for modeling long range dependencies in sequential data: data mining and data modeling. VOGUE relies on a novel Variable-Gap Sequence mining method (VGS), to mine frequent patterns with different lengths and gaps between elements. It then uses these mined sequences to build the state machine. We applied VOGUE to the task of protein sequence classification on real data from the PROSITE protein families. We show that VOGUE yields significantly better scores than higher-order Hidden Markov Models. Moreover, we show that VOGUE’s classification sensitivity outperforms that of HMMER, a state-of-the-art method for protein classification.

Download to read the full chapter text

Chapter PDF

Multiple Sequence Alignment Using Probcons and Probalign

HMMs in Protein Fold Classification

FS-Tree: Sequential Association Rules and First Applications to Protein Secondary Structure Analysis

References

Antunes, C., Oliveira, A.L.: Generalization of pattern-growth methods for sequential pattern mining with gap constraints. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNCS, vol. 2734, pp. 239–251. Springer, Heidelberg (2003)
Chapter Google Scholar
Botta, M., Galassi, U., Giordana, A.: Learning Complex and Sparse Events in Long Sequences. In: European Conference on Artificial Intelligence (2004)
Google Scholar
Deshpande, M., Karypis, G.: Selective markov models for predicting web-page accesses. In: SIAM International Conference on Data Mining (April 2001)
Google Scholar
Eddy, S.R.: Profile hidden markov models. Bioinformatics 14, 755–763 (1998)
Article Google Scholar
Evangelista, P.F., Embrechts, M.J., Bonissone, P., Szymanski, B.K.: Fuzzy ROC curves for unsupervised nonparametric ensemble techniques. IJCNN (2005)
Google Scholar
Laxman, S., et al.: Discovering frequent episodes and learning hidden markov models: A formal connection. IEEE TKDE 17(11), 1505–1517 (2005)
Google Scholar
Nanopoulos, A., Katsaros, D., Manolopoulos, Y.: A data mining algorithm for generalized web prefetching. IEEE TKDE 15(5), 1155–1169 (2003)
Google Scholar
Pitkow, J., Pirolli, P.: Mining longest repeating subsequence to predict WWW surfing. In: 2nd USENIX Symp. on Internet Technologies and Systems (1999)
Google Scholar
Saul, L., Jordan, M.: Mixed memory markov models: Decomposing complex stochastic processes as mix of simpler ones. Machine Learning 37(1), 75–87 (1999)
Article MATH Google Scholar
Schwardt, L.C., du Preez, J.A.: Efficient mixed-order hidden markov model inference. In: Int’l Conf. on Spoken Language Processing (October 2000)
Google Scholar
Zaki, M.J.: Sequences mining in categorical domains: Incorporating constraints. In: 9th Int’l Conf. on Information and Knowledge Management (November 2000)
Google Scholar
Zaki, M.J.: SPADE: An efficient algorithm for mining frequent sequences. Machine Learning Journal 42(1/2), 31–60 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

CS Department, Rensselaer Polytechnic Institute, Troy, NY, USA
Bouchra Bouqata, Christopher D. Carothers, Boleslaw K. Szymanski & Mohammed J. Zaki

Authors

Bouchra Bouqata
View author publications
You can also search for this author in PubMed Google Scholar
Christopher D. Carothers
View author publications
You can also search for this author in PubMed Google Scholar
Boleslaw K. Szymanski
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed J. Zaki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Engineering Group, Technische Universität Darmstadt,
Johannes Fürnkranz
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bouqata, B., Carothers, C.D., Szymanski, B.K., Zaki, M.J. (2006). VOGUE: A Novel Variable Order-Gap State Machine for Modeling Sequences. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Knowledge Discovery in Databases: PKDD 2006. PKDD 2006. Lecture Notes in Computer Science(), vol 4213. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871637_9

Download citation

DOI: https://doi.org/10.1007/11871637_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45374-1
Online ISBN: 978-3-540-46048-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

VOGUE: A Novel Variable Order-Gap State Machine for Modeling Sequences

Abstract

Chapter PDF

Similar content being viewed by others

Multiple Sequence Alignment Using Probcons and Probalign

HMMs in Protein Fold Classification

FS-Tree: Sequential Association Rules and First Applications to Protein Secondary Structure Analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

VOGUE: A Novel Variable Order-Gap State Machine for Modeling Sequences

Abstract

Chapter PDF

Similar content being viewed by others

Multiple Sequence Alignment Using Probcons and Probalign

HMMs in Protein Fold Classification

FS-Tree: Sequential Association Rules and First Applications to Protein Secondary Structure Analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation