Abstract
Distinguishing sequential patterns are useful in characterizing a given sequence class and contrasting that class against other sequence classes. This paper introduces the density concept into distinguishing sequential pattern mining, extending previous studies which considered gap and support constraints. Density is concerned with the number of times of given patterns occur in individual sequences; it is an important factor in many applications including biology, healthcare and financial analysis. We present gd-DSPMiner, a mining method with various pruning techniques, for mining density-aware distinguishing sequential patterns that satisfy density and gap, as well as support, constraints. With respect to computational speed, when the procedures related to density are masked gd-DSPMiner is substantially faster than previous distinguishing sequential pattern mining methods. Experiments on real data sets confirmed the effectiveness and efficiency of gd-DSPMiner in the general setting and the ability of gd-DSPMiner to discover density-aware distinguishing sequential patterns.
This work was supported in part by NSFC 61103042, SRFDP 20100181120029, and SKLSE2012-09-32.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE, pp. 3–14 (1995)
Yan, X., Han, J., Afshar, R.: Clospan: Mining closed sequential patterns in large databases. In: SDM 2003 (2003)
Ji, X., Bailey, J., Dong, G.: Mining minimal distinguishing subsequence patterns with gap constraints. Knowl. Inf. Syst. 11(3), 259–286 (2007)
Zhang, M., Kao, B., Cheung, D.W., Yip, K.Y.: Mining periodic patterns with gap requirement from sequences. ACM Trans. Knowl. Discov. Data 1(2) (2007)
Pei, J., Wang, H., Liu, J., Wang, K., Wang, J., Yu, P.S.: Discovering frequent closed partial orders from strings. IEEE Trans. on Knowl. and Data Eng. 18(11), 1467–1481 (2006)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Gardiner-Garden, M., Frommer, M.: CpG islands in vertebrate genomes. Journal of Molecular Biology 196(2), 261–282 (1987)
Bock, C., Paulsen, M., Tierling, S., Mikeska, T., Lengauer, T., Walter, J.: CpG island methylation in human lymphocytes is highly correlated with DNA sequence, repeats, and predicted DNA structure. PLoS Genetics 2(3), e26 (2006)
Jabbaria, K., Bernardi, G.: Cytosine methylation and CpG, TpG (CpA) and TpA frequencies. Gene 333 (2004)
Dong, G., Pei, J.: Sequence Data Mining. Springer, Heidelberg (2007)
Mabroukeh, N.R., Ezeife, C.I.: A taxonomy of sequential pattern mining algorithms. ACM Comput 43(1), 3:1–3:41 (2010)
Mooney, C.H., Roddick, J.F.: Sequential pattern mining – approaches and algorithms. ACM Comput. Surv. 45(2), 19:1–19:39 (2013)
Kumar, P., Krishna, P.R., Raju, S.B.: Pattern Discovery Using Sequence Data Mining: Applications and Studies, 1st edn. IGI Publishing, Hershey (2011)
Ferreira, P.G., Azevedo, P.J.: Protein sequence pattern mining with constraints. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 96–107. Springer, Heidelberg (2005)
She, R., Chen, F., Wang, K., Ester, M., Gardy, J.L., Brinkman, F.S.L.: Frequent-subsequence-based prediction of outer membrane proteins. In: KDD 2003, pp. 436–445 (2003)
Wu, X., Zhu, X., He, Y., Arslan, A.N.: PMBC: Pattern mining from biological sequences with wildcard constraints. Comput. Biol. Med. 43(5), 481–492 (2013)
Li, C., Yang, Q., Wang, J., Li, M.: Efficient mining of gap-constrained subsequences and its various applications. ACM Trans. Knowl. Discov. Data 6(1), 2:1–2:39 (2012)
Xie, F., Wu, X., Hu, X., Gao, J., Guo, D., Fei, Y., Hua, E.: MAIL: Mining sequential patterns with wildcards. Int. J. Data Min. Bioinformatics 8(1), 1–23 (2013)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)
He, D., Zhu, X., Wu, X.: Approximate repeating pattern mining with gap requirements. In: ICTAI 2009, pp. 17–24 (2009)
Shah, C.C., Zhu, X., Khoshgoftaar, T.M., Beyer, J.: Contrast pattern mining with gap constraints for peptide folding prediction. In: FLAIRS 2008, pp. 95–100 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, X., Duan, L., Dong, G., Yu, Z., Tang, C. (2014). Efficient Mining of Density-Aware Distinguishing Sequential Patterns with Gap Constraints. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science, vol 8421. Springer, Cham. https://doi.org/10.1007/978-3-319-05810-8_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-05810-8_25
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05809-2
Online ISBN: 978-3-319-05810-8
eBook Packages: Computer ScienceComputer Science (R0)