Abstract
Sequential Pattern Mining is an efficient technique for discovering recurring structures or patterns from very large dataset. Many algorithms are proposed for mining. Broadly data mining algorithms are classified into two categories as Pattern-Growth approach or candidate generation and Apriori –Based. By introducing constraints such as user defined threshold, user specified data, minimum gap or time, algorithms outperforms better. In this paper we have used dataset of protein sequences and comparison in between PrefixSpan from pattern growth approach and SPAM from Apriori-Based algorithm. This comparative study is carried out with respect to space and time consumption of an algorithm. The study shows that SPAM with constraints outperforms better than PrefixSpan for very large dataset but for smaller data PrefixSpan works better than SPAM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant: Mining Sequential Pattern. In: Yu, P.S., Chen (ed.) Eleventh International Conference on Data Engineering (ICDE 1995), Taipei, Taiwan, pp. 3–14. IEEE Computer Society Press (1995)
Srikant, R., Agrawal, R.: Mining Sequential patterns: Generalization and performance improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)
Pei, J., Han, J., Mortazavi-Asl, B., et al.: PrefixSpan: Mining Sequential Patterns efficiently by prefix projected pattern growth. In: ICDE 2001, Heidelberg, Germany, pp. 215–224 (2001)
Agrawal, R., Srikant, R.: Fast algorithm for mining association rules. In: Proceedings of International Conference on Very Large Data Bases (VLDB 1994), Santiago, Chile, pp. 487–499 (1994)
Zaki, M.: An efficient algorithm for mining frequent sequence. Machine Learning 40, 31–60 (2000)
Ayres, J., Gehrke, J., Yiu, T., Flannick, J.: Sequential Pattern Mining using Bitmap representation. In: Proceedings of ACM SIGKDD 2002, pp. 429–435 (2002)
Zaki, M.: Sequential mining in categorical domains-Incorporating constraints. In: Proceeding of CIKM 2000, pp. 422–429 (2000)
Han, H., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.-C.: FreeSpan: Frequent Pattern projected Sequential Pattern mining. In: Proceedings of 2000 International Conference on Knowledge Discovery and Data Mining, pp. 355–359 (2000)
www.pdb.org , www.ebi.ac.uk/pdbc , www.rcsb.org , www.pdbj.org
Ho, J., Lukov, L., Chawla, S.: Sequential Pattern mining with constraints on large protein databases. In: ICMD (2005)
Tao, T., Zhai, C.X., Lu, X., Fang, H.: A study of stastical methods for function prediction of protein motifs
Wang, M., Shang, X.-Q., Li, Z.-H.: Sequential Pattern Mining for Protein Function Prediction. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds.) ADMA 2008. LNCS (LNAI), vol. 5139, pp. 652–658. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mane, R.V. (2013). A Comparative Study of Spam and PrefixSpan Sequential Pattern Mining Algorithm for Protein Sequences. In: Unnikrishnan, S., Surve, S., Bhoir, D. (eds) Advances in Computing, Communication, and Control. ICAC3 2013. Communications in Computer and Information Science, vol 361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36321-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-36321-4_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36320-7
Online ISBN: 978-3-642-36321-4
eBook Packages: Computer ScienceComputer Science (R0)