A Comparative Study of Spam and PrefixSpan Sequential Pattern Mining Algorithm for Protein Sequences

Mane, Rashmi V.

doi:10.1007/978-3-642-36321-4_13

Rashmi V. Mane⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 361))

Included in the following conference series:

International Conference on Advances in Computing, Communication and Control

2891 Accesses
2 Citations

Abstract

Sequential Pattern Mining is an efficient technique for discovering recurring structures or patterns from very large dataset. Many algorithms are proposed for mining. Broadly data mining algorithms are classified into two categories as Pattern-Growth approach or candidate generation and Apriori –Based. By introducing constraints such as user defined threshold, user specified data, minimum gap or time, algorithms outperforms better. In this paper we have used dataset of protein sequences and comparison in between PrefixSpan from pattern growth approach and SPAM from Apriori-Based algorithm. This comparative study is carried out with respect to space and time consumption of an algorithm. The study shows that SPAM with constraints outperforms better than PrefixSpan for very large dataset but for smaller data PrefixSpan works better than SPAM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant: Mining Sequential Pattern. In: Yu, P.S., Chen (ed.) Eleventh International Conference on Data Engineering (ICDE 1995), Taipei, Taiwan, pp. 3–14. IEEE Computer Society Press (1995)
Google Scholar
Srikant, R., Agrawal, R.: Mining Sequential patterns: Generalization and performance improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)
Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., et al.: PrefixSpan: Mining Sequential Patterns efficiently by prefix projected pattern growth. In: ICDE 2001, Heidelberg, Germany, pp. 215–224 (2001)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithm for mining association rules. In: Proceedings of International Conference on Very Large Data Bases (VLDB 1994), Santiago, Chile, pp. 487–499 (1994)
Google Scholar
Zaki, M.: An efficient algorithm for mining frequent sequence. Machine Learning 40, 31–60 (2000)
Google Scholar
Ayres, J., Gehrke, J., Yiu, T., Flannick, J.: Sequential Pattern Mining using Bitmap representation. In: Proceedings of ACM SIGKDD 2002, pp. 429–435 (2002)
Google Scholar
Zaki, M.: Sequential mining in categorical domains-Incorporating constraints. In: Proceeding of CIKM 2000, pp. 422–429 (2000)
Google Scholar
Han, H., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.-C.: FreeSpan: Frequent Pattern projected Sequential Pattern mining. In: Proceedings of 2000 International Conference on Knowledge Discovery and Data Mining, pp. 355–359 (2000)
Google Scholar
www.pdb.org , www.ebi.ac.uk/pdbc , www.rcsb.org , www.pdbj.org
Ho, J., Lukov, L., Chawla, S.: Sequential Pattern mining with constraints on large protein databases. In: ICMD (2005)
Google Scholar
Tao, T., Zhai, C.X., Lu, X., Fang, H.: A study of stastical methods for function prediction of protein motifs
Google Scholar
Wang, M., Shang, X.-Q., Li, Z.-H.: Sequential Pattern Mining for Protein Function Prediction. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds.) ADMA 2008. LNCS (LNAI), vol. 5139, pp. 652–658. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Technology, Shivaji University, Kolhapur, Maharashtra, India
Rashmi V. Mane

Authors

Rashmi V. Mane
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fr. Conceicao Rodrigues College of Engineering, Bandstand, Bandra (W), 400 050, Mumbai, Maharashtra, India
Srija Unnikrishnan
Fr. Conceicao Rodrigues College of Engineering, Bandstand, Bandra (W), 400 050, Mumbai, India
Sunil Surve
Dept. of Electronics Engineering, Fr. Conceicao Rodrigues College of Engineering, Bandstand, Bandra (West), 400 050, Mumbai, India
Deepak Bhoir

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mane, R.V. (2013). A Comparative Study of Spam and PrefixSpan Sequential Pattern Mining Algorithm for Protein Sequences. In: Unnikrishnan, S., Surve, S., Bhoir, D. (eds) Advances in Computing, Communication, and Control. ICAC3 2013. Communications in Computer and Information Science, vol 361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36321-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-36321-4_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36320-7
Online ISBN: 978-3-642-36321-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics