Skip to main content

A Comparative Study of Spam and PrefixSpan Sequential Pattern Mining Algorithm for Protein Sequences

  • Conference paper
Advances in Computing, Communication, and Control (ICAC3 2013)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 361))

Abstract

Sequential Pattern Mining is an efficient technique for discovering recurring structures or patterns from very large dataset. Many algorithms are proposed for mining. Broadly data mining algorithms are classified into two categories as Pattern-Growth approach or candidate generation and Apriori –Based. By introducing constraints such as user defined threshold, user specified data, minimum gap or time, algorithms outperforms better. In this paper we have used dataset of protein sequences and comparison in between PrefixSpan from pattern growth approach and SPAM from Apriori-Based algorithm. This comparative study is carried out with respect to space and time consumption of an algorithm. The study shows that SPAM with constraints outperforms better than PrefixSpan for very large dataset but for smaller data PrefixSpan works better than SPAM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant: Mining Sequential Pattern. In: Yu, P.S., Chen (ed.) Eleventh International Conference on Data Engineering (ICDE 1995), Taipei, Taiwan, pp. 3–14. IEEE Computer Society Press (1995)

    Google Scholar 

  2. Srikant, R., Agrawal, R.: Mining Sequential patterns: Generalization and performance improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)

    Google Scholar 

  3. Pei, J., Han, J., Mortazavi-Asl, B., et al.: PrefixSpan: Mining Sequential Patterns efficiently by prefix projected pattern growth. In: ICDE 2001, Heidelberg, Germany, pp. 215–224 (2001)

    Google Scholar 

  4. Agrawal, R., Srikant, R.: Fast algorithm for mining association rules. In: Proceedings of International Conference on Very Large Data Bases (VLDB 1994), Santiago, Chile, pp. 487–499 (1994)

    Google Scholar 

  5. Zaki, M.: An efficient algorithm for mining frequent sequence. Machine Learning 40, 31–60 (2000)

    Google Scholar 

  6. Ayres, J., Gehrke, J., Yiu, T., Flannick, J.: Sequential Pattern Mining using Bitmap representation. In: Proceedings of ACM SIGKDD 2002, pp. 429–435 (2002)

    Google Scholar 

  7. Zaki, M.: Sequential mining in categorical domains-Incorporating constraints. In: Proceeding of CIKM 2000, pp. 422–429 (2000)

    Google Scholar 

  8. Han, H., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.-C.: FreeSpan: Frequent Pattern projected Sequential Pattern mining. In: Proceedings of 2000 International Conference on Knowledge Discovery and Data Mining, pp. 355–359 (2000)

    Google Scholar 

  9. www.pdb.org , www.ebi.ac.uk/pdbc , www.rcsb.org , www.pdbj.org

  10. Ho, J., Lukov, L., Chawla, S.: Sequential Pattern mining with constraints on large protein databases. In: ICMD (2005)

    Google Scholar 

  11. Tao, T., Zhai, C.X., Lu, X., Fang, H.: A study of stastical methods for function prediction of protein motifs

    Google Scholar 

  12. Wang, M., Shang, X.-Q., Li, Z.-H.: Sequential Pattern Mining for Protein Function Prediction. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds.) ADMA 2008. LNCS (LNAI), vol. 5139, pp. 652–658. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mane, R.V. (2013). A Comparative Study of Spam and PrefixSpan Sequential Pattern Mining Algorithm for Protein Sequences. In: Unnikrishnan, S., Surve, S., Bhoir, D. (eds) Advances in Computing, Communication, and Control. ICAC3 2013. Communications in Computer and Information Science, vol 361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36321-4_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36321-4_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36320-7

  • Online ISBN: 978-3-642-36321-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics