Skip to main content

D-Mine: Accurate Discovery of Large Pattern Sequences from Biological Datasets

  • Conference paper
  • First Online:
Proceedings of the International Conference on Soft Computing Systems

Abstract

Exploring interesting associations on gene variables help to assess the accuracy of the pattern sequence mining. Exploration of genetic structures like DNA, RNA, and protein sequences from biological datasets will boost up new innovations in Pathology diagnosis. For this mission, very large genetic pattern sequences are to be discovered. To do this, doubleton pattern mining (DPM) is considered as very constructive for analyzing these datasets. In this paper, D-Mine, a new approach for discovering very large gene pattern sequences from Biological datasets is discussed. D-Mine effectively discovers doubleton patterns which are further enriched to generate gene pattern sequences with vector intersection operator and Markov probabilistic grammars. D-Mine is described as a solution to diminish the set of discovered patterns. D-Mine makes use of a new integrated data structure called ‘D-struct,’ as combination of a virtual data matrix and 1D array pair set to dynamically discover doubleton patterns from biological datasets. D-struct has a diverse feature to facilitate which is that it has extremely limited and accurately predictable main memory and runs very quickly in memory-based constraints. The algorithm is designed in such a way that it takes only one scan over the database to discover large gene pattern sequences by iteratively enumerating D-struct matrix. The empirical analysis on D-Mine shows that the proposed approach attains a better mining efficiency on various biological datasets and outperforms with CARPENTER in different settings. The performance of D-Mine on biological data set is also assessed with accuracy and F-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: VLDB’94, pp 487–499

    Google Scholar 

  2. Mannila H, Toivonen H, Verkamo AI (1997) Efficient algorithms for discovering association rules. In: KDD’94, pp 181–192

    Google Scholar 

  3. Manila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discovery 259–289

    Google Scholar 

  4. Brin S, Motwani R, Silverstein C (1997) Beyond market basket: generalizing association rules to correlations. In: Proceedings of the ACM-SIGMOD international conference on management of data, pp 265–276

    Google Scholar 

  5. Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: EDBT’96, pp 3–17

    Google Scholar 

  6. Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: ICDE’01, pp 215–224

    Google Scholar 

  7. Bayardo RJ (1998) Efficiently mining long patterns from databases. In: SIGMOD’98, pp 85–93

    Google Scholar 

  8. Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proceedings of the 2000 ACM-SIGMOD international workshop data mining and knowledge discovery (DMKD’00), pp 11–20

    Google Scholar 

  9. Zaki M (2000) Generating non-redundant association rules. In: KDD’00, pp 34–43

    Google Scholar 

  10. Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the 8th international conference on intelligent systems for mocular biology

    Google Scholar 

  11. Cong G, Tung AKH, Xu X, Pan F, Yang J (2004) FARMER: finding interesting rule groups in microarray datasets. In: Proceedings of the 23rd ACM international conference on management of data

    Google Scholar 

  12. Yang J, Wang H, Wang W, Yu PS (2003) Enhanced biclustering on gene expression data. In: Proceedings of the 3rd IEEE symposium on bioinformatics and bioengineering (BIBE), Washington DC

    Google Scholar 

  13. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th international conferrence on database theory (ICDT)

    Google Scholar 

  14. Zaki MJ, Hsiao C (2002) CHARM: an efficient algorithm for closed association rule mining. In: Proceedings of the SIAM international conference on data mining (SDM)

    Google Scholar 

  15. Pan F, Cong G, Tung AKH, Yang J, Zaki MJ (2003) CARPENTER: finding closed patterns in long biological datasets. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), 2003

    Google Scholar 

  16. Creighton C, Hanash S (2003) Mining gene expression databases for association rules. Bioinformatics 19

    Google Scholar 

  17. Zhang Z, Teo A, Ooi B, Tan K-L (2004) Mining deterministic biclusters in gene expression data. In: 4th symposium on bioinformatics and bioengineering

    Google Scholar 

  18. UCI machine learning data sets. http://archive.ics.uci.edu/ml/datasets/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prasanna Kottapalle .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer India

About this paper

Cite this paper

Kottapalle, P., Maddala, S., Gunjan, V.K. (2016). D-Mine: Accurate Discovery of Large Pattern Sequences from Biological Datasets. In: Suresh, L., Panigrahi, B. (eds) Proceedings of the International Conference on Soft Computing Systems. Advances in Intelligent Systems and Computing, vol 397. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2671-0_62

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-2671-0_62

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-2669-7

  • Online ISBN: 978-81-322-2671-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics