D-Mine: Accurate Discovery of Large Pattern Sequences from Biological Datasets

Kottapalle, Prasanna; Maddala, Seetha; Gunjan, Vinit Kumar

doi:10.1007/978-81-322-2671-0_62

Prasanna Kottapalle¹⁶,
Seetha Maddala¹⁷ &
Vinit Kumar Gunjan¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 397))

1799 Accesses
1 Citations

Abstract

Exploring interesting associations on gene variables help to assess the accuracy of the pattern sequence mining. Exploration of genetic structures like DNA, RNA, and protein sequences from biological datasets will boost up new innovations in Pathology diagnosis. For this mission, very large genetic pattern sequences are to be discovered. To do this, doubleton pattern mining (DPM) is considered as very constructive for analyzing these datasets. In this paper, D-Mine, a new approach for discovering very large gene pattern sequences from Biological datasets is discussed. D-Mine effectively discovers doubleton patterns which are further enriched to generate gene pattern sequences with vector intersection operator and Markov probabilistic grammars. D-Mine is described as a solution to diminish the set of discovered patterns. D-Mine makes use of a new integrated data structure called ‘D-struct,’ as combination of a virtual data matrix and 1D array pair set to dynamically discover doubleton patterns from biological datasets. D-struct has a diverse feature to facilitate which is that it has extremely limited and accurately predictable main memory and runs very quickly in memory-based constraints. The algorithm is designed in such a way that it takes only one scan over the database to discover large gene pattern sequences by iteratively enumerating D-struct matrix. The empirical analysis on D-Mine shows that the proposed approach attains a better mining efficiency on various biological datasets and outperforms with CARPENTER in different settings. The performance of D-Mine on biological data set is also assessed with accuracy and F-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: VLDB’94, pp 487–499
Google Scholar
Mannila H, Toivonen H, Verkamo AI (1997) Efficient algorithms for discovering association rules. In: KDD’94, pp 181–192
Google Scholar
Manila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discovery 259–289
Google Scholar
Brin S, Motwani R, Silverstein C (1997) Beyond market basket: generalizing association rules to correlations. In: Proceedings of the ACM-SIGMOD international conference on management of data, pp 265–276
Google Scholar
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: EDBT’96, pp 3–17
Google Scholar
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: ICDE’01, pp 215–224
Google Scholar
Bayardo RJ (1998) Efficiently mining long patterns from databases. In: SIGMOD’98, pp 85–93
Google Scholar
Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proceedings of the 2000 ACM-SIGMOD international workshop data mining and knowledge discovery (DMKD’00), pp 11–20
Google Scholar
Zaki M (2000) Generating non-redundant association rules. In: KDD’00, pp 34–43
Google Scholar
Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the 8th international conference on intelligent systems for mocular biology
Google Scholar
Cong G, Tung AKH, Xu X, Pan F, Yang J (2004) FARMER: finding interesting rule groups in microarray datasets. In: Proceedings of the 23rd ACM international conference on management of data
Google Scholar
Yang J, Wang H, Wang W, Yu PS (2003) Enhanced biclustering on gene expression data. In: Proceedings of the 3rd IEEE symposium on bioinformatics and bioengineering (BIBE), Washington DC
Google Scholar
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th international conferrence on database theory (ICDT)
Google Scholar
Zaki MJ, Hsiao C (2002) CHARM: an efficient algorithm for closed association rule mining. In: Proceedings of the SIAM international conference on data mining (SDM)
Google Scholar
Pan F, Cong G, Tung AKH, Yang J, Zaki MJ (2003) CARPENTER: finding closed patterns in long biological datasets. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), 2003
Google Scholar
Creighton C, Hanash S (2003) Mining gene expression databases for association rules. Bioinformatics 19
Google Scholar
Zhang Z, Teo A, Ooi B, Tan K-L (2004) Mining deterministic biclusters in gene expression data. In: 4th symposium on bioinformatics and bioengineering
Google Scholar
UCI machine learning data sets. http://archive.ics.uci.edu/ml/datasets/

Download references

Author information

Authors and Affiliations

JNIAS-JNTUH, AITS, Rajampet, Hyderabad, AP, India
Prasanna Kottapalle
JNIAS, GNITS, Hyderabad, AP, India
Seetha Maddala
Department of CSE, AITS, Rajampet, Hyderabad, AP, India
Vinit Kumar Gunjan

Authors

Prasanna Kottapalle
View author publications
You can also search for this author in PubMed Google Scholar
Seetha Maddala
View author publications
You can also search for this author in PubMed Google Scholar
Vinit Kumar Gunjan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prasanna Kottapalle .

Editor information

Editors and Affiliations

Noorul Islam Centre for Higher Education, Kumaracoil, Tamil Nadu, India
L. Padma Suresh
IIT Delhi, New Delhi, Delhi, India
Bijaya Ketan Panigrahi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kottapalle, P., Maddala, S., Gunjan, V.K. (2016). D-Mine: Accurate Discovery of Large Pattern Sequences from Biological Datasets. In: Suresh, L., Panigrahi, B. (eds) Proceedings of the International Conference on Soft Computing Systems. Advances in Intelligent Systems and Computing, vol 397. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2671-0_62

Download citation

DOI: https://doi.org/10.1007/978-81-322-2671-0_62
Published: 29 December 2015
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2669-7
Online ISBN: 978-81-322-2671-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics