Abstract
Exploring interesting associations on gene variables help to assess the accuracy of the pattern sequence mining. Exploration of genetic structures like DNA, RNA, and protein sequences from biological datasets will boost up new innovations in Pathology diagnosis. For this mission, very large genetic pattern sequences are to be discovered. To do this, doubleton pattern mining (DPM) is considered as very constructive for analyzing these datasets. In this paper, D-Mine, a new approach for discovering very large gene pattern sequences from Biological datasets is discussed. D-Mine effectively discovers doubleton patterns which are further enriched to generate gene pattern sequences with vector intersection operator and Markov probabilistic grammars. D-Mine is described as a solution to diminish the set of discovered patterns. D-Mine makes use of a new integrated data structure called ‘D-struct,’ as combination of a virtual data matrix and 1D array pair set to dynamically discover doubleton patterns from biological datasets. D-struct has a diverse feature to facilitate which is that it has extremely limited and accurately predictable main memory and runs very quickly in memory-based constraints. The algorithm is designed in such a way that it takes only one scan over the database to discover large gene pattern sequences by iteratively enumerating D-struct matrix. The empirical analysis on D-Mine shows that the proposed approach attains a better mining efficiency on various biological datasets and outperforms with CARPENTER in different settings. The performance of D-Mine on biological data set is also assessed with accuracy and F-measure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: VLDB’94, pp 487–499
Mannila H, Toivonen H, Verkamo AI (1997) Efficient algorithms for discovering association rules. In: KDD’94, pp 181–192
Manila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discovery 259–289
Brin S, Motwani R, Silverstein C (1997) Beyond market basket: generalizing association rules to correlations. In: Proceedings of the ACM-SIGMOD international conference on management of data, pp 265–276
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: EDBT’96, pp 3–17
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: ICDE’01, pp 215–224
Bayardo RJ (1998) Efficiently mining long patterns from databases. In: SIGMOD’98, pp 85–93
Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proceedings of the 2000 ACM-SIGMOD international workshop data mining and knowledge discovery (DMKD’00), pp 11–20
Zaki M (2000) Generating non-redundant association rules. In: KDD’00, pp 34–43
Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the 8th international conference on intelligent systems for mocular biology
Cong G, Tung AKH, Xu X, Pan F, Yang J (2004) FARMER: finding interesting rule groups in microarray datasets. In: Proceedings of the 23rd ACM international conference on management of data
Yang J, Wang H, Wang W, Yu PS (2003) Enhanced biclustering on gene expression data. In: Proceedings of the 3rd IEEE symposium on bioinformatics and bioengineering (BIBE), Washington DC
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th international conferrence on database theory (ICDT)
Zaki MJ, Hsiao C (2002) CHARM: an efficient algorithm for closed association rule mining. In: Proceedings of the SIAM international conference on data mining (SDM)
Pan F, Cong G, Tung AKH, Yang J, Zaki MJ (2003) CARPENTER: finding closed patterns in long biological datasets. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), 2003
Creighton C, Hanash S (2003) Mining gene expression databases for association rules. Bioinformatics 19
Zhang Z, Teo A, Ooi B, Tan K-L (2004) Mining deterministic biclusters in gene expression data. In: 4th symposium on bioinformatics and bioengineering
UCI machine learning data sets. http://archive.ics.uci.edu/ml/datasets/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer India
About this paper
Cite this paper
Kottapalle, P., Maddala, S., Gunjan, V.K. (2016). D-Mine: Accurate Discovery of Large Pattern Sequences from Biological Datasets. In: Suresh, L., Panigrahi, B. (eds) Proceedings of the International Conference on Soft Computing Systems. Advances in Intelligent Systems and Computing, vol 397. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2671-0_62
Download citation
DOI: https://doi.org/10.1007/978-81-322-2671-0_62
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2669-7
Online ISBN: 978-81-322-2671-0
eBook Packages: EngineeringEngineering (R0)