Abstract
Unlike the traditional datasets, gene expression datasets typically contain a huge number of items and few transactions. Though there were a large number of algorithms that had been developed for mining frequent closed patterns, their running time increased exponentially with the average length of the transactions increasing. Therefore, most current methods for high-dimensional gene expression datasets were impractical. In this paper, we proposed a new data structure, tidset-prefix-plus tree (TP+-tree), to store the compressed transposed table of dataset. Based on TP+-tree, an algorithm, TP+close, was developed for mining frequent closed patterns in gene expression datasets. TP+close adopted top-down and divide-and-conquer search strategies on the transaction space. Moreover, TP+close combined efficient pruning and effective optimizing methods. Several experiments on real-life gene expression datasets showed that TP+close was faster than RERII and CARPENTER, two existing algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Creighton, C., Hanash, S.: Mining Gene Expression Databases for Association Rules. Bioinformatics 19, 79–86 (2003)
Madeira, S., Oliveira, A.: Biclustering Algorithm for Biological Data Analysis: A Survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 24–45 (2004)
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proc. 1994 VLDB Int’l. Conf., Santiago, Chile, pp. 487–499 (1994)
Han, J.W., Pei, J., Yin, Y.: Mining Frequent Patterns Without Candidate Generation. In: Proc. ACM SIGMOD Int’l. Conf. On Management of Data, pp. 1–12. ACM Press, Dallas (2000)
Pasquier, N., Bastide, Y., Taouil, R., et al.: Discovering Frequent Closed Itemsets for Association Rules. In: Proc. Int’l. Conf. On Database Theory, pp. 398–416. Springer, Jerusalem (1999)
Zaki, M., Hsiao, C.: CHARM: An Efficient Algorithm for Closed Itemset Mining. In: Proc. SIAM Int’l. Conf. on Data Mining, pp. 12–28. SIAM, Arlington (2002)
Rioult, F., Boulicaut, J., Crémilleux, B., et al.: Using Transposition for Pattern Discovery from Microarray Data. In: DMKD 2003, pp. 73–79. ACM press, San Diego (2003)
Pan, F., Cong, G., Tung, A., et al.: CARPENTER: Finding Closed Patterns in Long Biological Datasets. In: SIGKDD 2003, pp. 637–642. ACM Press, Washington (2003)
Cong, G., Tan, K., Tung, A., et al.: Mining Frequent Closed Patterns in Microarray Data. In: ICDM 2004, pp. 363–366. IEEE Press, Los Alamitos (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Miao, Y., Chen, G., Song, B., Wang, Z. (2006). TP+Close: Mining Frequent Closed Patterns in Gene Expression Datasets. In: Dalkilic, M.M., Kim, S., Yang, J. (eds) Data Mining and Bioinformatics. VDMB 2006. Lecture Notes in Computer Science(), vol 4316. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11960669_11
Download citation
DOI: https://doi.org/10.1007/11960669_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68970-6
Online ISBN: 978-3-540-68971-3
eBook Packages: Computer ScienceComputer Science (R0)