TP+Close: Mining Frequent Closed Patterns in Gene Expression Datasets

Miao, YuQing; Chen, GuoLiang; Song, Bin; Wang, ZhiHao

doi:10.1007/11960669_11

YuQing Miao^21,22,
GuoLiang Chen²¹,
Bin Song²³ &
…
ZhiHao Wang²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4316))

Included in the following conference series:

VLDB Workshop on Data Mining and Bioinformatics

504 Accesses
2 Citations

Abstract

Unlike the traditional datasets, gene expression datasets typically contain a huge number of items and few transactions. Though there were a large number of algorithms that had been developed for mining frequent closed patterns, their running time increased exponentially with the average length of the transactions increasing. Therefore, most current methods for high-dimensional gene expression datasets were impractical. In this paper, we proposed a new data structure, tidset-prefix-plus tree (TP+-tree), to store the compressed transposed table of dataset. Based on TP+-tree, an algorithm, TP+close, was developed for mining frequent closed patterns in gene expression datasets. TP+close adopted top-down and divide-and-conquer search strategies on the transaction space. Moreover, TP+close combined efficient pruning and effective optimizing methods. Several experiments on real-life gene expression datasets showed that TP+close was faster than RERII and CARPENTER, two existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Creighton, C., Hanash, S.: Mining Gene Expression Databases for Association Rules. Bioinformatics 19, 79–86 (2003)
Article Google Scholar
Madeira, S., Oliveira, A.: Biclustering Algorithm for Biological Data Analysis: A Survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 24–45 (2004)
Article Google Scholar
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proc. 1994 VLDB Int’l. Conf., Santiago, Chile, pp. 487–499 (1994)
Google Scholar
Han, J.W., Pei, J., Yin, Y.: Mining Frequent Patterns Without Candidate Generation. In: Proc. ACM SIGMOD Int’l. Conf. On Management of Data, pp. 1–12. ACM Press, Dallas (2000)
Chapter Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., et al.: Discovering Frequent Closed Itemsets for Association Rules. In: Proc. Int’l. Conf. On Database Theory, pp. 398–416. Springer, Jerusalem (1999)
Google Scholar
Zaki, M., Hsiao, C.: CHARM: An Efficient Algorithm for Closed Itemset Mining. In: Proc. SIAM Int’l. Conf. on Data Mining, pp. 12–28. SIAM, Arlington (2002)
Google Scholar
Rioult, F., Boulicaut, J., Crémilleux, B., et al.: Using Transposition for Pattern Discovery from Microarray Data. In: DMKD 2003, pp. 73–79. ACM press, San Diego (2003)
Chapter Google Scholar
Pan, F., Cong, G., Tung, A., et al.: CARPENTER: Finding Closed Patterns in Long Biological Datasets. In: SIGKDD 2003, pp. 637–642. ACM Press, Washington (2003)
Google Scholar
Cong, G., Tan, K., Tung, A., et al.: Mining Frequent Closed Patterns in Microarray Data. In: ICDM 2004, pp. 363–366. IEEE Press, Los Alamitos (2004)
Google Scholar
http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi
http://www.broad.mit.edu/cancer/pub/dlbcl

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China
YuQing Miao, GuoLiang Chen & ZhiHao Wang
Department of Computer Science and Technology, Guilin University of Electronic, Technology, Guilin, China
YuQing Miao
Department of Computer Science, Case Western Reserve University, Cleveland, USA
Bin Song

Authors

YuQing Miao
View author publications
You can also search for this author in PubMed Google Scholar
GuoLiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Bin Song
View author publications
You can also search for this author in PubMed Google Scholar
ZhiHao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Informatics, Indiana University, 901 E. 10th Street, 47408, Bloomington, IN,
Mehmet M. Dalkilic & Sun Kim &
EECS Department, Case Western Reserve Univ., 10900 Euclid Ave, 44106, Cleveland, OH, USA
Jiong Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miao, Y., Chen, G., Song, B., Wang, Z. (2006). TP+Close: Mining Frequent Closed Patterns in Gene Expression Datasets. In: Dalkilic, M.M., Kim, S., Yang, J. (eds) Data Mining and Bioinformatics. VDMB 2006. Lecture Notes in Computer Science(), vol 4316. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11960669_11

Download citation

DOI: https://doi.org/10.1007/11960669_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68970-6
Online ISBN: 978-3-540-68971-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics