Skip to main content

Boolean Property Encoding for Local Set Pattern Discovery: An Application to Gene Expression Data Analysis

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3539))

Abstract

In the domain of gene expression data analysis, several researchers have recently emphasized the promising application of local pattern (e.g., association rules, closed sets) discovery techniques from boolean matrices that encode gene properties. Detecting local patterns by means of complete constraint-based mining techniques turns to be an important complementary approach or invaluable counterpart to heuristic global model mining. To take the most from local set pattern mining approaches, a needed step concerns gene expression property encoding (e.g., over-expression). The impact of this preprocessing phase on both the quantity and the quality of the extracted patterns is crucial. In this paper, we study the impact of discretization techniques by a sound comparison between the dendrograms, i.e., trees that are generated by a hierarchical clustering algorithm on raw numerical expression data and its various derived boolean matrices. Thanks to a new similarity measure, we can select the boolean property encoding technique which preserves similarity structures holding in the raw data. The discussion relies on several experimental results for three gene expression data sets. We believe our framework is an interesting direction of work for the many application domains in which (a) local set patterns have been proved useful, and (b) Boolean properties have to be derived from raw numerical data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. DeRisi, J., Iyer, V., Brown, P.: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680–686 (1997)

    Article  Google Scholar 

  2. Velculescu, V., Zhang, L., Vogelstein, B., Kinzler, K.: Serial analysis of gene expression. Science 270, 484–487 (1995)

    Article  Google Scholar 

  3. Piatetsky-Shapiro, G., Tamayo, P. (eds.): Special issue on microrray data mining. SIGKDD Explorations 5(2) (2003)

    Google Scholar 

  4. Eisen, M., Spellman, P., Brown, P., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)

    Article  Google Scholar 

  5. Niehrs, C., Pollet, N.: Synexpression groups in eukaryotes. Nature 402, 483–487 (1999)

    Article  Google Scholar 

  6. Boulicaut, J.F., Bykowski, A.: Frequent closures as a concise representation for binary data mining. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS (LNAI), vol. 1805, pp. 62–73. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  7. Pei, J., Han, J., Mao, R.: CLOSET an efficient algorithm for mining frequent closed itemsets. In: Proceedings ACM SIGMOD Workshop DMKD 2000, Dallas, USA, pp. 21–30 (2000)

    Google Scholar 

  8. Zaki, M.J., Hsiao, C.J.: CHARM: An efficient algorithm for closed itemset mining. In: Proccedings SIAM DM 2002, Arlington, USA (2002)

    Google Scholar 

  9. Becquet, C., Blachon, S., Jeudy, B., Boulicaut, J.F., Gandrillon, O.: Strongassociation- rule mining for large-scale gene-expression data analysis: a case study on human sage data. Genome Biology 12 (2002)

    Google Scholar 

  10. Creighton, C., Hanash, S.: Mining gene expression databases for association rules. Bioinformatics 19, 79–86 (2003)

    Article  Google Scholar 

  11. Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival, I. (ed.) Ordered sets, pp. 445–470. Reidel (1982)

    Google Scholar 

  12. Rioult, F., Boulicaut, J.F., Crémilleux, B., Besson, J.: Using transposition for pattern discovery from microarray data. In: Proceedings ACM SIGMODWorkshop DMKD 2003, San Diego (USA), pp. 73–79 (2003)

    Google Scholar 

  13. Rioult, F., Robardet, C., Blachon, S., Crémilleux, B., Gandrillon, O., Boulicaut, J.F.: Mining concepts from large sage gene expression matrices. In: Proceedings KDID 2003 co-located with ECML-PKDD 2003, Catvat-Dubrovnik (Croatia), pp. 107–118 (2003)

    Google Scholar 

  14. Besson, J., Robardet, C., Boulicaut, J.F.: Constraint-based mining of formal concepts in transactional data. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 615–624. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  15. Besson, J., Robardet, C., Boulicaut, J.F., Rome, S.: Constraint-based concept mining and its application to microarray data analysis. Intelligent Data Analysis Journal 9 (2004) (to appear)

    Google Scholar 

  16. Pensa, R.G., Leschi, C., Besson, J., Boulicaut, J.F.: Assessment of discretization techniques for relevant pattern discovery from gene expression data. In: Proceedings ACM BIOKDD 2004 co-located with SIGKDD 2004, Seattle, USA, pp. 24–30 (2004)

    Google Scholar 

  17. Parthasarathy, S.: Efficient progressive sampling for association rules. In: Proceedings IEEE ICDM 2002, Maebashi City, Japan, pp. 354–361 (2002)

    Google Scholar 

  18. Moore, G.W., Goodman, M., Barnabas, J.: An iterative approach from the standpoint of the additive hypothesis to the dendrogram problem posed by molecular data sets. Journal of Theoretical Biology 38, 423–457 (1973)

    Article  Google Scholar 

  19. Robinsons, D.F.: Comparison of labeled trees with valency three. Journal of Combinatorial Theory, Series B 11, 105–119 (1971)

    Article  MathSciNet  Google Scholar 

  20. DasGupta, B., He, X., Jiang, T., Li, M., Tromp, J., Zhang, L.: On distances between phylogenetic trees. In: Proceedings ACM-SIAM SODA 1997, vol. 55, pp. 427–436 (1997)

    Google Scholar 

  21. DasGupta, B., He, X., Jiang, T., Li, M., Tromp, J., Zhang, L.: On computing the nearest neighbor interchange distance. In: Discrete mathematical problems with medical applications, New Brunswick, NJ, 1999, pp. 125–143. Amer. Math. Soc., Providence (2000)

    Google Scholar 

  22. Finden, C., Gordon, A.: Obtaining common pruned trees. Journal of Classification 2, 255–276 (1985)

    Article  Google Scholar 

  23. Cole, R., Hariharan, R.: An o(n log n) algorithm for the maximum agreement subtree problem for binary trees. In: Proceedings of the 7th annual ACM-SIAM symposium on Discrete algorithms, Atlanta, Georgia, United States, pp. 323–332 (1996)

    Google Scholar 

  24. Bozdech, Z., Llinás, M., Pulliam, B.L., Wong, E., Zhu, J., DeRisi, J.: The transcriptome of the intraerythrocytic developmental cycle of plasmodium falciparum. PLoS Biology 1, 1–16 (2003)

    Article  Google Scholar 

  25. Arbeitman, M., Furlong, E., Imam, F., Johnson, E., Null, B., Baker, B., Krasnow, M., Scott, M., Davis, R., White, K.: Gene expression during the life cycle of drosophila melanogaster. Science 297, 2270–2275 (2002)

    Article  Google Scholar 

  26. Lash, A., Tolstoshev, C., Wagner, L., Schuler, G., Strausberg, R., Riggins, G., Altschul, S.: SAGEmap: A public gene expression resource. Genome Research 10, 1051–1060 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pensa, R.G., Boulicaut, JF. (2005). Boolean Property Encoding for Local Set Pattern Discovery: An Application to Gene Expression Data Analysis. In: Morik, K., Boulicaut, JF., Siebes, A. (eds) Local Pattern Detection. Lecture Notes in Computer Science(), vol 3539. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11504245_8

Download citation

  • DOI: https://doi.org/10.1007/11504245_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26543-6

  • Online ISBN: 978-3-540-31894-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics