Skip to main content

Efficient Error Setting for Subspace Miners

  • Conference paper
  • 2314 Accesses

Part of the Lecture Notes in Computer Science book series (LNAI,volume 8556)

Abstract

A typical mining problem is the extraction of patterns from subspaces of multidimensional data. Such patterns, known as a biclusters, comprise subsets of objects that behave similarly across subsets of attributes, and may overlap each other, i.e., objects/attributes may belong to several patterns, or to none. For many miners, a key input parameter is the maximum allowed error used which greatly affects the quality, quantity and coherency of the mined clusters. As the error is dataset dependent, setting it demands either domain knowledge or some trial-and-error. The paper presents a new method for automatically setting the error to the value that maximizes the number of clusters mined. This error value is strongly correlated to the value for which performance scores are maximized. The correlation is extensively evaluated using six datasets, two mining algorithms, seven prevailing performance measures, and compared with five prior literature methods, demonstrating a substantial improvement in the mining score.

Keywords

  • Biclustering
  • Subspace Mining
  • Error Setting

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aguilar-Ruiz, J.S.: Shifting and scaling patterns from gene expression data. Bioinformatics 21(20), 3840–3845 (2005)

    CrossRef  Google Scholar 

  2. Bache, K., Lichman, M.: UCI Machine Learning Repository (2013)

    Google Scholar 

  3. Berkhin, P.: A survey of clustering data mining techniques. Grouping Multidimensional Data, pp. 25–71 (2006)

    Google Scholar 

  4. Berson, A., Smith, S., Thearling, K.: Building data mining applications for CRM. McGraw-Hill, New York (2000)

    Google Scholar 

  5. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: ICDT, pp. 217–235 (1999)

    Google Scholar 

  6. Bryan, K., Cunningham, P.: Bottom-up biclustering of expression data. In: CIBCB, pp. 1–8 (2006)

    Google Scholar 

  7. Califano, A., Stolovitzky, G., Tu, Y.: Analysis of gene expression microarrays for phenotype classification. In: ISMB, vol. 8, pp. 75–85 (2000)

    Google Scholar 

  8. Cheng, Y., Church, G.M.: Biclustering of expression data. In: ISMB, pp. 93–103 (2000)

    Google Scholar 

  9. Dom, B.E.: An information-theoretic external cluster-validity measure. In: UAI, pp. 137–145 (2002)

    Google Scholar 

  10. Guan, J., Gan, Y., Wang, H.: Discovering pattern-based subspace clusters by pattern tree. KBS 22(8), 569–579 (2009)

    Google Scholar 

  11. Günnemann, S., Färber, I., Müller, E., Assent, I., Seidl, T.: External evaluation measures for subspace clustering. In: CIKM, pp. 1363–1372 (2011)

    Google Scholar 

  12. Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: A survey. TKDE 16(11), 1370–1386 (2004)

    Google Scholar 

  13. Keogh, E., Wei, L., Xi, X., Lee, S.H., Vlachos, M.: LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures. In: VLDB, pp. 882–893 (2006)

    Google Scholar 

  14. Lagarias, J., Reeds, J., Wright, M., Wright, P.: Convergence Properties of the Nelder–Mead Simplex Method in Low Dimensions. SIOPT 9(1), 112–147 (1998)

    CrossRef  MATH  MathSciNet  Google Scholar 

  15. Liu, G., Sim, K., Li, J., Wong, L.: Efficient mining of distance-based subspace clusters. SADM 2(5-6), 427–444 (2009)

    MathSciNet  Google Scholar 

  16. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. TCBB 1(1), 24–45 (2004)

    Google Scholar 

  17. McDaid, A.F., Greene, D., Hurley, N.: Normalized mutual information to evaluate overlapping community finding algorithms. CoRR abs/1110.2515 (2011)

    Google Scholar 

  18. Meilă, M.: Comparing clusterings—an information based distance. J. Multivar. Anal. 98(5), 873–895 (2007)

    CrossRef  MATH  Google Scholar 

  19. Melkman, A.A., Shaham, E.: Sleeved CoClustering. In: KDD, pp. 635–640 (2004)

    Google Scholar 

  20. Moise, G., Zimek, A., Kroeger, P., Kriegel, H., Sander, J.: Subspace and projected clustering: experimental evaluation and analysis. KAIS 21(3), 299–326 (2009)

    Google Scholar 

  21. Patrikainen, A., Meila, M.: Comparing subspace clusterings. TKDE 18(7), 902–916 (2006)

    Google Scholar 

  22. Peeters, R.: The maximum edge biclique problem is NP-complete. DAM 131(3), 651–654 (2003)

    MATH  MathSciNet  Google Scholar 

  23. Pei, J., Zhang, X., Cho, M., Wang, H., Yu, P.S.: Maple: A fast algorithm for maximal pattern-based clustering. In: ICDM, pp. 259–266 (2003)

    Google Scholar 

  24. Pio, G., Ceci, M., D’Elia, D., Loglisci, C., Malerba, D.: A Novel Biclustering Algorithm for the Discovery of Meaningful Biological Correlations between microRNAs and their Target Genes. BMC Bioinformatics 14(7), 1–25 (2013)

    Google Scholar 

  25. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical recipes in C: the art of scientific computing. Cambridge University Press (1992)

    Google Scholar 

  26. Procopiuc, C.M., Jones, M., Agarwal, P.K., Murali, T.: A Monte Carlo algorithm for fast projective clustering. In: SIGMOD, pp. 418–427 (2002)

    Google Scholar 

  27. Rosenberg, A., Hirschberg, J.: V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. In: EMNLP-CoNLL, vol. 7, pp. 410–420 (2007)

    Google Scholar 

  28. Shaham, E., Sarne, D., Ben-Moshe, B.: Sleeved co-clustering of lagged data. KAIS 31(2), 251–279 (2012)

    Google Scholar 

  29. Supporting webpage (2013), http://tinyurl.com/Supporting-MLDM14

  30. Van Rijsbergen, C.: Information retrieval, 2nd edn. Butterworths (1979)

    Google Scholar 

  31. Wang, H., Chu, F., Fan, W., Yu, P.S., Pei, J.: A fast algorithm for subspace clustering by pattern similarity. In: SSDBM, pp. 51–60 (2004)

    Google Scholar 

  32. Wang, H., Wang, W., Yang, J., Yu, P.S.: Clustering by pattern similarity in large data sets. In: SIGMOD, pp. 394–405 (2002)

    Google Scholar 

  33. Yiu, M.L., Mamoulis, N.: Iterative projected clustering by subspace mining. TKDE 17(2), 176–189 (2005)

    Google Scholar 

  34. Yoon, S., Nardini, C., Benini, L., De Micheli, G.: Enhanced pClustering and its applications to gene expression data. In: BIBE, pp. 275–282 (2004)

    Google Scholar 

  35. Zeng, Y., Tang, J., Garcia-Frias, J., Gao, G.R.: An adaptive meta-clustering approach: combining the information from different clustering results. In: CSB, pp. 276–287 (2002)

    Google Scholar 

  36. Zhao, L., Zaki, M.J.: TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data. In: SIGMOD, pp. 694–705 (2005)

    Google Scholar 

  37. Zhao, Y., Karypis, G.: Criterion functions for document clustering: Experiments and analysis. Machine Learning (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Shaham, E., Sarne, D., Ben-Moshe, B. (2014). Efficient Error Setting for Subspace Miners. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2014. Lecture Notes in Computer Science(), vol 8556. Springer, Cham. https://doi.org/10.1007/978-3-319-08979-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08979-9_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08978-2

  • Online ISBN: 978-3-319-08979-9

  • eBook Packages: Computer ScienceComputer Science (R0)