Skip to main content

Fuzzy Knowledge-Based Subspace Clustering for Life Science Data Analysis

  • Chapter
Knowledge-Based Systems in Biomedicine and Computational Life Science

Part of the book series: Studies in Computational Intelligence ((SCI,volume 450))

  • 751 Accesses

Abstract

Features or attributes play an important role when handling multi-dimensional datasets. Generally, not all the features are needed to find several groups of similar objects in traditional clustering methods because some of the features may not be relevant and also redundant. Hence, the concept of identifying subsets of the features that are relevant to clusters is introduced, instead of using the full set of features. This chapter discusses the use of the prior knowledge of the importance of features and their interaction in constructing both fuzzy measures and signed fuzzy measures for subspace clustering. The Choquet integral, which is known as a useful aggregation operator with respect to fuzzy measure, is used to aggregate the importance and interaction of the features. The concept of fuzzy knowledge-based subspace clustering is applied especially to the analysis of life science data in this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 61–72. ACM, New York (1999)

    Google Scholar 

  2. Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 70–81. ACM, New York (2000)

    Chapter  Google Scholar 

  3. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 94–105. ACM, New York (1998)

    Google Scholar 

  4. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Mining Knowledge Discovery 11, 5–33 (2005)

    Article  MathSciNet  Google Scholar 

  5. Berkhin, P.: A survey of clustering data mining techniques, pp. 25–71. Springer, Heidelberg (2006)

    Google Scholar 

  6. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer, Norwell (1981)

    Book  MATH  Google Scholar 

  7. Chan, E.Y., Ching, W.K., Ng, M.K., Huang, J.Z.: An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recognition 37(5), 943–952 (2004)

    Article  MATH  Google Scholar 

  8. Chang, J.-W., Jin, D.-S.: A new cell-based clustering method for large, high-dimensional data in data mining applications. In: Proceedings of the ACM Symposium on Applied Computing, pp. 503–507. ACM, New York (2002)

    Google Scholar 

  9. Chen, T.-Y., Wang, J.-C., Tzeng, G.-H.: Identification of general fuzzy measures by genetic algorithms based on partial information. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 30(4), 517–528 (2000)

    Article  Google Scholar 

  10. Cheng, C.-H., Fu, A.W., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 84–93. ACM, New York (1999)

    Chapter  Google Scholar 

  11. Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 93–103. AAAI Press (2000)

    Google Scholar 

  12. Choquet, G.: Theory of capacities. Ann. Inst. Fourier 5, 131–295 (1953)

    Article  MathSciNet  Google Scholar 

  13. Deng, Z., Choi, K.-S., Chung, F.-L., Wang, S.: Enhanced soft subspace clustering integrating within-cluster and between-cluster information. Pattern Recognition 43, 767–781 (2010)

    Article  MATH  Google Scholar 

  14. Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., Papadopoulos, D.: Locally adaptive metrics for clustering high dimensional data. Data Mining and Knowledge Discovery 14, 63–97 (2007)

    Article  MathSciNet  Google Scholar 

  15. Domeniconi, C., Papadopoulos, D., Gunopulos, D., Ma, S.: Subspace clustering of high dimensional data. In: Proceedings of the 4th SIAM International Conference on Data Mining (SDM), vol. 6, pp. 517–521 (2004)

    Google Scholar 

  16. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification, 2nd edn. Pattern Classification and Scene Analysis: Pattern Classification. Wiley (2001)

    Google Scholar 

  17. Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Cluster Analysis, 5th edn. Wiley Series in Probability and Statistics. Wiley (2011)

    Google Scholar 

  18. Fang, H., Rizzo, M.L., Wang, H., Espy, K.A., Wang, Z.: A new nonlinear classifier with a penalized signed fuzzy measure using effective genetic algorithm. Pattern Recognition 43(4), 1393–1401 (2010)

    Article  MATH  Google Scholar 

  19. Fisher, D., Xu, L., Carnes, J., Reich, Y., Fenves, J., Chen, J., Shiavi, R., Biswas, G., Weinberg, J.: Applying AI clustering to engineering tasks. IEEE Expert 8(6), 51–60 (1993)

    Article  Google Scholar 

  20. Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2(2), 139–172 (1987)

    Google Scholar 

  21. Frank, A., Asuncion, A.: UCI machine learning repository. School of Information and Computer Sciences. University of California, Irvine (2010), http://archive.ics.uci.edu/ml

  22. Freitas, A.A.: Understanding the crucial role of attribute interaction in data mining. Artificial Intelligence Review 16(3), 177–199 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  23. Friedman, J.H., Meulman, J.J.: Clustering objects on subsets of attributes (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66(4), 815–849 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  24. Gan, G., Wu, J.: A convergence theorem for the fuzzy subspace clustering (FSC) algorithm. Pattern Recognition 41(6), 1939–1947 (2008)

    Article  MATH  Google Scholar 

  25. Gan, G., Wu, J., Yang, Z.-J.: A Fuzzy Subspace Algorithm for Clustering High Dimensional Data. In: Li, X., Zaïane, O.R., Li, Z.-h. (eds.) ADMA 2006. LNCS (LNAI), vol. 4093, pp. 271–278. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  26. Goil, S., Nagesh, H., Choudhary, A.: MAFIA: efficient and scalable subspace clustering for very large data sets. Tech. Rep. CPDC-TR-9906-010, Northwest University (1999)

    Google Scholar 

  27. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston (1989)

    MATH  Google Scholar 

  28. Goldberg, D.E., Deb, K.: A comparative analysis of selection schemes used in genetic algorithms. Foundations of Genetic Algorithms 1, 69–93 (1991)

    MathSciNet  Google Scholar 

  29. Grabisch, M.: A new algorithm for identifying fuzzy measures and its application to pattern recognition. In: Proceedings of 1995 IEEE International Conference on Fuzzy Systems, 1995. International Joint Conference of the Fourth IEEE International Conference on Fuzzy Systems and the Second International Fuzzy Engineering Symposium, vol. 1, pp. 145–150 (1995)

    Google Scholar 

  30. Grabisch, M.: The application of fuzzy integrals in multicriteria decision making. European Journal of Operational Research 89(3), 445–456 (1996)

    Article  MATH  Google Scholar 

  31. Grabisch, M.: The representation of importance and interaction of features by fuzzy measures. Pattern Recognition Letters 17(6), 567–575 (1996)

    Article  Google Scholar 

  32. Grabisch, M.: Fuzzy integral for classification and feature extraction. In: Grabisch, M., Murofushi, T., Sugeno, M. (eds.) Fuzzy Measure and Integrals, pp. 415–434. Physica-Verlag, New York (2000)

    Google Scholar 

  33. Grabisch, M.: Fuzzy Measures and Integrals: Theory and Applications. Springer-Verlag New York, Inc., Secaucus (2000)

    MATH  Google Scholar 

  34. Grabisch, M., Labreuche, C.: A decade of application of the Choquet and Sugeno integrals in multi-criteria decision aid. 4OR: A Quarterly Journal of Operations Research 6, 1–44 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  35. Grabisch, M., Murofushi, T., Sugeno, M. (eds.): Fuzzy Measures and Integrals: Theory and Applications. STUDFUZZ. Physica-Verlag, Berlin (2000)

    MATH  Google Scholar 

  36. Grabisch, M., Sugeno, M.: Multi-attribute classification using fuzzy integral. In: IEEE International Conference on Fuzzy Systems, pp. 47–54 (1992)

    Google Scholar 

  37. Guo, G., Chen, S., Chen, L.: Soft subspace clustering with an improved feature weight self-adjustment mechanism. International Journal of Machine Learning and Cybernetics 3, 39–49 (2012)

    Article  Google Scholar 

  38. Hartigan, J.A.: Clustering Algorithms. Wiley series in probability and mathematical statistics. Applied probability and statistics. Wiley, New York (1975)

    MATH  Google Scholar 

  39. Huang, J.Z., Ng, M.K., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(5), 657–668 (2005)

    Article  Google Scholar 

  40. Huang, K.-K., Shieh, J.-I., Lee, K.-J., Wu, S.-N.: Applying a generalized choquet integral with signed fuzzy measure based on the complexity to evaluate the overall satisfaction of the patients. In: Proceedings of the Ninth International Conference on Machine Learning and Cybernetics (ICMLC 2010), vol. 5, pp. 2377–2382 (2010)

    Google Scholar 

  41. Jain, A., Duin, R., Mao, J.: Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1), 4–37 (2000)

    Article  Google Scholar 

  42. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognition Letters 31(8), 651–666 (2010)

    Article  Google Scholar 

  43. Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle River (1988)

    MATH  Google Scholar 

  44. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999)

    Article  Google Scholar 

  45. Jakulin, A., Bratko, I.: Quantifying and visualizing attribute interactions: An Approach Based on Entropy (2004), http://arxiv.org/abs/cs.AI/0308002v3

  46. Jia, W., Zhenyuan, W.: Using neural networks to determine Sugeno measures by statistics. Neural Networks 10(1), 183–195 (1997)

    Article  Google Scholar 

  47. Jiao, B.: Hahn decomposition theory for signed fuzzy measure. Ph.D. thesis, Hebei University (1992)

    Google Scholar 

  48. Jing, L., Ng, M.K., Huang, J.Z.: An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Transactions on Knowledge and Data Engineering 19(8), 1026–1041 (2007)

    Article  Google Scholar 

  49. Jing, L., Ng, M.K., Xu, J., Huang, J.Z.: Subspace Clustering of Text Documents with Feature Weighting K-Means Algorithm. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 802–812. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  50. Klir, G.J., Wang, Z., Harmanec, D.: Constructing fuzzy measures in expert systems. Fuzzy Sets and Systems 92(2), 251–264 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  51. Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data 3(1), 1–58 (2009)

    Article  Google Scholar 

  52. Larbani, M., Huang, C.-Y., Tzeng, G.-H.: A novel method for fuzzy measure identification. International Journal of Fuzzy Systems 13(1), 24–34 (2011)

    MathSciNet  Google Scholar 

  53. Liu, B., Xia, Y., Yu, P.S.: Clustering through decision tree construction. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, CIKM 2000, pp. 20–29. ACM, New York (2000)

    Chapter  Google Scholar 

  54. Liu, H.-C., Liu, T.-S.: A novel fuzzy measure and its extensional signed fuzzy measure. In: Proceedings of the 10th WSEAS International Conference on Systems Theory and Scientific Computation, ISTASC 2010, pp. 107–111. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point (2010)

    Google Scholar 

  55. Liu, X.: Hahn decomposition theorem for infinite signed fuzzy measure. Fuzzy Sets and Systems 57(3), 377–380 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  56. Liu, X.: Further discussion on the hahn decomposition theorem for signed fuzzy measure. Fuzzy Sets and Systems 57(3), 89–95 (1995)

    Google Scholar 

  57. Makarenkov, V., Legendre, P.: Optimal variable weighting for ultrametric and additive trees and k-means partitioning: methods and software. Journal of Classification 18(2), 245–271 (2001)

    MathSciNet  MATH  Google Scholar 

  58. Marichal, J.-L.: On sugeno integral as an aggregation function. Fuzzy Sets and Systems 114(3), 347–365 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  59. Mesiar, R.: Fuzzy measures and integrals. Fuzzy Sets and Systems 156(3), 365–370 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  60. Mirkin, B.: Reinterpreting the category utility function. Machine Learning 45, 219–228 (2001)

    Article  MATH  Google Scholar 

  61. Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill, Inc., New York (1997)

    MATH  Google Scholar 

  62. Modha, D.S., Spangler, W.S.: Feature weighting in k-means clustering. Machine Learning 52, 217–237 (2003)

    Article  MATH  Google Scholar 

  63. Moise, G., Zimek, A., Kröger, P., Kriegel, H.-P., Sander, J.: Subspace and projected clustering: experimental evaluation and analysis. Knowledge and Information Systems 21(3), 299–326 (2009)

    Article  Google Scholar 

  64. Murofushi, T., Sugeno, M., Machida, M.: Non-monotonic fuzzy measures and the Choquet integral. Fuzzy Sets and Systems 64(1), 73–86 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  65. Ng, T.F., Pham, T.D., Jia, X.: Feature interaction in subspace clustering using the Choquet integral. Pattern Recognition 45(7), 2645–2660 (2012)

    Article  MATH  Google Scholar 

  66. Pap, E.: Null-additive Set Functions. Mathematics and its applications. Kluwer Academic Publishers, Dordrecht (1995)

    MATH  Google Scholar 

  67. Pap, E.: The Jordan decomposition of the null-additive signed fuzzy measure. Novi. Sad J. Math. 30(1), 1–7 (2000)

    MathSciNet  MATH  Google Scholar 

  68. Pap, E.: σ-null-additive set function. Novi. Sad J. Math. 32(1), 47–57 (2002)

    MathSciNet  Google Scholar 

  69. Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. SIGKDD Explorations Newsletter 6, 90–105 (2004)

    Article  Google Scholar 

  70. Patrikainen, A., Meila, M.: Comparing subspace clusterings. IEEE Transactions on Knowledge and Data Engineering 18(7), 902–916 (2006)

    Article  Google Scholar 

  71. Petridis, V., Kazarlis, S., Bakirtzis, A.: Varying fitness functions in genetic algorithm constrained optimization: the cutting stock and unit commitment problems. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 28(5), 629–640 (1998)

    Article  Google Scholar 

  72. Pham, T.D.: An image restoration by fusion. Pattern Recognition 34(12), 2403–2411 (2001)

    Article  MATH  Google Scholar 

  73. Pham, T.D.: Fuzzy posterior-probabilistic fusion. Pattern Recognition 44(5), 1023–1030 (2011)

    Article  MATH  Google Scholar 

  74. Procopiuc, C.M., Jones, M., Agarwal, P.K., Murali, T.M.: A Monte Carlo algorithm for fast projective clustering. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 418–427. ACM, New York (2002)

    Google Scholar 

  75. Sim, K., Gopalkrishnan, V., Zimek, A., Cong, G.: A survey on enhanced subspace clustering. Data Mining and Knowledge Discovery, 1–66 (2012)

    Google Scholar 

  76. Soria-Frisch, A.: Unsupervised construction of fuzzy measures through self-organizing feature maps and its application in color image segmentation. International Journal of Approximate Reasoning 41, 23–42 (2006)

    Article  MathSciNet  Google Scholar 

  77. Sugeno, M.: Theory of fuzzy integrals and its applications. Ph.D. thesis, Tokyo Institute of Technology (1974)

    Google Scholar 

  78. Tahani, H., Keller, J.M.: Information fusion in computer vision using the fuzzy integral. IEEE Transactions on Systems, Man and Cybernetics 20(3), 733–741 (1990)

    Article  Google Scholar 

  79. Theodoridis, S., Koutroumbas, K.: Pattern recognition, 4th edn. Academic Press (2009)

    Google Scholar 

  80. Wang, W.: Genetic algorithms for determining fuzzy measures from data. Journal of Intelligent and Fuzzy Systems 6(2), 171–183 (1998)

    Google Scholar 

  81. Wang, X.-Z., He, Y.-L., Dong, L.-C., Zhao, H.-Y.: Particle swarm optimization for determining fuzzy measures from data. Information Sciences 181(19), 4230–4252 (2011)

    Article  MATH  Google Scholar 

  82. Wang, Z., Guo, H.-F.: A new genetic algorithm for nonlinear multiregressions based on generalized Choquet integrals. In: Proceedings of the 12th IEEE International Conference on Fuzzy Systems, FUZZ 2003, vol. 2, pp. 819–821 (2003)

    Google Scholar 

  83. Wang, Z., Yang, R., Heng, P.A., Leung, K.S.: Real-valued choquet integrals with fuzzy-valued integrand. Fuzzy Sets and Systems 157(2), 256–269 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  84. Woo, K.-G., Lee, J.-H., Kim, M.-H., Lee, Y.-J.: FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting. Information and Software Technology 46(4), 255–271 (2004)

    Article  Google Scholar 

  85. Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)

    Article  Google Scholar 

  86. Yager, R.R.: On ordered weighted averaging aggregation operators in multicriteria decision making. IEEE Trans. Syst. Man Cybern. 18(1), 183–190 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  87. Yang, J., Wang, W., Wang, H., Yu, P.: δ-clusters: capturing subspace correlation in a large data set. In: Proceedings of the 18th International Conference on Data Engineering, ICDE 2002, pp. 517–528. IEEE Computer Society, Washington, DC (2002)

    Chapter  Google Scholar 

  88. Yin, X., Germay, N.: Investigations on solving the load flow problem by genetic algorithms

    Google Scholar 

  89. Yuan, B., Klir, G.J.: Constructing fuzzy measures: a new method and its application to cluster analysis. In: Proceedings of the 1996 Biennial Conference of the North American Fuzzy Information Processing Society, NAFIPS 1996, Berkeley, CA, pp. 567–571 (1996)

    Google Scholar 

  90. Zimek, A.: Correlation clustering. SIGKDD Explorations Newsletter 11, 53–54 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Theam Foo Ng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ng, T.F., Pham, T.D., Jia, X., Fraser, D. (2013). Fuzzy Knowledge-Based Subspace Clustering for Life Science Data Analysis. In: Pham, T., Jain, L. (eds) Knowledge-Based Systems in Biomedicine and Computational Life Science. Studies in Computational Intelligence, vol 450. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33015-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33015-5_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33014-8

  • Online ISBN: 978-3-642-33015-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics