Abstract
Features or attributes play an important role when handling multi-dimensional datasets. Generally, not all the features are needed to find several groups of similar objects in traditional clustering methods because some of the features may not be relevant and also redundant. Hence, the concept of identifying subsets of the features that are relevant to clusters is introduced, instead of using the full set of features. This chapter discusses the use of the prior knowledge of the importance of features and their interaction in constructing both fuzzy measures and signed fuzzy measures for subspace clustering. The Choquet integral, which is known as a useful aggregation operator with respect to fuzzy measure, is used to aggregate the importance and interaction of the features. The concept of fuzzy knowledge-based subspace clustering is applied especially to the analysis of life science data in this chapter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 61–72. ACM, New York (1999)
Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 70–81. ACM, New York (2000)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 94–105. ACM, New York (1998)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Mining Knowledge Discovery 11, 5–33 (2005)
Berkhin, P.: A survey of clustering data mining techniques, pp. 25–71. Springer, Heidelberg (2006)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer, Norwell (1981)
Chan, E.Y., Ching, W.K., Ng, M.K., Huang, J.Z.: An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recognition 37(5), 943–952 (2004)
Chang, J.-W., Jin, D.-S.: A new cell-based clustering method for large, high-dimensional data in data mining applications. In: Proceedings of the ACM Symposium on Applied Computing, pp. 503–507. ACM, New York (2002)
Chen, T.-Y., Wang, J.-C., Tzeng, G.-H.: Identification of general fuzzy measures by genetic algorithms based on partial information. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 30(4), 517–528 (2000)
Cheng, C.-H., Fu, A.W., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 84–93. ACM, New York (1999)
Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 93–103. AAAI Press (2000)
Choquet, G.: Theory of capacities. Ann. Inst. Fourier 5, 131–295 (1953)
Deng, Z., Choi, K.-S., Chung, F.-L., Wang, S.: Enhanced soft subspace clustering integrating within-cluster and between-cluster information. Pattern Recognition 43, 767–781 (2010)
Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., Papadopoulos, D.: Locally adaptive metrics for clustering high dimensional data. Data Mining and Knowledge Discovery 14, 63–97 (2007)
Domeniconi, C., Papadopoulos, D., Gunopulos, D., Ma, S.: Subspace clustering of high dimensional data. In: Proceedings of the 4th SIAM International Conference on Data Mining (SDM), vol. 6, pp. 517–521 (2004)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification, 2nd edn. Pattern Classification and Scene Analysis: Pattern Classification. Wiley (2001)
Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Cluster Analysis, 5th edn. Wiley Series in Probability and Statistics. Wiley (2011)
Fang, H., Rizzo, M.L., Wang, H., Espy, K.A., Wang, Z.: A new nonlinear classifier with a penalized signed fuzzy measure using effective genetic algorithm. Pattern Recognition 43(4), 1393–1401 (2010)
Fisher, D., Xu, L., Carnes, J., Reich, Y., Fenves, J., Chen, J., Shiavi, R., Biswas, G., Weinberg, J.: Applying AI clustering to engineering tasks. IEEE Expert 8(6), 51–60 (1993)
Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2(2), 139–172 (1987)
Frank, A., Asuncion, A.: UCI machine learning repository. School of Information and Computer Sciences. University of California, Irvine (2010), http://archive.ics.uci.edu/ml
Freitas, A.A.: Understanding the crucial role of attribute interaction in data mining. Artificial Intelligence Review 16(3), 177–199 (2001)
Friedman, J.H., Meulman, J.J.: Clustering objects on subsets of attributes (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66(4), 815–849 (2004)
Gan, G., Wu, J.: A convergence theorem for the fuzzy subspace clustering (FSC) algorithm. Pattern Recognition 41(6), 1939–1947 (2008)
Gan, G., Wu, J., Yang, Z.-J.: A Fuzzy Subspace Algorithm for Clustering High Dimensional Data. In: Li, X., Zaïane, O.R., Li, Z.-h. (eds.) ADMA 2006. LNCS (LNAI), vol. 4093, pp. 271–278. Springer, Heidelberg (2006)
Goil, S., Nagesh, H., Choudhary, A.: MAFIA: efficient and scalable subspace clustering for very large data sets. Tech. Rep. CPDC-TR-9906-010, Northwest University (1999)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston (1989)
Goldberg, D.E., Deb, K.: A comparative analysis of selection schemes used in genetic algorithms. Foundations of Genetic Algorithms 1, 69–93 (1991)
Grabisch, M.: A new algorithm for identifying fuzzy measures and its application to pattern recognition. In: Proceedings of 1995 IEEE International Conference on Fuzzy Systems, 1995. International Joint Conference of the Fourth IEEE International Conference on Fuzzy Systems and the Second International Fuzzy Engineering Symposium, vol. 1, pp. 145–150 (1995)
Grabisch, M.: The application of fuzzy integrals in multicriteria decision making. European Journal of Operational Research 89(3), 445–456 (1996)
Grabisch, M.: The representation of importance and interaction of features by fuzzy measures. Pattern Recognition Letters 17(6), 567–575 (1996)
Grabisch, M.: Fuzzy integral for classification and feature extraction. In: Grabisch, M., Murofushi, T., Sugeno, M. (eds.) Fuzzy Measure and Integrals, pp. 415–434. Physica-Verlag, New York (2000)
Grabisch, M.: Fuzzy Measures and Integrals: Theory and Applications. Springer-Verlag New York, Inc., Secaucus (2000)
Grabisch, M., Labreuche, C.: A decade of application of the Choquet and Sugeno integrals in multi-criteria decision aid. 4OR: A Quarterly Journal of Operations Research 6, 1–44 (2008)
Grabisch, M., Murofushi, T., Sugeno, M. (eds.): Fuzzy Measures and Integrals: Theory and Applications. STUDFUZZ. Physica-Verlag, Berlin (2000)
Grabisch, M., Sugeno, M.: Multi-attribute classification using fuzzy integral. In: IEEE International Conference on Fuzzy Systems, pp. 47–54 (1992)
Guo, G., Chen, S., Chen, L.: Soft subspace clustering with an improved feature weight self-adjustment mechanism. International Journal of Machine Learning and Cybernetics 3, 39–49 (2012)
Hartigan, J.A.: Clustering Algorithms. Wiley series in probability and mathematical statistics. Applied probability and statistics. Wiley, New York (1975)
Huang, J.Z., Ng, M.K., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(5), 657–668 (2005)
Huang, K.-K., Shieh, J.-I., Lee, K.-J., Wu, S.-N.: Applying a generalized choquet integral with signed fuzzy measure based on the complexity to evaluate the overall satisfaction of the patients. In: Proceedings of the Ninth International Conference on Machine Learning and Cybernetics (ICMLC 2010), vol. 5, pp. 2377–2382 (2010)
Jain, A., Duin, R., Mao, J.: Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1), 4–37 (2000)
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognition Letters 31(8), 651–666 (2010)
Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle River (1988)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999)
Jakulin, A., Bratko, I.: Quantifying and visualizing attribute interactions: An Approach Based on Entropy (2004), http://arxiv.org/abs/cs.AI/0308002v3
Jia, W., Zhenyuan, W.: Using neural networks to determine Sugeno measures by statistics. Neural Networks 10(1), 183–195 (1997)
Jiao, B.: Hahn decomposition theory for signed fuzzy measure. Ph.D. thesis, Hebei University (1992)
Jing, L., Ng, M.K., Huang, J.Z.: An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Transactions on Knowledge and Data Engineering 19(8), 1026–1041 (2007)
Jing, L., Ng, M.K., Xu, J., Huang, J.Z.: Subspace Clustering of Text Documents with Feature Weighting K-Means Algorithm. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 802–812. Springer, Heidelberg (2005)
Klir, G.J., Wang, Z., Harmanec, D.: Constructing fuzzy measures in expert systems. Fuzzy Sets and Systems 92(2), 251–264 (1997)
Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data 3(1), 1–58 (2009)
Larbani, M., Huang, C.-Y., Tzeng, G.-H.: A novel method for fuzzy measure identification. International Journal of Fuzzy Systems 13(1), 24–34 (2011)
Liu, B., Xia, Y., Yu, P.S.: Clustering through decision tree construction. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, CIKM 2000, pp. 20–29. ACM, New York (2000)
Liu, H.-C., Liu, T.-S.: A novel fuzzy measure and its extensional signed fuzzy measure. In: Proceedings of the 10th WSEAS International Conference on Systems Theory and Scientific Computation, ISTASC 2010, pp. 107–111. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point (2010)
Liu, X.: Hahn decomposition theorem for infinite signed fuzzy measure. Fuzzy Sets and Systems 57(3), 377–380 (1993)
Liu, X.: Further discussion on the hahn decomposition theorem for signed fuzzy measure. Fuzzy Sets and Systems 57(3), 89–95 (1995)
Makarenkov, V., Legendre, P.: Optimal variable weighting for ultrametric and additive trees and k-means partitioning: methods and software. Journal of Classification 18(2), 245–271 (2001)
Marichal, J.-L.: On sugeno integral as an aggregation function. Fuzzy Sets and Systems 114(3), 347–365 (2000)
Mesiar, R.: Fuzzy measures and integrals. Fuzzy Sets and Systems 156(3), 365–370 (2005)
Mirkin, B.: Reinterpreting the category utility function. Machine Learning 45, 219–228 (2001)
Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill, Inc., New York (1997)
Modha, D.S., Spangler, W.S.: Feature weighting in k-means clustering. Machine Learning 52, 217–237 (2003)
Moise, G., Zimek, A., Kröger, P., Kriegel, H.-P., Sander, J.: Subspace and projected clustering: experimental evaluation and analysis. Knowledge and Information Systems 21(3), 299–326 (2009)
Murofushi, T., Sugeno, M., Machida, M.: Non-monotonic fuzzy measures and the Choquet integral. Fuzzy Sets and Systems 64(1), 73–86 (1994)
Ng, T.F., Pham, T.D., Jia, X.: Feature interaction in subspace clustering using the Choquet integral. Pattern Recognition 45(7), 2645–2660 (2012)
Pap, E.: Null-additive Set Functions. Mathematics and its applications. Kluwer Academic Publishers, Dordrecht (1995)
Pap, E.: The Jordan decomposition of the null-additive signed fuzzy measure. Novi. Sad J. Math. 30(1), 1–7 (2000)
Pap, E.: σ-null-additive set function. Novi. Sad J. Math. 32(1), 47–57 (2002)
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. SIGKDD Explorations Newsletter 6, 90–105 (2004)
Patrikainen, A., Meila, M.: Comparing subspace clusterings. IEEE Transactions on Knowledge and Data Engineering 18(7), 902–916 (2006)
Petridis, V., Kazarlis, S., Bakirtzis, A.: Varying fitness functions in genetic algorithm constrained optimization: the cutting stock and unit commitment problems. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 28(5), 629–640 (1998)
Pham, T.D.: An image restoration by fusion. Pattern Recognition 34(12), 2403–2411 (2001)
Pham, T.D.: Fuzzy posterior-probabilistic fusion. Pattern Recognition 44(5), 1023–1030 (2011)
Procopiuc, C.M., Jones, M., Agarwal, P.K., Murali, T.M.: A Monte Carlo algorithm for fast projective clustering. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 418–427. ACM, New York (2002)
Sim, K., Gopalkrishnan, V., Zimek, A., Cong, G.: A survey on enhanced subspace clustering. Data Mining and Knowledge Discovery, 1–66 (2012)
Soria-Frisch, A.: Unsupervised construction of fuzzy measures through self-organizing feature maps and its application in color image segmentation. International Journal of Approximate Reasoning 41, 23–42 (2006)
Sugeno, M.: Theory of fuzzy integrals and its applications. Ph.D. thesis, Tokyo Institute of Technology (1974)
Tahani, H., Keller, J.M.: Information fusion in computer vision using the fuzzy integral. IEEE Transactions on Systems, Man and Cybernetics 20(3), 733–741 (1990)
Theodoridis, S., Koutroumbas, K.: Pattern recognition, 4th edn. Academic Press (2009)
Wang, W.: Genetic algorithms for determining fuzzy measures from data. Journal of Intelligent and Fuzzy Systems 6(2), 171–183 (1998)
Wang, X.-Z., He, Y.-L., Dong, L.-C., Zhao, H.-Y.: Particle swarm optimization for determining fuzzy measures from data. Information Sciences 181(19), 4230–4252 (2011)
Wang, Z., Guo, H.-F.: A new genetic algorithm for nonlinear multiregressions based on generalized Choquet integrals. In: Proceedings of the 12th IEEE International Conference on Fuzzy Systems, FUZZ 2003, vol. 2, pp. 819–821 (2003)
Wang, Z., Yang, R., Heng, P.A., Leung, K.S.: Real-valued choquet integrals with fuzzy-valued integrand. Fuzzy Sets and Systems 157(2), 256–269 (2006)
Woo, K.-G., Lee, J.-H., Kim, M.-H., Lee, Y.-J.: FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting. Information and Software Technology 46(4), 255–271 (2004)
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)
Yager, R.R.: On ordered weighted averaging aggregation operators in multicriteria decision making. IEEE Trans. Syst. Man Cybern. 18(1), 183–190 (1988)
Yang, J., Wang, W., Wang, H., Yu, P.: δ-clusters: capturing subspace correlation in a large data set. In: Proceedings of the 18th International Conference on Data Engineering, ICDE 2002, pp. 517–528. IEEE Computer Society, Washington, DC (2002)
Yin, X., Germay, N.: Investigations on solving the load flow problem by genetic algorithms
Yuan, B., Klir, G.J.: Constructing fuzzy measures: a new method and its application to cluster analysis. In: Proceedings of the 1996 Biennial Conference of the North American Fuzzy Information Processing Society, NAFIPS 1996, Berkeley, CA, pp. 567–571 (1996)
Zimek, A.: Correlation clustering. SIGKDD Explorations Newsletter 11, 53–54 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Ng, T.F., Pham, T.D., Jia, X., Fraser, D. (2013). Fuzzy Knowledge-Based Subspace Clustering for Life Science Data Analysis. In: Pham, T., Jain, L. (eds) Knowledge-Based Systems in Biomedicine and Computational Life Science. Studies in Computational Intelligence, vol 450. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33015-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-33015-5_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33014-8
Online ISBN: 978-3-642-33015-5
eBook Packages: EngineeringEngineering (R0)