Abstract
In traditional co-clustering, the only basis for the clustering task is a given relationship matrix, describing the strengths of the relationships between pairs of elements in the different domains. Relying on this single input matrix, co-clustering discovers relationships holding among groups of elements from the two input domains. In many real life applications, on the other hand, other background knowledge or metadata about one or more of the two input domain dimensions may be available and, if leveraged properly, such metadata might play a significant role in the effectiveness of the co-clustering process. How additional metadata affects co-clustering, however, depends on how the process is modified to be context-aware. In this paper, we propose, compare, and evaluate three alternative strategies (metadata-driven, metadata-constrained, and metadata-injected co-clustering) for embedding available contextual knowledge into the co-clustering process. Experimental results show that it is possible to leverage the available metadata in discovering contextually-relevant co-clusters, without significant overheads in terms of information theoretical co-cluster quality or execution cost.
Similar content being viewed by others
Notes
The matrix is re-normalized after the application of the combination function to ensure that information-theoretic co-clustering, which treats the values in the matrix as probability distributions, can be applied. Due to this renormalization, the combination function sum() is equivalent to the average() (the two functions would differ for a scaling factor 2, which is absorbed by re-normalization).
References
Alp Aslandogan, Y., Thier, C., Yu, C. T., Liu, C., & Nair, K. R. (1995). Design, implementation and evaluation of score (a system for content based retrieval of pictures). In ICDE ’95: Proceedings of the eleventh international conference on data engineering (pp. 280–287). Washington: IEEE.
Baier, D., Gaul, W., & Schader, M. (1997). Two-mode overlapping clustering with applications to simultaneous benefit segmentation and market structuring. In R. Klar, & O. Opitz (Eds.), Classification and knowledge organization: Recent advances and applications (pp. 557–566). Springer.
Banerjee, A., Dhillon, I., Ghosh, J., Merugu, S., & Modha, D. S. (2007). A generalized maximum entropy approach to bregman co-clustering and matrix approximation. Journal of Machine Learning Research, 8, 1919–1986.
Basu, S., Banerjee, A., & Mooney, R. J. (2002). Semi-supervised clustering by seeding. In ICML’02: Proceedings of the 9th international conference on machine learning (pp. 27–34). San Francisco: Morgan Kaufmann.
Basu, S., Bilenko, M., & Mooney, R. J. (2004). A probabilistic framework for semi-supervised clustering. In KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 59–68). New York: ACM.
Bilenko, M., Basu, S., & Mooney, R. J. (2004). Integrating constraints and metric learning in semi-supervised clustering. In ICML (pp. 81–88).
Bilenko, M., & Mooney, R. J. (2003). Adaptive duplicate detection using learnable string similarity measures. In KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 39–48). New York: ACM.
Bishop, C. M. (2006). Pattern recognition and machine learning (Information science and statistics). New York: Springer.
Candan, K. S., Cataldi, M., Sapino, M. L., & Schifanella, C. (2008). Structure- and extension-informed taxonomy alignment. In Proceedings of the 4th international VLDB workshop on ontology-based techniques for databases in information systems and knowledge systems, ODBIS 2008, Auckland, New Zealand, 23 August 2008, co-located with the 34th international conference on very large data bases (pp. 1–8).
Candan, K. S., & Li, W.-S. (2001). On similarity measures for multimedia database applications. Knowledge and Information Systems, 3(1), 30–51.
Cataldi, M., Schifanella, C., Candan, K. S., Sapino, M. L., & Di Caro, L. (2009). Cosena: A context-based search and navigation system. In The first international acm conference on management of emergent digital ecosystems (MEDES). Lyon: ACM.
Chen, Y., Dong, M., & Wan, W. (2009). Image co-clustering with multi-modality features and user feedbacks. In Proceedings of the seventeen ACM international conference on multimedia, MM ’09 (pp. 689–692). New York: ACM. ISBN 978-1-60558-608-3. doi:10.1145/1631272.1631389. URL http://doi.acm.org/10.1145/1631272.1631389.
Chen, Y., Wang, L., & Dong, M. (2008). A matrix-based approach for semi-supervised document co-clustering. In Proceeding of the 17th ACM conference on information and knowledge management, CIKM ’08 (pp. 1523–1524). New York: ACM. ISBN 978-1-59593-991-3.
Chen, Y., Wang, L., & Dong, M. (2010). Non-negative matrix factorization for semisupervised heterogeneous data coclustering. IEEE Transactions on Knowledge and Data Engineering, 22, 1459–1474. ISSN 1041-4347. doi:10.1109/TKDE.2009.169.
Cheng, Y., & Church, G. M. (2000). Biclustering of expression data. In Proceedings of the eighth international conference on intelligent systems for molecular biology (pp. 93–103). AAAI Press.
Cho, H., Dhillon, I. S., Guan, Y., & Sra, S. (2004). Minimum sum-squared residue co-clustering of gene expression data. In M. W. Berry, U. Dayal, C. Kamath, & D. B. Skillicorn (Eds.), SDM. SIAM.
Demiriz, A., Bennett, K. P., & Embrechts, M. J. (1999). Semi-supervised clustering using genetic algorithms. In Artificial neural networks in engineering (ANNIE-99) (pp. 809–814). ASME Press.
Dhillon, I. S. (2001). Co-clustering documents and words using bipartite spectral graph partitioning. In KDD ’01: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (pp. 269–274). New York: ACM.
Dhillon, I. S., Subramanyam, M., & Modha Dharmendra, S. (2003). Information-theoretic co-clustering. In KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 89–98). New York: ACM.
Freitag, D. (2004). Trained named entity recognition using distributional clusters. In Proceedings of the conference on empirical methods in natural language processing, EMNLP (pp. 262–269). Barcelona, Spain.
Gao, B., Liu, T.-Y., & Ma, W.-Y. (2006). Star-structured high-order heterogeneous data co-clustering based on consistent information theory. In Proceedings of the 6th IEEE international conference on data mining (ICDM 2006), 18–22 December 2006, Hong Kong, China (pp. 880–884). IEEE Computer Society.
Gao, B., Liu, T.-Y., Zheng, X., Cheng, Q.-S., & Ma, W.-Y. (2005). Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In KDD ’05: Proceedings of the 11th ACM SIGKDD int. conference on knowledge discovery in data mining (pp. 41–50). New York: ACM.
Gaul, W., & Schader, M. (1996). A new algorithm for two-mode clustering. In H. Hermann, & W. Polasek (Eds.), Data analysis and information systems (pp. 15–23). Springer.
George, T., & Merugu, S. (2005). A scalable collaborative filtering framework based on co-clustering. In ICDM ’05: Proceedings of the fifth IEEE international conference on data mining (pp. 625–628). Washington: IEEE.
Hanisch, D., Zien, A., Zimmer, R., & Lengauer, T. (2002). Co-clustering of biological networks and gene expression data. In ISMB (Supplement of bioinformatics) (pp. 145–154).
Hartigan, J. A. (1972). Direct clustering of a data matrix. Journal of the American Statistical Association, 67(337), 123–129.
Hofmann, T., & Puzicha, J. (1999). Latent class models for collaborative filtering. In Proceedings of the sixteenth international joint conference on artificial intelligence, IJCAI ’99 (pp. 688–693). San Francisco: Morgan Kaufmann. ISBN 1-55860-613-0.
Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T., & Ueda, N. (2006). Learning systems of concepts with an infinite relational model. In Proceedings of the 21st national conference on artificial intelligence (Vol. 1, pp. 381–388). AAAI Press. ISBN 978-1-57735-281-5.
Kim, J. W., & Candan, K. S. (2006). Cp/cv: Concept similarity mining without frequency information from domain describing taxonomies. In CIKM ’06 (pp. 483–492).
Klein, D., Kamvar, S. D., & Manning, C. D. (2002). From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In ICML ’02: Proceedings of the nineteenth international conference on machine learning (pp. 307–314). San Francisco: Morgan Kaufmann.
Kolda, T. G., & Bader, B. W. (2009). Tensor decompositions and applications. SIAM Review, 51(3), 455–500.
Lee, D. D., & Seung, H. S. (2000). Algorithms for non-negative matrix factorization. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems 13, papers from neural information processing systems (NIPS) 2000, Denver, CO, USA (pp. 556–562). MIT Press.
Li, H., & Abe, N. (1998). Word clustering and disambiguation based on co-occurrence data. In Proceedings of the 17th international conference on computational linguistics (pp. 749–755). Morristown: Association for Computational Linguistics.
Long, B., Zhang, Z. M., Wú, X., & Yu, P. S. (2006). Spectral clustering for multi-type relational data. In Proceedings of the 23rd international conference on machine learning, ICML ’06 (pp. 585–592). New York: ACM. ISBN 1-59593-383-2. doi:10.1145/1143844.1143918. URL http://doi.acm.org/10.1145/1143844.1143918.
Long, B., Zhang, Z. M., & Yu, P. S. (2007). A probabilistic framework for relational clustering. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’07 (pp. 470–479). New York: ACM. ISBN 978-1-59593-609-7. doi:10.1145/1281192.1281244. URL http://doi.acm.org/10.1145/1281192.1281244.
Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.
Ma, H., Zhao, W., Tan, Q., & Shi, Z. (2010). Orthogonal nonnegative matrix tri-factorization for semi-supervised document co-clustering. In M. Zaki, J. Yu, B. Ravindran, & V. Pudi (Eds.), Advances in knowledge discovery and data mining. Lecture Notes in Computer Science (Vol. 6119, pp. 189–200). Berlin: Springer.
Madeira, S. C., & Oliveira, A. L. (2004). Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1), 24–45.
Ng, A. Y., Jordan, M. I., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In Advances in neural information processing systems 14 (pp. 849–856). MIT Press.
Pensa, R. G., & Boulicaut, J.-F. (2008). Constrained co-clustering of gene expression data. In Proceedings of the SIAM international conference on data mining, SDM 2008, 24–26 April 2008, Atlanta, Georgia, USA (pp. 25–36). SIAM.
Ruiz, C., Spiliopoulou, M., & Ruiz, E. M. (2007). C-dbscan: Density-based clustering with constraints. In A. An, J. Stefanowski, S. Ramanna, C. J. Butz, W. Pedrycz, & G. Wang (Eds.), RSFDGrC. LNCS (Vol. 4482, pp. 216–223). Springer.
Shan, H., & Banerjee, A. (2008). Bayesian co-clustering. In Proceedings of the 2008 eighth IEEE international conference on data mining (pp. 530–539). Washington: IEEE Computer Society. ISBN 978-0-7695-3502-9.
Song, Y., Pan, S., Liu, S., Wei, F., Zhou, M. X., & Qian, W. (2010). Constrained coclustering for textual documents. In M. Fox, & D. Poole (Eds.), AAAI. AAAI Press.
Struyf, J., & Dzeroski, S. (2007). Clustering trees with instance level constraints. In J. N. Kok, J. Koronacki, R. López de Mántaras, S. Matwin, D. Mladenic, & A. Skowron (Eds.), ECML. LNCS (Vol. 4701, pp. 359–370). Springer.
Tucker, L. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3), 279–311.
Valtchev, P., & Euzenat, J. (1997). Dissimilarity measure for collections of objects and values. In X. Liu, P. R. Cohen, & M. R. Berthold (Eds.), IDA. LNCS (Vol. 1280, pp. 259–272). Springer.
Vichi, M. (2001). Double k-means clustering for simultaneous classification of objects and variables. In Advances in classification and data analysis (pp. 43–52). Springer.
Wagstaff, K., Cardie, C., Rogers, S., & Schrödl, S. (2001). Constrained k-means clustering with background knowledge. In C. E. Brodley, & A. P. Danyluk (Eds.), ICML (pp. 577–584). Morgan Kaufmann.
Xing, E. P., Ng, A. Y., Jordan, M. I., & Russell, S. (2002). Distance metric learning, with application to clustering with side-information. Advances in neural information processing systems (pp. 505–512). MIT Press
Xu, W., Liu, X., & Gong, Y. (2003). Document clustering based on non-negative matrix factorization. In SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference (pp. 267–273). New York: ACM.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is partially supported by NSF Grant NSF-III1016921. “One Size Does Not Fit All: Empowering the User with User-Driven Integration.”
Rights and permissions
About this article
Cite this article
Schifanella, C., Sapino, M.L. & Candan, K.S. On context-aware co-clustering with metadata support. J Intell Inf Syst 38, 209–239 (2012). https://doi.org/10.1007/s10844-011-0151-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-011-0151-x