Manifold Learning Based Cross-media Retrieval: A Solution to Media Object Complementary Nature

  • Yueting Zhuang
  • Yi Yang
  • Fei Wu
  • Yunhe Pan


Media objects of different modalities always exist jointly and they are naturally complementary of each other, either in the view of semantics or in the view of modality. In this paper, we propose a manifold learning based cross-media retrieval approach that gives solutions to the two intrinsically basic but crucial questions of media objects semantics understanding and cross-media retrieval. First, considering the semantic complementary, how can we represent the concurrent media objects and fuse the complementary information they carry to understand the integrated semantics precisely. Second, considering the modality complementary, how can we accomplish the modality bridge to establish the cross-index and facilitate the cross-media retrieval? To solve the two problems, we first construct a Multimedia Document (MMD) Semi-Semantic Graph (MMDSSG) and then adopt Multidimensional Scaling to create an MMD Semantic Space (MMDSS). Both long-term and short-term feedbacks are proposed to boost the system performance. The first one is used to refine the MMDSSG and the second one is adopted to introduce new items that are not in the training set into the MMDSS. Since all of the MMDs and their component media objects of different modalities lie in the MMDSS and they are indexed uniformly by their coordinates in the MMDSS regardless of their modalities, the semantic subspace is actually a bridge of media objects which are of different modalities and the cross-media retrieval can be easily achieved. Experiment results are encouraging and indicate that the proposed approach is effective.


cross-media retrieval semantic complementary modality complementary manifold learning multimedia document 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    H. J. Zhang and D. Zhong, “Schema for Visual Feature Based Image Retrieval [A],” in Proc. of Storage and Retrieval for Image and Video Database, USA, 1995. pp. 36–46.Google Scholar
  2. 2.
    J. Z. Wang, G. Wiederhold, O. Firschein, and S. X. Wei, “Content-based Image Indexing and Searching using Daubechies’ Wavelets,” Int. J. Digit. Libr., vol. 1, 1997, pp. 311–328.CrossRefGoogle Scholar
  3. 3.
    E. Chang, K. Goh, G. Sychay, and G. Wu, “CBSA: Content-based Soft Annotation for Multimodal Image Retrieval Using Bayes Point Machine,” IEEE Trans on Circuits and Systems for Video Technology, vol. 13, no. 1, 2003. (Jan.)Google Scholar
  4. 4.
    X. He, W. Y Ma, and H. J. Zhang, “Learning an Image Manifold for Retrieval,” ACM Multimedia Conference, New York, 2004.Google Scholar
  5. 5.
    Namunu C Maddage, Changsheng Xu., Mohan S Kankanhalli, and Xi Shao, “Content-based Music Structure Analysis with Applications to Music Semantics Understanding,” ACM Multimedia Conference, New York, 2004.Google Scholar
  6. 6.
    Guodong Guo and S. Z. Li, “Content-based Audio Classification and Retrieval by Support Vector Machines,” IEEE Trans. Neural Netw., vol. 14, no. 1, 2003, pp. 209–215. (Jan.)CrossRefGoogle Scholar
  7. 7.
    E. Wold, T. Blum, D. Keislar, and J. Wheaton, “Content-based Classification, Search and Retrieval of Audio,” IEEE Multimedia Mag., vol. 3, 1996, pp. 27–36. (July)CrossRefGoogle Scholar
  8. 8.
    S. W. Smoliar and HongJiang Zhang, “Content based Video Indexing and Retrieval,” IEEE Multimed., vol. 1, no. 2, 1994, pp. 62–72. (Summer)CrossRefGoogle Scholar
  9. 9.
    Jianping Fan, A. K. Elmagarmid, Xingquan Zhu, W. G. Aref, and Lide Wu, “ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing,” Multimedia, IEEE Transactions on, vol. 6, no. 1, 2004, pp. 70–86. (Feb.)CrossRefGoogle Scholar
  10. 10.
    M. Y. Wu, C. Y. Chiu, S. P. Chao,S. N. Y, and H. C. Lin, “Content-based Retrieval for Human Motion Data,” 16th IPPR Conference on Computer Vision, Graphics and Image Processing (CVGIP 2003).Google Scholar
  11. 11.
    Meinard Müller, Tido Röder, Michael Clausen, “Efficient Content-based Retrieval of Motion Capture Data,” Proceedings of ACM SIGGRAPH 2005.Google Scholar
  12. 12.
    Y. Wang, Z. Liu, and J. Huang, “Multimedia Content Analysis Using Audio and Visual Information”, IEEE Signal Process. Mag., vol. 17, no. 6, 2000, pp. 12–36.CrossRefGoogle Scholar
  13. 13.
    K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, “When is Nearest Neighbor,” meaningful? International Conference on Database Theory, 1999, pp. 217–235.Google Scholar
  14. 14.
    J. Yang, Y. T. Zhuang, and Q. Li, “Search for Multi-modality Data in Digital Libraries,” Proc. of 2nd IEEE Pacific-rim Conference on Multimedia, Beijing, China, 2001, pp. 482–489.Google Scholar
  15. 15.
    H. S. Seung and D. Lee, “The Manifold Ways of Perception,” Science, vol 290, 2000. (22 December)Google Scholar
  16. 16.
    J. B. Tenenbaum, V. D. Silva, and J. C. Langford, “A Global Geometric Framework for Nonlinear Dimensionality Reduction,” Science, vol 290, 2000. (22 December)Google Scholar
  17. 17.
    Fei Wu, Yi Yang, Yueting Zhuang, and Yunhe Pan, “Understanding Multimedia Document Semantics for Cross-Media Retrieval,” LNCS 3767(PCM 2005), pp. 993–1004.Google Scholar
  18. 18.
    J. B. Kruskal and M. Wish, “Multidimensional Scaling,” Sage, Beverly Hills, CA, 1977.Google Scholar
  19. 19.
    Fei Wu, Hong Zhang, and Yueting Zhuang, “Learning Semantic Correlations for Cross Media Retrieval,” The 13th International Conference on Image Processing (ICIP) Atlanta, GA, USA, 2006.Google Scholar
  20. 20.
    H. Choi and S. Choi (2005), “Kernel Isomap on Noisy Manifold,” in Proc. IEEE Int’l Conf. Development and Learning (ICDL), pp. 208–213, Osaka, Japan, July 19–21, 2005.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.College of Computer Science and EngineeringZhejiang UniversityHangzhouPeople’s Republic of China

Personalised recommendations