Abstract
In this paper, we present a novel Probabilistic Latent Semantic Analysis-based (PLSA-based) aspect model and turn cross-media retrieval into two parts of multi-modal integration and correlation propagation. We first use multivariate Gaussian distributions to model continuous quantity in PLSA, avoiding information loss between feature-instance versus real-world matching. Multi-modal correlations are learned in an asymmetrical manner, giving a better control of the respective influence of each modality in the latent space. Then we propose a new propagation pattern to refine multi-modal correlations by efficiently taking the complementary from multi-modalities. Experimental results demonstrate that our method is accurate and robust for cross-media information retrieval.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Yu, B., Ma, W.Y., Nahrstedt, K., Zhang, H.J.: Video Summarization Based on User Log Enhanced Link Analysis. ACM Multimedia, 382–391 (2003)
Feng, S.L., Manmatha, R., Lavrenko, V.: Multiple Bernoulli Relevance Models for Image and Video Annotation. In: Proc. IEEE CVPR, vol. 2, pp. 1002–1009 (2004)
Datta, R., Li, J., Wang, J.Z.: Content-Based Image Retrieval - Approaches and Trends of the New Age. In: Proceedings of the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, Singapore, pp. 253–262 (2005)
Chang, E., Goh, K., Sychay, G., Wu, G.: CBSA: Content-Based Soft Annotation for Multimodal Image Retrieval Using Bayes Point Machines. IEEE Trans. on Circuits and Systems for Video Technology 13, 26–38 (2003)
Zhang, H., Zhuang, Y.T., Wu, F.: Cross-Modal Correlation Learning for Clustering on Image-Audio Dataset. ACM Multimedia, 273–276 (2007)
Beal, M.J., Attias, H., Jojic, N.: Audio-Video Sensor Fusion with Probabilistic Graphical Models. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 736–750. Springer, Heidelberg (2002)
Zhuang, Y.T., Yang, Y., Wu, F.: Mining Semantic Correlation of Heterogeneous Multimedia Data for Cross-Media Retrieval. IEEE Trans. on Multimedia 10, 221–229 (2008)
Wang, J.D., Zeng, H.J., Zheng, C., Lu, H.J., Li, T., Ma, W.Y.: ReCoM: Reinforcement Clustering of Multi-Type Interrelated Data Objects. In: ACM SIGIR, Canada, pp. 274–281 (2003)
Wang, X.J., Ma, W.Y., Xue, G.R., Li, X.: Multi-Model Similarity Propagation and its Application for Web Image Retrieval. ACM Multimedia, 944–951 (2004)
Yang, Y., Zhuang, Y.T., Wu, F., Pan, Y.H.: Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-media Retrieval. IEEE Transactions on Multimedia 10, 437–446 (2008)
Blei, D.M., Jordan, M.I.: Modeling Annotated Data. In: Proc. ACM SIGIR, Toronto, Canada, pp. 127–134 (2003)
Barnard, K., Duygulu, P., Freitas, N.D., Forsyth, D., Blei, D.M., Jordan, M.I.: Matching Words and Pictures. J. Machine Learning Research 3, 1107–1135 (2003)
Monay, F., Perez, D.G.: Modeling Semantic Aspects for Cross-Media Image Indexing. IEEE Trans. on PAMI 29, 1802–1817 (2007)
Li, Z.X., Shi, Z.P., Liu, X., Shi, Z.Z.: Automatic Image Annotation with Continuous PLSA. In: Proceedings of ICASSP, pp. 806–809 (2010)
Hofmann, T.: Unsupervised Learning by Probabilistic Latent Semantic Analysis. In: Proceedings of Machine Learning, vol. 42, pp. 117–196 (2001)
Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60, 91–110 (2004)
Foote, J.: Content-Based Retrieval of Music and Audio. In: Multimedia Storage and Archiving Systems II, Proc. of SPIE, vol. 3229, pp. 138–147 (1997)
Jiang, W., Cotton, C., Chang, S.F., Ellis, D., Loui, A.C.: Short-Term Audio-Visual Atoms for Generic Video Concept Classification. ACM Multimedia, 5–14 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lin, W., Lu, T., Su, F. (2012). A Novel Multi-modal Integration and Propagation Model for Cross-Media Information Retrieval. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, CW., Andreopoulos, Y., Breiteneder, C. (eds) Advances in Multimedia Modeling. MMM 2012. Lecture Notes in Computer Science, vol 7131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27355-1_78
Download citation
DOI: https://doi.org/10.1007/978-3-642-27355-1_78
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27354-4
Online ISBN: 978-3-642-27355-1
eBook Packages: Computer ScienceComputer Science (R0)