Abstract
Different strategies to learn user semantic queries from dissimilarity representations of audio-visual content are presented. When dealing with large corpora of videos documents, using a feature representation requires the on-line computation of distances between all documents and a query. Hence, a dissimilarity representation may be preferred because its offline computation speeds up the retrieval process. We show how distances related to visual and audio video features can directly be used to learn complex concepts from a set of positive and negative examples provided by the user. Based on the idea of dissimilarity spaces, we derive three algorithms to fuse modalities and therefore to enhance the precision of retrieval results. The evaluation of our technique is performed on artificial data and on the annotated TRECVID corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using linear algebra for intelligent information retrieval. SIAM Review 37(4), 573–595 (1995)
Boldareva, L., Hiemstra, D.: Interactive content-based retrieval using pre-computed object-object similarities. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A.F., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 308–316. Springer, Heidelberg (2004)
Bruno, E., Moenne-Loccoz, N., Marchand-Maillet, S.: Unsupervised event discrimination based on nonlinear temporal modelling of activity. Pattern Analysis and Application, special issue on Video Event Mining (2005) DOI: 10.1007/s10044-005-0242-9
Chang, E.Y., Li, B., Wu, G., Go, K.: Statistical learning for effective visual information retrieval. In: Proceedings of the IEEE International Conference on Image Processing (2003)
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.L.: Searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)
Cox, T.F., Cox, M.A.A.: Multidimensional scaling. Chapman & Hall, London (1995)
Duin, R.P.W.: The combining classifier: To train or not to train? In: Proceedings of the 16th International Conference on Pattern Recognition, ICPR 2002, Quebec City, vol. II, pp. 765–770. IEEE Computer Socity Press, Los Alamitos (2004)
Gauvain, J.L., Lamel, L., Adda, G.: The limsi broadcast news transcription system. Speech Communication 37(1-2), 89–108 (2002)
Gu, J., Lu, L., Zhang, H.J., Yang, J.: Dominant feature vectors based audio similarity measure. In: PCM, vol. 2, pp. 890–897
Heesch, D., Rueger, S.: Nnk networks for content-based image retrieval. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 253–266. Springer, Heidelberg (2004)
Moënne-Loccoz, N., Bruno, E., Maillet, S.M.: Interactive retrieval of video sequences from local feature dynamics. In: Detyniecki, M., Jose, J.M., Nürnberger, A., van Rijsbergen, C.J.‘. (eds.) AMR 2005. LNCS, vol. 3877, pp. 128–140. Springer, Heidelberg (2006)
Moenne-Loccoz, N., Bruno, E., Marchand-Maillet, S.: Interactive partial matching of video sequences in large collections. In: IEEE International Conference on Image Processing (ICIP 2005), Genova, Italy (2005)
Pekalska, E., Paclík, P., Duin, R.P.W.: A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research 2, 175–211 (2001)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: 14th International Joint Conference on Artificial Intelligence, IJCAI, Montreal, Canada, pp. 448–453 (1995)
Smith, J.R., Jaimes, A., Lin, C.-Y., Naphade, M., Natsev, A., Tseng, B.: Interactive search fusion methods for video database retrieval. In: IEEE International Conference on Image Processing (ICIP) (2003)
Wu, Y., Chang, E.Y., Chang, K.C.-C., Smith, J.R.: Optimal multimodal fusion for multimedia data analysis. In: Proceedings of ACM Int., Conf. on Multimedia, New York (2004)
Yan, R., Hauptmann, A., Jin, R.: Negative pseudo-relevance feedback in contentbased video retrieval. In: Proceedings of ACM Multimedia (MM 2003), Berkeley, USA (2003)
Yang, J., Hauptmann, A.G.: Multi-modality analysis for person type classification in news video. In: Electronic Imaging 2005 - Conference on Storage and Retrieval Methods and Applications for Multimedia, San Jose, USA (January 2005)
Zhou, X.S., Garg, A., Huang, T.S.: A discussion of nonlinear variants of biased discriminant for interactive image retrieval. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A.F., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 353–364. Springer, Heidelberg (2004)
Zhou, X.S., Huang, T.S.: Small sample learning during multimedia retrieval using biasmap. In: Proceedings of the IEEE Conference on Pattern Recognition and Computer Vision, CVPR 2001, Hawaii, vol. 1, pp. 11–17 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bruno, E., Moenne-Loccoz, N., Marchand-Maillet, S. (2006). Learning User Queries in Multimodal Dissimilarity Spaces. In: Detyniecki, M., Jose, J.M., Nürnberger, A., van Rijsbergen, C.J. (eds) Adaptive Multimedia Retrieval: User, Context, and Feedback. AMR 2005. Lecture Notes in Computer Science, vol 3877. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11670834_14
Download citation
DOI: https://doi.org/10.1007/11670834_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32174-3
Online ISBN: 978-3-540-32175-0
eBook Packages: Computer ScienceComputer Science (R0)