Multimedia Systems

, Volume 20, Issue 4, pp 389–413 | Cite as

RELIEF-MM: effective modality weighting for multimedia information retrieval

  • Turgay Yilmaz
  • Adnan Yazici
  • Masaru Kitsuregawa
Regular Paper


Fusing multimodal information in multimedia data usually improves the retrieval performance. One of the major issues in multimodal fusion is how to determine the best modalities. To combine the modalities more effectively, we propose a RELIEF-based modality weighting approach, named as RELIEF-MM. The original RELIEF algorithm is extended for weaknesses in several major issues: class-specific feature selection, complexities with multi-labeled data and noise, handling unbalanced datasets, and using the algorithm with classifier predictions. RELIEF-MM employs an improved weight estimation function, which exploits the representation and reliability capabilities of modalities, as well as the discrimination capability, without any increase in the computational complexity. The comprehensive experiments conducted on TRECVID 2007, TRECVID 2008 and CCV datasets validate RELIEF-MM as an efficient, accurate and robust way of modality weighting for multimedia data.


RELIEF Feature weighting Multimodal fusion Multimedia information retrieval 


  1. 1.
    Atrey, P.K., Kankanhalli, M.S., Oommen, J.B.: Goal-oriented optimal subset selection of correlated multimedia streams. ACM Trans. Multimedia Comput. Commun. Appl. 3(1) (2007). doi: 10.1145/1198302.1198304
  2. 2.
    Mathieu, B., Essid, S., Fillon, T., Prado, J., Richard, G.: Yaafe, an easy to use and efficient audio feature extraction software (2010). In: Proceedings of the 11th ISMIR Conference, Utrecht, NetherlandsGoogle Scholar
  3. 3.
    Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). Software available at
  4. 4.
    Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004). doi: 10.1145/1007730.1007733 CrossRefGoogle Scholar
  5. 5.
    Chen, Y.Y., Hsu, W., Liao, H.Y.: Automatic training image acquisition and effective feature selection from community-contributed photos for facial attribute detection. Multimedia, IEEE Transactions on 15(6), 1388–1399 (2013). doi: 10.1109/TMM.2013.2250492 CrossRefGoogle Scholar
  6. 6.
    Dietterich, T.G.: Machine-learning research: Four current directions. The AI Magazine 18(4), 97–136 (1998)Google Scholar
  7. 7.
    Doquire, G., Verleysen, M.: Feature selection for multi-label classification problems. In: Proceedings of the 11th International Conference on Artificial Neural Networks Conference on Advances in Computational Intelligence-vol. Part I, IWANN’11, pp. 9–16. Springer, Berlin, Heidelberg (2011).
  8. 8.
    Ferri, F.J., Pudil, P., Hatef, M., Kittler, J.: Comparative study of techniques for large-scale feature selection. In: Gelsema, E.S., Kamal, L.N. (eds.) Pattern Recognition in Practice IV, Multiple Paradigms, Comporative Studies and Hybrid Systems, pp. 403–413. Elsevier, Amsterdam (1994)Google Scholar
  9. 9.
    Fumera, G., Roli, F.: A theoretical and experimental analysis of linear combiners for multiple classifier systems. IEEE TPAMI 27(6), 942–956 (2005). doi: 10.1109/TPAMI.2005.109 CrossRefGoogle Scholar
  10. 10.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003). Google Scholar
  11. 11.
    Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. Ph.D. thesis, Department of Computer Science, University of Waikato, New Zealand (1999)Google Scholar
  12. 12.
    Huang, K.C., Lin, H.Y.S., Chan, J.C., Kuo, Y.H.: Learning collaborative decision-making parameters for multimodal emotion recognition. In: Multimedia and Expo (ICME), 2013 IEEE International Conference, pp. 1–6 (2013). doi: 10.1109/ICME.2013.6607472
  13. 13.
    Hunt, E.B., Stone, P.J., Marin, J.: Experiments in induction/Earl B. Hunt, Janet Marin, Philip J. Stone. Academic Press, New York (1966)Google Scholar
  14. 14.
    Inoue, N., Kamishima, Y., Wada, T., Shinoda, K., Sato, S.: Tokyotech+canon at trecvid 2011. In: NIST TRECVID Workshop. Gaithersburg, MD (2011)Google Scholar
  15. 15.
    Jain, A., Nandakumar, K., Ross, A.: Score normalization in multimodal biometric systems. Pattern Recognition 38(12), 2270–2285 (2005)CrossRefGoogle Scholar
  16. 16.
    Jain, A.K., Duin, R.P., Mao, J.: Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 4–37 (2000)CrossRefGoogle Scholar
  17. 17.
    Jiang, Y.G., Bhattacharya, S., Chang, S.F., Shah, M.: High-level event recognition in unconstrained videos. Int. J. Multimedia Info. Retr. 1–29 (2012). doi: 10.1007/s13735-012-0024-2
  18. 18.
    Jiang, Y.G., Yanagawa, A., Chang, S.F., Ngo, C.W.: CU-VIREO374: Fusing Columbia374 and VIREO374 for Large Scale Semantic Concept Detection. Tech. rep., Columbia University ADVENT #223-2008-1 (2008)Google Scholar
  19. 19.
    Jiang, Y.G., Ye, G., Chang, S.F., Ellis, D., Loui, A.C.: Consumer video understanding: a benchmark database and an evaluation of human and machine performance. In: Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR ’11, pp. 29:1–29:8. ACM, New York, NY, USA (2011). doi: 10.1145/1991996.1992025
  20. 20.
    Jiang, Y.G., Zeng, X., Ye, G., Ellis, D., Chang, S.F., Bhattacharya, S., Shah, M.: Columbia-ucf trecvid2010 multimedia event detection: Combining multiple modalities, contextual concepts, and temporal matching. In: P. Over, G. Awad, J.G. Fiscus, B. Antonishek, M. Michel, W. Kraaij, A.F. Smeaton, G. Quénot (eds.) TRECVID. National Institute of Standards and Technology (NIST), Gaithersburg, MD (2010)Google Scholar
  21. 21.
    Kalamaras, I., Mademlis, A., Malassiotis, S., Tzovaras, D.: A novel framework for retrieval and interactive visualization of multimodal data. Electron. Lett. Comput. Vis. Image Anal. 12(2) (2013).
  22. 22.
    Kankanhalli, M., Wang, J., Jain, R.: Experiential sampling on multiple data streams. Multimedia, IEEE Transactions on 8(5), 947–955 (2006)CrossRefGoogle Scholar
  23. 23.
    Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the 9th International Workshop on Machine Learning, ML ’92, pp. 249–256. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1992).
  24. 24.
    Kittler, J.: Feature set search algorithms. In: Chen, C.H. (ed.) Pattern Recognition and Signal Processing, pp. 41–60. Sijthoff & Noordhoff International Publishers B.V., Alphen aan den Rijn, The Netherlands (1978)Google Scholar
  25. 25.
    Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20, 226–239 (1998)CrossRefGoogle Scholar
  26. 26.
    Kludas, J., Bruno, E., Marchand-Maillet, S.: Information fusion in multimedia information retrieval. In: Proceedings of the 5th International Workshop on Adaptive Multimedia Retrieval (AMR). Paris, France (2007)Google Scholar
  27. 27.
    Kludas, J., Bruno, E., Marchand-Maillet, S.: Can feature information interaction help for information fusion in multimedia problems?. Multimedia Tools Appl. 42, 57–71 (2009)CrossRefGoogle Scholar
  28. 28.
    Kong, D., Ding, C., Huang, H., Zhao, H.: Multi-label relieff and f-statistic feature selections for image annotation. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference, pp. 2352 –2359 (2012). doi: 10.1109/CVPR.2012.6247947
  29. 29.
    Kononenko, I.: Estimating attributes: analysis and extensions of relief. In: Proceedings of the European Conference on Machine Learning, pp. 171–182. Springer, New York, Inc., Secaucus, NJ, USA (1994).
  30. 30.
    Liu, H., Motoda, H., Yu, L.: A selective sampling approach to active feature selection. Artif. Intell. 159, 49–74 (2004). doi: 10.1016/j.artint.2004.05.009. Google Scholar
  31. 31.
    Atrey, P., Hossain, M., Saddik, A.E., Kankanhalli, M.: Multimodal fusion for multimedia analysis: a survey. Multimedia Systems 16, 345–379 (2010)CrossRefGoogle Scholar
  32. 32.
    Moulin, C., Largeron, C., Ducottet, C., Géry, M., Barat, C.: Fisher linear discriminant analysis for text-image combination in multimedia information retrieval. Pattern Recognit. 47(1), 260–269 (2014). doi: 10.1016/j.patcog.2013.06.003.
  33. 33.
  34. 34.
    Natarajan, P., Manohar, V., Wu, S., Tsakalidis, S., Vitaladevuni, S.N., Zhuang, X., Prasad, R., Ye, G., Liu, D., Jhuo, I., Chang, S., Izadinia, H., Saleemi, I., Shah, M., White, B., Yeh, T., Davis, L.: Bbn viser trecvid 2011 multimedia event detection system. In: NIST TRECVID Workshop. Gaithersburg, MD (2011)Google Scholar
  35. 35.
    Over, P., Awad, G., Kraaij, W., Smeaton, A.F.: Trecvid 2007—overview. In: Over, P., Awad, G., Kraaij, W., Smeaton, A.F. (eds.) TRECVID. National Institute of Standards and Technology (NIST), Gaithersburg, MD (2007)Google Scholar
  36. 36.
    Over, P., Awad, G., Rose, R.T., Fiscus, J.G., Kraaij, W., Smeaton, A.F.: Trecvid 2008—goals, tasks, data, evaluation mechanisms and metrics. In: Over, P., Awad, G., Rose, R.T., Fiscus, J.G., Kraaij, W., Smeaton, A.F. (eds.) TRECVID. National Institute of Standards and Technology (NIST), Gaithersburg, MD (2008)Google Scholar
  37. 37.
    Poh, N., Kittler, J.: Multimodal Information Fusion: Theory and Applications for Human-Computer Interaction, chap 8, pp. 153–169. Academic Press, (2010)Google Scholar
  38. 38.
    Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986). doi: 10.1023/A:1022643204877.
  39. 39.
    Rahman, M., You, D., Simpson, M., Antani, S., Demner-Fushman, D., Thoma, G.: Multimodal biomedical image retrieval using hierarchical classification and modality fusion. Int. J. Multimedia Info. Retr. 2(3), 159–173 (2013). doi: 10.1007/s13735-013-0038-4 CrossRefGoogle Scholar
  40. 40.
    Robnik-Sikonja, M., Kononenko, I.: An adaptation of relief for attribute estimation in regression. In: Fisher, D.H. (ed.) ICML, pp. 296–304. Morgan Kaufmann, San Francisco (1997)Google Scholar
  41. 41.
    Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53, 23–69 (2003). doi: 10.1023/A:1025667309714.
  42. 42.
    Saeys, Y., Inza, I.n., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007). doi: 10.1093/bioinformatics/btm344. Google Scholar
  43. 43.
    Sikonja, M.R.: Speeding up relief algorithm with k-d trees. In: Proceedings of Electrotechnical and Computer Science Conference (ERK’98), pp. 137–140 (1998)Google Scholar
  44. 44.
    Snidaro, L., Niu, R., Foresti, G., Varshney, P.: Quality-based fusion of multiple video sensors for video surveillance. SMC-B: Cybernetics, IEEE Trans. on 37(4), 1044–1051 (2007)Google Scholar
  45. 45.
    Snoek, C.G.M., Worring, M.: Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications 25(1), 5–35 (2005)CrossRefGoogle Scholar
  46. 46.
    Sun, Y.: Iterative relief for feature weighting: Algorithms, theories, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1035–1051 (2007)CrossRefGoogle Scholar
  47. 47.
    Temko, A., Macho, D., Nadeu, C.: Fuzzy integral based information fusion for classification of highly confusable non-speech sounds. Pattern Recognit. 41(5), 1814–1823 (2008). doi: 10.1016/j.patcog.2007.10.026. Google Scholar
  48. 48.
    Tsoumakas, G., Katakis, I., Vlahavas, I.P.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer US, Berlin (2010)Google Scholar
  49. 49.
    Tumer, K., Ghosh, J.: Linear and order statistics combiners for pattern classification. CoRR cs.NE/9905012 (1999).
  50. 50.
    Wang, L., Zhou, N., Chu, F.: A general wrapper approach to selection of class-dependent features. IEEE Transactions on Neural Networks 19(7), 1267–1278 (2008)CrossRefGoogle Scholar
  51. 51.
    Wu Q., Wang Z., Deng F., Chi Z., Feng D.: (2013) Realistic human action recognition with multimodal feature selection and fusion. Syst. Man Cybern. Syst. IEEE Trans. 43(4), 875–885. doi: 10.1109/TSMCA.2012.2226575
  52. 52.
    Wu, Y., Chang, E.Y., Chang, K.C.C., Smith, J.R.: Optimal multimodal fusion for multimedia data analysis. In: Proceedings of the 12th ACM Multimedia, pp. 572–579. ACM, New York, NY, USA (2004)Google Scholar
  53. 53.
    Yan, R., Hauptmann, A.G.: The combination limit in multimedia retrieval. In: Proceedings of the 11th ACM International Conference on Multimedia, MULTIMEDIA ’03, pp. 339–342. ACM, New York, NY, USA (2003)Google Scholar
  54. 54.
    Yilmaz, T., Gulen, E., Yazici, A., Kitsuregawa, M.: A relief-based modality weighting approach for multimodal information retrieval. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR ’12, pp. 54:1–54:8. ACM, New York, NY, USA (2012). doi: 10.1145/2324796.2324858
  55. 55.
    Yilmaz, T., Yazici, A., Yildirim, Y.: Exploiting class-specific features in multi-feature dissimilarity space for efficient querying of images. In: Christiansen, H., Tré, G., Yazici, A., Zadrozny, S., Andreasen, T., Larsen, H. (eds.) Flexible Query Answering Systems, Lecture Notes in Computer Science, vol. 7022, pp. 149–161. Springer, Berlin, Heidelberg (2011). doi: 10.1007/978-3-642-24764-4_14

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Turgay Yilmaz
    • 1
    • 2
  • Adnan Yazici
    • 1
  • Masaru Kitsuregawa
    • 2
    • 3
  1. 1.Computer Engineering DepartmentMiddle East Technical UniversityAnkaraTurkey
  2. 2.Institute of Industrial ScienceThe University of TokyoTokyoJapan
  3. 3.National Institute of InformaticsTokyoJapan

Personalised recommendations