Skip to main content
Log in

Learning in high-dimensional multimedia data: the state of the art

  • Special Issue Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

During the last decade, the deluge of multimedia data has impacted a wide range of research areas, including multimedia retrieval, 3D tracking, database management, data mining, machine learning, social media analysis, medical imaging, and so on. Machine learning is largely involved in multimedia applications of building models for classification and regression tasks, etc., and the learning principle consists in designing the models based on the information contained in the multimedia dataset. While many paradigms exist and are widely used in the context of machine learning, most of them suffer from the ‘curse of dimensionality’, which means that some strange phenomena appears when data are represented in a high-dimensional space. Given the high dimensionality and the high complexity of multimedia data, it is important to investigate new machine learning algorithms to facilitate multimedia data analysis. To deal with the impact of high dimensionality, an intuitive way is to reduce the dimensionality. On the other hand, some researchers devoted themselves to designing some effective learning schemes for high-dimensional data. In this survey, we cover feature transformation, feature selection and feature encoding, three approaches fighting the consequences of the curse of dimensionality. Next, we briefly introduce some recent progress of effective learning algorithms. Finally, promising future trends on multimedia learning are envisaged.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Bartlett, P.L., Hazan, E., Rakhlin, A.: Adaptive online gradient descent. In: NIPS (2007)

  2. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)

    Article  MATH  Google Scholar 

  3. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: NIPS, pp. 153–160 (2006)

  4. Borji, A., Cheng, M.-M., Jiang, H., Li, J.: Salient object detection: a benchmark. In: ECCV, pp. 414–429. Springer, Berlin, Heidelberg (2012)

  5. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)

    Article  Google Scholar 

  6. Chatfield, K., Lempitsky, V.S., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC, pp. 1–12 (2011)

  7. Choi, S., Zhou, Q.-Y., Koltun, V.: Robust reconstruction of indoor scenes. In: CVPR (2015)

  8. Choi, W., Pantofaru, C., Savarese, S.: A general framework for tracking multiple people from a moving camera. TPAMI 35(7), 1577–1591 (2013)

    Article  Google Scholar 

  9. Cunningham, J.P., Ghahramani, Z.: Linear dimensionality reduction: survey, insights, and generalizations. JMLR (2015)

  10. de Oliveira, L.E.S., Sabourin, R. Bortolozzi, F., Suen, C.Y.: A methodology for feature selection using multiobjective genetic algorithms for handwritten digit string recognition. In: IJPRAI (2003)

  11. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)

    Article  Google Scholar 

  12. Van der Maaten, L.J.P., Postma, E.O., Van den Herik, H.J.: Dimensionality reduction: a comparative review. Technical Report TiCC TR 2009-005 (2009)

  13. Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)

    Article  Google Scholar 

  14. Donoho, D.L., Grimes, C.: Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. USA 100(10), 5591–5596 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  15. Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. JMLR 12, 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  16. Engel, D., Hüttenberger, L., Hamann, B.: A survey of dimension reduction methods for high-dimensional data analysis and visualization. In: VLUDS, pp. 135–149 (2011)

  17. Escalante-B, A.N., Wiskott, L.: How to solve classification and regression problems on high-dimensional data with a supervised extension of slow feature analysis. JMLR 14, 3683–3719 (2013)

    MathSciNet  MATH  Google Scholar 

  18. Feng, Z., Jin, R., Jain, A.: Large-scale image annotation by efficient and robust kernel metric learning. In: ICCV (2013)

  19. Gao, L., Song, J., Nie, F., Yan, Y., Sebe, N., Shen, H.T.: Optimal graph leaning with partial tags and multiple features for image and video annotation. In: CVPR (2015)

  20. Gao, L.L., Song, J., Shao, J. Zhu, X., Shen, H.T.: Zero-shot image categorization by image correlation exploration. In: ICMR, pp. 487–490 (2015)

  21. Gao, L., Song, J., Zou, F., Zhang, D., Shao, J.: Scalable multimedia retrieval by deep learning hashing with relative similarity learning. In: ACM Multimedia (2015)

  22. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning, 1st edn. Addison-Wesley Longman Publishing Co., Inc, Boston (1989)

    MATH  Google Scholar 

  23. Gong, Y., Lazebnik, S., Gordo, A., Perronnin, F.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. TPAMI 35(12), 2916–2929 (2013)

    Article  Google Scholar 

  24. Gupta, S., Arbeláez, P.A., Girshick, R.B., Malik, J.: Indoor scene understanding with RGB-D images: bottom-up segmentation, object detection and semantic segmentation. IJCV 112(2), 133–149 (2015)

    Article  MathSciNet  Google Scholar 

  25. Gupta, S., Girshick, R.B., Arbeláez, P.A., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: ECCV, pp. 345–360 (2014)

  26. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. JMLR 3, 1157–1182 (2003)

    MATH  Google Scholar 

  27. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer New York Inc., New York (2001)

    Book  MATH  Google Scholar 

  28. Hazan, E., Kale, S.: Extracting certainty from uncertainty: regret bounded by variation in costs. Mach. Learn. 80(2–3), 165–188 (2010)

    Article  MathSciNet  Google Scholar 

  29. He, R., Tan, T., Wang, L., Zheng, W.-S.: l2, 1 regularized correntropy for robust feature selection. In: CVPR, pp. 2504–2511 (2012)

  30. He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: NIPS (2005)

  31. He, X., Niyogi, P.: Locality preserving projections. In: NIPS (2003)

  32. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  33. Hong, Z., Chen, Z., Wang, C., Mei, X., Prokhorov, D., Tao, D.: Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking. In: CVPR (2015)

  34. Javed, K., Babri, H.A., Saeed, M.: Feature selection based on class-dependent densities for high-dimensional binary data. TKDE 24(3), 465–477 (2012)

    Google Scholar 

  35. Jawanpuria, P., Varma, M., Nath, S.: On p-norm path following in multiple kernel learning for non-linear feature selection. In: ICML, pp. 118–126 (2014)

  36. Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. TPAMI 33(1), 117–128 (2011)

    Article  Google Scholar 

  37. Jolliffe, I.: Principal Component Analysis. Wiley Online Library, New York (2002)

    MATH  Google Scholar 

  38. Kantorov, V., Laptev, I.: Efficient feature extraction, encoding and classification for action recognition. In: CVPR (2014)

  39. Kantorski, G.Z., Moreira, V.P., Heuser, C.A.: Automatic filling of hidden web forms: a survey. SIGMOD 44(1), 24–35 (2015)

    Article  Google Scholar 

  40. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR (2014)

  41. Khosla, A., An, B., Lim, J.J., Torralba, A.: Looking beyond the visible scene. In: CVPR (2014)

  42. Kim, Y., Lee, H., Provost, E.M.: Deep learning for robust feature generation in audiovisual emotion recognition. In: ICASSP, pp. 3687–3691 (2013)

  43. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)

    Article  MATH  Google Scholar 

  44. Kulis, B., Darrell, T.: Learning to hash with binary reconstructive embeddings. In: NIPS (2009)

  45. Kumar, S., Udupa, R.: Learning hash functions for cross-view similarity search. In: IJCAI, pp. 1360–1365 (2011)

  46. Lafon, S., Lee, A.B.: Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. TPAMI 28(9), 1393–1403 (2006)

    Article  Google Scholar 

  47. Lai, H., Pan, Y., Liu, Y., Yan, S.: Simultaneous feature learning and hash coding with deep neural networks. In: CVPR (2015)

  48. Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C., de Schaetzen, V., Duque, R., Bersini, H., Nowé, A.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(4), 1106–1119 (2012)

    Article  Google Scholar 

  49. Lee, H., Ekanadham, C., Ng, A.Y.: Sparse deep belief net model for visual area V2. In: NIPS, pp. 873–880 (2007)

  50. Lin, G., Shen, C., Shi, Q., van den Hengel, A., Suter, D.: Fast supervised hashing with decision trees for high-dimensional data. In: CVPR, pp. 1971–1978 (2014)

  51. Lin, Z., Ding, G., Hu, M., Wang, J.: Semantics-preserving hashing for cross-view retrieval. In: CVPR, pp. 3864–3872 (2015)

  52. Liu, W., Wang, J., Ji, R., Jiang, Y.-G., Chang, S.-F.: Supervised hashing with kernels. In: CVPR, pp. 2074–2081 (2012)

  53. Liu, W., Wang, J., Ji, R., Jiang, Y.-G., Chang, S.-F.: Supervised hashing with kernels. In: CVPR (2012)

  54. Liu, W., Wang, J., Kumar, S., Chang, S.-F.: Hashing with graphs. In: ICML, pp. 1–8 (2011)

  55. Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–136 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  56. McMahan, H.B.: Follow-the-regularized-leader and mirror descent: equivalence theorems and l1 regularization. In: ICAIS (2011)

  57. Mittelman, R., Lee, H., Kuipers, B., Savarese, S.: Weakly supervised learning of mid-level features with beta-bernoulli process restricted Boltzmann machines. In: CVPR, pp. 476–483 (2013)

  58. Mladenic, D.: Feature subset selection in text-learning. In: ECML (1998)

  59. Neshatian, K., Zhang, M.: Genetic programming and class-wise orthogonal transformation for dimension reduction in classification problems. In: EuroGP, pp. 242–253 (2008)

  60. Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: ICDM (2010)

  61. Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: ICML (2011)

  62. Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: CVPR (2015)

  63. Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: CVPR, pp. 2161–2168 (2006)

  64. Norouzi, M., Fleet, D.J.: Minimal loss hashing for compact binary codes. In: ICML, pp. 353–360 (2011)

  65. Norouzi, M., Fleet, D.J.: Cartesian k-means. In: CVPR (2013)

  66. Papandreou, G., Kokkinos, I., Savalle, P.-A.: Modeling local and global deformations in deep learning: epitomic convolution, multiple instance learning, and sliding window detection. In: CVPR (2015)

  67. Pudil, P., Novovičová, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognit. Lett. 15(11), 1119–1125 (1994)

    Article  Google Scholar 

  68. Rehman, A., Javed, K., Babri, H.A., Saeed, M.: Relative discrimination criterion—a novel feature ranking method for text data. Expert Syst. Appl. 42(7), 3670–3681 (2015)

    Article  Google Scholar 

  69. Reunanen, J.: Overfitting in making comparisons between variable selection methods. JMLR 3, 1371–1382 (2003)

    MATH  Google Scholar 

  70. Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: ICML, pp. 833–840 (2011)

  71. Saini, M.K., Gadde, R., Yan, S., Ooi, W.T.: Movimash: online mobile video mashup. In: ACM Multimedia, pp. 139–148 (2012)

  72. Salakhutdinov, R., Hinton, G.E.: Semantic hashing. Int. J. Approx. Reason. 50(7), 969–978 (2009)

    Article  Google Scholar 

  73. Saul, L.K., Weinberger, K.Q., Ham, J.H., Sha, F., Lee, D.D.: Spectral methods for dimensionality reduction. Semisuperv. Learn., pp. 293–308 (2006)

  74. Zhou, X., Chen, L., Zhang, Y., Cao, L., Huang, G., Wang, C.: Online video recommendation in sharing community. In: SIGMOD, pp. 1645–1656 (2015)

  75. Shalev-Shwartz, S.: Online learning and online convex optimization. Found. Trends Mach. Learn. 4(2), 107–194 (2012)

    Article  MATH  Google Scholar 

  76. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  77. Shi, M., Avrithis, Y., Jegou, H.: Early burst detection for memory-efficient image retrieval. In: CVPR (2015)

  78. Sohn, K., Zhou, G., Lee, C., Lee, H.: Learning and selecting features jointly with point-wise gated Boltzmann machines. In: ICML, pp. 217–225 (2013)

  79. Song, J., Gao, L., Yan, Y., Zhang, D., Sebe, N.: Supervised hashing with pseudo labels for scalable multimedia retrieval. In: ACM Multimedia (2015)

  80. Song, J., Yang, Y., Yang, Y., Huang, Z., Shen, H.T.: Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: SIGMOD, pp. 785–796 (2013)

  81. Song, J., Yang, Y., Huang, Z., Shen, H.T., Hong, R.: Multiple feature hashing for real-time large scale near-duplicate video retrieval. In: ACM Multimedia, pp. 423–432 (2011)

  82. Song, J., Yang, Y., Huang, Z., Shen, H.T., Luo, J.: Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans. Multimed. 15(8), 1997–2008 (2013)

    Article  Google Scholar 

  83. Song, J., Yang, Y., Li, X., Huang, Z., Yang, Y.: Robust hashing with local models for approximate similarity search. IEEE Trans. Cybern. 44(7), 1225–1236 (2014)

    Article  Google Scholar 

  84. Strecha, C., Bronstein, A.M., Bronstein, M.M., Fua, P.: Ldahash: improved matching with smaller descriptors. TPAMI 34(1), 66–78 (2012)

    Article  Google Scholar 

  85. Teng, L., Li, H., Fu, X., Chen, W., Shen, I.-F.: Dimension reduction of microarray data based on local tangent space alignment. In: ICCI, pp. 154–159 (2005)

  86. Torralba, A., Fergus, R., Weiss, Y.: Small codes and large image databases for recognition. In: CVPR, pp. 1–8 (2008)

  87. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. JMLR 11, 3371–3408 (2010)

    MathSciNet  MATH  Google Scholar 

  88. Wang, F., Kang, L., Li, Y.: Sketch-based 3d shape retrieval using convolutional neural networks. In: CVPR (2015)

  89. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV, pp. 3551–3558 (2013)

  90. Wang, J., Wang, J., Song, J., Xin-Shun, X., Shen, H.T., Li, S.: Optimized cartesian k-means. IEEE Trans. Knowl. Data Eng. 27(1), 180–192 (2015)

    Article  Google Scholar 

  91. Wang, J., Wang, J., Yu, N., Li, S.: Order preserving hashing for approximate nearest neighbor search. In: ACM Multimedia (2013)

  92. Wang, J., Kumar, S., Chang, S.-F.: Semi-supervised hashing for large-scale search. TPAMI 34(12), 2393–2406 (2012)

    Google Scholar 

  93. Wang, W., Huang, Y., Wang, Y., Wang, L.: Generalized autoencoder: a neural network framework for dimensionality reduction. In: CVPR Workshops, pp. 496–503 (2014)

  94. Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB, pp. 194–205 (1998)

  95. Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: NIPS, pp. 1753–1760 (2008)

  96. Wichterich, M., Assent, I., Kranen, P., Seidl, T.: Efficient emd-based similarity search in multimedia databases via flexible dimensionality reduction. In: SIGMOD, pp. 199–212 (2008)

  97. Wu, S., Flach, P.A.: Feature selection with labelled and unlabelled data. In: ECML/PKDD, pp. 156–167 (2002)

  98. Xu, H., Wang, J., Li, Z., Zeng, G., Li, S., Yu, N.: Complementary hashing for approximate nearest neighbor search. In: ICCV, pp. 1631–1638 (2011)

  99. Yao, B., Khosla, A., Fei-Fei, L.: Classifying actions and measuring action similarity by modeling the mutual context of objects and human poses. In: ICML (2011)

  100. Zhang, L., Zhang, Y., Tang, J., Lu, K., Tian, Q.: Binary code ranking with weighted hamming distance. In: CVPR (2013)

  101. Zhang, Y., Sohn, K., Villegas, R., Pan, G., Lee, H.: Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction. In: CVPR (2015)

  102. Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J. Sci. Comput. 26(1), 313–338 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  103. Zhen, Y., Yeung, D.-Y.: A probabilistic model for multimodal hash function learning. In: KDD, pp. 940–948 (2012)

  104. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. In: ICLR (2015)

  105. Zhou, K., Liu, Y., Song, J., Yan, L., Zou, F., Shen, F.: Deep self-taught hashing for image retrieval. In: ACM Multimedia (2015)

  106. Zhu, X., Huang, Z., Cheng, H., Cui, J., Shen, H.T.: Sparse hashing for fast multimedia search. ACM Trans. Inf. Syst. 31(2), 9 (2013)

    Article  Google Scholar 

  107. Zhu, X., Huang, Z., Shen, H.T., Cheng, J., Xu, C.: Dimensionality reduction by mixed kernel canonical correlation analysis. Pattern Recognit. 45(8), 3003–3016 (2012)

    Article  MATH  Google Scholar 

  108. Zhu, X., Huang, Z., Shen, H.T., Zhao, X.: Linear cross-modal hashing for efficient multimedia search. In: ACM Multimedia, pp. 143–152 (2013)

  109. Zhu, X., Huang, Z., Yang, Y., Shen, H.T., Xu, C., Luo, J.: Self-taught dimensionality reduction on the high-dimensional small-sized data. Pattern Recognit. 46(1), 215–229 (2013)

    Article  MATH  Google Scholar 

  110. Zhu, X., Suk, H.-I., Lee, S.-W., Shen, D.: Canonical feature selection for joint regression and multi-class identification in Alzheimers disease diagnosis. Brain Imaging Behav., pp. 1–11 (2015). doi:10.1007/s11682-015-9430-4

  111. Zhu, X., Suk, H.-I., Lee, S.-W., Shen, D.: Subspace regularized sparse multi-task learning for multi-class neurodegenerative disease identification. IEEE Trans. Biomed. Eng. (2015)

  112. Zhu, X., Suk, H.-I., Shen, D.: Sparse discriminative feature selection for multi-class alzheimer’s disease classification. In: MICCAI, pp. 157–164 (2014)

  113. Zhu, X., Zhang, L., Huang, Z.: A sparse embedding and least variance encoding approach to hashing. IEEE Trans. Image Process. 23(9), 3737–3750 (2014)

    Article  MathSciNet  Google Scholar 

  114. Zou, F., Chen, Y., Song, J., Zhou, K., Yang, Y., Sebe, N.: Compact image fingerprint via multiple kernel hashing. IEEE Trans. Multime. 17(7), 1006–1018 (2015)

    Article  Google Scholar 

  115. Zou, F., Feng, H., Ling, H., Liu, C., Yan, L., Li, P., Li, D.: Nonnegative sparse coding induced hashing for image copy detection. Neurocomputing 105, 81–89 (2013)

    Article  Google Scholar 

  116. Zou, F., Liu, C., Ling, H., Feng, H., Yan, L., Li, D.: Least square regularized spectral hashing for similarity search. Signal Process. 93(8), 2265–2273 (2013)

    Article  Google Scholar 

Download references

Acknowledgments

The work of Lianli Gao has been partially supported by NSFC (Grant No. 61502080) and by the Fundamental Research Funds for the Central University (Grant No. ZYGX2014J063). The work of Junming Shao has been supported partially by NSFC (Grant Nos. 61403062, 61433014), and Fundamental Research Funds for the Central Universities (Grant No. ZYGX2014J053).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingkuan Song.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, L., Song, J., Liu, X. et al. Learning in high-dimensional multimedia data: the state of the art. Multimedia Systems 23, 303–313 (2017). https://doi.org/10.1007/s00530-015-0494-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-015-0494-1

Keywords

Navigation