Advertisement

Generative View-Correlation Adaptation for Semi-supervised Multi-view Learning

Conference paper
  • 571 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12359)

Abstract

Multi-view learning (MVL) explores the data extracted from multiple resources. It assumes that the complementary information between different views could be revealed to further improve the learning performance. There are two challenges. First, it is difficult to effectively combine the different view data while still fully preserve the view-specific information. Second, multi-view datasets are usually small, which means the model can be easily overfitted. To address the challenges, we propose a novel View-Correlation Adaptation (VCA) framework in semi-supervised fashion. A semi-supervised data augmentation me-thod is designed to generate extra features and labels based on both labeled and unlabeled samples. In addition, a cross-view adversarial training strategy is proposed to explore the structural information from one view and help the representation learning of the other view. Moreover, an effective and simple fusion network is proposed for the late fusion stage. In our model, all networks are jointly trained in an end-to-end fashion. Extensive experiments demonstrate that our approach is effective and stable compared with other state-of-the-art methods (Code is available on: https://github.com/wenwen0319/GVCA).

Keywords

Multi-view learning Data augmentation Semi-supervised learning 

Notes

Acknowledgement

This research is supported by the U.S. Army Research Office Award W911NF-17-1-0367.

References

  1. 1.
    Azad, R., Asadi-Aghbolaghi, M., Kasaei, S., Escalera, S.: Dynamic 3D hand gesture recognition by learning weighted depth motion maps. IEEE Trans. Circuits Syst. Video Technol. 29, 1729–1740 (2018)CrossRefGoogle Scholar
  2. 2.
    Banica, D., Sminchisescu, C.: Second-order constrained parametric proposals and sequential search-based structured prediction for semantic segmentation in RGB-D images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3517–3526 (2015)Google Scholar
  3. 3.
    Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.: MixMatch: a holistic approach to semi-supervised learning. arXiv preprint arXiv:1905.02249 (2019)
  4. 4.
    Cai, Z., Wang, L., Peng, X., Qiao, Y.: Multi-view super vector for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 596–603 (2014)Google Scholar
  5. 5.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)CrossRefGoogle Scholar
  6. 6.
    Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning. IEEE Trans. Neural Netw. 20(3), 542 (2009)CrossRefGoogle Scholar
  7. 7.
    Cheng, Y., Zhao, X., Cai, R., Li, Z., Huang, K., Rui, Y., et al.: Semi-supervised multimodal deep learning for RGB-D object recognition (2016)Google Scholar
  8. 8.
    Cheng, Z., Qin, L., Ye, Y., Huang, Q., Tian, Q.: Human daily action analysis with multi-view and color-depth data. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7584, pp. 52–61. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33868-7_6CrossRefGoogle Scholar
  9. 9.
    Ding, Z., Shao, M., Fu, Y.: Robust multi-view representation: a unified perspective from multi-view learning to domain adaption. In: Proceedings of the International Joint Conferences on Artificial Intelligence, pp. 5434–5440 (2018)Google Scholar
  10. 10.
    Du, D., Wang, L., Wang, H., Zhao, K., Wu, G.: Translate-to-recognize networks for RGB-D scene recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11836–11845 (2019)Google Scholar
  11. 11.
    Girdhar, R., Ramanan, D., Gupta, A., Sivic, J., Russell, B.: ActionVLAD: learning spatio-temporal aggregation for action classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, p. 3 (2017)Google Scholar
  12. 12.
    Gupta, S., Hoffman, J., Malik, J.: Cross modal distillation for supervision transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2827–2836 (2016)Google Scholar
  13. 13.
    Holte, M.B., Moeslund, T.B., Nikolaidis, N., Pitas, I.: 3D human action recognition for multi-view camera systems. In: Proceedings of the International conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, pp. 342–349 (2011)Google Scholar
  14. 14.
    Ji, X., Wang, C., Li, Y.: A view-invariant action recognition based on multi-view space hidden Markov models. Int. J. Hum. Robot. 11(01), 1450011 (2014)CrossRefGoogle Scholar
  15. 15.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  16. 16.
    Li, Y., Zhang, J., Cheng, Y., Huang, K., Tan, T.: DF2Net: discriminative feature learning and fusion network for RGB-D indoor scene classification. In: Proceedings of AAAI Conference on Artificial Intelligence (2018)Google Scholar
  17. 17.
    Lin, Y.C., Hu, M.C., Cheng, W.H., Hsieh, Y.H., Chen, H.M.: Human action recognition and retrieval using sole depth information. In: Proceedings of the ACM International Conference on Multimedia, pp. 1053–1056 (2012)Google Scholar
  18. 18.
    Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008) zbMATHGoogle Scholar
  19. 19.
    Nie, F., Cai, G., Li, X.: Multi-view clustering and semi-supervised classification with adaptive neighbours. In: Proceedings of AAAI Conference on Artificial Intelligence (2017)Google Scholar
  20. 20.
    Nie, F., Li, J., Li, X., et al.: Parameter-free auto-weighted multiple graph learning: a framework for multiview clustering and semi-supervised classification. In: Proceedings of International Joint Conferences on Artificial Intelligence, pp. 1881–1887 (2016)Google Scholar
  21. 21.
    Nie, F., Tian, L., Wang, R., Li, X.: Multiview semi-supervised learning model for image classification. IEEE Trans. Knowl. Data Eng. (2019)Google Scholar
  22. 22.
    Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Berkeley MHAD: a comprehensive multimodal human action database. In: IEEE Workshop on Applications of Computer Vision, pp. 53–60 (2013)Google Scholar
  23. 23.
    Pagliari, D., Pinto, L.: Calibration of Kinect for Xbox one and comparison between the two generations of Microsoft sensors. Sensors 15, 27569–27589 (2015)CrossRefGoogle Scholar
  24. 24.
    Rahmani, H., Mahmood, A., Huynh, D., Mian, A.: Histogram of oriented principal components for cross-view action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(12), 2430–2443 (2016)CrossRefGoogle Scholar
  25. 25.
    Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)Google Scholar
  26. 26.
    Verma, V., Lamb, A., Beckham, C., Courville, A., Mitliagkis, I., Bengio, Y.: Manifold mixup: encouraging meaningful on-manifold interpolation as a regularizer. stat 1050, vol. 13 (2018)Google Scholar
  27. 27.
    Wang, A., Cai, J., Lu, J., Cham, T.J.: Modality and component aware feature fusion for RGB-D scene classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5995–6004 (2016)Google Scholar
  28. 28.
    Wang, D., Ouyang, W., Li, W., Xu, D.: Dividing and aggregating network for multi-view action recognition. In: Proceedings of European Conference on Computer Vision (September 2018)Google Scholar
  29. 29.
    Wang, L., Ding, Z., Fu, Y.: Learning transferable subspace for human motion segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence (2018)Google Scholar
  30. 30.
    Wang, L., Ding, Z., Fu, Y.: Low-rank transfer human motion segmentation. IEEE Trans. Image Process. 28(2), 1023–1034 (2019)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Wang, L., Ding, Z., Tao, Z., Liu, Y., Fu, Y.: Generative multi-view human action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6212–6221 (2019)Google Scholar
  32. 32.
    Wang, L., Liu, Y., Qin, C., Sun, G., Fu, Y.: Dual relation semi-supervised multi-label learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (2020)Google Scholar
  33. 33.
    Wang, L., Sun, B., Robinson, J., Jing, T., Fu, Y.: EV-Action: electromyography-vision multi-modal action dataset. In: Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (2020)Google Scholar
  34. 34.
    Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Proceedings of European Conference on Machine Learning, pp. 20–36 (2016)Google Scholar
  35. 35.
    Wang, W., Zhou, Z.-H.: Analyzing co-training style algorithms. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 454–465. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-74958-5_42CrossRefGoogle Scholar
  36. 36.
    Yang, Y., Zhan, D.C., Sheng, X.R., Jiang, Y.: Semi-supervised multi-modal learning with incomplete modalities. In: Proceedings of International Joint Conferences on Artificial Intelligence, pp. 2998–3004 (2018)Google Scholar
  37. 37.
    Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: Mixup: beyond empirical risk minimization. In: Proceedings of International Conference on Learning Representations (2018)Google Scholar
  38. 38.
    Zhang, Z.: Microsoft Kinect sensor and its effect. IEEE Multimed. 19(2), 4–10 (2012)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Northeastern UniversityBostonUSA
  2. 2.Indiana University-Purdue University IndianapolisIndianapolisUSA

Personalised recommendations