Advertisement

Neural Graph Matching Networks for Fewshot 3D Action Recognition

  • Michelle GuoEmail author
  • Edward Chou
  • De-An Huang
  • Shuran Song
  • Serena Yeung
  • Li Fei-Fei
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11205)

Abstract

We propose Neural Graph Matching (NGM) Networks, a novel framework that can learn to recognize a previous unseen 3D action class with only a few examples. We achieve this by leveraging the inherent structure of 3D data through a graphical representation. This allows us to modularize our model and lead to strong data-efficiency in few-shot learning. More specifically, NGM Networks jointly learn a graph generator and a graph matching metric function in an end-to-end fashion to directly optimize the few-shot learning objective. We evaluate NGM on two 3D action recognition datasets, CAD-120 and PiGraphs, and show that learning to generate and match graphs both lead to significant improvement of few-shot 3D action recognition over the holistic baselines.

Supplementary material

474172_1_En_40_MOESM1_ESM.pdf (3.4 mb)
Supplementary material 1 (pdf 3514 KB)

References

  1. 1.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  2. 2.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  3. 3.
    Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles, J.: ActivityNet: a large-scale video benchmark for human activity understanding, pp. 961–970 (2015)Google Scholar
  4. 4.
    Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR (2014)Google Scholar
  5. 5.
    Luo, Z., Peng, B., Huang, D.A., Alahi, A., Fei-Fei, L.: Unsupervised learning of long-term motion dynamics for videos. In: CVPR (2017)Google Scholar
  6. 6.
    Misra, I., Zitnick, C.L., Hebert, M.: Shuffle and learn: unsupervised learning using temporal order verification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 527–544. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_32CrossRefGoogle Scholar
  7. 7.
    Lai, K., Bo, L., Fox, D.: Unsupervised feature learning for 3D scene labeling. In: ICRA (2014)Google Scholar
  8. 8.
    Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
  9. 9.
    Koppula, H.S., Gupta, R., Saxena, A.: Learning human activities and object affordances from RGB-D videos. Int. J. Robot. Res. 32, 951–970 (2013)CrossRefGoogle Scholar
  10. 10.
    Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+ D: a large scale dataset for 3D human activity analysis. In: CVPR (2016)Google Scholar
  11. 11.
    Song, S., Lichtenberg, S.P., Xiao, J.: SUN RGB-D: a RGB-D scene understanding benchmark suite. In: CVPR (2015)Google Scholar
  12. 12.
    Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems, pp. 3630–3638 (2016)Google Scholar
  13. 13.
    Garcia, V., Bruna, J.: Few-shot learning with graph neural networks. In: ICLR (2018)Google Scholar
  14. 14.
    Snell, J., Swersky, K., Zemel, R.S.: Prototypical networks for few-shot learning. arXiv preprint arXiv:1703.05175 (2017)
  15. 15.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  16. 16.
    Masci, J., Boscaini, D., Bronstein, M., Vandergheynst, P.: Geodesic convolutional neural networks on riemannian manifolds. In: ICCV Workshops (2015)Google Scholar
  17. 17.
    Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shapes. In: CVPR (2015)Google Scholar
  18. 18.
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. arXiv preprint arXiv:1612.00593 (2016)
  19. 19.
    Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Neural module networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 39–48 (2016)Google Scholar
  20. 20.
    Johnson, J., et al.: Inferring and executing programs for visual reasoning. arXiv preprint arXiv:1705.03633 (2017)
  21. 21.
    Hu, R., Andreas, J., Rohrbach, M., Darrell, T., Saenko, K.: Learning to reason: end-to-end module networks for visual question answering. CoRR, abs/1704.05526 3 (2017)Google Scholar
  22. 22.
    Savva, M., Chang, A.X., Hanrahan, P., Fisher, M., Nießner, M.: PiGraphs: learning interaction snapshots from observations. ACM Trans. Graph. (TOG) 35(4), 139 (2016)CrossRefGoogle Scholar
  23. 23.
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)Google Scholar
  24. 24.
    Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning (2016)Google Scholar
  25. 25.
    Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: ICML (2016)Google Scholar
  26. 26.
    Santoro, A., Raposo, D., Barrett, D.G., Malinowski, M., Pascanu, R., Battaglia, P., Lillicrap, T.: A simple neural network module for relational reasoning. arXiv preprint arXiv:1706.01427 (2017)
  27. 27.
    Oreifej, O., Liu, Z.: HON4D: histogram of oriented 4D normals for activity recognition from depth sequences. In: CVPR (2013)Google Scholar
  28. 28.
    Rahmani, H., Mahmood, A., Huynh, D.Q., Mian, A.: HOPC: histogram of oriented principal components of 3D pointclouds for action recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 742–757. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10605-2_48CrossRefGoogle Scholar
  29. 29.
    Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR (2012)Google Scholar
  30. 30.
    Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D skeletons as points in a lie group. In: CVPR (2014)Google Scholar
  31. 31.
    Li, C., Wang, P., Wang, S., Hou, Y., Li, W.: Skeleton-based action recognition using LSTM and CNN. arXiv preprint arXiv:1707.02356 (2017)
  32. 32.
    Liu, M., Chen, C., Meng, F.M., Liu, H.: 3D action recognition using multi-temporal skeleton visualization. In: CVPR 2017, p. 391 (2017)Google Scholar
  33. 33.
    Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:1801.07455 (2018)
  34. 34.
    Ikizler, N., Forsyth, D.A.: Searching for complex human activities with no visual examples. Int. J. Comput. Vis. 80, 337–357 (2008)CrossRefGoogle Scholar
  35. 35.
    Gu, C., et al.: AVA: a video dataset of spatio-temporally localized atomic visual actions. CoRR, CoRR:1705.08421 (2017)Google Scholar
  36. 36.
    Johnson, J., et al.: Image retrieval using scene graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3668–3678 (2015)Google Scholar
  37. 37.
    Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)Google Scholar
  38. 38.
    Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: LINE: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. International World Wide Web Conferences Steering Committee (2015)Google Scholar
  39. 39.
    Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)Google Scholar
  40. 40.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  41. 41.
    Kearnes, S., McCloskey, K., Berndl, M., Pande, V., Riley, P.: Molecular graph convolutions: moving beyond fingerprints. J. Comput. Aided Mol. Des. 30(8), 595–608 (2016)CrossRefGoogle Scholar
  42. 42.
    Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
  43. 43.
    Garcia, V., Bruna, J.: Few-shot learning with graph neural networks. arXiv preprint arXiv:1711.04043 (2017)
  44. 44.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)Google Scholar
  45. 45.
    Lu, C., Krishna, R., Bernstein, M., Fei-Fei, L.: Visual relationship detection with language priors. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 852–869. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_51CrossRefGoogle Scholar
  46. 46.
    Ullmann, J.R.: An algorithm for subgraph isomorphism. J. ACM (JACM) 23(1), 31–42 (1976)MathSciNetCrossRefGoogle Scholar
  47. 47.
    Riesen, K., Jiang, X., Bunke, H.: Exact and inexact graph matching: Methodology and applications. In: Aggarwal, C., Wang, H. (eds.) Managing and Mining Graph Data, pp. 217–247. Springer, Boston (2010)CrossRefGoogle Scholar
  48. 48.
    Morrison, P., Zou, J.J.: Inexact graph matching using a hierarchy of matching processes. Comput. Vis. Media 1(4), 291–307 (2015)CrossRefGoogle Scholar
  49. 49.
    Cai, H., Zheng, V.W., Chang, K.: A comprehensive survey of graph embedding: problems, techniques and applications. IEEE Trans. Knowl. Data Eng. (2018)Google Scholar
  50. 50.
    Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: NIPS, pp. 3844–3852 (2016)Google Scholar
  51. 51.
    Jain, A., Zamir, A.R., Savarese, S., Saxena, A.: Structural-RNN: deep learning on spatio-temporal graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5308–5317 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Computer Science DepartmentStanford UniversityStanfordUSA
  2. 2.Computer Science DepartmentPrinceton UniversityPrincetonUSA

Personalised recommendations