Exchangeable Deep Neural Networks for Set-to-Set Matching and Learning

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12362)


Matching two different sets of items, called heterogeneous set-to-set matching problem, has recently received attention as a promising problem. The difficulties are to extract features to match a correct pair of different sets and also preserve two types of exchangeability required for set-to-set matching: the pair of sets, as well as the items in each set, should be exchangeable. In this study, we propose a novel deep learning architecture to address the abovementioned difficulties and also an efficient training framework for set-to-set matching. We evaluate the methods through experiments based on two industrial applications: fashion set recommendation and group re-identification. In these experiments, we show that the proposed method provides significant improvements and results compared with the state-of-the-art methods, thereby validating our architecture for the heterogeneous set matching problem.


Set to set matching Deep learning Permutation invariance 

Supplementary material

504472_1_En_37_MOESM1_ESM.pdf (252 kb)
Supplementary material 1 (pdf 252 KB)


  1. 1.
    Arandjelovic, O., Shakhnarovich, G., Fisher, J., Cipolla, R., Darrell, T.: Face recognition with image sets using manifold density divergence. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 581–588. IEEE (2005)Google Scholar
  2. 2.
    Bai, Y., Ding, H., Bian, S., Chen, T., Sun, Y., Wang, W.: SimGNN: a neural network approach to fast graph similarity computation. In: Proceedings of the 12th ACM International Conference on Web Search and Data Mining, pp. 384–392 (2019)Google Scholar
  3. 3.
    Bai, Y., Ding, H., Sun, Y., Wang, W.: Convolutional set matching for graph similarity. arXiv preprint arXiv:1810.10866 (2018)
  4. 4.
    Cai, Y., Takala, V., Pietikainen, M.: Matching groups of people by covariance descriptor. In: 2010 20th International Conference on Pattern Recognition, pp. 2744–2747. IEEE (2010)Google Scholar
  5. 5.
    Caspi, Y., Simakov, D., Irani, M.: Feature-based sequence-to-sequence matching. Int. J. Comput. Vis. 68(1), 53–64 (2006)CrossRefGoogle Scholar
  6. 6.
    Cevikalp, H., Triggs, B.: Face recognition based on image sets. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2567–2573. IEEE (2010)Google Scholar
  7. 7.
    Cucurull, G., Taslakian, P., Vazquez, D.: Context-aware visual compatibility prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12617–12626 (2019)Google Scholar
  8. 8.
    Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
  9. 9.
    Feng, J., Karaman, S., Chang, S.F.: Deep image set hashing. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1241–1250. IEEE (2017)Google Scholar
  10. 10.
    Fey, M., Lenssen, J.E., Morris, C., Masci, J., Kriege, N.M.: Deep graph matching consensus. arXiv preprint arXiv:2001.09621 (2020)
  11. 11.
    Gao, Z., Wang, D., He, X., Zhang, H.: Group-pair convolutional neural networks for multi-view based 3D object retrieval. In: 32nd AAAI Conference on Artificial Intelligence (2018)Google Scholar
  12. 12.
    Gionis, A., Indyk, P., Motwani, R., et al.: Similarity search in high dimensions via hashing. In: VLDB, vol. 99, pp. 518–529 (1999)Google Scholar
  13. 13.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)Google Scholar
  14. 14.
    Guo, M., Chou, E., Huang, D.-A., Song, S., Yeung, S., Fei-Fei, L.: Neural graph matching networks for fewshot 3D action recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 673–689. Springer, Cham (2018). Scholar
  15. 15.
    Hamm, J., Lee, D.D.: Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 376–383 (2008)Google Scholar
  16. 16.
    Han, X., Wu, Z., Jiang, Y.G., Davis, L.S.: Learning fashion compatibility with bidirectional LSTMs. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 1078–1086. ACM (2017)Google Scholar
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  18. 18.
    He, R., Packer, C., McAuley, J.: Learning compatibility across categories for heterogeneous item recommendation. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 937–942. IEEE (2016)Google Scholar
  19. 19.
    Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. CoRR abs/1703.07737 (2017).
  20. 20.
    Howard, A.G., et al.: MobileNets: Efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017).
  21. 21.
    Hsiao, W.L., Grauman, K.: Creating capsule wardrobes from fashion images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7161–7170 (2018)Google Scholar
  22. 22.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)Google Scholar
  23. 23.
    Hu, Y., Mian, A.S., Owens, R.: Sparse approximated nearest points for image set classification. In: CVPR 2011, pp. 121–128. IEEE (2011)Google Scholar
  24. 24.
    Huang, Z., Wang, Z., Hung, T., Satoh, S., Lin, C.: Group re-identification via transferred representation and adaptive fusion. In: 2019 IEEE 5th International Conference on Multimedia Big Data (BigMM), pp. 128–132 (September 2019).
  25. 25.
    Huang, Z., Wu, J., Van Gool, L.: Building deep networks on Grassmann manifolds. In: 32nd AAAI Conference on Artificial Intelligence (2018)Google Scholar
  26. 26.
    Huang, Z., Wang, Z., Hu, W., Lin, C.W., Satoh, S.: DoT-GNN: domain-transferred graph neural network for group re-identification. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1888–1896 (2019)Google Scholar
  27. 27.
    Huang, Z., Wang, Z., Satoh, S., Lin, C.W.: Group re-identification via transferred single and couple representation learning. arXiv preprint arXiv:1905.04854 (2019)
  28. 28.
    Ilse, M., Tomczak, J.M., Welling, M.: Attention-based deep multiple instance learning. arXiv preprint arXiv:1802.04712 (2018)
  29. 29.
    Iwata, T., Kanagawa, M., Hirao, T., Fukumizu, K.: Unsupervised group matching with application to cross-lingual topic matching without alignment information. Data Min. Knowl. Discov. 31(2), 350–370 (2017)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Iwata, T., Lloyd, J.R., Ghahramani, Z.: Unsupervised many-to-many object matching for relational data. IEEE Trans. Pattern Anal. Mach. Intell. 38(3), 607–617 (2015)CrossRefGoogle Scholar
  31. 31.
    Jain, S., Wallace, B.C.: Attention is not explanation (2019)Google Scholar
  32. 32.
    Jiang, L., Zhou, Z., Leung, T., Li, L., Fei-Fei, L.: MentorNet: Regularizing very deep neural networks on corrupted labels. CoRR abs/1712.05055 (2017),
  33. 33.
    Kim, J., McCourt, M., You, T., Kim, S., Choi, S.: Practical Bayesian optimization over sets (2019)Google Scholar
  34. 34.
    Kim, T.K., Kittler, J., Cipolla, R.: Discriminative learning and recognition of image set classes using canonical correlations. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1005–1018 (2007)CrossRefGoogle Scholar
  35. 35.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  36. 36.
    Le, D.T., Lauw, H.W., Fang, Y.: Correlation-sensitive next-basket recommendation. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 2808–2814. International Joint Conferences on Artificial Intelligence Organization (July 2019).
  37. 37.
    Lee, J., Lee, Y., Kim, J., Kosiorek, A., Choi, S., Teh, Y.W.: Set transformer: a framework for attention-based permutation-invariant neural networks. In: International Conference on Machine Learning, pp. 3744–3753 (2019)Google Scholar
  38. 38.
    Li, Y., Gu, C., Dullien, T., Vinyals, O., Kohli, P.: Graph matching networks for learning the similarity of graph structured objects. arXiv preprint arXiv:1904.12787 (2019)
  39. 39.
    Li, Y., Cao, L., Zhu, J., Luo, J.: Mining fashion outfit composition using an end-to-end deep learning approach on set data. CoRR abs/1608.03016 (2016).
  40. 40.
    Lin, W., et al.: Group reidentification with multigrained matching and integration. IEEE Trans. Cybern. (2019)Google Scholar
  41. 41.
    Lisanti, G., Martinel, N., Bimbo, A.D., Foresti, G.L.: Group re-identification via unsupervised transfer of sparse features encoding. CoRR abs/1707.09173 (2017).
  42. 42.
    Liu, D., Liang, C., Zhang, Z., Qi, L., Lovell, B.C.: Exploring inter-instance relationships within the query set for robust image set matching. Sensors 19(22), 5051 (2019)CrossRefGoogle Scholar
  43. 43.
    Liu, X., et al.: Permutation-invariant feature restructuring for correlation-aware image set-based recognition. arXiv preprint arXiv:1908.01174 (2019)
  44. 44.
    Liu, Y., Yan, J., Ouyang, W.: Quality aware network for set to set recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5790–5799 (2017)Google Scholar
  45. 45.
    Lu, J., Wang, G., Deng, W., Moulin, P., Zhou, J.: Multi-manifold deep metric learning for image set classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1137–1145 (2015)Google Scholar
  46. 46.
    Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the ICML, vol. 30, p. 3 (2013)Google Scholar
  47. 47.
    Maron, H., Ben-Hamu, H., Shamir, N., Lipman, Y.: Invariant and equivariant graph networks. arXiv preprint arXiv:1812.09902 (2018)
  48. 48.
    Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Advances in Neural Information Processing Systems, pp. 4826–4837 (2017)Google Scholar
  49. 49.
    Moon, H., Phillips, P.J.: Computational and performance aspects of PCA-based face-recognition algorithms. Perception 30(3), 303–321 (2001)CrossRefGoogle Scholar
  50. 50.
    Mudgal, S., et al.: Deep learning for entity matching: a design space exploration. In: Proceedings of the 2018 International Conference on Management of Data, pp. 19–34 (2018)Google Scholar
  51. 51.
    Nakamura, T., Goto, R.: Outfit generation and style extraction via bidirectional LSTM and autoencoder. CoRR abs/1807.03133 (2018).
  52. 52.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. CoRR abs/1603.06937 (2016).
  53. 53.
    Nguyen, H.V., Bai, L.: Cosine similarity metric learning for face verification. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6493, pp. 709–720. Springer, Heidelberg (2011). Scholar
  54. 54.
    Parkhi, O.M., Vedaldi, A., Zisserman, A., et al.: Deep face recognition. In: BMVC, vol. 1, p. 6 (2015)Google Scholar
  55. 55.
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: Deep learning on point sets for 3D classification and segmentation. CoRR abs/1612.00593 (2016).
  56. 56.
    Rendle, S., Freudenthaler, C., Schmidt-Thieme, L.: Factorizing personalized Markov chains for next-basket recommendation. In: Proceedings of the 19th International Conference on World Wide Web, pp. 811–820. ACM (2010)Google Scholar
  57. 57.
    Ristani, E., Solera, F., Zou, R.S., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking (2016)Google Scholar
  58. 58.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  59. 59.
    Saito, Y., Hong, P.K., Niihara, T., Miyamoto, H., Fukumizu, K.: Data-driven taxonomy matching of asteroid and meteorite. Meteoritics Planetary Science 55(1), 193–206 (2020). Scholar
  60. 60.
    Sannai, A., Takai, Y., Cordonnier, M.: Universal approximations of permutation invariant/equivariant functions by deep neural networks. arXiv preprint arXiv:1903.01939 (2019)
  61. 61.
    Sarwar, B.M., Karypis, G., Konstan, J.A., Riedl, J., et al.: Item-based collaborative filtering recommendation algorithms. In: WWW 2001, pp. 285–295 (2001)Google Scholar
  62. 62.
    Segol, N., Lipman, Y.: On universal equivariant set networks. arXiv preprint arXiv:1910.02421 (2019)
  63. 63.
    Shakhnarovich, G., Fisher, J.W., Darrell, T.: Face recognition from long-term observations. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2352, pp. 851–865. Springer, Heidelberg (2002). Scholar
  64. 64.
    Si, J., et al.: Dual attention matching network for context-aware feature sequence based person re-identification (2018)Google Scholar
  65. 65.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  66. 66.
    Sogi, N., Nakayama, T., Fukui, K.: A method based on convex cone model for image-set classification with CNN features. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018)Google Scholar
  67. 67.
    Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 1857–1865. Curran Associates, Inc. (2016).
  68. 68.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)Google Scholar
  69. 69.
    Vasileva, M.I., Plummer, B.A., Dusad, K., Rajpal, S., Kumar, R., Forsyth, D.A.: Learning type-aware embeddings for fashion compatibility. CoRR abs/1803.09196 (2018).
  70. 70.
    Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017).
  71. 71.
    Vincent, P., Bengio, Y.: K-local hyperplane and convex distance nearest neighbor algorithms. In: Advances in Neural Information Processing Systems, pp. 985–992 (2002)Google Scholar
  72. 72.
    Vinyals, O., Bengio, S., Kudlur, M.: Order matters: Sequence to sequence for sets. arXiv preprint arXiv:1511.06391 (2015)
  73. 73.
    Wagstaff, E., Fuchs, F.B., Engelcke, M., Posner, I., Osborne, M.: On the limitations of representing functions on sets. arXiv preprint arXiv:1901.09006 (2019)
  74. 74.
    Wang, J., et al.: Learning fine-grained image similarity with deep ranking. CoRR abs/1404.4661 (2014).
  75. 75.
    Wang, R., Guo, H., Davis, L.S., Dai, Q.: Covariance discriminative learning: a natural and efficient approach to image set classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2496–2503. IEEE (2012)Google Scholar
  76. 76.
    Wang, R., Shan, S., Chen, X., Gao, W.: Manifold-manifold distance with application to face recognition based on image set. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)Google Scholar
  77. 77.
    Xiao, H., et al.: Group re-identification: leveraging and integrating multi-grain information. In: Proceedings of the 26th ACM International Conference on Multimedia, MM 2018, pp. 192–200. ACM, New York (2018).
  78. 78.
    Xie, W., Shen, L., Zisserman, A.: Comparator networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 811–826. Springer, Cham (2018). Scholar
  79. 79.
    Yamaguchi, O., Fukui, K., Maeda, K.: Face recognition using temporal image sequence. In: Proceedings 3rd IEEE International Conference on Automatic Face and Gesture Recognition, pp. 318–323. IEEE (1998)Google Scholar
  80. 80.
    Yang, M., Zhu, P., Van Gool, L., Zhang, L.: Face recognition based on regularized nearest points between image sets. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–7. IEEE (2013)Google Scholar
  81. 81.
    Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)Google Scholar
  82. 82.
    Yarotsky, D.: Universal approximations of invariant maps by neural networks. arXiv preprint arXiv:1804.10306 (2018)
  83. 83.
    Yoshida, T., Takeuchi, I., Karasuyama, M.: Learning interpretable metric between graphs: convex formulation and computation with graph mining. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, pp. 1026–1036. Association for Computing Machinery, New York (2019).
  84. 84.
    Yu, R., Dou, Z., Bai, S., Zhang, Z., Xu, Y., Bai, X.: Hard-aware point-to-set deep metric for person re-identification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 196–212. Springer, Cham (2018). Scholar
  85. 85.
    Zaheer, M., Kottur, S., Ravanbakhsh, S., Póczos, B., Salakhutdinov, R., Smola, A.J.: Deep sets. CoRR abs/1703.06114 (2017).
  86. 86.
    Zanfir, A., Sminchisescu, C.: Deep learning of graph matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2684–2693 (2018)Google Scholar
  87. 87.
    Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: Proceedings of the IEEE International Conference on Computer Vision (2015)Google Scholar
  88. 88.
    Zheng, W.S., Gong, S., Xiang, T.: Associating groups of people. In: BMVC, vol. 2 (2009)Google Scholar
  89. 89.
    Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. arXiv preprint arXiv:1708.04896 (2017)
  90. 90.
    Zhou, S., Wang, J., Wang, J., Gong, Y., Zheng, N.: Point to set similarity based deep feature learning for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3741–3750 (2017)Google Scholar
  91. 91.
    Zhu, F., Chu, Q., Yu, N.: Consistent matching based on boosted salience channels for group re-identification. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 4279–4283. IEEE (2016)Google Scholar
  92. 92.
    Zhu, P., Zhang, L., Zuo, W., Zhang, D.: From point to set: extend the learning of distance metrics. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2664–2671 (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.ZOZO ResearchShibuyaJapan
  2. 2.The Graduate University for Advanced Studies, SOKENDAITachikawaJapan
  3. 3.Wakayama UniversityWakayamaJapan
  4. 4.The Institute of Statistical MathematicsTachikawaJapan

Personalised recommendations