MessyTable: Instance Association in Multiple Camera Views

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12356)


We present an interesting and challenging dataset that features a large number of scenes with messy tables captured from multiple camera views. Each scene in this dataset is highly complex, containing multiple object instances that could be identical, stacked and occluded by other instances. The key challenge is to associate all instances given the RGB image of all views. The seemingly simple task surprisingly fails many popular methods or heuristics that we assume good performance in object association. The dataset challenges existing methods in mining subtle appearance differences, reasoning based on contexts, and fusing appearance with geometric cues for establishing an association. We report interesting findings with some popular baselines, and discuss how this dataset could help inspire new problems and catalyse more robust formulations to tackle real-world instance association problems. (Project page:



This research was supported by SenseTime-NTU Collaboration Project, Singapore MOE AcRF Tier 1 (2018-T1-002-056), NTU SUG, and NTU NAP.

Supplementary material

504452_1_En_1_MOESM1_ESM.pdf (11.4 mb)
Supplementary material 1 (pdf 11656 KB)


  1. 1.
    Baqué, P., Fleuret, F., Fua, P.: Deep occlusion reasoning for multi-camera multi-target detection. In: ICCV (2017)Google Scholar
  2. 2.
    Bradski, G.: The OpenCV library. Dr. Dobb’s J. Softw. Tools 25, 120–125 (2000)Google Scholar
  3. 3.
    Caliskan, A., Mustafa, A., Imre, E., Hilton, A.: Learning dense wide baseline stereo matching for people. In: ICCVW (2019)Google Scholar
  4. 4.
    Chavdarova, T., et al.: WILDTRACK: a multi-camera HD dataset for dense unscripted pedestrian detection. In: CVPR (2018)Google Scholar
  5. 5.
    Chavdarova, T., et al.: Deep multi-camera people detection. In: ICMLA (2017)Google Scholar
  6. 6.
    Csurka, G., Humenberger, M.: From handcrafted to deep local features for computer vision applications. CoRR abs/1807.10254 (2018)Google Scholar
  7. 7.
    Fleuret, F., Berclaz, J., Lengagne, R., Fua, P.: Multicamera people tracking with a probabilistic occupancy map. PAMI 30, 267–282 (2007)CrossRefGoogle Scholar
  8. 8.
    Gao, J., Nevatia, R.: Revisiting temporal modeling for video-based person ReID. CoRR abs/1805.02104 (2018)Google Scholar
  9. 9.
    Raja, Y., Gong, S.: Scalable multi-camera tracking in a metropolis. In: Gong, S., Cristani, M., Yan, S., Loy, C.C. (eds.) Person Re-Identification. ACVPR, pp. 413–438. Springer, London (2014). Scholar
  10. 10.
    Gou, M., et al.: A systematic evaluation and benchmark for person re-identification: features, metrics, and datasets. PAMI (2018) Google Scholar
  11. 11.
    Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: MatchNet: unifying feature and metric learning for patch-based matching. In: CVPR (2015)Google Scholar
  12. 12.
    Hsu, H.M., Huang, T.W., Wang, G., Cai, J., Lei, Z., Hwang, J.N.: Multi-camera tracking of vehicles based on deep features Re-ID and trajectory-based camera link models. In: CVPRW (2019)Google Scholar
  13. 13.
    Li, W., Zhao, R., Xiao, T., Wang, X.: DeepReID: deep filter pairing neural network for person re-identification. In: CVPR (2014)Google Scholar
  14. 14.
    Li, W., Mu, J., Liu, G.: Multiple object tracking with motion and appearance cues. In: ICCVW (2019)Google Scholar
  15. 15.
    López-Cifuentes, A., Escudero-Viñolo, M., Bescós, J., Carballeira, P.: Semantic driven multi-camera pedestrian detection. CoRR abs/1812.10779 (2018)Google Scholar
  16. 16.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar
  17. 17.
    Luo, W., et al.: Multiple object tracking: a literature review. CoRR abs/1409.7618 (2014)Google Scholar
  18. 18.
    Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking. CoRR abs/1603.00831 (2016)Google Scholar
  19. 19.
    Milan, A., Rezatofighi, S.H., Dick, A., Reid, I., Schindler, K.: Online multi-target tracking using recurrent neural networks. In: AAAI (2017)Google Scholar
  20. 20.
    Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 17–35. Springer, Cham (2016). Scholar
  21. 21.
    Roig, G., Boix, X., Shitrit, H.B., Fua, P.: Conditional random fields for multi-camera object detection. In: ICCV (2011)Google Scholar
  22. 22.
    Sadeghian, A., Alahi, A., Savarese, S.: Tracking the untrackable: learning to track multiple cues with long-term dependencies. In: ICCV (2017)Google Scholar
  23. 23.
    Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR (2015)Google Scholar
  24. 24.
    Schulter, S., Vernaza, P., Choi, W., Chandraker, M.: Deep network flow for multi-object tracking. In: CVPR (2017)Google Scholar
  25. 25.
    Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P., Moreno-Noguer, F.: Discriminative learning of deep convolutional feature point descriptors. In: ICCV (2015)Google Scholar
  26. 26.
    Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 501–518. Springer, Cham (2018). Scholar
  27. 27.
    Susanto, W., Rohrbach, M., Schiele, B.: 3D object detection with multiple kinects. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7584, pp. 93–102. Springer, Heidelberg (2012). Scholar
  28. 28.
    Wei, L., Zhang, S., Gao, W., Tian, Q.: Person transfer GAN to bridge domain gap for person re-identification. In: CVPR (2018)Google Scholar
  29. 29.
    Wei, X.S., Cui, Q., Yang, L., Wang, P., Liu, L.: RPC: a large-scale retail product checkout dataset. CoRR abs/1901.07249 (2019)Google Scholar
  30. 30.
    Winder, S., Hua, G., Brown, M.: Picking the best DAISY. In: CVPR (2009)Google Scholar
  31. 31.
    Xu, Y., Zhou, X., Chen, S., Li, F.: Deep learning for multiple object tracking: a survey. IET Comput. Vis. 13, 355–368 (2019)CrossRefGoogle Scholar
  32. 32.
    Xu, Y., Liu, X., Liu, Y., Zhu, S.C.: Multi-view people tracking via hierarchical trajectory composition. In: CVPR (2016)Google Scholar
  33. 33.
    Xu, Y., Liu, X., Qin, L., Zhu, S.C.: Cross-view people tracking by scene-centered spatio-temporal parsing. In: AAAI (2017)Google Scholar
  34. 34.
    Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.: Deep learning for person re-identification: a survey and outlook. CoRR abs/2001.04193 (2020)Google Scholar
  35. 35.
    Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: CVPR (2015)Google Scholar
  36. 36.
    Zbontar, J., LeCun, Y.: Computing the stereo matching cost with a convolutional neural network. In: CVPR (2015)Google Scholar
  37. 37.
    Zhang, Z., Wu, J., Zhang, X., Zhang, C.: Multi-target, multi-camera tracking by hierarchical clustering: recent progress on DukeMTMC project. CoRR abs/1712.09531 (2017)Google Scholar
  38. 38.
    Zhao, H., et al.: Spindle net: person re-identification with human body region guided feature decomposition and fusion. In: CVPR (2017)Google Scholar
  39. 39.
    Zheng, W.S., Gong, S., Xiang, T.: Associating groups of people. In: BMVC (2009)Google Scholar
  40. 40.
    Zhou, Y., Shao, L.: Aware attentive multi-view inference for vehicle re-identification. In: CVPR (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.SenseTime ResearchTai PoHong Kong
  2. 2.Nanyang Technological UniversitySingaporeSingapore

Personalised recommendations