Abstract
Automatic Check-Out (ACO) aims to accurately predict the presence and count of each category of products in check-out images, where a major challenge is the significant domain gap between training data (single-product exemplars) and test data (check-out images). To mitigate the gap, we propose a method, termed as PSP, to perform Prototype-based classifier learning from Single-Product exemplars. In PSP, by revealing the advantages of representing category semantics, the prototype representation of each product category is firstly obtained from single-product exemplars. Based on the prototypes, it then generates categorical classifiers with a background classifier to not only recognize fine-grained product categories but also distinguish background upon product proposals derived from check-out images. To further improve the ACO accuracy, we develop discriminative re-ranking to both adjust the predicted scores of product proposals for bringing more discriminative ability in classifier learning and provide a reasonable sorting possibility by considering the fine-grained nature. Moreover, a multi-label recognition loss is also equipped for modeling co-occurrence of products in check-out images. Experiments are conducted on the large-scale RPC dataset for evaluations. Our ACO result achieves 86.69%, by 6.18% improvements over state-of-the-arts, which demonstrates the superiority of PSP. Our codes are available at https://github.com/Hao-Chen-NJUST/PSP.
Keywords
- Automatic check-out
- Prototype
- Classifier learning
X.-S. Wei and Y. Shen are also with Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, and Jiangsu Key Lab of Image and Video Understanding for Social Security, Nanjing University of Science and Technology, China. This work is supported by National Key R &D Program of China (2021YFA1001100), Natural Science Foundation of China under Grant (61871226), Natural Science Foundation of Jiangsu Province of China under Grant (BK20210340), the Fundamental Research Funds for the Central Universities (No. 30920041111, No. NJ2022028), CAAI-Huawei MindSpore Open Fund, Beijing Academy of Artificial Intelligence (BAAI), and Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX22_0464).
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR, pp. 6154–6162 (2018)
Chen, C., Zheng, Z., Huang, Y., Ding, X., Yu, Y.: I3Net: implicit instance-invariant network for adapting one-stage object detectors. In: CVPR, pp. 12576–12585 (2021)
Chen, K., et al.: MMDetection: open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., Sun, J.: You only look one-level feature. In: CVPR, pp. 13039–13048 (2021)
Chen, Z.M., Jin, X., Zhao, B., Wei, X.S., Guo, Y.: Hierarchical context embedding for region-based object detection. In: ECCV, pp. 633–648 (2020)
Follmann, P., Bottger, T., Hartinger, P., Konig, R., Ulrich, M.: MVTec D2S: densely segmented supermarket dataset. In: ECCV, pp. 569–585 (2018)
Frontoni, E., Raspa, P., Mancini, A., Zingaretti, P., Placidi, V.: Customers’ activity recognition in intelligent retail environments. In: ICIAP, pp. 509–516 (2013)
George, M., Floerkemeier, C.: Recognizing products: a per-exemplar multi-label image classification approach. In: ECCV, pp. 440–455 (2014)
Georgiadis, K., et al.: Products-6K: a large-scale groceries product recognition dataset. In: PETRA, pp. 1–7 (2021)
Girshick, R.: Fast R-CNN. In: CVPR, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE TPAMI 37(9), 1904–1916 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015)
Jund, P., Abdo, N., Eitel, A., Burgard, W.: The freiburg groceries dataset. CoRR abs/1611.05799 (2016)
Koubaroulis, D., Matas, J., Kittler, J.: Evaluating colour-based object recognition algorithms using the SOIL-47 database. In: ACCV, pp. 840–845 (2002)
Kozerawski, J., Turk, M.: CLEAR: cumulative learning for one-shot one-class image recognition. In: CVPR, pp. 3446–3455 (2018)
Lapin, M., Hein, M., Schiele, B.: Analysis and optimization of loss functions for multiclass, top-k, and multilabel classification. IEEE TPAMI 40(7), 1533–1554 (2018)
Li, C., Du, D., Zhang, L., Luo, T., Wu, Y., Tian, Q., Wen, L., Lyu, S.: Data priming network for automatic check-out. In: ACM MM, pp. 2152–2160 (2019)
Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: Light-head R-CNN: in defense of two-stage object detector. arXiv preprint arXiv:1711.07264 (2017)
Li, Z., Hoiem, D.: Learning without forgetting. IEEE TPAMI 40(12), 2935–2947 (2018)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 936–944 (2017)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017)
Liu, A., Wang, J., Liu, X., Cao, B., Zhang, C., Yu, H.: Bias-based universal adversarial patch attack for automatic check-out. In: ECCV, pp. 395–410 (2020)
Liu, W., et al.: SSD: single shot multibox detector. In: ECCV, pp. 21–37 (2016)
Merler, M., Galleguillos, C., Belongie, S.: Recognizing groceries in situ using in vitro training data. In: CVPR, pp. 1–8 (2007)
Paolanti, M., Liciotti, D., Pietrini, R., Mancini, A., Frontoni, E.: Modelling and forecasting customer navigation in intelligent retail environments. JINT 91(2), 165–180 (2018)
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: NeurIPS, pp. 8026–8037 (2019)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You Only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NeurIPS, pp. 91–99 (2015)
Sciucca, L.D., Manco, D., Contigiani, M., Pietrini, R., Bello, L.D., Placidi, V.: Shoppers detection analysis in an intelligent retail environment. In: ICPR, pp. 534–546 (2021)
Tan, Z., Nie, X., Qian, Q., Li, N., Li, H.: Learning to rank proposals for object detection. In: ICCV, pp. 8273–8281 (2019)
Tychsen-Smith, L., Petersson, L.: Improving object localization with fitness NMS and bounded iou loss. In: CVPR, pp. 6877–6885 (2018)
Vieville, T., Crahay, S.: Using an hebbian learning rule for multi-class SVM classifiers. J. Comput. Neurosci. 17(3), 271–287 (2004)
Wang, Q., Liu, X., Liu, W., Liu, A.A., Liu, W., Mei, T.: MetaSearch: incremental product search via deep meta-learning. IEEE TIP 29, 7549–7564 (2020)
Wang, Y.X., Hebert, M.: Learning to learn: model regression networks for easy small sample learning. In: ECCV, pp. 616–634 (2016)
Wei, X.S., Cui, Q., Yang, L., Wang, P., Liu, L., Yang, J.: RPC: a large-scale and fine-grained retail product checkout dataset. Sci. China Inf. Sci. (2022). https://doi.org/10.1007/s11432-022-F3513-y
Wei, X.S., Shen, Y., Sun, X., Ye, H.J., Yang, J.: A\(^{2}\)-Net: Learning attribute-aware hash codes for large-scale fine-grained image retrieval. In: NeurIPS, pp. 5720–5730 (2021)
Wei, X.S., et al.: Fine-grained image analysis with deep learning: a survey. IEEE TPAMI (2021). https://doi.org/10.1109/TPAMI.2021.3126648
Wei, X.S., Wang, P., Liu, L., Shen, C., Wu, J.: Piecewise classifier mappings: Learning fine-grained learners for novel categories with few examples. IEEE TIP 28(12), 6116–6125 (2019)
Wu, Y., et al.: Rethinking classification and localization for object detection. In: CVPR, pp. 10186–10195 (2020)
Yang, Y., Sheng, L., Jiang, X., Wang, H., Xu, D., Cao, X.B.: IncreACO: incrementally learned automatic check-out with photorealistic exemplar augmentation. In: WACV, pp. 626–634 (2021)
Yeh, M.C., Li, Y.N.: Multilabel deep visual-semantic embedding. IEEE TPAMI 42(6), 1530–1536 (2020)
Zhan, X., et al.: Product1M: towards weakly supervised instance-level product retrieval via cross-modal pretraining. In: ICCV, pp. 11782–11791 (2021)
Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: FreeAnchor: learning to match anchors for visual object detection. In: NeurIPS, pp. 147–155 (2019)
Zhao, L., Yao, J., Du, H., Zhao, J., Zhang, R.: A unified object detection framework for intelligent retail container commodities. In: ICIP, pp. 3891–3895 (2019)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp. 2223–2232 (2017)
Acknowledgments
The authors would like to thank the anonymous reviewers for their critical and constructive comments and suggestions. We gratefully acknowledge the support of MindSpore, CANN (Compute Architecture for Neural Networks) and Ascend AI Processor used for this research.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, H., Wei, XS., Zhang, F., Shen, Y., Xu, H., Xiao, L. (2022). Automatic Check-Out via Prototype-Based Classifier Learning from Single-Product Exemplars. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13685. Springer, Cham. https://doi.org/10.1007/978-3-031-19806-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-19806-9_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19805-2
Online ISBN: 978-3-031-19806-9
eBook Packages: Computer ScienceComputer Science (R0)