Skip to main content
Log in

Lightweight identification of retail products based on improved convolutional neural network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Due to the similar appearances among many retail products, it is a big challenge to identify the product with high accuracy and low computational cost in smart retail scenes. In this paper, we proposed a lightweight retail product identification and localization method based on an improved convolutional neural network. First, we use group convolution and deep separable convolution to optimize the structure of the backbone network and reduce the amount of calculation. Second, the multiscale structure was adjusted to optimal scales. We further use the k-means clustering algorithm to re-cluster six anchors with different sizes. Third, we introduced spatial pyramid pooling (SPP) to replace pooling by convolution to effectively improve the robustness against image distortion, such as cropping and scaling. Finally, we use mosaic data enhancement method to improved the robustness of the network. Experiments on the RPC dataset show that, compared with YOLOv5, the number of parameters is reduced by 1/6.4 times, and FLOPs is reduced by 1/9 times. Experiments on the DeepBlue Retail Dataset show that compared with YOLOv5, the number of parameters is reduced by 1/7.8 times, and FLOPs is reduced by 1/9.3 times. Realtime evaluation under the same hardware show that the FPS of the proposed model is 123 in the forward inference test, while the FPS of the YOLOv5 model under the same conditions is 58.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Baz I, Yoruk E, Cetin M (2016) Context-aware hybrid classification system for fine-grained retail product recognition. In: 2016 IEEE 12th image, video, and multidimensional signal processing workshop, Bordeaux, France, pp 1–5

  2. Bochkovskiy A, Wang CY , Liao H (2020) YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv:2004.10934

  3. Chong T, Bustan I, Wee M (2016) Deep learning approach to planogram compliance in retail stores. Semantic Scholar, pp 1–6

  4. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition, San Diego, CA, pp 886–893

  5. Efraty B, Huang C, Shah SK, Kakadiaris IA (2011) Facial landmark detection in uncontrolled conditions. In: 2011 International joint conference on biometrics, pp 1–8

  6. Farren D (2017) Classifying food items by image using Convolutional Neural Networks

  7. Geng W, Han F, Lin J et al (2018) Fine-grained grocery product recognition by one-shot learning. In: Proceedings of the 26th ACM International conference on multimedia, Republic of Seoul, Korea, pp 1706–1714

  8. Girshick R et al (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In: 2014 IEEE conference on computer vision and pattern recognition, Columbus, OH, USA, pp 580–587

  9. Girshick R (2015) Fast R-CNN. In: 2015 IEEE international conference on computer vision, Santiago, Chile, pp 1440–1448

  10. He K et al (2015) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  11. Howard AG et al (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:1704.04861

  12. Huang C, Jiang H (2019) Image indexing and content analysis in children’s picture books using a large-scale database. Multimed Tools Appl 78 (15):20679–20695

    Article  Google Scholar 

  13. Huang C, Efraty BA, Kurkure U, Papadakis M, Shah SK, Kakadiaris IA (2012) Facial landmark configuration for improved detection. In: 2012 IEEE International workshop on information forensics and security, pp 13–18

  14. Huang C, Jin Y, Zhao Y, Yu Y, Zhao L (2009) Speech emotion recognition based on re-composition of two-class classifiers. In: 2009 3rd International conference on affective computing and intelligent interaction and workshops, pp 1–3

  15. Huang C et al (2013) Practical speech emotion recognition based on online learning: From acted data to elicited data. Mathematical Problems in Engineering

  16. Huang CW, Jin Y, Zhao Y, Yu YH, Zhao L (2010) Design and establishment of practical speech emotion database. Tech Acoust 29(4):396–399

    Google Scholar 

  17. Huang C, Jiang H (2019) Image indexing and content analysis in children’s picture books using a large-scale database. Multimed Tools Appl 78 (15):20679–20695

    Article  Google Scholar 

  18. Jin Y, Zhao Y, Huang C, Zhao L (2010) The design and establishment of a Chinese whispered speech emotion database. Tech Acoust 29(1):63–68

    Google Scholar 

  19. Jin Y, Zhao Y, Huang C, Zhao L (2009) Study on the emotion recognition of whispered speech. In: 2009 WRI global congress on intelligent systems, vol 3, pp 242–246

  20. Jin Y, Zhao Y, Huang C, Zhao L (2010) The design and establishment of a Chinese whispered speech emotion database. Tech Acoust 29(1):63–68

    Google Scholar 

  21. Jund P, Abdo N, Eitel A et al (2016) The Freiburg groceries dataset. arXiv preprint, arXiv:1611.05799

  22. Karlinsky L, Shtok J, Tzur Y et al (2017) Fine-grained recognition of thousands of object categories with single-example training. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4113–4122

  23. Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. In: International conference on neural information processing systems, Minneapolis, MN, USA, pp 1–8

  24. Kumar K, Shrimankar D (2018) F-DES: Fast and Deep Event Summarization. IEEE Trans Multimed 20(2):323–334

    Article  Google Scholar 

  25. Kumar K, Shrimankar D et al (2018) Eratosthenes sieve based key-frame extraction technique for event summarization in videos. Multimed Tools Appl 77:7383–7404

    Article  Google Scholar 

  26. Kumar K, Shrimankar D (2018) Deep Event Learning boosT-up Approach: DELTA. Multimed Tools and Appl 77:26635–26655

    Article  Google Scholar 

  27. Kumar K (2021) Text query based summarized event searching interface system using deep learning over cloud. Multimed Tools and Appl 80:11079–11094

    Article  Google Scholar 

  28. Kumar K, Sinha S, Manupriya P D-pnr: Deep license plate number recognition. Proceedings of 2nd International Conference on Computer Vision & Image Processing, pp 37–46, (2018)

  29. Leutenegger S, Chli M, Siegwart RY (2011) BRISK: Binary Robust invariant scalable keypoints. In: 2011 International conference on computer vision, Barcelona, Spain, pp 2548–2555

  30. Lin T et al (2017) Feature Pyramid Networks for Object Detection. In: 2017 IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, pp 936–944

  31. Liu L, Zhou B, Zou Z et al (2018) A smart unstaffed retail shop based on artificial intelligence and IoT. 2018 IEEE 23rd International workshop on computer aided modeling and design of communication links and networks (CAMAD), pp 1–4

  32. Lowe DG (2004) Distinctive Image Features from Scale-Invariant Keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  33. Luo V, Huang C et al (2013) Emotional feature analysis and recognition from Vietnamese speech. J Signal Process 29(10):1423–1432

    Google Scholar 

  34. Milella A et al (2021) 3D Vision-Based Shelf Monitoring System for Intelligent Retail, ICPR International Workshops and Challenges, Milan, Italy, pp 447–459

  35. Merler M, Galleguillos C, Belongie S (2007) Recognizing groceries in situ using in vitro training data. In: 2007 IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2007.383486, pp 1–8

  36. Paolanti M et al (2019) Robotic retail surveying by deep learning visual and textual data. Robot Auton Syst 118:179–188

    Article  Google Scholar 

  37. Ren S et al (2017) Faster R-CNN: towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  38. Redmon J et al (2016) You Only Look Once: Unified, Real-Time Object Detection. In: 2016 IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, pp 779–788

  39. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: 2017 IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, pp 7263–7271

  40. Redmon J, Farhadi A (2018) YOLOv3: An Incremental Improvement. arXiv:1804.02767

  41. Santra B, Mukherjee DP (2019) A comprehensive survey on computer vision based approaches for automatic identification of products in retail store. Image Vis Comput 86:45–63

    Article  Google Scholar 

  42. Shankar V et al (2021) How Technology is Changing Retail. J Retail 97(1):13–27

    Article  Google Scholar 

  43. Sharma S, Kumar K, Singh N (2017) D-FES: Deep facial expression recognition system, 2017 Conference on Information and Communication Technology (CICT), pp 1–6. https://doi.org/10.1109/INFOCOMTECH.2017.8340635

  44. Sharma S, Kumar K (2021) ASL-3DCNN: American sign language recognition technique using 3-D convolutional neural networks. Multimed Tools and Appl 80:26319–26331

    Article  Google Scholar 

  45. Singh N, Dhanak N et al (2017) HDML: habit detection with machine learning. ICCCT-2017: Proceedings of the 7th International Conference on Computer and Communication Technology, pp 29–33

  46. Sun H, Zhang J, Akashi T (2020) TemplateFree: product detection on retail store shelves, vol 15

  47. Sriram T et al (1996) Applications of barcode technology in automated storage and retrieval systems. In: Proceedings of the 22nd international conference on industrial electronics, control, and instrumentation, Taipei, Taiwan, pp 641–646

  48. Srivastava MM (2020) Bag of Tricks for Retail Product Image Classification. In: Image analysis and recognition, Póvoa de Varzim, Porto, Portugal, pp 71–82

  49. Szegedy C et al (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition, Boston, MA, pp 1–9

  50. Sonmez EB, Albayrak S (2017) A survey of product recognition in shelf images. 2017 International Conference on Computer Science and Engineering (UBMK), pp 145–150

  51. Tonioni A, Di Stefano L (2019) Domain invariant hierarchical embedding for grocery products recognition. Computer Vision and Image Understanding, (182):81-92

  52. Want R (2006) An introduction to, RFID technology. IEEE Pervasive Computing 5(1):25–33

    Article  Google Scholar 

  53. Wang W et al (2020) A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Comput Appl 32(18):14613–14622

    Article  Google Scholar 

  54. Wu C, Huang C, Chen H (2015) Automatic recognition of emotions and actions in bi-modal video analysis. In: International conference on internet of vehicles, pp 427–438

  55. Wei X-S et al (2019) RPC: A large-scale retail product checkout dataset. arXiv preprint, arXiv:1901.072491901.07249. URL: https://rpc-dataset.github.io/, accessed on May 22, 2022

  56. Yan J, Lu G, Li X, et al. (2020) FENP: a database of neonatal facial expression for pain analysis. IEEE transactions on affective computing, https://doi.org/10.1109/TAFFC.2020.3030296

  57. Yun S et al (2019) CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features. In: 2019 IEEE/CVF international conference on computer vision, Seoul, Korea (South), pp 6022–6031

  58. Yan J, Lu G, Li X et al (2020) FENP: a database of neonatal facial expression for pain analysis. IEEE Transactions on Affective Computing

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liye Zhao.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Huang, C., Zhao, L. et al. Lightweight identification of retail products based on improved convolutional neural network. Multimed Tools Appl 81, 31313–31328 (2022). https://doi.org/10.1007/s11042-022-12872-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12872-6

Keywords

Navigation