Abstract
Retail product classification can be beneficial in the world of commerce, take for example helping vision-disabled parties in their shopping or evaluating product placement strategy. However, the available datasets for retail product classification are few and some have very distinct distribution of training and evaluation data, thus providing a huge challenge on its own. In addition, there are only few researches on this subject which can still be improved on. This paper attempts to improve previous approaches for retail product classification on very distinct training and evaluation data distribution by utilizing convolutional neural network (CNN) models inspired by well-performing CNN models in general image classification task, which later can be fine-tuned for other computer vision tasks, namely, object detection. The results show that VGG-16 performs at 66.9167% accuracy and a new modified VGG-16 model named VGG-16-D attains 66.83% accuracy with 85% fewer parameters than VGG-16, outperforming most existing approaches considering several comparison baselines.
Similar content being viewed by others
REFERENCES
J. Deng, W. Dong, R. Socher, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Miami, 2009 (IEEE, 2019), pp. 248–255. https://doi.org/10.1109/CVPR.2009.5206848
M. George and C. Floerkemeier, “Recognizing products: A per-exemplar multi-label image classification approach,” in Computer Vision–ECCV 2014, Ed. by D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Lecture Notes in Computer Science, vol. 8690 (Springer, Cham, 2014), pp. 440–455. https://doi.org/10.1007/978-3-319-10605-2_29
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016 (IEEE, 2016), pp. 770–778. https://doi.org/10.1109/CVPR.2016.90
G. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature.” arXiv:1207.0580 [cs.NE]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS’12: Proc. 25th Int. Conf. on Neural Information Processing Systems, Lake Tahoe, Nev., 2012, Ed. by F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Curran Associates, Red Hook, N.Y., 2012), Vol. 1, pp. 1097–1105.
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE 86, 2278–2234 (1998). https://doi.org/10.1109/5.726791
Y. Lecun, C. Cortes, and C. Burges, “MNIST handwritten digit database,” ATT Labs. http://yann.lecun.com/exdb/mnist
Q. Li, X. Peng, L. Cao, W. Du, H. Xing, and Y. Qiao, “Product image recognition with guidance learning and noisy supervision,” Comput. Vision Image Understanding 196, 102963 (2019). https://doi.org/10.1016/j.cviu.2020.102963
M. Lin, Q. Chen, and S. Yan, “Network in network.” arXiv:1312.4400v3 [cs.NE]
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single shot multiBox detector,” in Computer Vision–ECCV 2016, Ed. by B. Leibe, J. Matas, N. Sebe, and M. Welling, Lecture Notes in Computer Science, vol. 9905 (Springer, Cham, 2016), pp. 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
D. G. Lowe, “Object recognition from local scale-invariant features,” in Proc. 7th Int. Conf. on Computer Vision, Corfu, 1999 (IEEE, 1999), Vol. 2, pp. 1150–1157. https://doi.org/10.1109/ICCV.1999.790410
D. G. Lowe, “Distinctive image features from scale-invariant key points,” Int. J. Comput. Vision 60, 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94
A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier Nonlinearities Improve Neural Network Acoustic Models,” in Proceedings of the 30th Int. Conf. on Machine Learning, Atlanta, 2013.
M. Merler, C. Galleguillos, and S. Belongie, “Recognizing groceries in situ using in vitro training data,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Minneapolis, 2007 (IEEE, 2007), pp. 1–8. https://doi.org/10.1109/CVPR.2007.383486
R. Rajagopal, “Comparative analysis of COVID-19 X‑ray images classification using convolutional neural network, transfer learning, and machine learning classifiers using deep features,” Pattern Recognit. Image Anal. 31, 313–322 (2021). https://doi.org/10.1134/S1054661821020140
J. Redmon, Darknet: Open source neural networks in C (2013). https://pjreddie.com/darknet/
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2015 (IEEE, 2015), pp. 779–788. https://doi.org/10.1109/CVPR.2016.91
J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” in IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, Hawaii, 2017 (IEEE, 2017), vol. 1, pp. 6517–6525. https://doi.org/10.1109/CVPR.2017.690
J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement.” arXiv:1804.02767 [cs.CV]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vision 115, 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
B. Santra, A. Paul, and D. P. Mukherjee, “Deterministic Dropout for Deep Neural Networks Using Composite Random Forest,” Pattern Recognit. Lett. 131, 205–212 (2020). https://doi.org/10.1016/j.patrec.2019.12.023
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Int. Conf. on Learning Representations, 2015. arXiv:1409.1556 [cs.CV]
M. M. Srivastava, “Bag of tricks for retail product image classification,” Image Analysis and Recognition. ICIAR 2020, Ed. by A. Campilho, F. Karray, and Z. Wang, Lecture Notes in Computer Science, vol. 12131 (Springer, Cham, 2020), pp. 71–82. https://doi.org/10.1007/978-3-030-50347-5_8
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learning Res. 15, 1929–1958 (2014).
A. C. Wilson, R. Roelofs, M. Stern, N. Srebro, and B. Recht, “The Marginal Value of Adaptive Gradient Methods in Machine Learning,” in 31st Conf. on Neural Information Processing Systems (NIPS 2017), Long Beach, Calif., 2017, Ed. by U. von Luxburg, I. Guyon, S. Bengio, H. Wallach, and R. Fergus (Curran Associates, Red Hook, N. Y., 2017), pp. 4151–4161.
S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, Hawaii, 2017 (IEEE, 2017), pp. 5987–5995. https://doi.org/10.1109/CVPR.2017.634
F. Yu, V. Koltun, and T. Funkhouser, “Dilated residual networks,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 2017 (IEEE, 2017), pp. 636–644. https://doi.org/10.1109/CVPR.2017.75
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
COMPLIANCE WITH ETHICAL STANDARDS
This article is a completely original work of its authors; it has not been published before and will not be sent to other publications until the PRIA Editorial Board decides not to accept it for publication.
Conflict of Interest
The process of writing and the content of the article do not give grounds for raising the issue of a conflict of interest.
Additional information
Jonathan is a candidate of Master of Computer Science at Bina Nusantara University’s BINUS Graduate Program. He has published several researches on computer vision, pattern recognition, and software engineering on various international journals. His research interests include computer vision, pattern recognition, deep learning, and software engineering.
Gede Putra Kusuma received PhD degree in Electrical and Electronic Engineering from Nanyang Technological University (NTU), Singapore, in 2013. He is currently working as a Lecturer and Research Coordinator in Computer Science Department, Bina Nusantara University, Indonesia. Before joining Bina Nusantara University, he was working as a Research Scientist in I2R–A*STAR, Singapore. His research interests include pattern recognition, machine learning, face recognition, appearance-based object recognition, mobile learning, and gamification of learning.
Rights and permissions
About this article
Cite this article
Jonathan, Kusuma, G.P. Retail Product Classification on Distinct Distribution of Training and Evaluation Data. Pattern Recognit. Image Anal. 32, 142–152 (2022). https://doi.org/10.1134/S105466182104012X
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S105466182104012X