Deep Learning-Based Improved Object Recognition in Warehouses

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10749)

Abstract

Research migrates in recent years from model-based object detection and classification to data-driven approaches. With the efficiency improvement of computational resources, improved acquisition systems, and bulks of data for training, deep learning models have found their way to accurate object category classification. Deep convolution nets have an inherent ability to extract features automatically and are used for accurate category classification. This paper has three parts. First, we extract moving foregrounds by using a mixture-of-Gaussians technique. Next, we aim at improving the quality of object foreground based on a pixel saliency map. Third, the obtained improved foreground is assigned labels using a pre-trained deep learning detector. Altogether, the paper proposes a way for improved video-based object detection and classification for logistics in warehouses.

References

  1. 1.
    Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Sasstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012)CrossRefGoogle Scholar
  2. 2.
    Breiman, L., Cutler, A.: Random Forests (2004)Google Scholar
  3. 3.
    Cheng, M.M., Mitra, N.J., Huang, X., Torr, P.H., Hu, S.M.: Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 569–582 (2015)CrossRefGoogle Scholar
  4. 4.
    Csurka, G., et al.: Visual categorization with bags of keypoints. Workshop Stat. Learn. Comput. Vis. 1, 1–22 (2004)Google Scholar
  5. 5.
    Chang, C.C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)CrossRefGoogle Scholar
  6. 6.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. Proc. Comput. Vis. Pattern Recognit. 1, 886–893 (2005)Google Scholar
  7. 7.
    Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: DeCAF: a deep convolutional activation feature for generic visual recognition. In: Proceedings of the International Conference on Machine Learning, pp. 647–655 (2014)Google Scholar
  8. 8.
    Felzenszwalb, P.F., et al.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  9. 9.
    Friedman, N., Russell, S.: Image segmentation in video sequences: a probabilistic approach. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 175–181 (1997)Google Scholar
  10. 10.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 580–587 (2014)Google Scholar
  11. 11.
    Girshick, R., Fast R-CNN. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)Google Scholar
  12. 12.
    Guo, D.J., Zhe-Ming, L., Hao, L.: Multi-channel adaptive mixture background model for real-time tracking. J. Inf. Hiding Multimed. Sig. Process. 7, 216–221 (2016)Google Scholar
  13. 13.
    Haines, T.S.F., Xiang, T.: Background subtraction with Dirichlet processes. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012 Part IV. LNCS, vol. 7575, pp. 99–113. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33765-9_8 CrossRefGoogle Scholar
  14. 14.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014 Part III. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10578-9_23 Google Scholar
  15. 15.
    Hou, X., Zhang, L.: Saliency detection: a spectral residual approach. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, pp. 1–8 (2007)Google Scholar
  16. 16.
    KaewTraKulPong, P., Bowden, R.: An improved adaptive background mixture model for real time tracking with shadow detection. In: Remagnino, P., Jones, G.A., Paragios, N., Regazzoni, C.S. (eds.) Video-Based Surveillance Systems, pp. 135–144. Springer, Boston (2002).  https://doi.org/10.1007/978-1-4615-0913-4_11 CrossRefGoogle Scholar
  17. 17.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the Advances Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  18. 18.
    LeCun, Y., Yoshua, B., Geoffrey, H.: Deep learning. Nature 521, 436–444 (2015)CrossRefGoogle Scholar
  19. 19.
    Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the International Conference on Machine Learning, pp. 609–616 (2009)Google Scholar
  20. 20.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  21. 21.
    Panda, D.K., Meher, S.: A Gaussian mixture model with Gaussian weight learning rate and foreground detection using neighbourhood correlation. In: Proceedings of the IEEE Asia Pacific Conference on Postgraduate Research Microelectronics, pp. 158–163 (2013)Google Scholar
  22. 22.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher Kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010 Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15561-1_11 CrossRefGoogle Scholar
  23. 23.
    Perazzi, F., Krahenbuhl, P., Pritch, Y., Hornung, A.: Saliency filters, contrast based filtering for salient region detection. In: Proceedings of the IEEE Conference on Computing Vision Pattern Recognition, pp. 733–740 (2012)Google Scholar
  24. 24.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the Advances Neural Inform processing systems, pp. 91–99 (2015)Google Scholar
  25. 25.
    Sermanet, P., Kavukcuoglu, K., Chintala, S.C., Le-Cun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, pp. 3626–3633 (2013)Google Scholar
  26. 26.
    Sohn, K., Zhou, G., Lee, C., Lee, H.: Learning and selecting features jointly with point-wise gated Boltzmann machines. In: Proceedings of the International Conference on Machine Learning, pp. 217–225 (2013)Google Scholar
  27. 27.
    Stauffer, C., Grimson, W.E.L.: Adaptive background mixture models for real-time tracking. Proc. IEEE Comput. Vis. Pattern Recognit. 2, 246–252 (1999)Google Scholar
  28. 28.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference Computer Vision Pattern Recognition, pp. 1–9 (2015)Google Scholar
  29. 29.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, pp. 2818–2826 (2016)Google Scholar
  30. 30.
    TensorFlow: how to retrain inception’s final layer for new categories. www.tensorflow.org/tutorials/. Accessed 26 July 2017
  31. 31.
    Tian, Y.L., Lu, M., Hampapur, A.: Robust and efficient foreground analysis for real-time video surveillance. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 1, 1182–1187 (2005)Google Scholar
  32. 32.
    Wu, Y., Liu, Y., Li, J., Liu, H., and Hu, X.: Traffic sign detection based on convolutional neural networks. In: Proceedings of the International Joint Conference on Neural Networks, pp. 1–7 (2013)Google Scholar
  33. 33.
    Xia, Y., Hu, R., Wang, Z., Lu, T.: Moving foreground detection based on spatio-temporal saliency. Int. J. Comput. Sci. Issues 10(3), 79–84 (2013)Google Scholar
  34. 34.
    Xia, H., Song, S., He, L.: A modified Gaussian mixture background model via spatiotemporal distribution with shadow detection. Sig. Image Video Process. 2(10), 343–350 (2016)CrossRefGoogle Scholar
  35. 35.
    Zang, Q., Klette, R.: Evaluation of an adaptive composite gaussian model in video surveillance. In: Petkov, N., Westenberg, M.A. (eds.) CAIP 2003. LNCS, vol. 2756, pp. 165–172. Springer, Heidelberg (2003).  https://doi.org/10.1007/978-3-540-45179-2_21 CrossRefGoogle Scholar
  36. 36.
    Zhou, L., Yang, Z., Yuan, Q., Zhou, Z., Hu, D.: Salient region detection via integrating diffusion-based compactness and local contrast. Proc. IEEE Trans. Image Process. 11, 3308–3320 (2015)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Zivkovic, Z.: Improved adaptive Gaussian mixture model for background subtraction. Proc. IEEE Int. Conf. Pattern Recognit. 2, 28–31 (2004)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Electrical and Computer EngineeringAuckland University of TechnologyAucklandNew Zealand
  2. 2.Crown Lift Trucks Ltd.AucklandNew Zealand

Personalised recommendations