Multimedia Tools and Applications

, Volume 76, Issue 5, pp 6595–6622 | Cite as

Customer behavior classification using surveillance camera for marketing



The analysis of customer behavior from surveillance camera is one of the most important open topics for marketing. Traditionally, retailers use the records of cash registers or credit cards to analyze the buying behaviors of customers. However, this information cannot reveal the behaviors of customer when he or she shows interest on the front of the merchandise shelf but does not buy. Those behaviors can be recorded and analyzed by the surveillance camera. We propose a system to classify different customer behaviors on the front of shelf: no interest, viewing, turning body to shelf, touching, picking and returning to shelf and picking and putting into basket, which show customer’s increasing interest to products. In the proposed system, head orientation, body orientation, and arm action, the multiple cues are integrated for the customer behavior recognition. The proposed system discretizes the head and body orientation of customer into 8 directions to estimate whether the customer is looking or turning to the merchandise shelf. Semi-Supervised Learning method is applied to optimize the training dataset and to generate the accurate classifier. In addition, the temporal constraint and the human physical model constraint are considered in joint body and head orientation estimation. As for the arm action recognition, a novel Combined Hand Feature (CHF), which includes hand trajectory, tracking status and the relative position between hand and shopping basket, is proposed to classify different arm actions. The hand tracking is done by an improved particle filter. The CHF is classified by Dynamic Bayesian Network (DBN) to output different types of arm actions. A series of experiments demonstrate effectiveness of the proposed technologies and the performance to the developed system.


Surveillance camera Customer behavior Orientation estimation Arm action classification 


  1. 1.
    Abe S, Morimoto M, Fujii K (2010) Estimating face direction from wideview surveillance camera. In World Automation Congress (WAC), 2010 (pp. 1–6). IEEEGoogle Scholar
  2. 2.
    Benmokhtar R (2014) Robust human action recognition scheme based on high-level feature fusion. Multimedia Tools Appl 69(2):253–275CrossRefGoogle Scholar
  3. 3.
    Chen C, Heili A, Odobez JM (2011). Combined estimation of location and body pose in surveillance video. In Advanced Video and Signal-Based Surveillance (AVSS), 2011 8th IEEE International Conference on (pp. 5–10). IEEEGoogle Scholar
  4. 4.
    Chen F, Wang W (2010) Activity recognition through multi-scale dynamic bayesian network. In Virtual Systems and Multimedia (VSMM), 2010 16th International Conference on (pp. 34–41). IEEEGoogle Scholar
  5. 5.
    Choi W, Savarese S (2012) A unified framework for multi-target tracking and collective activity recognition. In computer vision–ECCV. Springer, Berlin Heidelberg, pp 215–230Google Scholar
  6. 6.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on (Vol. 1, pp. 886–893). IEEEGoogle Scholar
  7. 7.
    Elmezain M, Al-Hamadi A, Michaelis B (2009) Hand trajectory-based gesture spotting and recognition using HMM. In Image Processing (ICIP), 2009 16th IEEE International Conference on (pp. 3577–3580). IEEEGoogle Scholar
  8. 8.
    Gandhi T, Trivedi MM (2008). Image based estimation of pedestrian orientation for improving path prediction. In Intelligent Vehicles Symposium, 2008 I.E. (pp. 506–511). IEEEGoogle Scholar
  9. 9.
    Goffredo M, Bouchrika I, Carter JN, Nixon MS (2010) Performance analysis for automated gait extraction and recognition in multi-camera surveillance. Multimedia Tools Appl 50(1):75–94CrossRefGoogle Scholar
  10. 10.
    Gu Y, Kamijo S (2014) Recognition and pose estimation of urban road users from on-board camera for collision avoidance. In Intelligent Transportation Systems (ITSC), 2014 I.E. 17th International Conference on (pp. 1266–1273). IEEEGoogle Scholar
  11. 11.
    Haritaoglu I, Beymer D, Flickner M (2002) Ghost 3d: detecting body posture and parts using stereo. In Motion and Video Computing, 2002. Proceedings. Workshop on (pp. 175–180). IEEEGoogle Scholar
  12. 12.
    Haritaoglu I, Flickner M (2001) Detection and tracking of shopping groups in stores. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 I.E. Computer Society Conference on (Vol. 1, pp. I-431). IEEEGoogle Scholar
  13. 13.
    Haritaoglu I, Flickner M (2002) Attentive billboards: towards to video based customer behavior understanding. In Applications of Computer Vision, 2002.(WACV 2002). Proceedings. Sixth IEEE Workshop on (pp. 127–131). IEEEGoogle Scholar
  14. 14.
    Hu Y, Cao L, Lv F, Yan S, Gong Y, Huang TS (2009) Action detection in complex scenes with spatial and temporal ambiguities. In Computer Vision, 2009 I.E. 12th International Conference on (pp. 128–135). IEEEGoogle Scholar
  15. 15.
    Lao W, Han J, De With PH (2009) Automatic video-based human motion analyzer for consumer surveillance system. Consumer Electronics, IEEE Trans 55(2):591–598CrossRefGoogle Scholar
  16. 16.
    Lee KD, Nam MY, Chung KY, Lee YH, Kang UG (2013) Context and profile based cascade classifier for efficient people detection and safety care system. Multimedia Tools Appl 63(1):27–44CrossRefGoogle Scholar
  17. 17.
    Leykin A, Tuceryan M (2007) Detecting shopper groups in video sequences. In Advanced Video and Signal Based Surveillance, 2007. AVSS 2007. IEEE Conference on (pp. 417–422). IEEEGoogle Scholar
  18. 18.
    Liu J, Shah M (2008) Learning human actions via information maximization. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on (pp. 1–8). IEEEGoogle Scholar
  19. 19.
    Migniot C, Ababsa F (2013) 3D human tracking from depth cue in a buying behavior analysis context. In Computer Analysis of Images and Patterns (pp. 482–489). Springer Berlin HeidelbergGoogle Scholar
  20. 20.
    Murphy KP (2002) Dynamic bayesian networks: representation, inference and learning. Diss. University of California, BerkeleyGoogle Scholar
  21. 21.
    Niebles JC, Fei-Fei L (2007) A hierarchical model of shape and appearance for human action classification. In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on (pp. 1–8). IEEEGoogle Scholar
  22. 22.
    Popa M, Rothkrantz L, Yang Z, Wiggers P, Braspenning R, Shan C (2010) Analysis of shopping behavior based on surveillance system. In Systems Man and Cybernetics (SMC), 2010 I.E. International Conference on (pp. 2512–2519). IEEEGoogle Scholar
  23. 23.
    Ryoo MS, Aggarwal JK (2009) Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In Computer vision, 2009 ieee 12th international conference on (pp. 1593–1600). IEEEGoogle Scholar
  24. 24.
    Ryoo MS, Aggarwal JK (2009) Semantic representation and recognition of continued and recursive human activities. Int J Comput Vis 82(1):1–24CrossRefGoogle Scholar
  25. 25.
    Sae-ueng S, Ogino A, Kato T (2007) Modeling personal preference using shopping behaviors in ubiquitous information environment. DEWS2007, MarGoogle Scholar
  26. 26.
    Schüldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on (Vol. 3, pp. 32–36). IEEEGoogle Scholar
  27. 27.
    Schulz A, Damer N, Fischer M, Stiefelhagen R (2011) Combined head localization and head pose estimation for video–based advanced driver assistance systems. In pattern recognition. Springer, Berlin Heidelberg, pp 51–60Google Scholar
  28. 28.
    Schulz A, Stiefelhagen R (2012) Video-based pedestrian head pose estimation for risk assessment. In Intelligent Transportation Systems (ITSC), 2012 15th International IEEE Conference on (pp. 1771–1776). IEEEGoogle Scholar
  29. 29.
    Senior AW, Brown L, Hampapur A, Shu C-F, Zhai Y, Feris RS, Tian Y-L, Borger S, Carlson C (2007) Video analytics for retail. In Advanced Video and Signal Based Surveillance, 2007. AVSS 2007. IEEE Conference on (pp. 423–428)Google Scholar
  30. 30.
    Shao L, Ji L, Liu Y, Zhang J (2012) Human action segmentation and recognition via motion and shape analysis. Pattern Recogn Lett 33(4):438–445CrossRefGoogle Scholar
  31. 31.
    Shechtman E, Irani M (2005) Space-time behavior based correlation. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on (Vol. 1, pp. 405–412). IEEEGoogle Scholar
  32. 32.
    Stan CE, Dumitrescu D, Caras V, Tiliute DE, Pop E, Anghel LE (2008) Intelligent store-an innovative technological solution for retail activities with mobile access. In Computing in the Global Information Technology, 2008. ICCGI’08. The Third International Multi-Conference on (pp. 7–11). IEEEGoogle Scholar
  33. 33.
    Trinh H, Fan Q, Pan J, Gabbur P, Miyazawa S, Pankanti S (2011) Detecting human activities in retail surveillance using hierarchical finite state machine. In Acoustics, Speech and Signal Processing (ICASSP), 2011 I.E. International Conference on (pp. 1337–1340). IEEEGoogle Scholar
  34. 34.
    Watanabe T, Ito S, Yokoi K (2010) Co-occurrence histograms of oriented gradients for human detection. Information Media Technol 5(2):659–667Google Scholar
  35. 35.
    Weinland D, Özuysal M, Fua P (2010) Making action recognition robust to occlusions and viewpoint changes. In computer vision–ECCV. Springer, Berlin Heidelberg, pp 635–648Google Scholar
  36. 36.
    Yano S, Gu Y, Kamijo S (2014) Estimation of pedestrian pose and orientation using on-board camera with histograms of oriented gradients features. International Journal of Intelligent Transportation Systems Research, 1–10Google Scholar
  37. 37.
    Yao J, Odobez JM (2007) Multi-layer background subtraction based on color and texture. In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on (pp. 1–8). IEEEGoogle Scholar
  38. 38.
    Zelnik-Manor L, Irani M (2001) Event-based analysis of video. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 I.E. Computer Society Conference on (Vol. 2, pp. II-123). IEEEGoogle Scholar
  39. 39.
    Zhang TY, Suen CY (1984) A fast parallel algorithm for thinning digital patterns. Commun ACM 27(3):236–239CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Graduate School of Information Science and TechnologyThe University of TokyoTokyoJapan
  2. 2.Institute of Industrial ScienceThe University of TokyoTokyoJapan

Personalised recommendations