Egocentric Object Recognition Leveraging the 3D Shape of the Grasping Hand

  • Yizhou Lin
  • Gang HuaEmail author
  • Philippos Mordohai
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8927)


We present a systematic study on the relationship between the 3D shape of a hand that is about to grasp an object and recognition of the object to be grasped. In this paper, we investigate the direction from the shape of the hand to object recognition for unimpaired users. Our work shows that the 3D shape of a grasping hand from an egocentric point of view can help improve recognition of the objects being grasped. Previous work has attempted to exploit hand interactions or gaze information in the egocentric setting to guide object segmentation. However, all such analyses are conducted in 2D. We hypothesize that the 3D shape of a grasping hand is highly correlated to the physical attributes of the object being grasped. Hence, it can provide very beneficial visual information for object recognition. We validate this hypothesis by first building a 3D, egocentric vision pipeline to segment and reconstruct dense 3D point clouds of the grasping hands. Then, visual descriptors are extracted from the point cloud and subsequently fed into an object recognition system to recognize the object being grasped. Our experiments demonstrate that the 3D hand shape can indeed greatly help improve the visual recognition accuracy, when compared with the baseline where only 2D image features are utilized.


Mobile and wearable systems Egocentric and first-person vision Activity monitoring systems Rehabilitation aids 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Delamarre, Q., Faugeras, O.: Finding pose of hand in video images: a stereo-based approach. In: IEEE International Conference on Automatic Face and Gesture Recognition, pp. 585–590 (1998)Google Scholar
  2. 2.
    Dewaele, G., Devernay, F., Horaud, R.: Hand Motion from 3D Point Trajectories and a Smooth Surface Model. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3021, pp. 495–507. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  3. 3.
    Fathi, A., Farhadi, A., Rehg, J.: Understanding egocentric activities. In: ICCV, pp. 407–414 (2011)Google Scholar
  4. 4.
    Fathi, A., Ren, X., Rehg, J.: Learning to recognize objects in egocentric activities. In: CVPR, pp. 3281–3288 (2011)Google Scholar
  5. 5.
    Fathi, A., Li, Y., Rehg, J.M.: Learning to Recognize Daily Actions Using Gaze. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 314–327. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  6. 6.
    Go, A.S., Mozaffarian, D., Roger, V.L., Benjamin, E.J., Berry, J.D., Borden, W.B., Bravata, D.M., Dai, S., Ford, E.S., Fox, C.S., Franco, S., Fullerton, H.J., Gillespie, C., Hailpern, S.M., Heit, J.A., Howard, V.J., Huffman, M.D., Kissela, B.M., Kittner, S.J., Lackland, D.T., Lichtman, J.H., Lisabeth, L.D., Magid, D., Marcus, G.M., Marelli, A., Matchar, D.B., McGuire, D.K., Mohler, E.R., Moy, C.S., Mussolino, M.E., Nichol, G., Paynter, N.P., Schreiner, P.J., Sorlie, P.D., Stein, J., Turan, T.N., Virani, S.S., Wong, N.D., Woo, D., Turner, M.B.: Heart disease and stroke statistics-2013 update: A report from the american heart association. Circulation 127, 6–245 (2013)CrossRefGoogle Scholar
  7. 7.
    Hankey, G.J., Jamrozik, K., Broadhurst, R.J., Forbes, S., Anderson, C.S.: Long-term disability after first-ever stroke and related prognostic factors in the perth community stroke study, 1989-1990. Stroke 33, 1034–1040 (2002)Google Scholar
  8. 8.
    Jojic, N., Perina, A., Murino, V.: Structural epitome: a way to summarize one’s visual experience. In: NIPS (2010)Google Scholar
  9. 9.
    Jones, M., Rehg, J.: Statistical color models with application to skin detection. International Journal of Computer Vision 46(1), 81–96 (2002)CrossRefzbMATHGoogle Scholar
  10. 10.
    Kelly-Hayes, M., Robertson, J.T., Broderick, J.P., Duncan, P.W., Hershey, L.A., Roth, E.J., Thies, W.H., Trombly, C.A.: The American heart association stroke outcome classification: executive summary. Circulation 97, 2474–2478 (1998)CrossRefGoogle Scholar
  11. 11.
    Kwakkel, G., Kollen, B.J., Wagenaar, R.C.: Long term effects of intensity of upper and lower limb training after stroke: a randomised trial. Journal of Neurology, Neurosurgery and Psychiatry 72, 473–479 (2002)Google Scholar
  12. 12.
    Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: CVPR (2012)Google Scholar
  13. 13.
    Mayol, W., Murray, D.: Wearable hand activity recognition for event summarization. In: Ninth IEEE International Symposium on Wearable Computers, pp. 122–129 (2005)Google Scholar
  14. 14.
    Nowak, D.A.: The impact of stroke on the performance of grasping: Usefulness of kinetic and kinematic motion analysis. Neuroscience and Biobehavioral Reviews 32, 1439–1450 (2008)CrossRefGoogle Scholar
  15. 15.
    Ogaki, K., Kitani, K., Sugano, Y., Sato, Y.: Coupling eye-motion and ego-motion features for first-person activity recognition. In: ECCV (2012)Google Scholar
  16. 16.
    Pentland, A.: Looking at people: sensing for ubiquitous and wearable computing. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1), 107–119 (2000)Google Scholar
  17. 17.
    Pirsiavash, H., Remanan, D.: Detecting activities of daily living in first-person camera views. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (2012)Google Scholar
  18. 18.
    Ren, X., Philipose, M.: Egocentric recognition of handled objects: benchmark and analysis. In: First Workshop on Egocentric Vision (2009)Google Scholar
  19. 19.
    Ren, X., Gu, C.: Figure-ground segmentation improves handled object recognition in egocentric video. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3137–3144, June 2010Google Scholar
  20. 20.
    Rother, C., Kolmogorov, V., Lempitsky, V., Szummer, M.: Optimizing binary mrfs via extended roof duality. In: CVPR (2007)Google Scholar
  21. 21.
    Santello, M., Soechtino, J.F.: Gradual molding of the hand to object contours. Journal of Physiology 79(3), 1307–1320 (1998)Google Scholar
  22. 22.
    Schettino, L.F., Adamovich, S.V., Poizner, H.: Effects of object shape and visual feedback on hand configuration during grasping. Experimental Brain Research 151, 158–166 (2003)CrossRefGoogle Scholar
  23. 23.
    Schiele, B., Oliver, N., Jebara, T., Pentland, A.: An Interactive Computer Vision System DyPERS: Dynamic Personal Enhanced Reality System. In: Christensen, H.I. (ed.) ICVS 1999. LNCS, vol. 1542, pp. 51–65. Springer, Heidelberg (1998) CrossRefGoogle Scholar
  24. 24.
    Simone, L.K., Sundarrajan, N., Luo, X., Jia, Y., Kamper, D.G.: A low cost instrumented glove for extended monitoring and functional hand assessment. Journal of Neuroscience Methods 160, 335–348 (2007)CrossRefGoogle Scholar
  25. 25.
    Spriggs, E., De La Torre, F., Hebert, M.: Temporal segmentation and activity classification from first-person sensing. In: First Workshop on Egocentric Vision (2009)Google Scholar
  26. 26.
    Tang, S., Wang, X., Lv, X., Han, T.X., Keller, J., He, Z., Skubic, M., Lao, S.: Histogram of Oriented Normal Vectors for Object Recognition with a Depth Sensor. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part II. LNCS, vol. 7725, pp. 525–538. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  27. 27.
    la Torre, F.D., Hodgins, J., Bargteil, A., Martin, X., Macey, J.: Guide to the carnegie mellon university multimodal activity (cmu-mmac) database. Carnegie Mellon University, Tech. rep. (2008)Google Scholar
  28. 28.
    Vishwanathan, S.V.N., Sun, Z., Theera-Ampornpunt, N., Varma, M.: Multiple kernel learning and the SMO algorithm. In: Advances in Neural Information Processing Systems, December 2010Google Scholar
  29. 29.
    Winges, S.A., Weber, D.J., Santello, M.: The role of vision on hand preshaping during reach to grasp. Experimental Brain Research 152, 489–498 (2003)CrossRefGoogle Scholar
  30. 30.
    Zhang, Z.: A Flexible New Technique for Camera Calibration. IEEE Trans. on Pattern Analysis and Machine Intelligence 22(11), 1330–1334 (2000)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Computer ScienceStevens Institute of TechnologyHobokenUSA

Personalised recommendations