Abstract
This paper presents a novel approach for real-time egocentric activity recognition in which component atomic events are characterised in terms of binary relationships between parts of the body and manipulated objects. The key contribution is to summarise, within a histogram, the relationships that hold over a fixed time interval. This histogram is then classified into one of a number of atomic events. The relationships encode both the types of body parts and objects involved (e.g. wrist, hammer) together with a quantised representation of their distance apart and the normalised rate of change in this distance. The quantisation and classifier are both configured in a prior learning phase from training data. An activity is represented by a Markov model over atomic events. We show the application of the method in the prediction of the next atomic event within a manual procedure (e.g. assembling a simple device) and the detection of deviations from an expected procedure. This could be used for example in training operators in the use or servicing of a piece of equipment, or the assembly of a device from components. We evaluate our approach (’Bag-of-Relations’) on two datasets: ‘labelling and packaging bottles’ and ‘hammering nails and driving screws’, and show superior performance to existing Bag-of-Features methods that work with histograms derived from image features [1]. Finally, we show that the combination of data from vision and inertial (IMU) sensors outperforms either modality alone.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)
Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding 104, 90–126 (2006)
Turaga, P.K., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: A survey. IEEE Trans. Circuits Syst. Video Techn. 18, 1473–1488 (2008)
Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: A review. ACM Comput. Surv. 43, 1–16 (2011)
Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: ICPR, pp. 32–36 (2004)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: ICCV, pp. 1395–1402 (2005)
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: A large video database for human motion recognition. In: ICCV, pp. 2556–2563 (2011)
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: CVPR, pp. 1996–2003 (2009)
Gupta, A., Davis, L.S.: Objects in action: An approach for combining action understanding and object perception. In: CVPR (2007)
Fathi, A., Ren, X., Rehg, J.M.: Learning to recognize objects in egocentric activities. In: CVPR, pp. 3281–3288 (2011)
Kitani, K.M., Okabe, T., Sato, Y., Sugimoto, A.: Fast unsupervised ego-action learning for first-person sports videos. In: CVPR, pp. 3241–3248 (2011)
Fathi, A., Farhadi, A., Rehg, J.M.: Understanding egocentric activities. In: ICCV, pp. 407–414 (2011)
Aghazadeh, O., Sullivan, J., Carlsson, S.: Novelty detection from an ego-centric perspective. In: CVPR, pp. 3297–3304 (2011)
Wanstall, B.: HUD on the Head for Combat Pilots. Interavia 44, 334–338 (1989)
Damen, D., Bunnun, P., Calway, A., Mayol-Cuevas, W.: Real-time learning and detection of 3d texture-less objects: A scalable approach. In: BMVC (2012)
Pinhanez, C., Bobick, A.: Human action detection using pnf propagation of temporal constraints. In: Proc. of IEEE CVPR (1998)
Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: ICCV, pp. 1593–1600 (2009)
Sridhar, M., Cohn, A.G., Hogg, D.C.: Unsupervised learning of event classes from video. In: AAAI (2010)
Bleser, G., Hendeby, G., Miezal, M.: Using egocentric vision to achieve robust inertial body tracking under magnetic disturbances. In: ISMAR, pp. 103–109 (2011)
Reiss, A., Hendeby, G., Bleser, G., Stricker, D.: Activity Recognition Using Biomechanical Model Based Pose Estimation. In: Lukowicz, P., Kunze, K., Kortuem, G. (eds.) EuroSSC 2010. LNCS, vol. 6446, pp. 42–55. Springer, Heidelberg (2010)
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23, 257–267 (2001)
Efros, A.A., Berg, A.C., Berg, E.C., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV, pp. 726–733 (2003)
Ryoo, M.S.: Human activity prediction: Early recognition of ongoing activities from streaming videos. In: ICCV, pp. 1036–1043 (2011)
Lan, T., Wang, Y., Yang, W., Mori, G.: Beyond actions: Discriminative models for contextual group activities. In: NIPS, pp. 1216–1224 (2010)
Shi, Y., Huang, Y., Minnen, D., Bobick, A., Essa, I.: Propagation networks for recognition of partially ordered sequential action. In: CVPR, pp. 862–869 (2004)
Veres, G., Grabner, H., Middleton, L., Van Gool, L.: Automatic Workflow Monitoring in Industrial Environments. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part I. LNCS, vol. 6492, pp. 200–213. Springer, Heidelberg (2011)
Behera, A., Cohn, A.G., Hogg, D.C.: Workflow Activity Monitoring Using Dynamics of Pair-Wise Qualitative Spatial Relations. In: Schoeffmann, K., Merialdo, B., Hauptmann, A.G., Ngo, C.-W., Andreopoulos, Y., Breiteneder, C. (eds.) MMM 2012. LNCS, vol. 7131, pp. 196–209. Springer, Heidelberg (2012)
Worgan, S.F., Behera, A., Cohn, A.G., Hogg, D.C.: Exploiting petrinet structure for activity classification and user instruction within an industrial setting. In: ICMI, pp. 113–120 (2011)
Starner, T., Pentland, A.: Real-time American sign language recognition from video using hidden Markov models. In: Proc. of Int’l Symposium on Computer Vision, pp. 265–270 (1995)
Ward, J.A., Lukowicz, P., Troster, G., Starner, T.E.: Activity recognition of assembly tasks using body-worn microphones and accelerometers. IEEE Trans. PAMI 28, 1553–1567 (2006)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001)
Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: CVPR, pp. 3539–3546 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Behera, A., Hogg, D.C., Cohn, A.G. (2013). Egocentric Activity Monitoring and Recovery. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds) Computer Vision – ACCV 2012. ACCV 2012. Lecture Notes in Computer Science, vol 7726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37431-9_40
Download citation
DOI: https://doi.org/10.1007/978-3-642-37431-9_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37430-2
Online ISBN: 978-3-642-37431-9
eBook Packages: Computer ScienceComputer Science (R0)