Temporal Segmentation of Egocentric Videos to Highlight Personal Locations of Interest

  • Antonino FurnariEmail author
  • Giovanni Maria Farinella
  • Sebastiano Battiato
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9913)


With the increasing availability of wearable cameras, the acquisition of egocentric videos is becoming common in many scenarios. However, the absence of explicit structure in such videos (e.g., video chapters) makes their exploitation difficult. We propose to segment unstructured egocentric videos to highlight the presence of personal locations of interest specified by the end-user. Given the large variability of the visual content acquired by such devices, it is necessary to design explicit rejection mechanisms able to detect negatives (i.e., frames not related to any considered location) learning only from positive ones at training time. To challenge the problem, we collected a dataset of egocentric videos containing 10 personal locations of interest. We propose a method to segment egocentric videos performing discrimination among the personal locations of interest, rejection of negative frames, and enforcing temporal coherence between neighboring predictions.


First person vision Egocentric video Context-based analysis Aware computing Video segmentation 

Supplementary material

431902_1_En_34_MOESM1_ESM.pdf (570 kb)
Supplementary material 1 (pdf 570 KB)


  1. 1.
    White, M.D.: Police Officer Body-worn Cameras: Assessing the Evidence. Office of Community Oriented Policing Services, Washington (2014)Google Scholar
  2. 2.
    Lee, M.L., Dey, A.K.: Lifelogging memory appliance for people with episodic memory impairment. In: Proceedings of the 10th International Conference on Ubiquitous Computing, pp. 44–53 (2008)Google Scholar
  3. 3.
    Gurrin, C., Smeaton, A.F., Doherty, A.R.: Lifelogging: personal big data. Found. Trends Inf. Retr. 8(1), 1–125 (2014)CrossRefGoogle Scholar
  4. 4.
    Ortis, A., Farinella, G.M., D’Amico, V., Addesso, L., Torrisi, G., Battiato, S.: RECfusion: automatic video curation driven by visual content popularity. In: ACM Multimedia (2015)Google Scholar
  5. 5.
    Poleg, Y., Arora, C., Peleg, S.: Temporal segmentation of egocentric videos. In: Computer Vision and Pattern Recognition, pp. 2537–2544 (2014)Google Scholar
  6. 6.
    Aizawa, K., Ishijima, K., Shiina, M.: Summarizing wearable video. In: International Conference on Image Processing. vol. 3, pp. 398–401 (2001)Google Scholar
  7. 7.
    Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: Computer Vision and Pattern Recognition. pp. 2714–2721 (2013)Google Scholar
  8. 8.
    Xu, J., Mukherjee, L., Li, Y., Warner, J., Rehg, J.M., Singh, V.: Gaze-enabled egocentric video summarization via constrained submodular maximization. In: IEEE Conference on Computer Vision and Pattern Recognition pp. 2235–2244 (2015)Google Scholar
  9. 9.
    Kitani, K.M., Okabe, T., Sato, Y., Sugimoto, A.: Fast unsupervised ego-action learning for first-person sports videos. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3241–3248 (2011)Google Scholar
  10. 10.
    Fathi, A., Farhadi, A., Rehg, J.M.: Understanding egocentric activities. In: IEEE International Conference on Computer Vision, pp. 407–414 (2011)Google Scholar
  11. 11.
    Ryoo, M.S., Rothrock, B., Matthies, L.: Pooled motion features for first-person videos (2014). arXiv preprint arXiv:1412.6505
  12. 12.
    Li, Y., Ye, Z., Rehg, J.M.: Delving into egocentric actions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 287–295 (2015)Google Scholar
  13. 13.
    Spriggs, E.H., De La Torre, F., Hebert, M.: Temporal segmentation and activity classification from first-person sensing. In: Computer Vision and Pattern Recognition Workshops, pp. 17–24 (2009)Google Scholar
  14. 14.
    Poleg, Y., Ephrat, A., Peleg, S., Arora, C.: Compact CNN for indexing egocentric videos (2015). arXiv preprint arXiv:1504.07469
  15. 15.
    Furnari, A., Farinella, G.M., Battiato, S.: Recognizing personal contexts from egocentric images. In: Workshop on Assistive Computer Vision and Robotics (ACVR) in Conjunction with the IEEE International Conference on Computer Vision (2015)Google Scholar
  16. 16.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)CrossRefzbMATHGoogle Scholar
  17. 17.
    Starner, T., Schiele, B., Pentland, A.: Visual contextual awareness in wearable computing. In: International Symposium on Wearable Computing, pp. 50–57 (1998)Google Scholar
  18. 18.
    Aoki, H., Schiele, B., Pentland, A.: Recognizing personal location from video. In: Workshop on Perceptual User Interfaces, pp.79–82 (1998)Google Scholar
  19. 19.
    Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: International Conference on Computer Vision (2003)Google Scholar
  20. 20.
    Starner, T., Weaver, J., Pentland, A.: Real-time american sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1371–1375 (1998)CrossRefGoogle Scholar
  21. 21.
    Antifakos, S., Kern, N., Schiele, B., Schwaninge, A.: Towards improving trust in context-aware systems by displaying system confidence. In: International Conference on Human Computer Interaction with Mobile Devices & Services (MobileHCI) (2005)Google Scholar
  22. 22.
    Fathi, A., Li, Y., Rehg, J.M.: Learning to recognize daily actions using gaze. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 314–327. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  23. 23.
    Castro, D., Hickson, S., Bettadapura, V., Thomaz, E., Abowd, G., Christensen, H., Essa, I.: Predicting daily activities from egocentric images using deep learning. In: International Symposium on Wearable Computing (2015)Google Scholar
  24. 24.
    Damen, D., Leelasawassuk, T., Haines, O., Calway, A., Mayol-Cuevas, W.: You-do, i-learn: discovering task relevant objects and their modes of interaction from multi-user egocentric video. In: British Machine Vision Conference (2014)Google Scholar
  25. 25.
    Farinella, G.M., Ravì, D., Tomaselli, V., Guarnera, M., Battiato, S.: Representing scenes for real-time context classification on mobile devices. Pattern Recogn. 48(4), 1086–1100 (2015)CrossRefGoogle Scholar
  26. 26.
    Templeman, R., Korayem, M., Crandall, D., Apu, K.: PlaceAvoider: steering first-person cameras away from sensitive spaces. In: Annual Network and Distributed System Security Symposium, pp. 23–26 (2014)Google Scholar
  27. 27.
    Furnari, A., Farinella, G.M., Puglisi, G., Bruna, A.R., Battiato, S.: Affine region detectors on the fisheye domain. In: International Conference on Image Processing, pp. 5681–5685 (2014)Google Scholar
  28. 28.
    Furnari, A., Farinella, G.M., Bruna, A.R., Battiato, S.: Generalized sobel filters for gradient estimation of distorted images. In: International Conference on Image Processing (2015)Google Scholar
  29. 29.
    Gal, Y., Ghahramani, Z.: Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (2015). arXiv preprint arXiv:1506.02142
  30. 30.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)zbMATHGoogle Scholar
  31. 31.
    Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (2014)Google Scholar
  32. 32.
    Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems, pp. 487–495 (2014)Google Scholar
  33. 33.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R.B., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM Multimedia, vol. 2, p. 4(2014)Google Scholar
  34. 34.
    Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)CrossRefGoogle Scholar
  35. 35.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Torr, P., Zisserman, A.: MLESAC: a new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78(1), 138–156 (2000)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Antonino Furnari
    • 1
    Email author
  • Giovanni Maria Farinella
    • 1
  • Sebastiano Battiato
    • 1
  1. 1.Department of Mathematics and Computer ScienceUniversity of CataniaCataniaItaly

Personalised recommendations