Abstract
In this paper, we present a new method for egocentric video temporal segmentation based on integrating a statistical mean change detector and agglomerative clustering(AC) within an energy-minimization framework. Given the tendency of most AC methods to oversegment video sequences when clustering their frames, we combine the clustering with a concept drift detection technique (ADWIN) that has rigorous guarantee of performances. ADWIN serves as a statistical upper bound for the clustering-based video segmentation. We integrate both techniques in an energy-minimization framework that serves to disambiguate the decision of both techniques and to complete the segmentation taking into account the temporal continuity of video frames descriptors. We present experiments over egocentric sets of more than 13.000 images acquired with different wearable cameras, showing that our method outperforms state-of-the-art clustering methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aghaei, M., Radeva, P.: Bag-of-tracklets for person tracking in life-logging data. In: CCIA 2014, pp. 35ā44, Barcelona, Spain, October 2014
Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: SDM, vol. 7. SIAM (2007)
BolaƱos, M., Garolera, M., Radeva, P.: Video segmentation of life-logging videos. In: Perales, F.J., Santos-Victor, J. (eds.) AMDO 2014. LNCS, vol. 8563, pp. 1ā9. Springer, Heidelberg (2014)
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222ā1239 (2001)
Doherty, A.R., Smeaton, A.F.: Automatically segmenting lifelog data into events. In: Proceedings of WIAMIS 2008, pp. 20ā23. IEEE Computer Society, Washington, DC (2008)
Drozdzal, M., Vitria, J., Segui, S., Malagelada, C., Azpiroz, F., Radeva, P.: Intestinal event segmentation for endoluminal video analysis. In: ICIP (2014)
Fukunaga, K., Hostetler, L.: The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inf. Theor. 21(1), 32ā40 (2006)
Goodfellow, I.J., Ibarz, J., Bulatov, Y., Arnoud, S., Shet, V.: Multi-digit Number Recognition from Street View Imagery Using Deep Convolutional Neural Networks. Google Inc., Mountain View (2014)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13ā30 (1963)
Jia, Y. : Caffe: An open source convolutional architecture for fast feature embedding (2013). http://caffe.berkeleyvision.org/
Jojic, N., Perina, A., Murino, V.: Structural epitome: a way to summarize oneās visual experience. In: NIPS, pp. 1027ā1035 (2010)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) NIPS 25, pp. 1097ā1105. Curran Associates Inc., Red Hook (2012)
Laganire, R., Bacco, R., Hocevar, A., Lambert, P., Pas, G., Ionescu, B.: Video summarization from spatio-temporal features. In: TVS, pp. 144ā148. ACM (2008)
Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: CVPR, pp. 1346ā1353. IEEE (2012)
Li, Z., Wei, Z., Jia, W., Sun, M.: Daily life event segmentation for lifestyle evaluation based on multi-sensor data recorded by a wearable device. In: EMBC 2013, pp. 2858ā2861. IEEE (2013)
Lin, W.-H., Hauptmann, A.: Structuring continuous video recording of everyday life using time-constrained clustering. Computer Science Department 959 (2006)
Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: CVPR, pp. 2714ā2721. IEEE (2013)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Le Cam, L.M., Neyman, J. (eds. ) Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281ā297 (1967)
Murtagh, F., Contreras, P.: Methods of hierarchical clustering. CoRR, abs/1105.0121 (2011)
Ngo, C.-W., Ma, Y.-F., Zhang, H..: Automatic video summarization by graph modeling. pages 104ā109. IEEE Computer Society (2003)
Poleg, Y., Arora, C., Peleg, S.: Temporal segmentation of egocentric videos. In: IEEE Conference On Computer Vision and Pattern Recognition (CVPR) (2014)
SenseCam. Sensecam overview (2013)
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Addison-Wesley Longman Publishing Co., Boston (2005)
Zheng, L., Wang, S., He, F., Tian, Q.: Seeing the big picture: Deep embedding with contextual evidences. CoRR, abs/1406.0132 (2014)
Acknowledgments
This work was partially founded by TIN2012-38187-C03-01 and SGR 1219.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Talavera, E., Dimiccoli, M., BolaƱos, M., Aghaei, M., Radeva, P. (2015). R-Clustering for Egocentric Video Segmentation. In: Paredes, R., Cardoso, J., Pardo, X. (eds) Pattern Recognition and Image Analysis. IbPRIA 2015. Lecture Notes in Computer Science(), vol 9117. Springer, Cham. https://doi.org/10.1007/978-3-319-19390-8_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-19390-8_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19389-2
Online ISBN: 978-3-319-19390-8
eBook Packages: Computer ScienceComputer Science (R0)