Multimedia Tools and Applications

, Volume 69, Issue 3, pp 743–771 | Cite as

Hierarchical Hidden Markov Model in detecting activities of daily living in wearable videos for studies of dementia

  • Svebor Karaman
  • Jenny Benois-PineauEmail author
  • Vladislavs Dovgalecs
  • Rémi Mégret
  • Julien Pinquier
  • Régine André-Obrecht
  • Yann Gaëstel
  • Jean-François Dartigues


This paper presents a method for indexing activities of daily living in videos acquired from wearable cameras. It addresses the problematic of analyzing the complex multimedia data acquired from wearable devices, which has been recently a growing concern due to the increasing amount of this kind of multimedia data. In the context of dementia diagnosis by doctors, patient activities are recorded in the environment of their home using a lightweight wearable device, to be later visualized by the medical practitioners. The recording mode poses great challenges since the video data consists in a single sequence shot where strong motion and sharp lighting changes often appear. Because of the length of the recordings, tools for an efficient navigation in terms of activities of interest are crucial. Our work introduces a video structuring approach that combines automatic motion based segmentation of the video and activity recognition by a hierarchical two-level Hidden Markov Model. We define a multi-modal description space over visual and audio features, including mid-level features such as motion, location, speech and noise detections. We show their complementarities globally as well as for specific activities. Experiments on real data obtained from the recording of several patients at home show the difficulty of the task and the promising results of the proposed approach.


Activities of daily living Wearable videos Video indexing Hidden Markov Model 



This work is partly supported by a grant from the ANR (Agence Nationale de la Recherche) with reference ANR-09-BLAN-0165-02, within the IMMED project.


  1. 1.
    Amieva H, Le Goff M, Millet X, Orgogozo J-M, Pérès K, Barberger-Gateau P, Jacqmin-Gadda H, Dartigues J-F (2008) Prodromal Alzheimer’s disease: successive emergence of the clinical symptoms. Ann Neurol 64(5):492–498CrossRefGoogle Scholar
  2. 2.
    André-Obrecht R (1988) A new statistical approach for automatic speech segmentation. IEEE Trans Audio Speech Signal Process 36(1):29–40CrossRefGoogle Scholar
  3. 3.
    Ballan L, Bertini M, Del Bimbo A, Seidenari L, Serra G (2011) Event detection and recognition for semantic annotation of video. Multimed Tool Appl 51(1):279–302CrossRefGoogle Scholar
  4. 4.
    Bay H, Tuytelaars T, Van Gool L (2008) SURF: speeded-up robust features. Comput Vis Image Understand 110(3):346–359CrossRefGoogle Scholar
  5. 5.
    Bengio Y, Delalleau O, Le Roux N, Paiement J-F, Vincent P, Ouimet M (2006) Spectral dimensionality reduction. Feature Extraction. Foundations and Applications, Springer, pp. 519–550Google Scholar
  6. 6.
    Benois-Pineau J, Kramer P (2005) Camera motion detection in the rough indexing paradigm. TREC VideoGoogle Scholar
  7. 7.
    Boreczky JS, Wilcox LD (1998) A Hidden Markov Model framework for video segmentation using audio and image features. Proc IEEE Int Conf Acoust Speech Signal Process 6:3741–3744Google Scholar
  8. 8.
    Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167CrossRefGoogle Scholar
  9. 9.
    Byrne D, Doherty AR, Jones GJF, Smeaton AF, Kumpulainen S, Järvelin K (2008) The SenseCam as a tool for task observation. In Proceedings of the 22nd British CHI Group Annual Conference on HCI 2008: People and Computers XXII: Culture, Creativity, Interaction-Volume 2, 19–22Google Scholar
  10. 10.
    Chatzis SP, Kosmopoulos DI, Varvarigou TA (2009) Robust sequential data modeling using an outlier tolerant hidden markov model. IEEE Trans Pattern Anal Mach Intell 31(9):1657–1669CrossRefGoogle Scholar
  11. 11.
    Delakis M, Gravier G, Gros P (2008) Audiovisual integration with Segment Models for tennis video parsing. Comput Vis Image Understand 111(2):142–154CrossRefGoogle Scholar
  12. 12.
    Doherty A, Caprani N, Óconaire C, Kalnikaite V, Gurrin C, Smeaton AF, O’Connor NE (2011) Passively recognising human activities through lifelogging. Comput Hum Behav 27(5):1948–1958CrossRefGoogle Scholar
  13. 13.
    First Workshop on Egocentric Vision, held in conjunction with CVPR (2009)Google Scholar
  14. 14.
    Fine S, Singer Y, Tishby N (1998) The Hierarchical Hidden Markov Model: analysis and applications. Mach Learn 32:41–62CrossRefzbMATHGoogle Scholar
  15. 15.
    GaëstelY, Onifade-Fagbemi C, Trophy F, Karaman S, Benois-Pineau J, Mégret R, Pinquier J, André-Obrecht R, Dartigues J-F (2011) Autonomy at home and early diagnosis in Alzheimer Disease: usefulness of video indexing applied to clinical issues. The IMMED Project. Alzheimer’s Association International Conference on Alzheimer’s Disease—AAICAD, 16–21 Juillet, FranceGoogle Scholar
  16. 16.
    Gales M, Young J (1993) The theory of segmental Hidden Markov Models. University of Cambridge, Department of EngineeringGoogle Scholar
  17. 17.
    Galliano S, Geofrois E, De Mosterfa, Bonastre JF, Gravier G (2005) The Ester phase II evaluation campaign for the rich transcription of the French broadcast news. EUROSPEECH, pp. 1149–1152Google Scholar
  18. 18.
    Gao Z, Chen M, Hauptmann A, Cai A (2010) Comparing evaluation protocols on the KTH dataset. International Conference on Human Behavior Understanding—HBU, LNCS volume 6219, pp. 88–100Google Scholar
  19. 19.
    Gorisse D, Precioso F, Gosselin P, Granjon L, Pellerin D, Rombaut M, Bredin H, Koenig L, Vieux R, Mansencal B, Benois-Pineau J, Boujut H, Morand C, Jégou H, Ayache S, Safadi B, Tong Y, Thollard F, Quénot GM, Cord M, Benoît A, Lambert P (2010) IRIM at TRECVID 2010: semantic indexing and instance search. Proc. TRECVID 2010 WorkshopGoogle Scholar
  20. 20.
    Guyot P, Pinquier J, André-Obrecht R (2012, June 27–29) Water flow detection from a wearable device with an new feature, the spectral cover. Submitted to CBMI’2012, IEEE Workshop, Annecy, FranceGoogle Scholar
  21. 21.
    Hamid R, Maddi S, Johnson A, Bobick A, Essa I, Isbell Ch (2009) A novel sequence representation for unsupervised analysis of human activities. Artif Intell 173:1221–1244CrossRefMathSciNetGoogle Scholar
  22. 22.
    Harte N, Lennon D, Kokaram A (2009) On parsing visual sequences with the hidden Markov model. EURASIP J Image Video Process, 2009:1–13Google Scholar
  23. 23.
    Hill M, Hua G, Natsev A, Smith JR, Xie L, Huang B, Merler M, Ouyang H, Zhou M (2010) IBM research TRECVID-2010 video copy detection and multimedia event detection system. Proc. TRECVID 2010 WorkshopGoogle Scholar
  24. 24.
    Hodges S, Williams L, Berry E, Izadi S, Srinivasan J, Butler A, Smyth G, Kapur N, Wood KR (2006) Sensecam: a retrospective memory aid. UBICOMP’2006, pp. 177–193Google Scholar
  25. 25.
  26. 26.
    Ivanov Y, Bobick A (2000) Recognition of visual activities and interactions by stochastic parsing. IEEE Trans Pattern Anal Mach Intell 22(8):852–872CrossRefGoogle Scholar
  27. 27.
    Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. Tenth IEEE International Conference on Computer Vision—ICCV, 1, pp. 604–610Google Scholar
  28. 28.
    Karaman S, Benois-Pineau J, Dartigues J-F, Gaëstel Y, Mégret R, Pinquier J (2011) Activities of daily living indexing by hierarchical HMM for dementia diagnostics. Content-Based Multimedia Indexing and retrieval—CBMI’2011. IEEE Workshop, 13–15 Juin, Madrid, EspagneGoogle Scholar
  29. 29.
    Kijak E, Gravier G, Gros P, Oisel L, Bimbot F (2003) HMM based structuring of tennis videos using visual and audio cues. ICME 3:309–312Google Scholar
  30. 30.
    Lan Z-Z, Bao L, Yu S-I, Liu W, Hauptmann AG (2012) Double fusion for multimedia event detection. International Conference on Multimedia Modeling (MMM’12), pp. 173–185Google Scholar
  31. 31.
    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE Conference on Computer Vision and Pattern Recognition—CVPR, 2, pp. 2169–2178Google Scholar
  32. 32.
    Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos ‘in the wild’. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1996–2003Google Scholar
  33. 33.
    Megret R, Szolgay D, Benois-Pineau J, Joly P, Pinquier J, Dartigues J-F, Helmer C (2008) Wearable video monitoring of people with age dementia: video indexing at the service of healthcare. International Workshop on Content-Based Multimedia Indexing - CBMI, Conference Proceedings, art. no. 4564934, pp. 101–108Google Scholar
  34. 34.
    Ostendorf M, Digalakis V, Kimball OA (1995) From HMMs to segment models: a unified view of stochastic modeling for speech recognition. IEEE Trans Speech Audio Process 4:360–378CrossRefGoogle Scholar
  35. 35.
    Piccardi L, Noris B, Barbey O, Billard A, Schiavone G, Keller F, von Hofsten C 2007 Wearcam: a head wireless camera for monitoring gaze attention and for the diagnosis of developmental disorders in young children. International Symposium on Robot & Human Interactive Communication, pp. 177–193Google Scholar
  36. 36.
    Pinquier J, André-Obrecht R (2006) Audio indexing: primary components retrieval—robust classification in audio documents. Multimed Tool Appl 30(3):313–330CrossRefGoogle Scholar
  37. 37.
    Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990CrossRefGoogle Scholar
  38. 38.
    Quenot G, Benois-Pineau J, Mansencal B, Rossi E et al (2008) Rushes summarization by IRIM consortium: redundancy removal and multi-feature fusion. VS’08 (Trec Video Summarization), Google Scholar
  39. 39.
    Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286CrossRefGoogle Scholar
  40. 40.
    Scholkopf B, Smola A, Muller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(6):1299–1319CrossRefGoogle Scholar
  41. 41.
    Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. Proceedings of the 17th International Conference on Pattern Recognition (ICPR’2004), pp. 32–36Google Scholar
  42. 42.
    Sikora T, Manjunath B, Salembier P (2002) Introduction to MPEG-7: multimedia content description interfaceGoogle Scholar
  43. 43.
    Spriggs EH, De La Torre F, Hebert M (2009) Temporal segmentation and activity classification from first-person sensing. In First Workshop on Egocentric Vision, pp. 17–24Google Scholar
  44. 44.
    Sundaram S, Mayol-Cuevas W (2009) High level activity recognition using low resolution wearable vision. In First Workshop on Egocentric Vision, pp. 25–32Google Scholar
  45. 45.
    Sundaram S, Mayol-Cuevas W (2010) Egocentric visual event classification with location-based priors. In International Symposium on Visual Computing, Lecture Notes in Computer Science volume 6454, pp. 596–605Google Scholar
  46. 46.
    Surie D, Pederson T, Lagriffoul F, Janlert L-E, Sjölie D (2007) Activity recognition using an egocentric perspective of everyday objects. Ubiquitous Intelligence and Computing. Springer, pp. 246–257Google Scholar
  47. 47.
    Young S, Evermann G et al (1997) The HTK bookGoogle Scholar
  48. 48.
    Young SJ, Young S (1994) The HTK hidden Markov model toolkit: design and philosophy. Entropic Cambridge Research Laboratory, LtdGoogle Scholar
  49. 49.
    Zouba N, Bremond F, Anfonso A, Thonnat M, Pascual E, Guerin O (2010 May) Monitoring elderly activities at home. Gerontechnology 9(2):263Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Svebor Karaman
    • 1
  • Jenny Benois-Pineau
    • 1
    Email author
  • Vladislavs Dovgalecs
    • 2
  • Rémi Mégret
    • 2
  • Julien Pinquier
    • 3
  • Régine André-Obrecht
    • 3
  • Yann Gaëstel
    • 4
  • Jean-François Dartigues
    • 4
  1. 1.LaBRI—University of BordeauxTalence CedexFrance
  2. 2.IMS—University of BordeauxTalenceFrance
  3. 3.IRIT—University of ToulouseToulouse Cedex 9France
  4. 4.INSERM U.897—University Victor Ségalen Bordeaux 2BordeauxFrance

Personalised recommendations