Multimedia Tools and Applications

, Volume 49, Issue 1, pp 119–144 | Cite as

Everyday concept detection in visual lifelogs: validation, relationships and trends

  • Daragh ByrneEmail author
  • Aiden R. Doherty
  • Cees G. M. Snoek
  • Gareth J. F. Jones
  • Alan F. Smeaton


The Microsoft SenseCam is a small lightweight wearable camera used to passively capture photos and other sensor readings from a user’s day-to-day activities. It captures on average 3,000 images in a typical day, equating to almost 1 million images per year. It can be used to aid memory by creating a personal multimedia lifelog, or visual recording of the wearer’s life. However the sheer volume of image data captured within a visual lifelog creates a number of challenges, particularly for locating relevant content. Within this work, we explore the applicability of semantic concept detection, a method often used within video retrieval, on the domain of visual lifelogs. Our concept detector models the correspondence between low-level visual features and high-level semantic concepts (such as indoors, outdoors, people, buildings, etc.) using supervised machine learning. By doing so it determines the probability of a concept’s presence. We apply detection of 27 everyday semantic concepts on a lifelog collection composed of 257,518 SenseCam images from 5 users. The results were evaluated on a subset of 95,907 images, to determine the accuracy for detection of each semantic concept. We conducted further analysis on the temporal consistency, co-occurance and relationships within the detected concepts to more extensively investigate the robustness of the detectors within this domain.


Microsoft SenseCam Lifelog Passive photos Concept detection Supervised learning 



We are grateful to the AceMedia project and Microsoft Research for support. This work is supported by the Irish Research Council for Science Engineering and Technology, by Science Foundation Ireland under grant 07/CE/I1147 and by the EU IST-CHORUS project. We would also like to extend our thanks to the participants who made their personal lifelog collection available for these experiments, and who partook in the annotation effort.


  1. 1.
    Bell G, Gemmell J (2007) A digital life. Scientific American, New YorkGoogle Scholar
  2. 2.
    Bovik A, Clark M, Geisler W (1990) Multichannel texture analysis using localized spatial filters. IEEE Trans Pattern Anal Mach Intell 12(1):55–73CrossRefGoogle Scholar
  3. 3.
    Byrne D, Doherty AR, Snoek CG, Jones GG, Smeaton AF (2008) Validating the detection of everyday concepts in visual lifelogs. In: SAMT ’08: proceedings of the 3rd international conference on semantic and digital media technologies. Springer, Berlin, pp 15–30Google Scholar
  4. 4.
    Byrne D, Lavelle B, Doherty AR, Jones GJF, Smeaton AF (2007) Using bluetooth and GPS metadata to measure event similarity in sensecam images. In: IMAI’07 - 5th international conference on intelligent multimedia and ambient intelligence, Salt Lake City, pp 1454–1460Google Scholar
  5. 5.
    Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines.
  6. 6.
    Chang SF, He J, Jiang YG, Khoury EE, Ngo CW, Yanagawa A, Zavesky E (2008) Columbia University/VIREO-CityU/IRIT TRECVid2008 High-Level feature extraction and interactive video search. In: Proceedings of TRECVid workshop, Gaithersburg, 2008Google Scholar
  7. 7.
    DeVaul R (2001) Real-time motion classification for wearable computing applications. Tech. rep., Massachusetts Institute of Technology, MIT, CambridgeGoogle Scholar
  8. 8.
    Doherty A, Smeaton AF (2008) Combining face detection and novelty to identify important events in a visual lifelog. In: CIT 2008—IEEE international conference on computer and information technology, workshop on image- and video-based pattern analysis and applications, SydneyGoogle Scholar
  9. 9.
    Doherty AR, Byrne D, Smeaton AF, Jones GJF, Hughes M (2008) Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs. In: CIVR ’08: proceedings of the 2008 international conference on content-based image and video retrieval, Niagara Falls, Canada. ACM, New York, pp 259–268CrossRefGoogle Scholar
  10. 10.
    Doherty AR, Smeaton AF (2008) Automatically segmenting lifelog data into events. In: WIAMIS ’08: proceedings of the 2008 ninth international workshop on image analysis for multimedia interactive services, Klagenfurt, Germany. IEEE Computer Society, Washington, DC, pp 20–23Google Scholar
  11. 11.
    Fleiss J (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378–382CrossRefGoogle Scholar
  12. 12.
    Fuller M, Kelly L, Jones GJF (2008) Applying contextual memory cues for retrieval from personal information archives. In: PIM 2008 - proceedings of personal information management, workshop at CHI 2008Google Scholar
  13. 13.
    Geusebroek JM (2006) Compact object descriptors from local colour invariant histograms. In: British machine vision conference, vol 3, pp 1029–1038Google Scholar
  14. 14.
    Geusebroek JM, Smeulders AWM (2005) A six-stimulus theory for stochastic texture. Int J Comput Vis 62:7–16Google Scholar
  15. 15.
    Gurrin C, Smeaton AF, Byrne D, O’Hare N, Jones GJF, O’Connor NE (2008) An examination of a large visual lifelog. In: AIRS 2008—Asia information retrieval symposium, HarbinGoogle Scholar
  16. 16.
    Hauptmann A, Yan R, Lin WH (2007) How many high-level concepts will fill the semantic gap in news video retrieval? In: CIVR ’07: proceedings of the 6th ACM international conference on image and video retrieval. ACM, New York, pp 627–634Google Scholar
  17. 17.
    Hoang MA, Geusebroek JM, Smeulders AWM (2005) Color texture measurement and segmentation. Signal Process 85(2):265–275zbMATHCrossRefGoogle Scholar
  18. 18.
    Hodges S, Williams L, Berry E, Izadi S, Srinivasan J, Butler A, Smyth G, Kapur N, Wood K (2006) SenseCam: a retrospective memory aid. In: UbiComp - 8th international conference on ubiquitous computing, Calif., USAGoogle Scholar
  19. 19.
    Jiang YG, Ngo CW, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: CIVR ’07: proceedings of the 6th ACM international conference on image and video retrieval. ACM, New York, NY, USA, pp 494–501Google Scholar
  20. 20.
    Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: Computer vision, 2005. ICCV 2005. Tenth IEEE international conference on 1, 604–610, vol 1Google Scholar
  21. 21.
    Kapur J, Sahoo P, Wong A (1985) A new method for gray-level picture thresholding using the entropy of the histogram. Comput Vis Graph Image Process 29(3):273–285CrossRefGoogle Scholar
  22. 22.
    Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174zbMATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Lee H, Smeaton AF, O’Connor NE, Jones GJ (2006) Adaptive visual summary of lifelog photos for personal information management. In: AIR Workshop—1st international workshop on adaptive information retrieval, Glasgow, pp 22–23Google Scholar
  24. 24.
    Lin HT, Lin CJ, Weng R (2007) A note on Platt’s probabilistic outputs for support vector machines. Mach Learn 68(3):267–276CrossRefGoogle Scholar
  25. 25.
    Naphade H, Huang T (2001) A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE Trans Multimedia 3(1):141–151CrossRefGoogle Scholar
  26. 26.
    Naphade MR, Kennedy L, Kender JR, Chang SF, Smith JR, Over P, Hauptmann A (2005) A light scale concept ontology for multimedia understanding for TRECVid 2005. Tech. rep., In IBM Research Technical ReportGoogle Scholar
  27. 27.
    Natsev A, Jiangy W, Merlery M, Smith JR, Tesic J, Xie L, Yan R (2008) IBM research TRECVid-2008 video retrieval system. In: Proceedings of TRECVid workshop, 2008, GaithersburgGoogle Scholar
  28. 28.
    O’Hare N, Lee H, Cooray S, Gurrin C, Jones GJF, Malobabic J, O’Connor NE, Smeaton AF, Uscilowski B (2006) MediAssist: using content-based analysis and context to manage personal photo collections. In: CIVR2006 - 5th international conference on image and video retrieval. Springer, Tempe, pp 529–532Google Scholar
  29. 29.
    Smeaton A, Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: Proceedings of the 8th ACM international workshop on multimedia information retrieval. ACM, New York, pp 321–330CrossRefGoogle Scholar
  30. 30.
    Snoek CGM, Everts I, van Gemert JC, Geusebroek JM, Huurnink B, Koelma DC, van Liempt M, de Rooij O, van de Sande KEA, Smeulders AWM, Uijlings JRR, Worring M (2007) The MediaMill TRECVid 2007 semantic video search engine. In: Proceedings of TRECVid workshop, Gaithersburg, 2007Google Scholar
  31. 31.
    Snoek CGM, van Gemert JC, Gevers T, Huurnink B, Koelma DC, van Liempt M, de Rooij O, van de Sande KEA, Seinstra FJ, Smeulders AWM, Thean AHC, Veenman CJ, Worring M (2006) The MediaMill TRECVID 2006 semantic video search engine. In: Proceedings of the TRECVID workshop, GaithersburgGoogle Scholar
  32. 32.
    Snoek CGM, Worring M, van Gemert JC, Geusebroek JM, Smeulders AWM (2006) The challenge problem for automated detection of 101 semantic concepts in multimedia. In: MULTIMEDIA ’06: proceedings of the 14th annual ACM international conference on multimedia, Santa Barbara, CA, USA. ACM, New York, pp 421–430CrossRefGoogle Scholar
  33. 33.
    van Gemert JC, Snoek CGM, Veenman CJ, Smeulders AWM, Geusebroek JM (2009) Comparing compact codebooks for visual categorization. Comput Vis Image Underst. doi: 10.1016/j.cviu.2009.08.004 Google Scholar
  34. 34.
    Vapnik VN (2000) The nature of statistical learning theory, 2nd edn. Springer, New YorkzbMATHGoogle Scholar
  35. 35.
    Wang D, Liu X, Luo L, Li J, Zhang B (2007) Video diver: generic video indexing with diverse features. In: MIR ’07: proceedings of the 9th ACM international workshop on workshop on multimedia information retrieval, Augsburg, Germany. ACM, New York, pp 61–70CrossRefGoogle Scholar
  36. 36.
    Yanagawa A, Chang SF, Kennedy L, Hsu W (2007) Columbia University’s baseline detectors for 374 LSCOM semantic visual concepts. Tech. rep., Columbia UniversityGoogle Scholar
  37. 37.
    Yang J, Hauptmann AG (2006) Exploring temporal consistency for video analysis and retrieval. In: MIR ’06: proceedings of the 8th ACM international workshop on multimedia information retrieval, Santa Barbara, pp 33–42Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Daragh Byrne
    • 1
    Email author
  • Aiden R. Doherty
    • 1
  • Cees G. M. Snoek
    • 2
  • Gareth J. F. Jones
    • 3
  • Alan F. Smeaton
    • 1
  1. 1.CLARITY: Centre for Sensor Web TechnologiesDublin City UniversityDublin 9Ireland
  2. 2.Intelligent Systems Lab AmsterdamUniversity of AmsterdamAmsterdamThe Netherlands
  3. 3.Centre for Digital Video ProcessingDublin City UniversityGlasnevinIreland

Personalised recommendations