Multimedia Tools and Applications

, Volume 74, Issue 13, pp 4617–4639 | Cite as

Building 3D event logs for video investigation

  • Trung Kien DangEmail author
  • Marcel Worring
  • The Duy Bui


In scene investigation, creating a video log captured using a handheld camera is more convenient and more complete than taking photos and notes. By introducing video analysis and computer vision techniques, it is possible to build a spatio-temporal representation of the investigation. Such a representation gives a better overview than a set of photos and makes an investigation more accessible. We develop such methods and present an interface for navigating the result. The processing includes (i) segmenting a log into events using novel structure and motion features making the log easier to access in the time dimension, and (ii) mapping video frames to a 3D model of the scene so the log can be navigated in space. Our results show that, using our proposed features, we can recognize more than 70 percent of all frames correctly, and more importantly find all the events. From there we provide a method to semi-interactively map those events to a 3D model of the scene. With this we can map more than 80 percent of the events. The result is a 3D event log that captures the investigation and supports applications such as revisiting the scene, examining the investigation itself, or hypothesis testing.


Scene investigation Video analysis Story navigation 3D model 



We thank Jurrien Bijhold and the Netherlands Forensic Institute for providing the data and bringing in domain knowledge, and the police investigators for participating in the experiment. This work is supported by the Research Grant from Vietnam’s National Foundation for Science and Technology Development (NAFOSTED), No. 102.02-2011.13.


  1. 1.
    Abdollahian G, Taskiran CM, Pizlo Z, Delp EJ (2010) Camera motion-based analysis of user generated video. IEEE Trans Multimed 12(1):28–41CrossRefGoogle Scholar
  2. 2.
    Aizawa K (2005) Digitizing personal experiences: capture and retrieval of life log In: MMM ’05: Proceedings of the 11th international multimedia modelling conference, pp 10–15Google Scholar
  3. 3.
    Albiol A, Torrest L, Delpt EJ (2003) The indexing of persons in news sequences using audio-visual data In: IEEE international conference on acoustic, speech, and signal processingGoogle Scholar
  4. 4.
    Bijhold J, Ruifrok A, Jessen M, Geradts Z, Ehrhardt S, Alberink I (2007) Forensic audio and visual evidence 2004–2007: a review. 15th INTERPOL forensic science symposiumGoogle Scholar
  5. 5.
    Bush V (1945) As we may think. The atlanticGoogle Scholar
  6. 6.
    Dang TK, Worring M, Bui TD (2011) A semi-interactive panorama based 3D reconstruction framework for indoor scenes. Comp Vision Image Underst 115: 1516–1524CrossRefGoogle Scholar
  7. 7.
    Dickie C, Vertegaal R, Fono D, Sohn C, Chen D, Cheng D, Shell JS, Aoudeh O (2004) Augmenting and sharing memory with eyeblog In: CARPE’04: Proceedings of the the 1st ACM workshop on continuous archival and retrieval of personal experiences, pp 105–109Google Scholar
  8. 8.
    Doherty AR, Smeaton AF (2008) Automatically segmenting lifelog data into events In: WIAMIS ’08: Proceedings of the 2008 9th international workshop on image analysis for multimedia interactive services, pp 20–23Google Scholar
  9. 9.
    Doherty AR, Smeaton AF, Lee K, Ellis DPW (2007) Multimodal segmentation of lifelog data In: Proceedings of RIAO 2007. PittsburghGoogle Scholar
  10. 10.
    Gemmell J, Williams L, Wood K, Lueder R, Bell G (2004) Passive capture and ensuing issues for a personal lifetime store In: CARPE’04: Proceedings of the the 1st ACM workshop on continuous archival and retrieval of personal experiences, pp 48–55Google Scholar
  11. 11.
    Gibson S, Hubbold RJ, Cook J, Howard TLJ (2003) Interactive reconstruction of virtual environments from video sequences. Comput Graph 27(2):293–301CrossRefGoogle Scholar
  12. 12.
    Goldman DB, Gonterman C, Curless B, Salesin D, Seitz SM (2008) Video object annotation, navigation, and composition In: UIST ’08: Proceedings of the 21st annual ACM symposium on user interface software and technology, pp 3–12Google Scholar
  13. 13.
    Hartley R, Zisserman A (2004) Multiple view geometry in computer vision, 2nd edn. Cambridge University PressGoogle Scholar
  14. 14.
    Howard TLJ, Murta AD, Gibson S (2000) Virtual environments for scene of crime reconstruction and analysis In: SPIE – visual data exploration and analysis VII, vol 3960, pp 1–8Google Scholar
  15. 15.
    Kang HW, Shin SY (2002) Tour into the video: image-based navigation scheme for video sequences of dynamic scenes In: VRST ’02: Proceedings of the ACM symposium on virtual reality software and technology, pp 73–80Google Scholar
  16. 16.
    Kim K, Essa I, Abowd GD (2006) Interactive mosaic generation for video navigation In: MULTIMEDIA ’06: Proceedings of the 14th annual ACM international conference on multimedia, pp 655–658Google Scholar
  17. 17.
    Lan DJ, Ma YF, Zhang HJ (2003) A novel motion-based representation for video mining In: International conference on multimedia and expo, vol 3, pp 469–472Google Scholar
  18. 18.
    Lowe DG (1999) Object recognition from local scale-invariant features In: International conference on computer vision, vol 2, pp 1150–1157Google Scholar
  19. 19.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  20. 20.
    Ma YF, Lu L, Zhang HJ, Li M (2003) A user attention model for video summarization In: ACM multimedia, pp 533–542Google Scholar
  21. 21.
    Mei T, Hua XS, Zhou HQ, Li S (2007) Modeling and mining of users’ capture intention for home video. IEEE Trans Multimed 9(1)Google Scholar
  22. 22.
    Meur OL, Thoreau D, Callet PL, Barba D (2005) A spatial-temporal model of the selective human visual attention In: International conference on image processing, vol 3, pp 1188–1191Google Scholar
  23. 23.
    Ngo CW, Pong TC, Zhang H (2002) Motion-based video representation for scene change detection. Int J Comput Vis 50(2):127–142zbMATHCrossRefGoogle Scholar
  24. 24.
    Pollefeys M, Van Gool L, Vergauwen M, Verbiest F, Cornelis K, Tops J, Koch R (2004) Visual modeling with a hand-held camera. Int J Comput Vis 59:207–232CrossRefGoogle Scholar
  25. 25.
    Pollefeys M, Verbiest F, Van Gool L (2002) Surviving dominant planes in uncalibrated structure and motion recovery In: European conference on computer vision, pp 837–851Google Scholar
  26. 26.
    Robinson D, Milanfar P (2003) Fast local and global projection-based methods for affine motion estimation. J Math Imaging Vis 8(1):35–54MathSciNetCrossRefGoogle Scholar
  27. 27.
    Rui Y, Gupta A, Acero A (2000) Automatically extracting highlights for TV baseball program In: ACM multimedia, pp 105–115Google Scholar
  28. 28.
    Sinha SN, Steedly D, Szeliski R, Agrawala M, Pollefeys M (2008) Interactive 3D architectural modeling from unordered photo collections. ACM Trans Graph 27(5):159CrossRefGoogle Scholar
  29. 29.
    Sivic J, Zisserman A (2009) Efficient visual search of videos cast as text retrieval. IEEE Trans Pattern Anal Mach Intell 31(4):591–606CrossRefGoogle Scholar
  30. 30.
    Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3D. ACM Trans Graph 25(3):835–846CrossRefGoogle Scholar
  31. 31.
    Snavely N, Seitz SM, Szeliski R (2008) Modeling the world from internet photo collections. Int J Comput Vis 80(2):189–210CrossRefGoogle Scholar
  32. 32.
    Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 4(2):215–322Google Scholar
  33. 33.
    Tancharoen D, Yamasaki T, Aizawa K (2005) Practical experience recording and indexing of life log video In: CARPE ’05: Proceedings of the 2nd ACM workshop on continuous archival and retrieval of personal experiences, pp 61–66Google Scholar
  34. 34.
    Torr P, Fitzgibbon AW, Zisserman A (1999) The problem of degeneracy in structure and motion recovery from uncalibrated image sequences. Int. J. Comput. Vis. 32(1)Google Scholar
  35. 35.
    van den Hengel A, Dick A, Thormählen T, Ward B, Torr PHS (2007) VideoTrace: rapid interactive scene modelling from video. ACM Trans Graph 26(3):86CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Trung Kien Dang
    • 1
    • 2
    Email author
  • Marcel Worring
    • 1
  • The Duy Bui
    • 2
  1. 1.University of AmsterdamAmsterdamThe Netherlands
  2. 2.University of Engineering and Technology, Vietnam National University HanoiHanoiVietnam

Personalised recommendations