Multimedia Tools and Applications

, Volume 75, Issue 13, pp 7831–7854 | Cite as

SmartCamera: a low-cost and intelligent camera management system

  • Amin Roudaki
  • Jun KongEmail author
  • Shane Reetz


Intelligent camera management systems were developed to automatically record meetings for videoconferencing. These systems provided many benefits, such as reducing the production cost and conveniently documenting events. However, automatically recorded videos in general were not visually engaging. This paper presents a novel approach that intelligently controls camera shots and angles to improve the visual interest. We use 3D infrared images captured by a Kinect sensor to recognize active speakers and their positions in a meeting. A movable camera, constructed by placing a wireless PTZ (pan-tilt-zoom) camera on top of a motorized rail, can automatically move its position to frame an active speaker in the center of the screen. Without interrupting the meeting, a speaker can seamlessly switch video sources through gesture-based commands. We have summarized and implemented a set of heuristic rules to simulate a human director. These rules can be visually edited through a graphical user interface. The customization of a virtual director makes our system applicable in various scenarios. We conducted a user study, and the evaluation results justified the quality of an automated video.


Automatic camera management Video conferencing 3D camera 



We thank the volunteer participants in this investigation. The authors would like to thank the anonymous reviewers for their insightful and constructive comments that helped to significantly improve the presentation. This work is in part supported by NSF under grant CNS-1126570.


  1. 1.
    Basili VR, Caldiera G, Rombach HD (1994) The goal question metric approach, technical report, department of computer science, University of Maryland,
  2. 2.
    Bianchi M (1998) AutoAuditorium: a fully automatic, multi-camera system to televise auditorium presentation, In Proc. Joint DARPA/NIST smart spaces technology workshopGoogle Scholar
  3. 3.
    Brandstein M, Ward D (2001) Microphone arrays: signal processing techniques and applications. Springer VerlagGoogle Scholar
  4. 4.
    Cutler R, Rui Y, Gupta A, Cadiz J, Tashev I, He I, Colburn A, Zhang Z, Liu Z, Silverberg S (2002) Distributed meetings: a meeting capture and broadcasting system. ACM, Proc. Multimedia, pp 503–512Google Scholar
  5. 5.
    Foote J, Kimber D (2000) FlyCam: practical panoramic video. Proc. MULTIMEDIA. ACM, 487–488Google Scholar
  6. 6.
    Gadanac D, Ericsson Nikola Tesla d. d., Zagreb, Croatia, Dujak M, Tomic D, Jercic D (2014) Kinect-based presenter tracking prototype for videoconferencing Proc. MIPRO, 485–490Google Scholar
  7. 7.
    Heck R, Wallick M, Gleicher M (2007) Virtual videography. ACM Trans Multimedia Comput Commun Appl vol. 3(1)Google Scholar
  8. 8.
    Howell AJ, Buxton H (2002) Visually mediated interaction using learnt gestures and camera control. HCI 2002. Springer-Verlag. 272–284Google Scholar
  9. 9.
    Inoue T, Okada K, Matsushita Y (1995) Learning from TV programs: application of TV presentation to a videoconferencing system. Proc. UIST 1995, ACM Press 147–154Google Scholar
  10. 10.
    Jones A, Lang A, Fyffe G, Yu X, Busch J, McDowall I, Bolas M, Debevec P (2009) Achieving eye contact in a one-to-many 3D video teleconferencing system. ACM Trans Graph 28 (3), Article 64Google Scholar
  11. 11.
    Kuney J (1990) Take one: television directors on directing. Praeger PublishersGoogle Scholar
  12. 12.
    Lee D, Erol B, Graham J, Hull J, Murata N (2002) Portable meeting recorder. ACM, Proc. MULTIMEDIA, pp 493–502Google Scholar
  13. 13.
    Liu Q, Rui Y, Gupta A, Cadiz JJ (2001) Automating camera management for lecture room environments. In Proc. CHI 2001. ACM, 442–449Google Scholar
  14. 14.
    Liu Q, Kimber D, Foote J, Wilcox L, Boreczky J (2002) FlySPEC: a multi-user video camera system with hybrid human and automatic control. Proc. Multimedia 2002. ACM, 484–492Google Scholar
  15. 15.
    Motlicek P, Duffner S, Korchagin D, Bourlard H, Scheffler C, Odobez JM, Galdo G, Kallinger M, Thiergart O (2013) Real-time audio-visual analysis for multiperson videoconferencing. Advances in Multimedia (2013), Volume, Article ID 175745Google Scholar
  16. 16.
    Mukhopadhyay S, Smith B (1999) Passive capture and structuring of lectures. Proc Multimedia 99:477–487Google Scholar
  17. 17.
    Nagai T (2009) Automated lecture recording system with AVCHD camcorder and microserver, Proc. SIGUCCS, 47–54Google Scholar
  18. 18.
    Nickel K, Gehrig T, Stiefelhagen R, McDonough R (2005) A joint particle filter for audio-visual speaker tracking. Proc. ICMI 2005. ACM, 61–68Google Scholar
  19. 19.
    Norris J, Schnadelbach H, Qiu G (2012) CamBlend: an object focused collaboration tool. Proc CHI 12:627–636Google Scholar
  20. 20.
    Poltrock SE, Engelbeck G (1997) Requirements for a virtual collocation environment. In ACM GROUP, 61–70Google Scholar
  21. 21.
    Ranjan A, Birnholtz JP, Balakrishnan R (2006) An exploratory analysis of partner action and camera control in a video-mediated collaborative task. Proc. ACM CSCW 403–412Google Scholar
  22. 22.
    Ranjan A, Birnholtz JP, Balakrishnan R (2008) Improving meeting capture by applying television production principles with audio and motion detection. Proc. CHI 2008, ACM 227–236Google Scholar
  23. 23.
    Ranjan A, Henrikson R, Birnholtz J, Balakrishnan R, Lee D (2010) Automatic camera control using unobtrusive vision and audio tracking. Proc. Graphics Interface 2010. ACM 47–54Google Scholar
  24. 24.
    Ronzhin AL, Prischepa M, Karpov A (2010) A video monitoring model with a distributed camera system for the smart space. Proc. ruSMART/NEW2AN′10, Springer-Verlag, 102–110Google Scholar
  25. 25.
    Rubin AM (2002) The uses-and-gratifications perspective of media effects. Media Effects: Advances in theory and persuasion, 525–548Google Scholar
  26. 26.
    Rui Y, Gupta A, Cadiz JJ (2001) Viewing meeting captured by an omni-directional Camera. Proc. CHI 2001, ACM 450–457Google Scholar
  27. 27.
    Rui Y, Gupta A, Grudin J (2003) Videography for telepresentations. Proc. CHI 2003, ACM, 457–464Google Scholar
  28. 28.
    Song MS, Zhang C, Florencio D, Kang HG (2011) An Interactive 3-D audio system with loudspeakers. IEEE Trans Multimedia 13(5):844–855CrossRefGoogle Scholar
  29. 29.
    Suau X, Ruiz-Hidalgo J, Casas JR (2012) Real-time head and hand tracking based on 2.5D data. IEEE Trans Multimedia 14(3):575–585CrossRefGoogle Scholar
  30. 30.
    Takahashi M, Fujii M, Naemura M, Satoh S (2013) Human gesture recognition system for TV viewing using time-of-flight camera. Multimedia Tools Appl 62:761–783CrossRefGoogle Scholar
  31. 31.
    Tang JC, Marlow J, Hoff A, Roseway A, Inkpen K, Zhao C, Cao X (2012) Time travel proxy: Using Lightweight Video Recordings to Create Asynchronous, Interactive Meetings. Proc. CHI, 3111–3120Google Scholar
  32. 32.
    Wang F, Ngo CW, Pong TC (2007) Lecture video enhancement and editing by integrating posture, gesture, and text. IEEE Trans Multimedia 9(2):397–409CrossRefGoogle Scholar
  33. 33.
    Wang F, Ngo CW, Pong TC (2008) Simulating a smartboard by real-time gesture detection in lecture videos. IEEE Trans Multimedia 10(5):926–935CrossRefGoogle Scholar
  34. 34.
    Williamson B, LaViola J, Roberts T, Garrity P (2012) Multi-kinect tracking for dismounted Soldier training. Proc. Interservice/industry training, simulation, and education conference, 1727–1735Google Scholar
  35. 35.
    Yu Z, Nakamura Y (2010) Smart meeting systems: a survey of state-of-the-art and open issues. ACM Comput Surv Vol. 42, No. 2, Article 8Google Scholar
  36. 36.
    Zhang JR (2012) Upper body gestures in lecture videos: indexing and correlating to pedagogical Significance. Proc. MM, 1389–1392Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Computer ScienceNorth Dakota State UniversityFargoUSA

Personalised recommendations