Using Detected Visual Objects to Index Video Database

  • Xingzhong Du
  • Hongzhi Yin
  • Zi Huang
  • Yi Yang
  • Xiaofang Zhou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9877)


In this paper, we focus on how to use visual objects to index the videos. Two tables are constructed for this purpose, namely the unique object table and the occurrence table. The former table stores the unique objects which appear in the videos, while the latter table stores the occurrence information of these unique objects in the videos. In previous works, these two tables are generated manually by a top-down process. That is, the unique object table is given by the experts at first, then the occurrence table is generated by the annotators according to the unique object table. Obviously, such process which heavily depends on human labors limits the scalability especially when the data are dynamic or large-scale. To improve this, we propose to perform a bottom-up process to generate these two tables. The novelties are: we use object detector instead of human annotation to create the occurrence table; we propose a hybrid method which consists of local merge, global merge and propagation to generate the unique object table and fix the occurrence table. In fact, there are another three candidate methods for implementing the bottom-up process, namely, recognizing-based, matching-based and tracking-based methods. Through analyzing their mechanism and evaluating their accuracy, we find that they are not suitable for the bottom-up process. The proposed hybrid method leverages the advantages of the matching-based and tracking-based methods. Our experiments show that the hybrid method is more accurate and efficient than the candidate methods, which indicates that it is more suitable for the proposed bottom-up process.


Index Visual object Video database 



This research was jointly supported by the ARC project (Grant No. DP150103008) and the ARC DECRA project (Grant No. DE160100308).


  1. 1.
    Adali, S., Candan, K.S., Chen, S., Erol, K., Subrahmanian, V.S.: The advanced video information system: data structures and query processing. MMS 4(4), 172–186 (1996)Google Scholar
  2. 2.
    Breitenstein, M.D., Reichlin, F., Leibe, B., Koller-Meier, E., Gool, L.J.V.: Robust tracking-by-detection using a detector confidence particle filter. In: ICCV, pp. 1515–1522 (2009)Google Scholar
  3. 3.
    Dönderler, M.E., Saykol, E., Arslan, U., Ulusoy, Ö., Güdükbay, U.: Bilvideo: design and implementation of a video database management system. MTA 27(1), 79–104 (2005)Google Scholar
  4. 4.
    Dönderler, M.E., Ulusoy, Ö., Güdükbay, U.: A rule-based video database system architecture. Inf. Sci. 143(1–4), 13–45 (2002)CrossRefzbMATHGoogle Scholar
  5. 5.
    Dönderler, M.E., Ulusoy, Ö., Güdükbay, U.: Rule-based spatiotemporal query processing for video databases. VLDB J. 13(1), 86–103 (2004)CrossRefGoogle Scholar
  6. 6.
    Flickner, M., Sawhney, H.S., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., Yanker, P.: Query by image and video content: the QBIC system. IEEE Comput. 28(9), 23–32 (1995)CrossRefGoogle Scholar
  7. 7.
    Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)Google Scholar
  8. 8.
    Hjelsvold, R., Midtstraum, R.: Modelling and querying video data. In: VLDB, pp. 686–694 (1994)Google Scholar
  9. 9.
    Hu, W., Xie, N., Li, L., Zeng, X., Maybank, S.J.: A survey on visual content-based video indexing and retrieval. SMC C 41(6), 797–819 (2011)Google Scholar
  10. 10.
    Huang, Z., Shen, H.T., Shao, J., Cui, B., Zhou, X.: Practical online near-duplicate subsequence detection for continuous video streams. TMM 12(5), 386–398 (2010)Google Scholar
  11. 11.
    Huang, Z., Shen, H.T., Shao, J., Zhou, X., Cui, B.: Bounded coordinate system indexing for real-time video clip search. TOIS 27(3), 17 (2009)CrossRefGoogle Scholar
  12. 12.
    Köprülü, M., Cicekli, N.K., Yazici, A.: Spatio-temporal querying in video databases. Inf. Sci. 160(1–4), 131–152 (2004)CrossRefGoogle Scholar
  13. 13.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)Google Scholar
  14. 14.
    Kuo, T.C.T., Chen, A.L.P.: Content-based query processing for video databases. TMM 2(1), 1–13 (2000)Google Scholar
  15. 15.
    Kuznetsova, A., Ju Hwang, S., Rosenhahn, B., Sigal, L.: Expanding object detector’s horizon: incremental learning framework for object detection in videos. In: CVPR, pp. 28–36 (2015)Google Scholar
  16. 16.
    Le, T.-L., Thonnat, M., Boucher, A., Brémond, F.: A query language combining object features and semantic events for surveillance video retrieval. In: Satoh, S., Nack, F., Etoh, M. (eds.) MMM 2008. LNCS, vol. 4903, pp. 307–317. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  17. 17.
    Oomoto, E., Tanaka, K.: OVID: design and implementation of a video-object database system. TKDE 5(4), 629–643 (1993)Google Scholar
  18. 18.
    Petkovic, M., Jonker, W.: Content-Based Video Retrieval - A Database Perspective, Multimedia systems and applications, vol. 25. Springer (2003)Google Scholar
  19. 19.
    Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)Google Scholar
  20. 20.
    Shen, H.T., Shao, J., Huang, Z., Zhou, X.: Effective and efficient query processing for video subsequence identification. TKDE 21(3), 321–334 (2009)Google Scholar
  21. 21.
    Szegedy, C., Toshev, A., Erhan, D.: Deep neural networks for object detection. In: NIPS, pp. 2553–2561 (2013)Google Scholar
  22. 22.
    Ulusoy, Ö., Güdükbay, U., Dönderler, M.E., Saykol, E., Alper, C.: Bilvideo video database management system. In: VLDB, pp. 1373–1376 (2004)Google Scholar
  23. 23.
    Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV, pp. 3551–3558 (2013)Google Scholar
  24. 24.
    Wu, Y., Lim, J., Yang, M.: Object tracking benchmark. PAMI 37(9), 1834–1848 (2015)CrossRefGoogle Scholar
  25. 25.
    Yang, Y., Huang, Z., Shen, H.T., Zhou, X.: Mining multi-tag association for image tagging. WWWJ 14(2), 133–156 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Xingzhong Du
    • 1
  • Hongzhi Yin
    • 1
  • Zi Huang
    • 1
  • Yi Yang
    • 2
  • Xiaofang Zhou
    • 1
  1. 1.The University of QueenslandBrisbaneAustralia
  2. 2.University of Technology SydneyUltimoAustralia

Personalised recommendations