Ontology-Based Structured Video Annotation for Content-Based Video Retrieval via Spatiotemporal Reasoning

Chapter
Part of the Intelligent Systems Reference Library book series (ISRL, volume 145)

Abstract

The constantly increasing popularity and ubiquity of videos urges efficient automated mechanisms for processing video contents, which is a big challenge due to the huge gap between what software agents can obtain from signal processing and what humans can comprehend based on cognition, knowledge, and experience. Automatically extracted low-level video features typically do not correspond to concepts, persons, and events depicted in videos. To narrow the Semantic Gap, the depicted concepts and their spatial relations can be described in a machine-interpretable form using formal definitions from structured data resources. Rule-based mechanisms are efficient in describing the temporal information of actions and video events.

References

  1. 1.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: 7th IEEE International Conference on Computer Vision, Kerkyra, September 1999, vol. 2, pp. 1150–1157. IEEE, New York (1999).  https://doi.org/10.1109/ICCV.1999.790410
  2. 2.
    Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Beijing, October 2005, pp. 65–72. IEEE, New York (2005).  https://doi.org/10.1109/VSPETS.2005.1570899
  3. 3.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, June 2005, vol. 1, pp. 886–893. IEEE Computer Society, Washington (2005).  https://doi.org/10.1109/CVPR.2005.177
  4. 4.
    Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) Computer Vision—ECCV 2006. 9th European Conference on Computer Vision, Graz, May 2006. Lecture Notes in Computer Science, vol. 3952, pp. 428–441. Springer, Heidelberg (2006).  https://doi.org/10.1007/11744047_33
  5. 5.
    Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image. Und. 110(3), 346–359 (2008).  https://doi.org/10.1016/j.cviu.2007.09.014
  6. 6.
    Xu, F., Zhang, Y-J.: Evaluation and comparison of texture descriptors proposed in MPEG-7. J. Vis. Commun. Image Rep. 17(4), 701–716 (2006).  https://doi.org/10.1016/j.jvcir.2005.10.002
  7. 7.
    Yang, N.-C., Chang, W.-H., Kuo, C.-M., Li, T.-H.: A fast MPEG-7 dominant color extraction with new similarity measure for image retrieval. J. Vis. Commun. Image Rep. 19(2), 92–105 (2008).  https://doi.org/10.1016/j.jvcir.2007.05.003
  8. 8.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, Dec 8–14, 2001, pp. 511–518 (2001).  https://doi.org/10.1109/CVPR.2001.990517
  9. 9.
    Lienhart, R., Maydt, J.: An extended set of Haar-like features for rapid object detection. In: International Conference on Image Processing, Rochester, September 2002, pp. 900–903 (2002).  https://doi.org/10.1109/ICIP.2002.1038171
  10. 10.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004).  https://doi.org/10.1023/B:VISI.0000029664.99615.94
  11. 11.
    Khedher, M.I., El Yacoubi, M.A.: Local sparse representation based interest point matching for person re-identification. In: Arik, S., Huang, T., Lai, W.K., Liu, Q. (eds.) Neural Information Processing. 22nd International Conference on Neural Information Processing, Turkey, November 2015. Lecture Notes in Computer Science, vol. 9491, pp. 241–250. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-26555-1_28
  12. 12.
    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: 2011 IEEE International Conference on Computer Vision, Barcelona, Nov 6–13, 2011, pp. 2564–2571 (2011).  https://doi.org/10.1109/ICCV.2011.6126544
  13. 13.
    Sikos, L.F.: Description logics in multimedia reasoning. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-54066-5
  14. 14.
    Boll, S., Klas, W., Sheth, A.: Overview on using metadata to manage multimedia data. In: Sheth, A., Klas, W. (eds.) Multimedia Data Management: Using Metadata to Integrate and Apply Digital Media, p. 3. McGraw-Hill, New York (1998)Google Scholar
  15. 15.
    Duong, T.H., Nguyen, N.T., Truong, H.B., Nguyen, V.H.: A collaborative algorithm for semantic video annotation using a consensus-based social network analysis. Expert. Syst. Appl. 42(1), 246–258 (2015).  https://doi.org/10.1016/j.eswa.2014.07.046
  16. 16.
    Ballan, L., Bertini, M., Del Bimbo, A., Seidenari, L., Serra, G.: Event detection and recognition for semantic annotation of video. Multimed. Tools Appl. 51(1), 279–302 (2011).  https://doi.org/10.1007/s11042-010-0643-7
  17. 17.
    Gómez-Romero, J., Patricio, M.A., García, J., Molina, J.M.: Ontology-based context representation and reasoning for object tracking and scene interpretation in video. Expert. Syst. Appl. 38, 7494–7510 (2010).  https://doi.org/10.1016/j.eswa.2010.12.118
  18. 18.
    Poppe, C., Martens, G., De Potter, P., Van de Walle, R.: Semantic web technologies for video surveillance metadata. Multimed. Tools Appl. 56(3), 439–467 (2012).  https://doi.org/10.1007/s11042-010-0600-5
  19. 19.
    Bohlken, W., Neumann, B., Hotz, L., Koopmann, P.: Ontology-based realtime activity monitoring using beam search. In: Crowley, J.L., Draper, B.A., Thonnat, M. (eds.) Computer Vision Systems. ICVS 2011. Lecture Notes in Computer Science, vol. 6962, pp. 112–121. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-23968-7_12
  20. 20.
    Wu, Z., Yao, T., Fu, Y., Jiang, Y.-G.: Deep learning for video classification and captioning (2016). arXiv:1609.06782
  21. 21.
    Herrera, J.L., del-Blanco, C.R., Garcıa, N.: Improved 2D-to-3D video conversion by fusing optical flow analysis and scene depth learning. In: 3DTV-Conference: The True Vision—Capture, Transmission and Display of 3D Video, Hamburg, June 2016. IEEE, New York (2016).  https://doi.org/10.1109/3DTV.2016.7548954
  22. 22.
    Sikos, L.F.: A novel ontology for 3D semantics: ontology-based 3D model indexing and content-based video retrieval applied to the medical domain. Int. J. Metadata Semant. Ontol. 12(1), 59–70 (2017).  https://doi.org/10.1504/IJMSO.2017.10008658
  23. 23.
    Gruber, T.R.: Towards principles for the design of ontologies used for knowledge sharing. In: Guarino, N., Poli, R. (eds.) Formal Ontology in Conceptual Analysis and Knowledge Representation. Kluwer Academic Publishers, Deventer (1993)Google Scholar
  24. 24.
    Perperis, T., Giannakopoulos, T., Makris, A., Kosmopoulos, D.I., Tsekeridou, S., Perantonis, S.J., Theodoridis, S.: Multimodal and ontology-based fusion approaches of audio and visual processing for violence detection in movies. Expert Syst. Appl. 38(11), 14102–14116 (2011).  https://doi.org/10.1016/j.eswa.2011.04.219
  25. 25.
    Rodríguez-García, M.Á., Colombo-Mendoza, L.O., Valencia-García, R., Lopez-Lorca, A.A., Beydoun, G.: Ontology-based music recommender system. In: Omatu, S., Malluhi, Q.M., Gonzalez, S.R., Bocewicz, G., Bucciarelli, E., Giulioni, G., Iqba, F. (eds.) 12th International Conference on Distributed Computing and Artificial Intelligence, Salamanca, June 2015. Advances in Intelligent Systems and Computing, vol. 373, pp. 39–46. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-19638-1_5
  26. 26.
    Sikos, L.F.: A novel approach to multimedia ontology engineering for automated reasoning over audiovisual LOD datasets. In: Nguy\(\tilde{\hat{\rm e}}\)n, N.T., Trawiński, B., Fujita, H., Hong, T.-P. (eds.) Intelligent Information and Database Systems. 8th Asian Conference on Intelligent Information and Database Systems, Đà N\(\tilde{\breve{\rm a}}\)ng, March 2016. Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence), vol. 9621, pp. 3–12. Springer, Heidelberg (2016).  https://doi.org/10.1007/978-3-662-49381-6_1
  27. 27.
    Davis, S., Burnett, I., Ritz, C.: Using social networking and collections to enable video semantics acquisition. IEEE MultiMedia PP(99).  https://doi.org/10.1109/MMUL.2009.72
  28. 28.
    Bertini, M., Del Bimbo, A., Torniai, C.: Automatic annotation and semantic retrieval of video sequences using multimedia ontologies. In: MM 2006 Proceedings of the 14th ACM International Conference on Multimedia, Santa Barbara, October 2006, pp. 679–682. ACM, New York (2006)Google Scholar
  29. 29.
    Gómez-Romero, J., García, J., Patricio, M.A., Serrano, M.A., Molina, J.M.: Context-based situation recognition in computer vision systems. In: Gómez-Romero, J., García, J., Patricio, M.A., Serrano, M.A., Molina, J.M. (eds.) Context-enhanced Information Fusion. Advances in Computer Vision and Pattern Recognition, pp. 627–651. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-28971-7_23
  30. 30.
    Sikos, L.F.: Mastering Structured Data on the Semantic Web: From HTML5 Microdata to Linked Open Data. Apress, New York (2015). https://doi.org/10.1007/978-1-4842-1049-9_1
  31. 31.
    Krötzsch, M., Simančík, F., Horrocks, I.: A description logic primer (2013). arXiv:1201.4089v3
  32. 32.
    Sikos, L.F.: Web Standards: Mastering HTML5, CSS3, and XML, 2nd ed. Apress, New York (2014).  https://doi.org/10.1007/978-1-4842-0883-0
  33. 33.
    Isaac, A., Troncy, R.: Designing and using an audio-visual description core ontology. Paper presented at the Workshop on Core Ontologies in Ontology Engineering, Northamptonshire, 8 (2004). (Oct)Google Scholar
  34. 34.
    Hunter, J.: Adding multimedia to the Semantic Web—building an MPEG-7 ontology. Presented at the 1st International Semantic Web Working Symposium, Stanford University, Stanford, 29 July–1 Aug 2001Google Scholar
  35. 35.
    Tsinaraki, C., Polydoros, P., Moumoutzis, N., Christodoulakis, S.: Integration of OWL ontologies in MPEG-7 and TV-Anytime compliant semantic indexing. In: Persson, A., Stirna, J. (eds.) Advanced Information Systems Engineering. 16th International Conference on Advanced Information Systems Engineering, Riga, June 2004. Lecture Notes in Computer Science, vol. 3084, pp. 398–413. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-25975-6_29
  36. 36.
    García, R., Celma, O.: Semantic integration and retrieval of multimedia metadata. Paper presented at the 5th International Workshop on Knowledge Markup and Semantic Annotation, Galway, 7 Nov 2005Google Scholar
  37. 37.
    Blöhdorn, S., Petridis, K., Saathoff, C., Simou, N., Tzouvaras, V., Avrithis, Y., Handschuh, S., Kompatsiaris, Y., Staab, S., Strintzis, M.: Semantic annotation of images and videos for multimedia analysis. In: Gómez-Pérez, A., Euzenat, J. (eds.) The Semantic Web: Research and Applications. Second European Semantic Web Conference, Heraklion, May–June 2005. Lecture Notes in Computer Science, vol. 3532, pp. 592–607. Springer, Heidelberg (2005).  https://doi.org/10.1007/11431053_40
  38. 38.
    Athanasiadis, T., Tzouvaras, V., Petridis, K., Precioso, F., Avrithis, Y., Kompatsiaris, Y.: Using a multimedia ontology infrastructure for semantic annotation of multimedia content. In: Paper presented at the 5th International Workshop on Knowledge Markup and Semantic Annotation, Galway, 7 Nov 2005Google Scholar
  39. 39.
    Oberle, D., Ankolekar, A., Hitzler, P., Cimiano, P., Sintek, M., Kiesel, M., Mougouie, B., Baumann, S., Vembu, S., Romanelli, M.: DOLCE ergo SUMO: on foundational and domain models in the SmartWeb integrated ontology (SWIntO). J. Web Semant. Sci. Serv. Agents World Wide Web 5(3), 156–174 (2007).  https://doi.org/10.1016/j.websem.2007.06.002
  40. 40.
    Dasiopoulou, S., Tzouvaras, V., Kompatsiaris, I., Strintzis, M.: Capturing MPEG-7 semantics. In: Sicilia, M.-A., Lytras, M.D. (eds.) Metadata and Semantics, pp. 113–122. Springer, New York (2009)CrossRefGoogle Scholar
  41. 41.
    Arndt, R., Troncy, R., Staab, S., Hardman, L.: COMM: a core ontology for multimedia annotation. In: Staab, S., Studer, R. (eds.) Handbook on Ontologies, pp. 403–421, Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-540-92673-3_18
  42. 42.
    Sikos, L.F.: 3D model indexing in videos for content-based retrieval via X3D-based semantic enrichment and automated reasoning. In: 22nd International Conference on 3D Web Technology, Brisbane, June 2017. ACM, New York (2017).  https://doi.org/10.1145/3055624.3075943
  43. 43.
    Naphade, M., Smith, J.R., Tesic, J., Chang, S.-F., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J.: Large-scale concept ontology for multimedia. IEEE MultiMedia 13(3), 86–91 (2006).  https://doi.org/10.1109/MMUL.2006.63
  44. 44.
    Zha, Z.-J., Mei, T., Zheng, Y.-T., Wang, Z., Hua, X.-S.: A comprehensive representation scheme for video semantic ontology and its applications in semantic concept detection. Neurocomputing 95, 29–39 (2012).  https://doi.org/10.1016/j.neucom.2011.05.044
  45. 45.
    Hogenboom, F., Borgman, B., Frasincar, F., Kaymak, U.: Spatial knowledge representation on the Semantic Web. In: 2010 IEEE Fourth International Conference on Semantic Computing (2010).  https://doi.org/10.1109/ICSC.2010.31
  46. 46.
    Yildirim, Y., Yazici, A., Yilmaz, T.: Automatic semantic content extraction in videos using a fuzzy ontology and rule-based model. IEEE Trans. Knowl. Data Eng. 25(1), 47–61 (2013).  https://doi.org/10.1109/TKDE.2011.189
  47. 47.
    Sikos, L.F.: Spatiotemporal Reasoning for Complex Video Event Recognition in Content-Based Video Retrieval. In: Hassanien, A.E., Shaalan, K., Gaber, T., Tolba, M. (eds.) 3rd International Conference on Advanced Intelligent Systems and Informatics, Cairo, September 2017. Advances in Intelligent Systems and Computing, vol. 639, pp. 704–713. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-64861-3_66
  48. 48.
    Sikos, L.F., Powers, D.M.W.: Knowledge-driven video information retrieval with LOD: from semi-structured to structured video metadata. In: 8th Workshop on Exploiting Semantic Annotations in Information Retrieval, Melbourne, October 2015. pp. 35–37. ACM, New York (2015).  https://doi.org/10.1145/2810133.2810141
  49. 49.
    Saatho, C., Scherp, A.: M3O: The multimedia metadata ontology. Presented at the 10th International Workshop of the Multimedia Metadata Community on Semantic Multimedia Database Technologies, Graz, 2 Dec 2009Google Scholar
  50. 50.
    Horvat, M., Bogunović, N., Ćosić, K.: STIMONT: a core ontology for multimedia stimuli description. Multimed. Tools Appl. 73(3), 1103–1127 (2014).  https://doi.org/10.1007/s11042-013-1624-4
  51. 51.
    Sikos, L.F.: Advanced (X)HTML5 metadata and semantics for Web 3.0 videos. DESIDOC J. Library Inf. Technol. 31(4), 247–252 (2011).  https://doi.org/10.14429/djlit.31.4.1105
  52. 52.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked Data—the story so far. Semant. Web Inform. Syst. 5(3), 1–22 (2009).  https://doi.org/10.4018/jswis.2009081901
  53. 53.
    Choudhury, S., Breslin, J.G., Passant, A.: Enrichment and ranking of the YouTube tag space and integration with the Linked Data Cloud. In: The Semantic Web—ISWC 2009. 8th International Semantic Web Conference, Chantilly, October 2009. Lecture notes in computer science, vol. 5823, pp. 747–762. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-04930-9_47
  54. 54.
    Sikos, L.F.: RDF-powered semantic video annotation tools with concept mapping to Linked Data for next-generation video indexing. Multimed. Tools Appl. 76(12), 14437–14460 (2016).  https://doi.org/10.1007/s11042-016-3705-7
  55. 55.
    Jiang, Y.-G., Bhattacharya, S., Chang, S.-F., Shah, M.: High-level event recognition in unconstrained videos. Int. J. Multimed. Inf. Retrieval 2(2), 73–101 (2013).  https://doi.org/10.1007/s13735-012-0024-2
  56. 56.
    Elleuch, N., Zarka, M., Ammar, A.B., Alimi, A.M.: A fuzzy ontology-based framework for reasoning in visual video content analysis and indexing. In: Eleventh International Workshop on Multimedia Data Mining, San Diego, Aug 21–24, 2011, Article 1 (2011).  https://doi.org/10.1145/2237827.2237828
  57. 57.
    Allen, J.F.: Maintaining knowledge about temporal intervals. Commun. ACM 26(11), 832–843 (1983).  https://doi.org/10.1145/182.358434
  58. 58.
    Bai, L., Lao, S., Zhang, W., Jones, G.J.F., Smeaton, A.F.: Video semantic content analysis framework based on ontology combined MPEG-7. In: Boujemaa, N., Detyniecki, M., Nürnberger, A. (eds.) Adaptive Multimedia Retrieval: Retrieval, User, and Semantics. 5th International Workshop on Adaptive Multimedia Retrieval, Paris, July 2007. Lecture Notes in Computer Science, vol. 4918, pp. 237–250. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-79860-6_19
  59. 59.
    Bertini, M., Del Bimbo, A., Serra, G.: Video event annotation using ontologies with temporal reasoning. In: Proceeding of the 2nd DELOS Conference, Padova, January 2008, pp. 13–23 (2008)Google Scholar
  60. 60.
    Stoilos, G., Stamou, G., Pan, J.Z.: Fuzzy extensions of OWL: logical properties and reduction to fuzzy description logics. Int. J. Approximate Reasoning 51(6), 656–679 (2010).  https://doi.org/10.1016/j.ijar.2010.01.005
  61. 61.
    Zarka, M., Ammar, A.B., Alimi, A.M.: Fuzzy reasoning framework to improve semantic video interpretation. Multimed. Tools Appl. 75(10), 5719–5750 (2015).  https://doi.org/10.1007/s11042-015-2537-1
  62. 62.
    Waitelonis, J., Sack, H.: Towards exploratory video search using Linked Data. Multimed. Tools Appl. 59(2), 645–672 (2012).  https://doi.org/10.1007/s11042-011-0733-1
  63. 63.
    Lee, M.-H., Rho, S., Choi, E.-I.: Ontology-based user query interpretation for semantic multimedia contents retrieval. Multimed. Tools Appl. 73(2), 901–915 (2014).  https://doi.org/10.1007/s11042-013-1383-2
  64. 64.
    Ballan, L., Bertini, M., Del Bimbo, A., Serra, G.: Semantic annotation of soccer videos by visual instance clustering and spatial/temporal reasoning in ontologies. Multimed. Tools Appl. 48(2), 313–337 (2010).  https://doi.org/10.1007/s11042-009-0342-4
  65. 65.
    Münzer, B., Schoeffmann, K., Böszörményi, L.: Content-based processing and analysis of endoscopic images and videos: a survey. Multimed Tools Appl. (2017).  https://doi.org/10.1007/s11042-016-4219-z
  66. 66.
    Nixon, L., Bauer, M., Bara, C., Kurz, T., Pereira, J.: ConnectME: semantic tools for enriching online video with web content. In: Proceedings of the 8th International Conference on Semantic Systems, Graz, Austria (2012)Google Scholar
  67. 67.
    Grassi, M., Morbidoni, C., Nucci, M.: A collaborative video annotation system based on semantic web technologies. Cogn. Comput. 4(4), 497–514 (2012).  https://doi.org/10.1007/s12559-012-9172-1

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Flinders UniversityAdelaideAustralia

Personalised recommendations