Embedding Deep Learning Models into Hypermedia Applications

  • Antonio José G. BussonEmail author
  • Álan Livio V. Guedes
  • Sérgio Colcher
  • Ruy Luiz Milidiú
  • Edward Hermann Haeusler


Deep learning research has allowed significant advances in several areas of multimedia, especially in tasks related to speech processing, hearing, and computational vision. Particularly, recent usage scenarios in hypermedia domain already use such deep learning tasks to build applications that are sensitive to its media content semantics. However, the development of such scenarios is usually done from scratch. In particular, current hypermedia standards such as HTML do not fully support such kind of development. To support such development, we propose that a hypermedia language should be extended to support: (1) describe learning using structured media datasets; (2) recognize content semantics of the media elements in presentation time; (3) use the recognized semantics elements as events in during the multimedia. To illustrate our approach, we extended the NCL language, and its model NCM, to support such features. NCL (Nested Context Language) is the declarative language for developing interactive applications for Brazilian Digital TV and an ITU-T Recommendation for IPTV services. As a result of the work, it is presented a usage scenario to highlight how the extended NCL supports the development of content-aware hypermedia presentations, attesting the expressiveness and applicability of the model.


  1. 1.
    Abreu, R., dos Santos, J.A.: Using abstract anchors to aid the development of multimedia applications with sensory effects. In: Proceedings of the 2017 ACM Symposium on Document Engineering, pp. 211–218. ACM, New York (2017)Google Scholar
  2. 2.
    Blakowski, G., Steinmetz, R.: A media synchronization survey: reference model, specification, and case studies. IEEE J. Sel. Areas Commun. 14(1), 5–35 (1996). CrossRefGoogle Scholar
  3. 3.
    Guedes, Á.L.V., de Albuquerque Azevedo, R.G., Barbosa, S.D.J.: Extending multimedia languages to support multimodal user interactions. Multimed. Tools Appl. 76(4), 5691–5720 (2017)CrossRefGoogle Scholar
  4. 4.
    Hershey, S., Chaudhuri, S., Ellis, D.P., Gemmeke, J.F., Jansen, A., Moore, R.C., Slaney, M.: CNN architectures for large-scale audio classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 131–135. IEEE, Piscataway (2017)Google Scholar
  5. 5.
    Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer, ChamCrossRefGoogle Scholar
  6. 6.
    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer, Berlin (2016)CrossRefGoogle Scholar
  7. 7.
    Miech, A., Laptev, I., Sivic, J.: Learnable pooling with context gating for video classification. arXiv:1706.06905 (2017, preprint)Google Scholar
  8. 8.
    Moreno, M.F., Batista, C.E.C.F., Soares, L.F.G.: NCL and ITU-T’s standardization effort on multimedia application frameworks for IPTV. In Workshop on Interactive Digital TV (WTVDI) - Brazilian Symposium on Multimedia and the Web - WebMedia 2010, At Belo Horizonte. ACM, New York (2010)Google Scholar
  9. 9.
    Moreno, M.F., Brandao, R., Cerqueira, R.: Extending hypermedia conceptual models to support hyperknowledge specifications. Int. J. Semantic Comput. 11(1), 43–64 (2017)CrossRefGoogle Scholar
  10. 10.
    Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv:1804.02767 (2018, preprint)Google Scholar
  11. 11.
    Sant’Anna, F., Cerqueira, R., Soares, L.F.G.: NCLua: objetos imperativos lua na linguagem declarativa NCL. In: Proceedings of the 14th Brazilian Symposium on Multimedia and the Web, pp. 83–90. ACM, New York (2008)Google Scholar
  12. 12.
    Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)Google Scholar
  13. 13.
    Soares, L.F.G., Rodrigues, R.F., de Resende Costa, R.M.: Nested context model 3.0. Monogr. Comput. Sci. PUC-Rio 18, 5–35 (2005)Google Scholar
  14. 14.
    Soares, L.F.G., Moreno, M.F., Neto, C.D. S.S., Moreno, M.F.: Ginga-NCL: declarative middleware for multimedia IPTV services. IEEE Commun. Mag. 48(6), 74–81 (2010)CrossRefGoogle Scholar
  15. 15.
    Steinmetz, R., Nahrstedt, K.: Documents, Hypertext, and Hypermedia. Multimedia Applications, pp. 87–132. Springer, Berlin (2004)Google Scholar
  16. 16.
    Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: a face detection benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5525–5533 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Antonio José G. Busson
    • 1
    Email author
  • Álan Livio V. Guedes
    • 1
  • Sérgio Colcher
    • 1
  • Ruy Luiz Milidiú
    • 1
  • Edward Hermann Haeusler
    • 1
  1. 1.Pontifical Catholic University of Rio de Janeiro (PUC-Rio)Rio de JaneiroBrazil

Personalised recommendations