Microservices Architecture for Content-Based Indexing of Video Shots

  • Remigiusz BaranEmail author
  • Pavol Partila
  • Rafał Wilk
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 833)


Three different content-based video indexing microservices dedicated to index video shots for the needs of the IMCOP Content Discovery Platform are presented in the paper. These three services as well as numerous others cooperate with each other within the IMCOP platform to describe, enrich and relate the multimedia data regarding their audio, textual and visual content. Owing to the analysis they perform, the IMCOP platform can discover, recommend and deliver the personalized multimedia content to various IMCOP’s prospective recipients.

As these recipients may also require the personalized video content, services, as e.g. the presented ones, designed respectively to discriminate between characters in videos as well as text- and speech-based indexing of video shots, are absolutely essential. Goals of these services, their approaches and how they comply with objectives of the IMCOP’s microservices architecture are carefully presented in the paper. Research procedures and the results of examinations that have been carried out to verify their pretty high accuracies are also reported and discussed.


Video indexing Text detection and recognition Speech recognition Face recognition IMCOP platform 


  1. 1.
    Baran, R., Dziech, A., Zeja, A.: A capable multimedia content discovery platform based on visual content analysis and intelligent data enrichment. Multimed. Tools Appl., 1–15 (2017).
  2. 2.
    Wolff, E.: Microservices: Flexible Software Architectures. Addison-Wesley, Boston (2016)Google Scholar
  3. 3.
    Baran, R., Zeja, A.: The IMCOP system for data enrichment and content discovery and delivery. In: Proceedings of the 2015 International Conference on Computational Science and Computational Intelligence, Las Vegas, USA, pp. 143–146 (2015)Google Scholar
  4. 4.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)CrossRefGoogle Scholar
  5. 5.
    Bloehdorn, S., et al.: Semantic annotation of images and videos for multimedia analysis. In: Gómez-Pérez, A., Euzenat J. (eds.) The Semantic Web: Research and Applications. ESWC 2005. LNCS, vol. 3532, pp. 592–607. Springer, Heidelberg (2005)Google Scholar
  6. 6.
    Budnik, M., et al.: Learned features versus engineered features for semantic video indexing. In: 13th International Workshop on Content-Based Multimedia Indexing, Prague, pp. 1–6 (2015)Google Scholar
  7. 7.
    Leszczuk, M., Grega, M.: Prototype software for video summary of bronchoscopy procedures with the use of mechanisms designed to identify, index and search. In: Piȩtka, E., Kawa, J. (eds.) Information Technologies in Biomedicine. Advances in Intelligent and Soft Computing, vol. 69, pp. 587–598. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  8. 8.
    Grega, M., et al.: Multimed. Tools Appl. 68(1), 95–110 (2014)Google Scholar
  9. 9.
    Zhang, H.J., Wu, J., Zhong, D., Smoliar, S.W.: An integrated system for content-based video retrieval and browsing. Pattern Recognit. 30(4), 643–658 (1997)Google Scholar
  10. 10.
    Leszczuk, M., et al.: Video summarization framework for newscasts and reports – work in progress. In: Dziech, A., Czyżewski, A. (eds.) MCSS 2017, CCIS, vol. 785, pp. 86–97. Springer, Cham (2017)Google Scholar
  11. 11.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 International Conference on Computer Vision and Pattern Recognition (CVPR), Kauai, HI, USA, vol. 1, pp. 511–518. IEEE (2001)Google Scholar
  12. 12.
    Baran, R., et al.: Face recognition for movie character and actor discrimination based on similarity scores. In: Proceedings of the 2016 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 1333–1338. IEEE, Las Vegas (2016)Google Scholar
  13. 13.
    Rublee, E., et al.: ORB: an efficient alternative to SIFT or SURF. In: 13th International Conference on Computer Vision (ICCV), pp. 2564–2571. IEEE, Barcelona (2011)Google Scholar
  14. 14.
  15. 15.
    Chen, S.S., et al.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: Proceedings of the 18th International Conference on Image Processing, Brussels, pp. 2609–2612. IEEE (2011)Google Scholar
  16. 16.
    Baran, R., Partila, P., Wilk, R.: Automated text detection and character recognition in natural scenes based on local image features and contour processing techniques. In: Karwowski, W., Ahram, T. (eds.) IHSI 2018, AISC, vol. 722, pp. 42–48. Springer, Cham (2018)Google Scholar
  17. 17.
    Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Proceedings of the 2012 International Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, pp. 3538–3545. IEEE (2012)Google Scholar
  18. 18.
    Povey, D., Ghoshal, A., Boulianne, G., et al.: The Kaldi speech recognition toolkit. In: Proceedings of the Workshop on Automatic Speech Recognition and Understanding. IEEE, Big Island (2011)Google Scholar
  19. 19.
    O’Shaughnesssy, D.: Invited paper: automatic speech recognition: history, methods and challenges. Pattern Recognit. 41(10), 2965–2979 (2008)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer Science, Electronics and Electrical EngineeringKielce University of TechnologyKielcePoland
  2. 2.Department of TelecommunicationsVSB-Technical University of OstravaOstravaCzech Republic
  3. 3.Department of TeleinformaticsUniversity of Computer Engineering and TelecommunicationsKielcePoland

Personalised recommendations