Skip to main content

SVM Based Approach to Text Description from Video Sceneries

  • 204 Accesses

Part of the Learning and Analytics in Intelligent Systems book series (LAIS,volume 15)


Human uses communication language either by written or spoken to describe visual world around them. The study of text description for any video goes increasing. This paper presents a system which produce English descriptions from the complex video samples. Here system produces text description from complex video, where it represents a framework that gives output as description for any long length video with multiple objects. This paper is broadly classified into two modules training and testing modules. Where the training module perform extracting of its unique features a with its description found in that video and is stored in database. In testing module consider the video sample which under goes frame extraction, preprocessing, segmentation, feature extraction and the extracted features are compared with features which are computed in training module then identify the video action, classify it and finally generate the text description using langauge model. The sentences are generated from objects for this assessment, a preferred database from youtube are accumulated in which 250 samples from 50 domain names. The performance of the system can be calculated and gives the accuracy of 90% with minimum processing time for object 2.


  • Text description
  • SVM classification
  • Video pre-processing and edge detection

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-46939-9_52
  • Chapter length: 8 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   149.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-46939-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   199.99
Price excludes VAT (USA)
Hardcover Book
USD   279.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. D. Francis, P. Pidou, B. Merialdo, B. Huet, Natural language access to video databases, in IEEE Third International Conference on Multimedia Big Data (2017)

    Google Scholar 

  2. S.C.S. Machado, C.A.B. Mello, Text segmentation in ancient topographic maps and floor plans with support vector data description, in International Conference on Joint Neural Networks (2015)

    Google Scholar 

  3. Á. García-Martín, R. Sánchez-Matilla, J.M. Martínez, Hierarchical detection of persons in groups. Signal Image Video Process. 11, 1181–1188 (2017)

    Google Scholar 

  4. G. Kulkarni, V. Premraj, V. Ordonez, S. Dhar, Siming Li, Y. Choi, A. C. Berg, T.L. Berg, BabyTalk: understanding and generating simple image descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 35(12) (2013)

    Google Scholar 

  5. N. Krishnamoorthy, G. Malkarnenkar, R. Mooney, K. Saenko, S. Guadarrama, Generating Natural-Language Video Descriptions Using Text-Mined Knowledge (2013)

    Google Scholar 

  6. Barbu, A. Bridge, A. Burchill, Z. Coroian, D. Dickinson, S. Fidler, S. Michaux, A. Mussman, S. Narayanaswamy, S. Salvi, et al., Video in sentences out, in Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 102–112 (2012)

    Google Scholar 

  7. C. Chang, C. Lin, LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)

    Google Scholar 

  8. Ding, D. Metze, F. Rawat, S. Schulam, P. Burger, S. Younessian, E. Bao, L. Christel, M. Hauptmann, Beyond audio and video retrieval: towards multimedia summarization, in Proceedings of the 2nd ACM International Conference on Multimedia Retrieval (2012)

    Google Scholar 

  9. M.U.G. Khan, Y. Gotoh, Describing video contents in natural language, in Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data (Association for Computational Linguistics, 2012), pp. 27–35

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Prasad Khot .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Kagalkar, R.M., Khot, P., Bhaumik, R., Potdar, S., Maruf, D. (2020). SVM Based Approach to Text Description from Video Sceneries. In: Jyothi, S., Mamatha, D., Satapathy, S., Raju, K., Favorskaya, M. (eds) Advances in Computational and Bio-Engineering. CBE 2019. Learning and Analytics in Intelligent Systems, vol 15. Springer, Cham.

Download citation