Journal of Intelligent Information Systems

, Volume 36, Issue 1, pp 117–130

Using web sources for improving video categorization

  • José M. Perea-Ortega
  • Arturo Montejo-Ráez
  • M. Teresa Martín-Valdivia
  • L. Alfonso Ureña-López
Article

DOI: 10.1007/s10844-010-0123-6

Cite this article as:
Perea-Ortega, J.M., Montejo-Ráez, A., Martín-Valdivia, M.T. et al. J Intell Inf Syst (2011) 36: 117. doi:10.1007/s10844-010-0123-6

Abstract

In this paper, several experiments about video categorization using a supervised learning approach are presented. To this end, the VideoCLEF 2008 evaluation forum has been chosen as experimental framework. After an analysis of the VideoCLEF corpus, it was found that video transcriptions are not the best source of information in order to identify the thematic of video streams. Therefore, two web-based corpora have been generated in the aim of adding more informational sources by integrating documents from Wikipedia articles and Google searches. A number of supervised categorization experiments using the test data of VideoCLEF have been accomplished. Several machine learning algorithms have been proved to validate the effect of the corpus on the final results: Naïve Bayes, K-nearest-neighbors (KNN), Support Vectors Machine (SVM) and the j48 decision tree. The results obtained show that web can be a useful source of information for generating classification models for video data.

Keywords

Video categorizationSupervised learningAutomatic Speech Recognition transcriptions

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • José M. Perea-Ortega
    • 1
  • Arturo Montejo-Ráez
    • 1
  • M. Teresa Martín-Valdivia
    • 1
  • L. Alfonso Ureña-López
    • 1
  1. 1.SINAI Research Group, Computer Science DepartmentUniversity of JaénJaénSpain