, Volume 36, Issue 1, pp 117-130
Date: 23 Apr 2010

Using web sources for improving video categorization

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access


In this paper, several experiments about video categorization using a supervised learning approach are presented. To this end, the VideoCLEF 2008 evaluation forum has been chosen as experimental framework. After an analysis of the VideoCLEF corpus, it was found that video transcriptions are not the best source of information in order to identify the thematic of video streams. Therefore, two web-based corpora have been generated in the aim of adding more informational sources by integrating documents from Wikipedia articles and Google searches. A number of supervised categorization experiments using the test data of VideoCLEF have been accomplished. Several machine learning algorithms have been proved to validate the effect of the corpus on the final results: Naïve Bayes, K-nearest-neighbors (KNN), Support Vectors Machine (SVM) and the j48 decision tree. The results obtained show that web can be a useful source of information for generating classification models for video data.