Advertisement

International Journal on Digital Libraries

, Volume 3, Issue 3, pp 261–275 | Cite as

Text-based approaches for non-topical image categorization

  • Carl L. Sable
  • Vasileios Hatzivassiloglou

Abstract.

The rapid expansion of multimedia digital collections brings to the fore the need for classifying not only text documents but their embedded non-textual parts as well. We propose a model for basing classification of multimedia on broad, non-topical features, and show how information on targeted nearby pieces of text can be used to effectively classify photographs on a first such feature, distinguishing between indoor and outdoor images. We examine several variations to a TF*IDF-based approach for this task, empirically analyze their effects, and evaluate our system on a large collection of images from current news newsgroups. In addition, we investigate alternative classification and evaluation methods, and the effects that secondary features have on indoor/outdoor classification. Using density estimation over the raw TF*IDF values, we obtain a classification accuracy of 82%, a number that outperforms baseline estimates and earlier, image-based approaches, at least in the domain of news articles, and that nears the accuracy of humans who perform the same task with access to comparable information.

Key words: Image categorization – High-level image features – Text similarity features – Probabilistic TF*IDF – Evaluation in the presence of uncertainty 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag 2000

Authors and Affiliations

  • Carl L. Sable
    • 1
  • Vasileios Hatzivassiloglou
    • 1
  1. 1.Department of Computer Science, 450 Computer Science Building, Columbia University, 1214 Amsterdam Avenue, New York, NY 10027, USA; E-mail: {sable,vh}@cs.columbia.eduUS

Personalised recommendations