Chapter

Information Reuse and Integration in Academia and Industry

pp 281-298

Date:

Integration of Semantics Information and Clustering in Binary-Class Classification for Handling Imbalanced Multimedia Data

  • Chao ChenAffiliated withDepartment of Electrical and Computer Engineering, University of Miami Email author 
  • , Mei-Ling ShyuAffiliated withDepartment of Electrical and Computer Engineering, University of Miami

* Final gross prices may vary according to local VAT.

Get Access

Abstract

It is well-acknowledged that the data imbalance issue is one of the major challenges in classification, i.e., when the ratio of the positive data instances to the negative data instances is very small, especially for multimedia data. One solution is to utilize the clustering technique in binary-class classification to partition the majority class (also called negative class) into several subsets, each of which merges with the minority class (also called positive class) to form a much more balanced subset of the original data set. However, one major drawback of clustering is its time-consuming process to construct each cluster. Due to the fact that there are rich semantics in multimedia data (such as video and image data), the utilization of video semantics (i.e., semantic concepts as class labels) to form negative subsets can (i) effectively construct several groups whose data instances are semantically related, and (ii) significantly reduce the number of data instances participating in the clustering step. Therefore, in this chapter, a novel binary-class classification framework that integrates the video semantics information and the clustering technique is proposed to address the data imbalance issue. Experiments are conducted to compare our proposed framework with other techniques that are commonly used to learn from imbalanced data sets. The experimental results on some highly imbalanced video data sets demonstrate that our proposed classification framework outperforms these comparative classification approaches about 3–16 %.