Chapter

Data Mining: Foundations and Practice

Volume 118 of the series Studies in Computational Intelligence pp 539-562

Using Association Rules for Classification from Databases Having Class Label Ambiguities: A Belief Theoretic Method

  • S. P. SubasinghaAffiliated withDepartment of Electrical and Computer Engineering, University of Miami
  • , J. ZhangAffiliated withHemispheric Center for Environmental Technology (HCET), Florida International University
  • , K. PremaratneAffiliated withDepartment of Electrical and Computer Engineering, University of Miami
  • , M. -L. ShyuAffiliated withDepartment of Electrical and Computer Engineering, University of Miami
  • , M. KubatAffiliated withDepartment of Electrical and Computer Engineering, University of Miami
  • , K. K. R. G. K. HewawasamAffiliated withDepartment of Electrical and Computer Engineering, University of Miami

* Final gross prices may vary according to local VAT.

Get Access

Summary

This chapter introduces a belief theoretic method for classification from databases having class label ambiguities. It uses a set of association rules extracted from such a database. It is assumed that a training data set with an adequate number of pre-classified instances, where each instance is assigned with an integer class label, is available. We use a modified association rule mining (ARM) technique to extract the interesting rules from the training data set and use a belief theoretic classifier based on the extracted rules to classify the incoming feature vectors. The ambiguity modelling capability of belief theory enables our classifier to perform better in the presence of class label ambiguities. It can also address the issue of the training data set being unbalanced or highly skewed by ensuring that an approximately equal number of rules are generated for each class. All these capabilities make our classifier ideally suited for those applications where (1) different experts may have conflicting opinions about the class label to be assigned to a specific training data instance; and (2) the majority of the training data instances are likely to represent a few classes giving rise to highly skewed databases. Therefore, the proposed classifier would be extremely useful in security monitoring and threat classification environments where conflicting expert opinions about the threat level are common and only a few training data instances would be considered to pose a heightened threat level. Several experiments are conducted to evaluate our proposed classifier. These experiments use several databases from the UCI data repository and data sets collected from the airport terminal simulation platform developed at the Distributed Decision Environments (DDE) Laboratory at the Department of Electrical and Computer Engineering, University of Miami. The experimental results show that, while the proposed classifier’s performance is comparable to some existing classifiers when the databases have no class label ambiguities, it provides superior classification accuracy and better efficiency when class label ambiguities are present.