Skip to main content
Log in

Enhancement of kernel dependency estimation with information generalization and a case study on skewed data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Kernel dependency estimation (KDE) is a learning framework of finding the dependencies between two general classes of objects. Although it has been successfully used for many types of applications, its properties are not fully studied. In this paper, we discuss two practical issues with KDE. The first one is its real-value output for each label, which differ from the desired binary value for the 1-of-k coding scheme. Thus, a gap usually exists between the predicted real-value from KDE and the ground truth binary value. One common practice to reduce the gap is using thresholding strategies. In this paper, we provide an alternative approach that combines a second-level classifier using a special degenerated form of stacked generalization. The second issue is the decreasing performance when KDE is applied to classification with skewed data. Our experiments show that standard KDE is not an appropriate approach for skewed data; we then provide a solution to handle skewed data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Note that v t are not used by SG.

  2. Note that sufficient training instances to tell \({\hat y_{j}}\) from \({\hat y_{i}}\) doesn’t mean sufficient training instances to know \({\mathrm {}}R\left ( {{{\hat y}_{j}}{\mathrm {|}}{{\hat y}_{i}}} \right )\)

  3. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

  4. http://sci2s.ugr.es/keel/multilabel.php#sub10

References

  1. Bi W, Kwok JT (2011) Multi-label classification on tree- and DAG-structured hierarchies. In: Proceedings of the 28th international conference on machine learning, pp 17–24

  2. Dembczynski K, Waegeman W, Cheng W, Hllermeier E (2010) On label dependence in multi-label classification. In: Proceedings of the 2nd international workshop on learning from multi-label data, pp 5–12

  3. Ganganwar V (2012) An overview of classification algorithms for imbalanced datasets. Int J Emerg Tech Adv Eng 2(4)

  4. Hulse J V, Khoshgoftaar M, Napolitano A (2007) Experimental perspectives on learning from imbalanced data. In: Proceeding ICML ’07 proceedings of the 24th international conference on machine learning, pp 935–942

  5. Ioannou M, Sakkas G, Tsoumakas G, Vlahavas I P (2010) Obtaining bipartitions from score vectors for multi-label classification. Int Conf Tools Artif Intell-ICTAI 1:409–416

    Google Scholar 

  6. Lin Y, Hu X, Wu X (2014) Ensemble learning from multiple information sources via label propagation and consensus. Appl Intell 18

  7. Quevedo J R, Luaces O, Bahamonde A (2012) Multilabel classifers with a probabilistic thresholding strategy. Pattern Recog 45(2):76?C883

    Google Scholar 

  8. Rokach L (2009) Ensemble-based classifiers. Artif Intell Rev 33(1–2):1–39

    MathSciNet  Google Scholar 

  9. Russell S, Norvig P (2010) Artificial intelligence: a modern approach, 3rd edn. Prentice Hall

  10. Sewell M (2008) Ensemble learning, edited by University College London

  11. Tai F, Lin HT (2010) Multi-label classification with principle label space transformation. In: Proceedings of the 2nd international workshop on learning from multi-label data

  12. Tsoumakas G, Katakis I, Vlahavas I (2010) Mining multi-label data. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook, 2nd edn. Springer

  13. Weston J, Chapelle O, Elisseeff A, Scholkopf B, Vapnik V (2003) Kernel dependency estimation. In: Advances in neural information processing systems 15

  14. Wolpert D H (1992) Stacked generalization. Neural Netw 5:241–259

    Article  Google Scholar 

  15. Yang YM (2001) A study of thresholding strategies for text categorization. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, pp 137–145

  16. Zhang M L, Zhou Z H (in press) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chia-Hui Chang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Q., Chang, CH. Enhancement of kernel dependency estimation with information generalization and a case study on skewed data. Appl Intell 41, 582–593 (2014). https://doi.org/10.1007/s10489-014-0539-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-014-0539-8

Keywords

Navigation