Abstract
The k nearest neighbour (kNN) algorithm classifies a query instance to the most frequent class among its k nearest neighbours in the training instance space. For imbalanced class distribution where positive training instances are rare, a query instance is often overwhelmed by negative instances in its neighbourhood and likely to be classified to the negative majority class. In this paper we propose a Positive-biased Nearest Neighbour (PNN) algorithm, where the local neighbourhood of query instances is dynamically formed and classification decision is carefully adjusted based on class distribution in the local neighbourhood. Extensive experiments on real-world imbalanced datasets show that PNN has good performance for imbalanced classification. PNN often outperforms recent kNN-based imbalanced classification algorithms while significantly reducing their extra computation cost.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aha, D.W. (ed.): Lazy learning. Kluwer Academic Publishers, Norwell (1997)
Aha, D.W., Kibler, D.F., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)
Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://www.ics.uci.edu/~mlearn/
Bradley, A.P.: The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
Cover, T., Hart, P.: Nearest neighbor pattern classification. Institute of Electrical and Electronics Engineers Transactions on Information Theory 13, 21–27 (1967)
Domingos, P.: MetaCost: A general method for making classifiers cost-sensitive. In: Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining (KDD 1999), pp. 155–164. ACM Press (1999)
Ferrandiz, S., Boullé, M.: Bayesian instance selection for the nearest neighbor rule. Machine Learning 81(3), 229–256 (2010)
Holte, R.C., Acker, L., Porter, B.W.: Concept learning and the problem of small disjuncts. In: Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, pp. 813–818 (1989)
Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. SIGKDD Explorations 6(1), 40–49 (2004)
Kubat, M., Holte, R., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. In: Machine Learning, pp. 195–215 (1998)
Li, Y., Zhang, X.: Improving k nearest neighbor with exemplar generalization for imbalanced classification. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 321–332. Springer, Heidelberg (2011)
Liu, W., Chawla, S.: Class confidence weighted knn algorithms for imbalanced data sets. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 345–356. Springer, Heidelberg (2011)
Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering 33, 2–13 (2007)
Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the 15th International Conference on Machine Learning (ICML 1998), pp. 445–453. Morgan Kaufmann (1998)
Provost, F.J., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42(3), 203–231 (2001)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)
Swets, J.: Measuring the accuracy of diagnostic systems. Science 240(4857), 1285–1293 (1988)
Ting, K.: The problem of small disjuncts: its remedy in decision trees. In: Proceedings of the 10th Canadian Conference on Artificial Intelligence, pp. 91–97 (1994)
Van Den Bosch, A., Weijters, A., Van Den Herik, H.J., Daelemans, W.: When small disjuncts abound, try lazy learning: A case study. In: Proceedings of the Seventh Belgian-Dutch Conference on Machine Learning, pp. 109–118 (1997)
Wang, J., Neskovic, P., Cooper, L.: Neighborhood size selection in the k-nearest-neighbour rule using statistical confidence. Pattern Recognition 39, 417–423 (2006)
Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explorations 6(1), 7–19 (2004)
Weiss, G.M., Hirsh, H.: A quantitative study of small disjuncts. In: Proceedings of the National Conference on Artificial Intelligence, pp. 665–670 (2000)
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. In: Machine Learning, pp. 257–286 (2000)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, X., Li, Y. (2013). A Positive-biased Nearest Neighbour Algorithm for Imbalanced Classification. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37456-2_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-37456-2_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37455-5
Online ISBN: 978-3-642-37456-2
eBook Packages: Computer ScienceComputer Science (R0)