Fast Rare Category Detection Using Nearest Centroid Neighborhood

  • Song Wang
  • Hao HuangEmail author
  • Yunjun Gao
  • Tieyun Qian
  • Liang Hong
  • Zhiyong PengEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9931)


Rare category detection is an open challenge in data mining. The existing approaches to this problem often have some flaws, such as inappropriate investigation scopes, high time complexity, and limited applicable conditions, which will degrade their performance and reduce their usability. In this paper, we present FRANC an effective and efficient solution for rare category detection. It adopts an investigation scope based on k-nearest centroid neighbors with an automatically selected k, which helps the algorithm capture the real changes on local densities and data distribution caused by the presence of rare categories. By using our proposed pruning method, the identification of k-nearest centroid neighbors, which is the most computationally expensive step in FRANC, will be much faster for each data example. Extensive experimental results on real data sets demonstrate the effectiveness and efficiency of FRANC.


Gaussian Mixture Model Candidate Data Optimal Bandwidth Pruning Method Extensive Experimental Result 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported in part by NSFC Grants (61502347, 61522208, 61572376, 61303025, 61379033, and 61232002), the Fundamental Research Funds for the Central Universities (2015XZZX005-07, 2015XZZX004-18, and 2042015kf0038), the Research Funds for Introduced Talents of Wuhan University, and the International Academic Cooperation Training Program of Wuhan University.


  1. 1.
    Frank, A., Asuncion, A.: UCI machine learning repository (2010).
  2. 2.
    Gou, J., Yi, Z., Du, L., Xiong, T.: A local mean-based \(k\)-nearest centroid neighbor classifier. Comput. J. 55(9), 1058–1071 (2012)CrossRefGoogle Scholar
  3. 3.
    He, J., Carbonell, J.: Nearest-neighbor-based active learning for rare category detection. In: NIPS 2007, pp. 633–640 (2007)Google Scholar
  4. 4.
    He, J., Carbonell, J.: Prior-free rare category detection. In: SDM 2009, pp. 155–163 (2009)Google Scholar
  5. 5.
    He, J., Liu, Y., Lawrence, R.: Graph-based rare category detection. In: ICDM 2008, pp. 833–838 (2008)Google Scholar
  6. 6.
    Hospedales, T.M., Gong, S., Xiang, T.: Finding rare classes: active learning with generative and discriminative models. IEEE Trans. Knowl. Data Eng. 25(2), 374–386 (2013)CrossRefGoogle Scholar
  7. 7.
    Huang, H., Gao, Y., Chiew, K., Chen, L., He, Q.: Towards effective and efficient mining of arbitrary shaped clusters. In: ICDE 2014, pp. 28–39 (2014)Google Scholar
  8. 8.
    Huang, H., He, Q., Chiew, K., Qian, F., Ma, L.: CLOVER: a faster prior-free approach to rare-category detection. Knowl. Inf. Syst. 35(3), 713–736 (2013)CrossRefGoogle Scholar
  9. 9.
    Huang, H., He, Q., He, J., Ma, L.: RADAR: rare category detection via computation of boundary degree. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 258–269. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  10. 10.
    Liu, Z., Chiew, K., He, Q., Huang, H., Huang, B.: Prior-free rare category detection: more effective and efficient solutions. Expert Syst. Appl. 41(17), 7691–7706 (2014)CrossRefGoogle Scholar
  11. 11.
    Pelleg, D., Moore, A.W.: Active learning for anomaly and rare-category detection. In: NIPS 2004, pp. 1073–1080 (2004)Google Scholar
  12. 12.
    Scott, D.W.: Histogram. WIREs Comput. Stat. 2(1), 44–48 (2010)CrossRefGoogle Scholar
  13. 13.
    Vatturi, P., Wong, W.: Category detection using hierarchical mean shift. In: KDD 2009, pp. 847–856 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.State Key Laboratory of Software EngineeringWuhan UniversityWuhanChina
  2. 2.College of Computer ScienceZhejiang UniversityHangzhouChina
  3. 3.School of Information ManagementWuhan UniversityWuhanChina

Personalised recommendations