Fast Rare Category Detection Using Nearest Centroid Neighborhood
Rare category detection is an open challenge in data mining. The existing approaches to this problem often have some flaws, such as inappropriate investigation scopes, high time complexity, and limited applicable conditions, which will degrade their performance and reduce their usability. In this paper, we present FRANC an effective and efficient solution for rare category detection. It adopts an investigation scope based on k-nearest centroid neighbors with an automatically selected k, which helps the algorithm capture the real changes on local densities and data distribution caused by the presence of rare categories. By using our proposed pruning method, the identification of k-nearest centroid neighbors, which is the most computationally expensive step in FRANC, will be much faster for each data example. Extensive experimental results on real data sets demonstrate the effectiveness and efficiency of FRANC.
KeywordsGaussian Mixture Model Candidate Data Optimal Bandwidth Pruning Method Extensive Experimental Result
This work was supported in part by NSFC Grants (61502347, 61522208, 61572376, 61303025, 61379033, and 61232002), the Fundamental Research Funds for the Central Universities (2015XZZX005-07, 2015XZZX004-18, and 2042015kf0038), the Research Funds for Introduced Talents of Wuhan University, and the International Academic Cooperation Training Program of Wuhan University.
- 1.Frank, A., Asuncion, A.: UCI machine learning repository (2010). http://archive.ics.uci.edu/ml/
- 3.He, J., Carbonell, J.: Nearest-neighbor-based active learning for rare category detection. In: NIPS 2007, pp. 633–640 (2007)Google Scholar
- 4.He, J., Carbonell, J.: Prior-free rare category detection. In: SDM 2009, pp. 155–163 (2009)Google Scholar
- 5.He, J., Liu, Y., Lawrence, R.: Graph-based rare category detection. In: ICDM 2008, pp. 833–838 (2008)Google Scholar
- 7.Huang, H., Gao, Y., Chiew, K., Chen, L., He, Q.: Towards effective and efficient mining of arbitrary shaped clusters. In: ICDE 2014, pp. 28–39 (2014)Google Scholar
- 11.Pelleg, D., Moore, A.W.: Active learning for anomaly and rare-category detection. In: NIPS 2004, pp. 1073–1080 (2004)Google Scholar
- 13.Vatturi, P., Wong, W.: Category detection using hierarchical mean shift. In: KDD 2009, pp. 847–856 (2009)Google Scholar