Rare Category Exploration on Linear Time Complexity
Rare Category Exploration (in short as RCE) discovers the remaining data examples of a rare category from a seed. Approaches to this problem often have a high time complexity and are applicable to rare categories with compact and spherical shapes rather than arbitrary shapes. In this paper, we present FREE an effective and efficient RCE solution to explore rare categories of arbitrary shapes on a linear time complexity w.r.t. data set size. FREE firstly decomposes a data set into equal-sized cells, on which it performs wavelet transform and data density analysis to find the coarse shape of a rare category, and refines the coarse shape via an M\(k\)NN based metric. Experimental results on both synthetic and real data sets verify the effectiveness and efficiency of our approach.
KeywordsFeature Space Data Density Seed Cell Arbitrary Shape Local Cluster
Unable to display preview. Download preview PDF.
- 1.Asuncion, A., Newman, D.: UCI Machine Learning Repository (2007)Google Scholar
- 2.Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: KDD, pp. 29–38, Washington, DC, USA, August 24–27, 2003Google Scholar
- 3.He, J., Carbonell, J.: Nearest-neighbor-based active learning for rare category detection. In: Advances in Neural Information Processing Systems 20 (NIPS 2007), pp. 633–640, Vancouver, British Columbia, Canada, December 3–6, 2007Google Scholar
- 4.He, J., Carbonell, J.: Prior-free rare category detection. In: Proceedings of the SIAM International Conference on Data Mining (SDM 2009), pp. 155–163, Sparks, Nevada, USA, April 30-May 2, 2009Google Scholar
- 5.He, J., Tong, H., Carbonell, J.: Rare category characterization. In: The 10th IEEE International Conference on Data Mining (ICDM 2010), pp. 226–235, Sydney, Australia, December 14–17, 2010Google Scholar
- 6.Huang, H., Chiew, K., Gao, Y., He, Q., Li, Q.: Rare category exploration. ESWA 41(9), 4197–4210 (2014)Google Scholar
- 10.Li, S., Z. Wang, Zhou, G., Lee, S.: Semi-supervised learning for imbalanced sentiment classification. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, pp. 1826–1831 (2011)Google Scholar
- 11.Liu, Z., Chiew, K., He, Q., Huang, H., Huang, B.: Prior-free rare category detection: More effective and efficient solutions. ESWA 41(17), 7691–7706 (2014)Google Scholar
- 12.Liu, Z., Huang, H., He, Q., Chiew, K., Ma, L.: Rare category detection on \(O(dN)\) timecomplexity. In: The 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD 2014), pp. 498–509, Tainan, Taiwan, May 13–16, 2014Google Scholar
- 16.Vatturi, P., Wong, W.: Category detection using hierarchical mean shift. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 847–856, Paris, France, June 28-July 1, 2009Google Scholar
- 17.Wand, M.P.: Data-based choice of histogram bin width. The American Statistician 51(1), 59–64 (1997)Google Scholar