Advertisement

Rare Category Exploration on Linear Time Complexity

  • Zhenguang Liu
  • Hao HuangEmail author
  • Qinming He
  • Kevin Chiew
  • Yunjun Gao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9050)

Abstract

Rare Category Exploration (in short as RCE) discovers the remaining data examples of a rare category from a seed. Approaches to this problem often have a high time complexity and are applicable to rare categories with compact and spherical shapes rather than arbitrary shapes. In this paper, we present FREE an effective and efficient RCE solution to explore rare categories of arbitrary shapes on a linear time complexity w.r.t. data set size. FREE firstly decomposes a data set into equal-sized cells, on which it performs wavelet transform and data density analysis to find the coarse shape of a rare category, and refines the coarse shape via an M\(k\)NN based metric. Experimental results on both synthetic and real data sets verify the effectiveness and efficiency of our approach.

Keywords

Feature Space Data Density Seed Cell Arbitrary Shape Local Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Asuncion, A., Newman, D.: UCI Machine Learning Repository (2007)Google Scholar
  2. 2.
    Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: KDD, pp. 29–38, Washington, DC, USA, August 24–27, 2003Google Scholar
  3. 3.
    He, J., Carbonell, J.: Nearest-neighbor-based active learning for rare category detection. In: Advances in Neural Information Processing Systems 20 (NIPS 2007), pp. 633–640, Vancouver, British Columbia, Canada, December 3–6, 2007Google Scholar
  4. 4.
    He, J., Carbonell, J.: Prior-free rare category detection. In: Proceedings of the SIAM International Conference on Data Mining (SDM 2009), pp. 155–163, Sparks, Nevada, USA, April 30-May 2, 2009Google Scholar
  5. 5.
    He, J., Tong, H., Carbonell, J.: Rare category characterization. In: The 10th IEEE International Conference on Data Mining (ICDM 2010), pp. 226–235, Sydney, Australia, December 14–17, 2010Google Scholar
  6. 6.
    Huang, H., Chiew, K., Gao, Y., He, Q., Li, Q.: Rare category exploration. ESWA 41(9), 4197–4210 (2014)Google Scholar
  7. 7.
    Huang, H., He, Q., Chiew, K., Qian, F., Ma, L.: CLOVER: A faster prior-free approach to rarecategory detection. Knowledge and Information Systems 35(3), 713–736 (2013)CrossRefGoogle Scholar
  8. 8.
    Huang, H., He, Q., He, J., Ma, L.: RADAR: Rare category detection via computation of boundary degree. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 258–269. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  9. 9.
    Huang, J.Z., Ng, M., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. TPAMI 27(5), 657–668 (2005)CrossRefGoogle Scholar
  10. 10.
    Li, S., Z. Wang, Zhou, G., Lee, S.: Semi-supervised learning for imbalanced sentiment classification. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, pp. 1826–1831 (2011)Google Scholar
  11. 11.
    Liu, Z., Chiew, K., He, Q., Huang, H., Huang, B.: Prior-free rare category detection: More effective and efficient solutions. ESWA 41(17), 7691–7706 (2014)Google Scholar
  12. 12.
    Liu, Z., Huang, H., He, Q., Chiew, K., Ma, L.: Rare category detection on \(O(dN)\) timecomplexity. In: The 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD 2014), pp. 498–509, Tainan, Taiwan, May 13–16, 2014Google Scholar
  13. 13.
    Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, New York (1992)CrossRefzbMATHGoogle Scholar
  14. 14.
    Sheikholeslami, G., Chatterjee, S., Zhang, A.: Wavecluster: A wavelet-based clustering approach for spatial data in very large databases. The VLDB Journal 8(3–4), 289–304 (2000)CrossRefGoogle Scholar
  15. 15.
    Tang, Y., Zhang, Y., Chawla, N., Krasser, S.: SVMs modeling for highly imbalanced classification. IEEE Transactions on systems, man, and cybernetics 39(1), 281–288 (2009)CrossRefGoogle Scholar
  16. 16.
    Vatturi, P., Wong, W.: Category detection using hierarchical mean shift. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 847–856, Paris, France, June 28-July 1, 2009Google Scholar
  17. 17.
    Wand, M.P.: Data-based choice of histogram bin width. The American Statistician 51(1), 59–64 (1997)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Zhenguang Liu
    • 1
  • Hao Huang
    • 1
    • 2
    Email author
  • Qinming He
    • 1
  • Kevin Chiew
    • 3
  • Yunjun Gao
    • 1
  1. 1.College of Computer Science and TechnologyZhejiang UniversityHangzhouChina
  2. 2.State Key Laboratory of Software EngineeringWuhan UniversityWuhanChina
  3. 3.Singapore BranchHandal Indah Sdn BhdJohor BahruMalaysia

Personalised recommendations