Advertisement

Scalable Outlying-Inlying Aspects Discovery via Feature Ranking

  • Nguyen Xuan Vinh
  • Jeffrey Chan
  • James Bailey
  • Christopher Leckie
  • Kotagiri Ramamohanarao
  • Jian Pei
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9078)

Abstract

In outlying aspects mining, given a query object, we aim to answer the question as to what features make the query most outlying. The most recent works tackle this problem using two different strategies. (i) Feature selection approaches select the features that best distinguish the two classes: the query point vs. the rest of the data. (ii) Score-and-search approaches define an outlyingness score, then search for subspaces in which the query point exhibits the best score. In this paper, we first present an insightful theoretical result connecting the two types of approaches. Second, we present OARank – a hybrid framework that leverages the efficiency of feature selection based approaches and the effectiveness and versatility of score-and-search based methods. Our proposed approach is orders of magnitudes faster than previously proposed score-and-search based approaches while being slightly more effective, making it suitable for mining large data sets.

Keywords

Outlying aspects mining Feature selection Feature ranking Quadratic programming 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chaovalitwongse, A.W., et al.: Quadratic integer programming: complexity and equivalent forms quadratic integer programming: Complexity and equivalent forms. In: Floudas, C.A., Pardalos, P.M. (eds.) Encyclopedia of Optimization, pp. 3153–3159 (2009)Google Scholar
  2. 2.
    Dang, X.H., et al.: Discriminative features for identifying and interpreting outliers. In: ICDE 2014, pp. 88–99, March 2014Google Scholar
  3. 3.
    Dang, X.-H., Bailey, J.: A hierarchical information theoretic technique for the discovery of non linear alternative clusterings. In: 16th ACM SIGKDD, pp. 573–582. ACM, New York (2010)Google Scholar
  4. 4.
    Dang, X.H., Micenková, B., Assent, I., Ng, R.T.: Local outlier detection with interpretation. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part III. LNCS, vol. 8190, pp. 304–320. Springer, Heidelberg (2013)Google Scholar
  5. 5.
    Duan, L., Tang, G., Pei, J., Bailey, J., et al.: Mining outlying aspects on numeric data. In: Data Mining and Knowledge Discovery (2014) (in press)Google Scholar
  6. 6.
    Havrda, J., Charvat, F.: Quantification method of classification processes. concept of structural \(\alpha \)-entropy. Kybernetika 3, 30–35 (1967)zbMATHMathSciNetGoogle Scholar
  7. 7.
    Jawaharlal, K.: Entropy Measures. Maximum Entropy Principle and Emerging Applications. Springer-Verlag New York Inc., Secaucus (2003)Google Scholar
  8. 8.
    Keller, F., Muller, E., Bohm, K.: HiCS: high contrast sub-spaces for density-based outlier ranking. In: ICDE 2012, pp. 1037–1048 (2012)Google Scholar
  9. 9.
    Mathai, A.M., Haubold, H.J.: On generalized entropy measures and pathways. Physica A: Statistical Mechanics and its Applications 385(2), 493–500 (2007)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Micenkova, B., Ng, R.T., Assent, I., Dang, X.-H.: Explaining outliers by subspace separability. In: ICDM (2013)Google Scholar
  11. 11.
    Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley (1992)Google Scholar
  12. 12.
    Vinh, N.X., Chan, J., Bailey, J.: Reconsidering mutual information based feature selection: A statistical significance view. In: AAAI 2014 (2014)Google Scholar
  13. 13.
    Vinh, N.X., Chan, J., Romano, S., Bailey, J.: Effective global approaches for mutual information based feature selection. In: KDD 2014 (2014)Google Scholar
  14. 14.
    Vinh, N.X., Epps, J.: Mincentropy: A novel information theoretic approach for the generation of alternative clusterings. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 521–530 (2010)Google Scholar
  15. 15.
    Zhang, J., Lou, M., et al.: Hos-miner: a system for detecting outlyting subspaces of high-dimensional data. In: VLDB 2004, pp. 1265–1268 (2004)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Nguyen Xuan Vinh
    • 1
  • Jeffrey Chan
    • 1
  • James Bailey
    • 1
  • Christopher Leckie
    • 1
  • Kotagiri Ramamohanarao
    • 1
  • Jian Pei
    • 2
  1. 1.The University of MelbourneMelbourneAustralia
  2. 2.Simon Fraser UniversityBurnabyCanada

Personalised recommendations