Journal of Computer Science and Technology

, Volume 29, Issue 3, pp 408–422 | Cite as

Harnessing the Power of GPUs to Speed Up Feature Selection for Outlier Detection

  • Fatemeh AzmandianEmail author
  • Ayse Yilmazer
  • Jennifer G. Dy
  • Javed A. Aslam
  • David R. Kaeli
Regular Paper


Acquiring a set of features that emphasize the differences between normal data points and outliers can drastically facilitate the task of identifying outliers. In our work, we present a novel non-parametric evaluation criterion for filter-based feature selection which has an eye towards the final goal of outlier detection. The proposed method seeks the subset of features that represent the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms. Experimental results on real datasets show the advantage of our feature selection algorithm compared with popular and state-of-the-art methods. We also show that the proposed algorithm is able to overcome the small sample space problem and perform well on highly imbalanced datasets. Furthermore, due to the highly parallelizable nature of the feature selection, we implement the algorithm on a graphics processing unit (GPU) to gain significant speedup over the serial version. The benefits of the GPU implementation are two-fold, as its performance scales very well in terms of the number of features, as well as the number of data points.


feature selection outlier detection imbalanced data GPU acceleration 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2014_1439_MOESM1_ESM.pdf (76 kb)
ESM 1 (PDF 75 kb)


  1. [1]
    Schölkopf B, Smola A, Müller K R. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 1998, 10(5): 1299-1319.CrossRefGoogle Scholar
  2. [2]
    Kira K, Rendell L A. A practical approach to feature selection. In Proc. the 9th ICML, July 1992, pp.249-256.Google Scholar
  3. [3]
    Liu H, Motoda H. Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, 1998.Google Scholar
  4. [4]
    Kohavi R, John G H. Wrappers for feature subset selection. Artificial Intelligence, 1997, 97(1/2): 273-324.CrossRefzbMATHGoogle Scholar
  5. [5]
    Dash M, Liu H. Feature selection for classification. Intelligent Data Analysis, 1997, 1(1/4): 131-156.CrossRefGoogle Scholar
  6. [6]
    Guyon I, Elisseeff A. An introduction to variable and feature selection. J. Machine Learning Research, 2003, 3: 1157-1182.zbMATHGoogle Scholar
  7. [7]
    Nguyen H V, Gopalkrishnan V. Feature extraction for outlier detection in high-dimensional spaces. In Proc. the 4th Int. Workshop. Feature Selection in Data Mining, June 2010, pp.66-75.Google Scholar
  8. [8]
    Azmandian F, Yilmazer A, Dy J et al. GPU-accelerated feature selection for outlier detection using the local kernel density ratio. In Proc. the 12th ICDM, December 2012, pp.51-60.Google Scholar
  9. [9]
    Tibshirani R. Regression shrinkage and selection via the lasso. J. Royal Statistical Society, Series B, 1996, 58(1): 267-288.zbMATHMathSciNetGoogle Scholar
  10. [10]
    Song L, Bedo J, Borgwardt K M et al. Gene selection via the BAHSIC family of algorithms. Bioinformatics, 2007, 23(3): i490-i498.CrossRefGoogle Scholar
  11. [11]
    Wu X, Yu K, Ding W et al. Online feature selection with streaming features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(5): 1178-1192.CrossRefGoogle Scholar
  12. [12]
    Chen X W, Wasikowski M. FAST: A ROC-based feature selection metric for small samples and imbalanced data classification problems. In Proc. the 14th KDD, August 2008, pp.124-132.Google Scholar
  13. [13]
    Aggarwal C, Yu S. An effective and efficient algorithm for high-dimensional outlier detection. The VLDB Journal, 2005, 14(2): 211-221.CrossRefGoogle Scholar
  14. [14]
    de Vries T, Chawla S, Houle M E. Density-preserving projections for large-scale local anomaly detection. Knowledge and Information Systems, 2012, 32(1): 25-52.CrossRefGoogle Scholar
  15. [15]
    Branch J W, Giannella C, Szymanski B K et al. In-network outlier detection in wireless sensor networks. Knowledge and Information Systems, 2013, 34(1): 23-54.CrossRefGoogle Scholar
  16. [16]
    Hido S, Tsuboi Y, Kashima H et al. Statistical outlier detection using direct density ratio estimation. Knowledge and Information Systems, 2011, 26(2): 309-336.CrossRefGoogle Scholar
  17. [17]
    Sugiyama M, Yamada M, von Bünau P et al. Direct density-ratio estimation with dimensionality reduction via least-squares hetero-distributional subspace search. Neural Networks, 2011, 24(2): 183-198.CrossRefzbMATHGoogle Scholar
  18. [18]
    Smola A, Song L, Teo C H. Relative novelty detection. In Proc. the 12th AISTATS, April 2009, pp.536-543.Google Scholar
  19. [19]
    Hawkins D M. Identification of Outliers. London, New York: Chapman and Hall, 1980.CrossRefzbMATHGoogle Scholar
  20. [20]
    Breunig M M, Kriegel H P, Ng R T et al. LOF: Identifying density-based local outliers. ACM SIGMOD Record, 2000, 29(2): 93-104.CrossRefGoogle Scholar
  21. [21]
    Horn R A, Johnson C R. Matrix Analysis. Cambridge, New York: Cambridge University Press, 1985.CrossRefzbMATHGoogle Scholar
  22. [22]
    Balcan M F, Blum A. On a theory of learning with similarity functions. In Proc. the 23rd International Conference on Machine Learning (ICML), June 2006, pp.73-80.Google Scholar
  23. [23]
    Parzen E. On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 1962, 33(3): 1065-1076.CrossRefzbMATHMathSciNetGoogle Scholar
  24. [24]
    Rosenblatt M. Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics, 1956, 27(3): 832-837.CrossRefzbMATHMathSciNetGoogle Scholar
  25. [25]
    Devijver P A, Kittler J. Pattern Recognition: A Statistical Approach. Englewood Cliffs, NJ: Prentice Hall, 1982.zbMATHGoogle Scholar
  26. [26]
    Masaeli M, Fung G, Dy J G. From transformation-based dimensionality reduction to feature selection. In Proc. the 27th ICML, June 2010, pp.751-758.Google Scholar
  27. [27]
    Kirk D B, Hwu W W. Programming Massively Parallel Processors: A Hands-on Approach (Applications of GPU Computing Series). Morgan Kaufmann Publishers, 2010.Google Scholar
  28. [28]
    Lv Q, Josephson W, Wang Z et al. Multi-probe LSH: Efficient indexing for high-dimensional similarity search. In Proc. the 33rd VLDB, Sept. 2007, pp.950-961.Google Scholar
  29. [29]
    Arya S, Mount D M, Netanyahu N S et al. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM, 1998, 45(6): 891-923.CrossRefzbMATHMathSciNetGoogle Scholar
  30. [30]
    Garcia V, Debreuve E, Barlaud M. Fast k nearest neighbor search using GPU. In Proc. Workshop on Computer Vision and Pattern Recognition, June 2008.Google Scholar
  31. [31]
    Azmandian F, Dy J G, Aslam J A et al. Local kernel density ratio-based feature selection for outlier detection. In Proc. the 4th Asian Conf. Machine Learning, Nov. 2012, pp.49-64.Google Scholar
  32. [32]
    Güvenir H A, Acar B, Demiröz G et al. A supervised machine learning algorithm for arrhythmia analysis. In Proc. Computers in Cardiology Conference, September 1998, pp.433-436.Google Scholar

Copyright information

© Springer Science+Business Media New York & Science Press, China 2014

Authors and Affiliations

  • Fatemeh Azmandian
    • 1
    Email author
  • Ayse Yilmazer
    • 1
  • Jennifer G. Dy
    • 1
  • Javed A. Aslam
    • 2
  • David R. Kaeli
    • 1
  1. 1.Department of Electrical and Computer EngineeringNortheastern UniversityBostonU.S.A.
  2. 2.College of Computer and Information ScienceNortheastern UniversityBostonU.S.A.

Personalised recommendations