Skip to main content
Log in

Convex and nonconvex nonparametric frontier-based classification methods for anomaly detection

  • Original Article
  • Published:
OR Spectrum Aims and scope Submit manuscript

Abstract

Effective methods for determining the boundary of the normal class are very useful for detecting anomalies in commercial or security applications—a problem known as anomaly detection. This contribution proposes a nonparametric frontier-based classification (NPFC) method for anomaly detection. By relaxing the commonly used convexity assumption in the literature, a nonconvex-NPFC method is constructed and the nonconvex nonparametric frontier turns out to provide a more conservative boundary enveloping the normal class. By reflecting on the monotonic relation between the characteristic variables and the membership, the proposed NPFC method is in a more general form since both input-like and output-like characteristic variables are incorporated. In addition, by allowing some of the training observations to be misclassified, the convex- and nonconvex-NPFC methods are extended from a hard nonparametric frontier to a soft one, which also provides a more conservative boundary enclosing the normal class. Both simulation studies and a real-life data set are used to evaluate and compare the proposed NPFC methods to some well-established methods in the literature. The results show that the proposed NPFC methods have competitive classification performance and have consistent advantages in detecting abnormal samples, especially the nonconvex-NPFC methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. The data set is collected from the Statlib data archive at: http://lib.stat.cmu.edu/datasets/

References

  • Ahmed M, Mahmood AN, Islam MR (2016) A survey of anomaly detection techniques in financial domain. Futur Gener Comput Syst 55:278–288

    Article  Google Scholar 

  • Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Boulicaut J, Esposito F, Giannotti F, Pedreschi D (eds) Machine learning: ECML 2004. Springer, Berlin, pp 39–50

    Chapter  Google Scholar 

  • Al-Hashedi KG, Magalingam P (2021) Financial fraud detection applying data mining techniques: a comprehensive review from 2009 to 2019. Comput Sci Rev 40:100402

    Article  Google Scholar 

  • Alam S, Sonbhadra SK, Agarwal S, Nagabhushan P (2020) One-class support vector classifiers: a survey. Knowl-Based Syst 196:105754

    Article  Google Scholar 

  • Andersen P, Petersen NC (1993) A procedure for ranking efficient units in data envelopment analysis. Manage Sci 39(10):1261–1264

    Article  Google Scholar 

  • Aparicio J, Esteve M, Rodriguez-Sala JJ, Zofio JL (2021) The estimation of productive efficiency through machine learning techniques: efficiency analysis trees. In: Zhu J, Charles V (eds) Data-enabled analytics: DEA for big data. Springer, Cham, pp 51–92

    Chapter  Google Scholar 

  • Banker RD, Charnes A, Cooper WW (1984) Some models for estimating technical and scale inefficiencies in data envelopment analysis. Manage Sci 30(9):1078–1092

    Article  Google Scholar 

  • Bhattacharyya S, Jha S, Tharakunnel K, Westland JC (2011) Data mining for credit card fraud: a comparative study. Decis Support Syst 50(3):602–613

    Article  Google Scholar 

  • Briec W (1997) A graph-type extension of Farrell technical efficiency measure. J Prod Anal 8(1):95–110

    Article  Google Scholar 

  • Briec W, Kerstens K, Van de Woestyne I (2016) Congestion in production correspondences. J Econ 119(1):65–90

    Article  Google Scholar 

  • Briec W, Kerstens K, Van de Woestyne I (2018) Hypercongestion in production correspondences: an empirical exploration. Appl Econ 50(27):2938–2956

    Article  Google Scholar 

  • Chambers R, Chung Y, Färe R (1998) Profit, directional distance functions, and nerlovian efficiency. J Optim Theory Appl 98(2):351–364

    Article  Google Scholar 

  • Chang D, Kuo Y (2008) An Approach for the two-group discriminant analysis: an application of DEA. Math Comput Model 47(9–10):970–981

    Article  Google Scholar 

  • Cherchye L, Kuosmanen T, Post T (2001) FDH directional distance functions with an application to European commercial banks. J Prod Anal 15(3):201–215

    Article  Google Scholar 

  • Clifton L, Clifton DA, Zhang Y, Watkinson P, Tarassenko L, Yin H (2014) Probabilistic novelty detection with support vector machines. IEEE Trans Reliab 63(2):455–467

    Article  Google Scholar 

  • Cox LH, Johnson MM, Kafadar K (1982) Exposition of statistical graphics technology. In: Proceedings of the statistical computation section, American Statistical Association, Washington, D.C, pp 55–56

  • da Silva DB, Schmidt D, da Costa CA, da Rosa Righi R, Eskofier B (2021) Deepsigns: a predictive model based on deep learning for the early detection of patient health deterioration. Expert Syst Appl 165:113905

    Article  Google Scholar 

  • Deprins D, Simar L, Tulkens H (1984) Measuring labor efficiency in post offices. In: Marchand M, Pestieau P, Tulkens H (eds) The performance of public enterprises: concepts and measurements. North Holland, Amsterdam, pp 243–268

    Google Scholar 

  • El-Yaniv R, Nisenson M (2006) Optimal single-class classification strategies. Adv Neural Inform Process Syst 19

  • Emrouznejad A, Banker RD, Neralic L (2019) Advances in data envelopment analysis: celebrating the 40th anniversary of DEA and the 100th anniversary of professor Abraham Charnes, Birthday. Eur J Op Res 278(2):365–367

    Article  Google Scholar 

  • Emrouznejad A, Yang G-L (2018) A survey and analysis of the first 40 years of scholarly literature in DEA: 1978–2016. Socioecon Plann Sci 61:4–8

    Article  Google Scholar 

  • Esteve M, Aparicio J, Rodriguez-Sala JJ, Zhu J (2023) Random forests and the measurement of super-efficiency in the context of free disposal hull. Eur J Oper Res 304(2):729–744

    Article  Google Scholar 

  • He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  • Jiang T, Li Y, Xie W, Du Q (2020) Discriminative reconstruction constrained generative adversarial network for hyperspectral anomaly detection. IEEE Trans Geosci Remote Sens 58(7):4666–4679

    Article  Google Scholar 

  • Kaffash S, Azizi R, Huang Y, Zhu J (2020) A survey of data envelopment analysis applications in the insurance industry 1993–2018. Eur J Oper Res 284(3):801–813

    Article  Google Scholar 

  • Kerstens K, Sadeghi J, Toloo M, Van de Woestyne I (2022) Procedures for ranking technical and cost efficient units: with a focus on nonconvexity. Eur J Oper Res 300(1):269–281

    Article  Google Scholar 

  • Kerstens K, Van de Woestyne I (2011) Negative data in DEA: a simple proportional distance function approach. J Op Res Soc 62(7):1413–1419

    Article  Google Scholar 

  • Khan SS, Madden MG (2014) One-class classification: taxonomy of study and review of techniques. Knowl Eng Rev 29(3):345–374

    Article  Google Scholar 

  • Kim J-H (2009) Estimating classification error rate: repeated cross-validation, repeated hold-out and bootstrap. Comput Stat Data Anal 53(11):3735–3745

    Article  Google Scholar 

  • Leon CF, Palacios F (2009) Evaluation of rejected cases in an acceptance system with data envelopment analysis and goal programming. J Op Res Soc 60(10):1411–1420

    Article  Google Scholar 

  • Lovell K, Pastor JT (1999) Radial DEA models without inputs or without outputs. Eur J Oper Res 118(1):46–51

    Article  Google Scholar 

  • Marcot BG, Hanea AM (2021) What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis? Comput Stat 36(3):2009–2031

    Article  Google Scholar 

  • Markou M, Singh S (2003) Novelty detection: A review-part 1: statistical approaches. Signal Process 83(12):2481–2497

    Article  Google Scholar 

  • Markou M, Singh S (2003) Novelty detection: a review-part 2: neural network based approaches. Signal Process 83(12):2499–2521

    Article  Google Scholar 

  • Moya MM, Koch MW, Hostetler LD (1993) One-class classifier networks for target recognition applications. In: World congress on neural networks, vol 3. Lawrence Erlbaum Associates, Portland, pp 797–801

  • Noble WS (2006) What is a support vector machine? Nat Biotechnol 24(12):1565–1567

    Article  Google Scholar 

  • Pang G, Shen C, Cao L, Hengel AVD (2021) Deep learning for anomaly detection: a review. ACM Comput Surv 54(2):1–38

    Article  Google Scholar 

  • Park C, Huang JZ, Ding Y (2010) A computable plug-in estimator of minimum volume sets for novelty detection. Oper Res 58(5):1469–1480

    Article  Google Scholar 

  • Pendharkar P, Khosrowpour M, Rodger J (2000) Application of Bayesian network classifiers and data envelopment analysis for mining breast cancer patterns. J. Comput. Inform. Syst. 40(4):127–132

    Google Scholar 

  • Pendharkar P, Rodger J, Yaverbaum G (1999) Association, statistical, mathematical and neural approaches for mining breast cancer patterns. Expert Syst Appl 17(3):223–232

    Article  Google Scholar 

  • Pendharkar PC (2002) A potential use of data envelopment analysis for the inverse classification problem. Omega 30(3):243–248

    Article  Google Scholar 

  • Qiu K (2022) Support Vector Data Description (SVDD Version 2.2), https://github.com/iqiukp/SVDD–MATLAB

  • Ruff L, Kauffmann JR, Vandermeulen RA, Montavon G, Samek W, Kloft M, Dietterich TG, Müller K-R (2021) A unifying review of deep and shallow anomaly detection. Proc IEEE 109(5):756–795

    Article  Google Scholar 

  • Schölkopf B, Williamson RC, Smola A, Shawe-Taylor J, Platt J (1999) Support vector method for novelty detection. In: Solla A, Müller K-R, Leen TK (eds) Advances in neural information processing systems, vol 12. MIT Press, Cambridge, Massachusetts, pp 582–588

    Google Scholar 

  • Seiford L, Zhu J (1998) An acceptance system decision rule with data envelopment analysis. Comput Op Res 25(4):329–332

    Article  Google Scholar 

  • Sueyoshi T (2006) DEA-discriminant analysis: methodological comparison among eight discriminant analysis approaches. Eur J Oper Res 169(1):247–272

    Article  Google Scholar 

  • Tax DMJ, Duin RPW (1999) Support vector domain description. Pattern Recogn Lett 20(11–13):1191–1199

    Article  Google Scholar 

  • Tax DMJ, Juszczak P (2003) Kernel whitening for one-class classification. Int J Pattern Recognit Artif Intell 17(3):333–347

    Article  Google Scholar 

  • Troutt M, Rai A, Zhang A (1996) The potential use of DEA for credit applicant acceptance systems. Comput Op Res 23(4):405–408

    Article  Google Scholar 

  • Valero-Carreras D, Aparicio J, Guerrero NM (2021) Support vector frontiers: a new approach for estimating production functions through support vector machines. Omega 104:102490

    Article  Google Scholar 

  • Wu J, An Q, Liang L (2011) A modified super-efficiency DEA approach for solving multi-groups classification problems. Int J Comput Intell Syst 4(4):606–618

    Google Scholar 

  • Yan H, Wei Q (2011) Data envelopment analysis classification machine. Inf Sci 181(22):5029–5041

    Article  Google Scholar 

  • Zhu N, Zhu C, Emrouznejad A (2021) A combined machine learning algorithms and DEA method for measuring and predicting the efficiency of chinese manufacturing listed companies. J Manage Sci Eng 6(4):435–448

    Google Scholar 

Download references

Acknowledgements

We appreciate the constructive comments of the Editor-in-Chief, G. Voigt, and two anonymous referees, which helped to improve this contribution. The usual disclaimer applies. Q. Jin appreciates financial support of the National Natural Science Foundation of China (No. 72201127).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qianying Jin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Discussions on the choices of cut-off super-efficiency

Appendix: Discussions on the choices of cut-off super-efficiency

In this contribution, a negative cut-off super-efficiency denoted by \(c_{\Lambda }\) is introduced to decide the noisy and less important training observations. A larger value of \(c_{\Lambda }\) means that more training observations will be identified as noisy and of low importance and thus, will be excluded from the construction of a soft nonparametric frontier.

For both the C and NC cases in Sect. 3.2, the simulation is executed for 100 times. Here, an example for the C case is extracted to show different C and NC soft nonparametric frontiers constructed from different choices of \(c_{\Lambda }\).

The resulted frontiers under different choices of \(c_{\Lambda }\) are displayed in Figs. 5 and 6. In every sub-figure, the normal training observations are represented by blue crosses. The training observations that are identified as noisy and of low importance are further marked by red circles. These training observations are excluded while constructing the corresponding soft nonparametric frontier. The soft nonparametric frontier is represented by the blue solid lines.

Fig. 5
figure 5

C soft nonparametric frontiers with different choices of \(c_{\Lambda }\)

Fig. 6
figure 6

NC soft nonparametric frontiers with different choices of \(c_{\Lambda }\)

Similar observations can be derived from Figs. 5 and 6. With the increase in \(c_{\Lambda }\), more training observations are identified as noisy and of low importance. Accordingly, the soft nonparametric frontier becomes more conservative. In comparison with the C soft nonparametric frontier, the NC soft nonparametric frontier is more conservative, since more training observations are excluded.

A proper choice of \(c_{\Lambda }\) is important for identifying as accurately as possible noisy and less important training observations. In the following, the suggested way of deciding the value of \(c_{\Lambda }\) is explained with the above simulation example.

Fig. 7
figure 7

Illustrative diagram of deciding a cut-off super-efficiency

By solving model (7) under the C case for 700 normal training observations, 5 of them are identified as frontier observations. Their values of \(\delta _{\text {super},\Lambda ^\text {C}}^*\) ordered from largest to smallest are represented by blue diamonds in Fig. 7a. It is observed that 4 out of 5 frontier observations have a \(\delta _{\text {super},\Lambda ^\text {C}}^*\) larger than \(-\)0.1. Only one frontier observations has \(\delta _{\text {super},\Lambda ^\text {C}}^*=-0.1537\). Thus, \(c_{\Lambda ^\text {C}}=-0.1\) is chosen.

Similarly, by solving model (7) under the NC case for the same normal training observations, 11 of them are identified as frontier observations. Their values of \(\delta _{\text {super},\Lambda ^\text {NC}}^*\) ordered from largest to smallest are as shown in Fig. 7b. It is observed that 10 out of 11 frontier observations have the value of \(\delta _{\text {super},\Lambda ^\text {NC}}^*\) larger than \(-\)0.2. Only one frontier observations has \(\delta _{\text {super},\Lambda ^\text {NC}}^*=-0.2494\). Thus, \(c_{\Lambda ^\text {NC}}=-0.2\) is chosen.

One suggested way of deciding the value of \(c_{\Lambda }\) is illustrated. Note that we do not intend to suggest that this is the optimal way. It is worthwhile for future researches to explore the other methods of deciding a proper \(c_{\Lambda }\).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, Q., Kerstens, K. & Van de Woestyne, I. Convex and nonconvex nonparametric frontier-based classification methods for anomaly detection. OR Spectrum (2024). https://doi.org/10.1007/s00291-024-00751-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00291-024-00751-5

Keywords

Navigation