Quadcriteria Optimization of Binary Classifiers: Error Rates, Coverage, and Complexity
This paper presents a 4-objective evolutionary multiobjective optimization study for optimizing the error rates (false positives, false negatives), reliability, and complexity of binary classifiers. The example taken is the email anti-spam filtering problem.
The two major goals of the optimization is to minimize the error rates that is the false negative rate and the false positive rate. Our approach discusses three-way classification, that is the binary classifier can also not classify an instance in cases where there is not enough evidence to assign the instance to one of the two classes. In this case the instance is marked as suspicious but still presented to the user. The number of unclassified (suspicious) instances should be minimized, as long as this does not lead to errors. This will be termed the coverage objective. The set (ensemble) of rules needed for the anti-spam filter to operate in optimal conditions is addressed as a fourth objective. All objectives stated above are in general conflicting with each other and that is why we address the problem as a 4-objective (quadcriteria) optimization problem. We assess the performance of a set of state-of-the-art evolutionary multiobjective optimization algorithms. These are NSGA-II, SPEA2, and the hypervolume indicator-based SMS-EMOA. Focusing on the anti-spam filter optimization, statistical comparisons on algorithm performance are provided on several benchmarks and a range of performance indicators. Moreover, the resulting 4-D Pareto hyper-surface is discussed in the context of binary classifier optimization.
KeywordsBinary classification Three-way classification Parsimony Evolutionary multi-objective optimization Parallel coordinates
This work was partially funded by the [14VI05] Contract-Programme from the University of Vigo. Iryna Yevseyeva acknowledges Engineering and Physical Sciences Research Council (EPSRC), UK, and Government Communications Headquarters (GCHQ), UK, for funding Choice Architecture for Information Security (ChAISe) project EP/K006568/1 as a part of Cyber Research Institute.
- 4.Yevseyeva, I., Basto-Fernandes, V., Méndez, J.R.: Survey on anti-spam single and multi-objective optimization. In: Cruz-Cunha, M.M., Varajo, J., Powell, P., Martinho, R. (eds.), ENTERprise Information Systems. Communications in Computer and Information Science, vol. 220, pp. 120–129. Springer, Heidelberg (2011)Google Scholar
- 8.Zhao, J., Basto-Fernandes, V., Jiao, L., Yevseyeva, L., Maulana, A., Li, R., Bäck, T., Emmerich, M.T.M.: Multiobjective optimization of classifiers by means of 3-d convex hull based evolutionary algorithm, ARXIV Computer Science abs/1412.5710 (2014). http://arxiv.org/abs/1412.5710
- 9.The Apache SpamAssassin Project - SpamAssassin public corpus (2005). http://spamassassin.apache.org/publiccorpus
- 10.SpamAssassin Team: The apache spamassassin project (2011). http://spamassassin.apache.org/
- 16.Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: improving the strength Pareto evolutionary algorithm. In: Proceedings of EUROGEN 2001, Athens Greece. CIMNE, Barcelona (2001)Google Scholar
- 19.Emmerich, M.T.M., Fonseca, C.M.: Computing hypervolume contributions in low dimensions: asymptotically optimal algorithm and complexity results. In: Evolutionary Multi-Criterion Optimization. Springer, Heidelberg (2011)Google Scholar
- 20.Guerreiro, A.P., Fonseca, C.M., Emmerich, M.T.: A fast dimension-sweep algorithm for the hypervolume indicator in four dimensions. In: CCCG, pp. 77–82 (2012)Google Scholar
- 21.Tušar, T., Filipič, B.: Visualizing 4D approximation sets of multiobjective optimizers with prosections. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, pp. 737–744. ACM (2011)Google Scholar