Abstract
In the application of machine learning to reallife decisionmaking systems, e.g., credit scoring and criminal justice, the prediction outcomes might discriminate against people with sensitive attributes, leading to unfairness. The commonly used strategy in fair machine learning is to include fairness as a constraint or a penalization term in the minimization of the prediction loss, which ultimately limits the information given to decisionmakers. In this paper, we introduce a new approach to handle fairness by formulating a stochastic multiobjective optimization problem for which the corresponding Pareto fronts uniquely and comprehensively define the accuracyfairness tradeoffs. We have then applied a stochastic approximationtype method to efficiently obtain wellspread and accurate Pareto fronts, and by doing so we can handle training data arriving in a streaming way.
This is a preview of subscription content, access via your institution.
Notes
Our implementation code is available at https://github.com/sul217/MOO_Fairness.
References
Alexandropoulos SAN, Aridas CK, Kotsiantis SB, Vrahatis MN (2019) Multiobjective evolutionary optimization algorithms for machine learning: a recent survey. In: Approximation and optimization, Springer, pp 35–55
Barocas S, Selbst AD (2016) Big data’s disparate impact. California Law Rev 104:671
Bi J (2003) Multiobjective programming in SVMs. In: Proceedings of the 20th international conference on machine learning, pp 35–42,
Braga AP, Takahashi RH, Costa MA, de Albuquerque Teixeira R (2006) Multiobjective algorithms for neural networks learning. In: Multiobjective machine learning, Springer, pp 151–171
Calders T, Verwer S (2010) Three naive Bayes approaches for discriminationfree classification. Data Min Knowl Discov 21(2):277–292
Calders T, Kamiran F, Pechenizkiy M (2009) Building classifiers with independency constraints. In: 2009 IEEE international conference on data mining workshops, IEEE, pp 13–18
Calmon F, Wei D, Vinzamuri B, Ramamurthy KN, Varshney KR (2017) Optimized preprocessing for discrimination prevention. In: Advances in Neural Information Processing Systems, pp 3992–4001
Custódio ALL, Madeira JA, Vaz AIF, Vicente LN (2011) Direct multisearch for multiobjective optimization. SIAM J Optim 21(3):1109–1140
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGAII. IEEE Trans Evolut Comput 6(2):182–197
Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math program 91(2):201–213
Dua D, Graff C (2017) UCI Machine Learning Repository, 2017. URL http://archive.ics.uci.edu/ml
Fliege J, Svaiter BF (2000) Steepest descent methods for multicriteria optimization. Math Methods Oper Res 51(3):479–494
Fliege J, Vaz AIF, Vicente LN (2019) Complexity of gradient descent for multiobjective optimization. Optim Methods Softw 34(5):949–959
Fonseca CM, Paquete L, LópezIbánez M (2006) An improved dimensionsweep algorithm for the hypervolume indicator. In: 2006 IEEE international conference on evolutionary computation, IEEE, pp 1157–1163
Haimes YV (1971) On a bicriterion formulation of the problems of integrated system identification and system optimization. IEEE Trans Syst Man Cybern 1(3):296–297
Handl J, Knowles J (2004) Evolutionary multiobjective clustering. In: International conference on parallel problem solving from nature, Springer, pp 1081–1091
Hardt M, Price E, Srebro N (2016) Equality of opportunity in supervised learning. In: Advances in neural information processing systems, pp 3315–3323
Igel C (2005) Multiobjective model selection for support vector machines. In: International conference on evolutionary multicriterion optimization, Springer, pp 534–546
Jin Y (2006) Multiobjective machine learning, vol 16. Springer Science & Business Media, Berlin
Jin Y, Sendhoff B (2008) Paretobased multiobjective machine learning: an overview and case studies. IEEE Trans Syst Man Cybern Part C (Appl Rev) 38(3):397–415
Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Fairnessaware classifier with prejudice remover regularizer. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 35–50
Kamishima T, Akaho S, Sakuma J (2011) Fairnessaware learning through regularization approach. In: 2011 IEEE 11th international conference on data mining workshops, IEEE, pp 643–650
Kaoutar S, Mohamed, E (2017) Multicriteria optimization of neural networks using multiobjective genetic algorithm. In: 2017 Intelligent systems and computer vision (ISCV), IEEE, pp 1–4
Kelly J (2020) Women now hold more jobs than men in the U.S. workforce. https://www.forbes.com/sites/jackkelly/2020/01/13/womennowholdmorejobsthanmen
Kim D (2004) Structural risk minimization on decision trees using an evolutionary multiobjective optimization. In: European conference on genetic programming, Springer, pp 338–348
Kohavi R (1996) Scaling up the accuracy of naivebayes classifiers: a decisiontree hybrid. In: Proceedings of the second international conference on knowledge discovery and data mining, KDD’96, AAAI Press, pp 202207
Kokshenev I, Braga A.P (2008) A multiobjective learning algorithm for RBF neural network. In: 2008 10th Brazilian Symposium on Neural Networks, IEEE, pp 9–14
Kraft D (1998) A software package for sequential quadratic programming. Forschungsbericht Deutsche Forschungs und Versuchsanstalt fur Luft und Raumfahrt
Larson J, Mattu S, Kirchner L, Angwin J (2016a) How we analyzed the COMPAS recidivism algorithm. ProPublica
Larson J, Mattu S, Kirchner L, Angwin J (2016b) ProPublica COMPAS dataset. https://github.com/propublica/compasanalysis
Law MH, Topchy AP,Jain AK (2004) Multiobjective data clustering. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004., vol 2, IEEE, pp 424–430
Liu S, Vicente LN (2021) The stochastic multigradient algorithm for multiobjective optimization and its application to supervised machine learning. Ann Oper Res. https://doi.org/10.1007/s1047902104033z
Mercier Q, Poirion F, Désidéri JA (2018) A stochastic multiple gradient descent algorithm. European J. Oper. Res. 271(3):808–817
Munoz C, Smith M, Patil D of the President E.O (2016) Big data: A report on algorithmic systems, opportunity, and civil rights. Executive Office of the President
Navon A, Shamsian A, Chechik G, Fetaya E (2021) Learning the Pareto front with hypernetworks. In: International conference on learning representations
Pedreshi D, Ruggieri S, Turini F (2008) Discriminationaware data mining. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 560–568. ACM
Pleiss G, Raghavan M, Wu F, Kleinberg J, Weinberger KQ (2017) On fairness and calibration. In: Advances in neural information processing systems, pp 5680–5689
Podesta J, Pritzker P, Moniz EJ, Holdren J, Zients J (2014) Big data: seizing opportunities, preserving values. Technical Report, Executive Office of the President
Reiners M, Klamroth K, Stiglmayr M (2020) Efficient and sparse neural networks by pruning weights in a multiobjective learning approach. preprint arXiv:2008.13590
Sener O, Koltun V (2018) Multitask learning as multiobjective optimization. In: Proceedings of the 32nd international conference on neural information processing systems, pp 525–536
Senhaji K, Ramchoun H, Ettaouil M (2020) Training feedforward neural network via multiobjective optimization model using nonsmooth L1/2 regularization. Neurocomputing 410:1–11
Senhaji K, Ramchoun H, Ettaouil M (2017) Multilayer perceptron: NSGA II for a new multiobjective learning method for training and model complexity. In: First international conference on real time intelligent systems, pp 154–167. Springer
Varghese NV, Mahmoud QH (2020) A survey of multitask deep reinforcement learning. Electronics 9:1363
Williamson RC, Menon AK (2019) Fairness risk measures. In: International conference on machine learning, pp 6786–6797
Woodworth B, Gunasekar S, Ohannessian MI, Srebro N (2017) Learning nondiscriminatory predictors. In: Conference on Learning Theory, pp 1920–1953
Yusiong JPT, Naval PC (2006) Training neural networks using multiobjective particle swarm optimization. In: International conference on natural computation, pp 879–888. Springer
Zafar MB, Valera I, Rodriguez MG, Gummadi KP (2017a) Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In: Proceedings of the 26th international conference on world wide web, pp 1171–1180. International World Wide Web Conferences Steering Committee
Zafar MB, Valera I, Rodriguez MG, Gummadi KP (2017b) Fairness constraints: mechanisms for fair classification. In: Artificial intelligence and statistics, pp 962–970
Zemel R, Wu Y, Swersky K, Pitassi T, Dwork, C (2013) Learning fair representations. In: International conference on machine learning, pp 325–333
Zhang Y, Yang, Q (2021) A survey on multitask learning. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2021.3070203
Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans Evolut Comput 3:257–271
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
L. N. Vicente: Support for this author was partially provided by the Centre for Mathematics of the University of Coimbra under grant FCT/MCTES UIDB/MAT/00324/2020.
Appendices
A. The stochastic multigradient (SMG) algorithm
B. Description and illustration of the Paretofront stochastic multigradient algorithm
A formal description of the PFSMG algorithm is given in Algorithm 2.
An illustration is provided in Fig. 6. The blue curve represents the true Pareto front. The PFSMG algorithm first randomly generates a list of starting feasible points (see blue points in Fig. 6a).queryPlease check and confirm the inserted citation of Tables 1, 2, 3 are correct. If not, please suggest an alternative citation. Please note that Tables should be cited in sequential order in the text.For each point in the current list, a certain number of perturbed points (see green circles in Fig. 6a) are added to the list, after which multiple runs of the SMG algorithm are applied to each point in the current list. These newly generated points are marked by red circles in Fig. 6b. At the end of the current iteration, a new list for the next iteration is obtained by removing all the dominated points. As the algorithm proceeds, the front will move towards the true Pareto front.
The complexity rates to determine a point in the Pareto front using stochastic multigradient are reported in Liu & Vicente (2021). However, in multiobjective optimization, as far as we know, there are no convergence or complexity results to determine the whole Pareto front (under reasonable assumptions that do not reduce to evaluating the objective functions in a set that is dense in the decision space).
C. Metrics for Pareto front comparison
Let \({\mathcal {A}}\) denote the set of algorithms/solvers and \({\mathcal {T}}\) denote the set of test problems. The Purity metric measures the accuracy of an approximated Pareto front. Let us denote \(F({\mathcal {P}}_{a, t})\) as an approximated Pareto front of problem t computed by algorithm a. We approximate the “true” Pareto front \(F({\mathcal {P}}_t)\) for problem t by all the nondominated points in \(\cup _{a \in {\mathcal {A}}} F({\mathcal {P}}_{a, t})\). Then, the Purity of a Pareto front computed by algorithm a for problem t is the ratio \(r_{a, t} = F({\mathcal {P}}_{a, t}) \cap F({\mathcal {P}}_t)/F({\mathcal {P}}_{a, t}) \in [0, 1]\), which calculates the percentage of “true” nondominated solutions among all the nondominated points generated by algorithm a. A higher ratio value corresponds to a more accurate Pareto front.
The Spread metric is designed to measure the extent of the point spread in a computed Pareto front, which requires the computation of extreme points in the objective function space \({\mathbb {R}}^m\). Among the m objective functions, we select a pair of nondominated points in \({\mathcal {P}}_t\) with the highest pairwise distance (measured using \(f_i\)) as the pair of extreme points. More specifically, for a particular algorithm a, let \((x_{\min }^i, x_{\max }^i) \in {\mathcal {P}}_{a, t}\) denote the pair of nondominated points where \(x_{\min }^i = {{\,\mathrm{argmin}\,}}_{x \in {\mathcal {P}}_{a, t}} f_i(x)\) and \(x_{\max }^i = {{\,\mathrm{argmax}\,}}_{x \in {\mathcal {P}}_{a, t}} f_i(x)\). Then, the pair of extreme points is \((x_{\min }^k, x_{\max }^k)\) with \(k = {{\,\mathrm{argmax}\,}}_{i = 1, \ldots , m} f_i(x_{\max }^i)  f_i(x_{\min }^i)\).
The first Spread formula calculates the maximum size of the holes for a Pareto front. Assume algorithm a generates an approximated Pareto front with M points, indexed by \(1, \ldots , M\), to which the extreme points \(F(x_{\min }^k)\),\(F(x_{\max }^k)\) indexed by 0 and \(M+1\) are added. Denote the maximum size of the holes by \(\Gamma \). We have
where \(\delta _{i,j} = f_{i,j + 1}  f_{i, j}\), and we assume each of the objective function values \(f_i\) is sorted in an increasing order.
The second formula was proposed by Deb et al. (2002) for the case \(m = 2\) (and further extended to the case \(m \ge 2\) in Custódio et al. (2011)) and indicates how well the points are distributed in a Pareto front. Denote the point spread by \(\Delta \). It is computed by the following formula:
where \({\bar{\delta }}_i, i = 1, \ldots , m\) is the average of \(\delta _{i, j}\) over \(j = 1, \ldots , M 1\). Note that the lower \(\Gamma \) and \(\Delta \) are, the more well distributed the Pareto front is.
Hypervolume (Zitzler & Thiele 1999) is another classical performance indicator taking into account both the quality of the individual Pareto points and also their overall objective space coverage. It essentially calculates the area/volume dominated by the provided set of nondominated solutions with respect to a reference point. Figure 7 demonstrates a biobjective case where the area dominated by a set of points \(\{p^{(1)}, p^{(2)}, p^{(3)}\}\) with respect to the reference point r is shown in grey. In our experiments, we calculate hypervolume using the Pymoo package (see https://pymoo.org/misc/indicators.html).
D. Datasets generation and preprocessing
The synthetic data is formed by 20 sets of 2,000 binary classification data instances randomly generated from the same distributions setting specified in Zafar et al.(2017b, Section 4), specifically using an uniform distribution for generating binary labels Y, two different Gaussian distributions for generating 2dimensional nonsensitive features Z, and a Bernoulli distribution for generating the binary sensitive attribute A.
The data preprocessing details for the Adult Income dataset are given below.

1.
First, we combine all instances in adult.data and adult.test and remove those that values are missing for some attributes.

2.
We consider the list of features: Age, Workclass, Education, Education number, Martial Status, Occupation, Relationship, Race, Sex, Capital gain, Capital loss, Hours per week, and Country. In the same way as the authors Zafar et al. (2017a) did for attribute Country, we reduced its dimension by merging all nonUnitedStated countries into one group (Tables 1, 2, 3). Similarly for feature Education, where “Preschool”, “1st4th”, “5th6th”, and “7th8th” are merged into one group, and “9th”, “10th”, “11th”, and “12th” into another.

3.
Last, we did onehot encoding for all the categorical attributes, and we normalized attributes of continuous value.
In terms of gender, the dataset contains \(67.5\%\) males (\(31.3\%\) high income) and \(32.5\%\) females (\(11.4\%\) high income). Similarly, the demographic compositions in terms of race are \(2.88\%\) Asian (\(28.3\%\)), \(0.96\%\) AmericanIndian (\(12.2\%\)), \(86.03\%\) White (\(26.2\%\)), \(9.35\%\) Black (\(1.2\%\)), and \(0.78\%\) Other (\(12.7\%\)), where the numbers in brackets are the percentages of highincome instances.
E. More numerical results
E.1 Disparate impact w.r.t. binary sensitive attribute
See Fig. 8.
E.2 Disparate impact w.r.t. multivalued sensitive attribute
See Fig. 9.
E.3 Streaming data
See Fig. 10.
Rights and permissions
About this article
Cite this article
Liu, S., Vicente, L.N. Accuracy and fairness tradeoffs in machine learning: a stochastic multiobjective approach. Comput Manag Sci 19, 513–537 (2022). https://doi.org/10.1007/s1028702200425z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1028702200425z