Co-clustering for Fair Recommendation

Frisch, Gabriel; Leger, Jean-Benoist; Grandvalet, Yves

doi:10.1007/978-3-030-93736-2_44

Gabriel Frisch⁶⁴,
Jean-Benoist Leger⁶⁴ &
Yves Grandvalet⁶⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1524))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2272 Accesses
3 Citations

Abstract

Collaborative filtering relies on a sparse rating matrix, where each user rates a few products, to propose recommendations. The approach consists of approximating the sparse rating matrix with a simple model whose regularities allow to fill in the missing entries. The latent block model is a generative co-clustering model that can provide such an approximation. In this paper, we show that exogenous sensitive attributes can be incorporated in this model to make fair recommendations. Since users are only characterized by their ratings and their sensitive attribute, fairness is measured here by a parity criterion. We propose a definition of fairness specific to recommender systems, requiring item rankings to be independent of the users’ sensitive attribute. We show that our model ensures approximately fair recommendations provided that the classification of users approximately respects statistical parity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
$\gamma = (\boldsymbol{\tau }^{\left( U\right) }, \boldsymbol{\tau }^{\left( V\right) },\boldsymbol{\nu ^{\left( A\right) }},\boldsymbol{\rho ^{\left( A\right) }}, \boldsymbol{\nu ^{\left( B\right) }}, \boldsymbol{\rho ^{\left( B\right) }}, \boldsymbol{\nu ^{\left( C\right) }}, \boldsymbol{\rho ^{\left( C\right) }})$.
2.
$\gamma = (\boldsymbol{\tau }^{\left( U\right) }, \boldsymbol{\tau }^{\left( V\right) },\boldsymbol{\nu ^{\left( A\right) }},\boldsymbol{\rho ^{\left( A\right) }}, \boldsymbol{\nu ^{\left( B\right) }}, \boldsymbol{\rho ^{\left( B\right) }}, \boldsymbol{\nu ^{\left( C\right) }}, \boldsymbol{\rho ^{\left( C\right) }})$.

References

Abbasi, M., Bhaskara, A., Venkatasubramanian, S.: Fair clustering via equitable group representations. In: Elish, M.C., Isaac, W., Zemel, R.S. (eds.) ACM Conference on Fairness, Accountability, and Transparency (FAccT), pp. 504–514 (2021). https://doi.org/10.1145/3442188.3445913
Baudry, J.P., Celeux, G.: EM for mixtures. Stat. Comput. 25(4), 713–726 (2015). https://doi.org/10.1007/s11222-015-9561-x
Article MathSciNet MATH Google Scholar
Bellogin, A., Castells, P., Cantador, I.: Precision-oriented evaluation of recommender systems: an algorithmic comparison. In: Proceedings of the Fifth ACM Conference on Recommender Systems, pp. 333–336. Association for Computing Machinery, New York, NY, USA (2011). https://doi.org/10.1145/2043932.2043996
Bera, S.K., Chakrabarty, D., Flores, N.J., Negahbani, M.: Fair algorithms for clustering (2019)
Google Scholar
Beutel, A., et al.: Fairness in Recommendation Ranking through Pairwise Comparisons, pp. 2212–2220 (2019). https://doi.org/10.1145/3292500.3330745
Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal. 41, 561–575 (2003). https://doi.org/10.1016/S0167-9473(02)00163-9
Article MathSciNet MATH Google Scholar
Binns, R.: Fairness in machine learning: lessons from political philosophy. In: Friedler, S.A., Wilson, C. (eds.) Proceedings of the 1st Conference on Fairness, Accountability and Transparency. Proceedings of Machine Learning Research, vol. 81, pp. 149–159. PMLR, 23–24 Feb 2018, New York, NY, USA. http://proceedings.mlr.press/v81/binns18a.html
Brault, V., Mariadassou, M.: Co-clustering through latent bloc model: a review. J. de la Société Française de Statistique 156(3), 120–139 (2015). http://journal-sfds.fr/article/view/474/448
Burke, R., Sonboli, N., Ordonez-Gauger, A.: Balanced neighborhoods for multi-sided fairness in recommendation. In: 1st Conference on Fairness, Accountability and Transparency. PMLR, vol. 81, pp. 202–214 (2018). http://proceedings.mlr.press/v81/burke18a.html
Bürkner, P.C., Vuorre, M.: Ordinal regression models in psychology: a tutorial. Adv. Meth. Pract. Psychol. Sci. 2(1), 77–101 (2019)
Article Google Scholar
Daykin, A.R., Moffatt, P.G.: Analyzing ordered responses: a review of the ordered Probit model. Understand. Stat. 1(3), 157–166 (2002). https://doi.org/10.1207/S15328031US0103_02
Article Google Scholar
Gajane, P.: On formalizing fairness in prediction with machine learning. CoRR abs/1710.03184 (2017). http://arxiv.org/abs/1710.03184
George, T., Merugu, S.: A scalable collaborative filtering framework based on co-clustering. In: Fifth IEEE International Conference on Data Mining (ICDM) (2005)
Google Scholar
Ghadiri, M., Samadi, S., Vempala, S.: Socially fair k-means clustering. arXiv preprint arXiv:2006.10085 (2020)
Govaert, G., Nadif, M.: Block clustering with Bernoulli mixture models: comparison of different approaches. Comput. Stat. Data Anal. 52(6), 3233–3245 (2008)
Article MathSciNet Google Scholar
Govaert, G., Nadif, M.: Latent block model for contingency table. Commun. Stat. Theory Meth. 39(3), 416–425 (2010). https://doi.org/10.1080/03610920903140197
Article MathSciNet MATH Google Scholar
Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Advances in Neural Information Processing Systems 29, pp. 3315–3323 (2016). https://proceedings.neurips.cc/paper/2016/hash/9d2682367c3935defcb1f9e247a97c0d-Abstract.html
Hug, N.: Surprise: a python library for recommender systems. J. Open Source Softw. 5(52), 2174 (2020). https://doi.org/10.21105/joss.02174
Jaakkola, T.S.: Tutorial on variational approximation methods. In: Advanced Mean Field Methods: Theory and Practice, pp. 129–159. MIT Press, Cambridge (2000)
Google Scholar
Jang, E., Gu, S., Poole, B.: Categorical reparameterization with Gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016)
Järvelin, K., Kekäläinen, J.: IR evaluation methods for retrieving highly relevant documents. In: ACM SIGIR Forum, vol. 51, pp. 243–250 (2017)
Google Scholar
Kamishima, T., Akaho, S., Asoh, H., Sakuma, J.: Recommendation independence. In: Conference on Fairness, Accountability and Transparency, pp. 187–201 (2018)
Google Scholar
Keribin, C., Brault, V., Celeux, G., Govaert, G.: Estimation and selection for the latent block model on categorical data. Stat. Comput. 25(6), 1201–1216 (2015). https://hal.inria.fr/hal-01095957
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)
Article Google Scholar
Lomet, A., Govaert, G., Grandvalet, Y.: Model selection for Gaussian latent block clustering with the integrated classification likelihood. Adv. Data Anal. Classif. 12(3), 489–508 (2018). https://hal.archives-ouvertes.fr/hal-00913680
Marlin, B.M., Zemel, R.S., Roweis, S.T., Slaney, M.: Collaborative filtering and the missing at random assumption. In: Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI), pp. 267–275 (2007)
Google Scholar
Marlin, B.M., Zemel, R.S., Roweis, S.T., Slaney, M.: Collaborative filtering and the missing at random assumption. CoRR abs/1206.5267 (2012). http://arxiv.org/abs/1206.5267
Movielens 1M datasets. https://grouplens.org/datasets/movielens/
Ranganath, R., Gerrish, S., Blei, D.: Black box variational inference. In: Kaski, S., Corander, J. (eds.) Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 33, pp. 814–822. PMLR, 22–25 April 2014, Reykjavik, Iceland. http://proceedings.mlr.press/v33/ranganath14.html
Räz, T.: Group fairness: independence revisited. arXiv preprint arXiv:2101.02968 (2021)
Rendle, S., Zhang, L., Koren, Y.: On the difficulty of evaluating baselines: a study on recommender systems. arXiv preprint arXiv:1905.01395 (2019)
Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976). http://www.jstor.org/stable/2335739
Yao, S., Huang, B.: Beyond parity: fairness objectives for collaborative filtering. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/e6384711491713d29bc63fc5eeb5ba4f-Paper.pdf
Zhu, Z., Hu, X., Caverlee, J.: Fairness-aware tensor-based recommendation. In: 27th ACM International Conference on Information and Knowledge Management, pp. 1153–1162 (2018). https://doi.org/10.1145/3269206.3271795

Download references

Author information

Authors and Affiliations

Université de Technologie de Compiègne, CNRS, Heudiasyc UMR 7253, Compiègne, France
Gabriel Frisch, Jean-Benoist Leger & Yves Grandvalet

Authors

Gabriel Frisch
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Benoist Leger
View author publications
You can also search for this author in PubMed Google Scholar
Yves Grandvalet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriel Frisch .

Editor information

Editors and Affiliations

IKIM, Ruhr-University Bochum, Bochum, Germany
Michael Kamp
University of Sydney, Sydney, NSW, Australia
Irena Koprinska
University of Namur, Namur, Belgium
Adrien Bibal
University of Rennes 1, Rennes, France
Tassadit Bouadi
University of Namur, Namur, Belgium
Benoît Frénay
Inria, Rennes, France
Luis Galárraga
University of Antwerp, Antwerp, Belgium
José Oramas
Ruhr University Bochum, Bochum, Germany
Linara Adilova
Royal Holloway University of London, Egham, UK
Yamuna Krishnamurthy
Ghent University, Ghent, Belgium
Bo Kang
Université Jean Monnet, Saint-Etienne cedex 2, France
Christine Largeron
Ghent University, Gent, Belgium
Jefrey Lijffijt
Telecom Paris, Paris, France
Tiphaine Viard
University of Bonn, Bonn, Germany
Pascal Welke
Norwegian Univesity of Science and Technology, Trondheim, Norway
Massimiliano Ruocco
BI Norwegian Business School, Oslo, Norway
Erlend Aune
University of Pisa, Pisa, Italy
Claudio Gallicchio
University of Duisburg-Essen, Essen, Germany
Gregor Schiele
Graz University of Technology, Graz, Austria
Franz Pernkopf
Xilinx Research, Dublin, Ireland
Michaela Blott
Heidelberg University, Heidelberg, Germany
Holger Fröning
Heidelberg University, Heidelberg, Germany
Günther Schindler
University of Pisa, Pisa, Italy
Riccardo Guidotti
University of Pisa, Pisa, Italy
Anna Monreale
ISTI-CNR, Pisa, Italy
Salvatore Rinzivillo
Warsaw University of Technology, Warsaw, Poland
Przemyslaw Biecek
Freie Universität Berlin, Berlin, Germany
Eirini Ntoutsi
Eindhoven University of Technology, Eindhoven, The Netherlands
Mykola Pechenizkiy
Leibniz University Hannover, Hannover, Germany
Bodo Rosenhahn
University of Sussex, Brighton, UK
Christopher Buckley
University of Chieti-Pescara, Chieti, Italy
Daniela Cialfi
Radboud University Nijmegen, Nijmegen, The Netherlands
Pablo Lanillos
McGill University, Montreal, Canada
Maxwell Ramstead
Ghent University, Ghent, Belgium
Tim Verbelen
University of Lisbon, Lisboa, Portugal
Pedro M. Ferreira
University of Bari Aldo Moro, Bari, Italy
Giuseppina Andresini
Universita di Bari Aldo Moro, Bari, Italy
Donato Malerba
University of Lisbon, Lisbon, Portugal
Ibéria Medeiros
Shenzhen University, Shenzhen, China
Philippe Fournier-Viger
Harbin Institute of Technology, Harbin, China
M. Saqib Nawaz
University of Córdoba, Córdoba, Spain
Sebastian Ventura
Peking University, Beijing, China
Meng Sun
Noah's Ark Lab, Huawei, Beijing, China
Min Zhou
UniCredit, Milan, Italy
Valerio Bitetta
UniCredit, Rome, Italy
Ilaria Bordino
UniCredit, Milan, Italy
Andrea Ferretti
Unicredit, Rome, Italy
Francesco Gullo
ENEA Headquarters, Portici, Italy
Giovanni Ponti
Unicredit, Rome, Italy
Lorenzo Severini
University of Porto, Porto, Portugal
Rita Ribeiro
University of Porto, Porto, Portugal
João Gama
UPC BarcelonaTech, Barcelona, Spain
Ricard Gavaldà
Northwestern University, Chicago, IL, USA
Lee Cooper
PD Personalised Healthcare, Basel, Switzerland
Naghmeh Ghazaleh
University of Lausanne, Lausanne, Switzerland
Jonas Richiardi
ETH Zurich, Basel, Switzerland
Damian Roqueiro
F. Hoffmann–La Roche Ltd, Basel, Switzerland
Diego Saldana Miranda
Novartis Pharma AG, Basel, Switzerland
Konstantinos Sechidis
University of Lisbon, Lisbon, Portugal
Guilherme Graça

Appendices

Co-clustering for Fair Recommendation. Supplementary Material

A Computation of the Variational Log-Likelihood Criterion

The criterion we want to optimize is:

$$\begin{aligned} \mathcal {J}{\left( q_{\gamma }, \theta \right) } = \mathcal {H}(q_{\gamma }) + \mathbb {E}_{q_{\gamma }}\left[ \mathcal {L}{\left( \boldsymbol{R}, \boldsymbol{U}, \boldsymbol{V}, \boldsymbol{A}, \boldsymbol{B}, \boldsymbol{C}; \theta \right) }\right] . \end{aligned}$$

(S1)

We chose to restrict the space of the variational distribution $q_{\gamma }$ in order to get a fully factorized form:

$$\begin{aligned} q_{\gamma }&=\textstyle \prod _{i=1}^{{n_1}}{\mathcal {M}{\left( 1;\tau ^{\left( U\right) }_i\right) }}\;\times \;\; \prod _{j=1}^{{n_2}}{\mathcal {M}{\left( 1;\tau ^{\left( V\right) }_j\right) }} \\&\textstyle \quad \times \prod _{i=1}^{{n_1}}{\mathcal {N}{\left( \nu ^{\left( A\right) }_i,\rho ^{\left( A\right) }_i\right) }}\times \prod _{j=1}^{{n_2}}{\mathcal {N}{\left( \nu ^{\left( B\right) }_j,\rho ^{\left( B\right) }_j\right) }} \nonumber \\ {}&\quad \times \textstyle \prod _{j=1}^{{n_2}}{\mathcal {N}{\left( \nu ^{\left( C\right) }_j,\rho ^{\left( C\right) }_j\right) }}\nonumber \end{aligned}$$

(S2)

where $\gamma $ denotes the parameters concatenation of the variational distribution^{Footnote 2} $q_{\gamma }$. The entropy is additive across independant variables so we get:

$$\begin{aligned} \mathcal {H}{\left( q_{\gamma }\right) }= \mathcal {H}{\left( q_{\gamma }{\left( \boldsymbol{U}\right) }\right) } + \mathcal {H}{\left( q_{\gamma }{\left( \boldsymbol{V}\right) }\right) } + \mathcal {H}{\left( q_{\gamma }{\left( \boldsymbol{A}\right) }\right) } + \mathcal {H}{\left( q_{\gamma }{\left( \boldsymbol{B}\right) }\right) } + \mathcal {H}{\left( q_{\gamma }{\left( \boldsymbol{C}\right) }\right) } , \end{aligned}$$

with the following terms:

$$\begin{aligned} \mathcal {H}{\left( q_{\gamma }{\left( \boldsymbol{U}\right) }\right) }&= - \sum _{iq}{ \tau ^{\left( U\right) }_{iq} \log \tau ^{\left( U\right) }_{iq}} \\ \mathcal {H}{\left( q_{\gamma }{\left( \boldsymbol{V}\right) }\right) }&= - \sum _{jl}{ \tau ^{\left( U\right) }_{jl} \log \tau ^{\left( V\right) }_{jl}} \\ \mathcal {H}{\left( q_{\gamma }{\left( \boldsymbol{A}\right) }\right) }&= \frac{1}{2} \sum _{i}\log \rho ^{\left( A\right) }_i+ \frac{{n_1}}{2}{\left( \log 2\pi +1\right) } \\ \mathcal {H}{\left( q_{\gamma }{\left( \boldsymbol{B}\right) }\right) }&= \frac{1}{2} \sum _{j}\log \rho ^{\left( B\right) }_j+ \frac{{n_2}}{2}{\left( \log 2\pi +1\right) } \\ \mathcal {H}{\left( q_{\gamma }{\left( \boldsymbol{C}\right) }\right) }&= \frac{1}{2} \sum _{j}\log \rho ^{\left( C\right) }_j+ \frac{{n_2}}{2}{\left( \log 2\pi +1\right) } \\ \end{aligned}$$

The independence of the latent variables allows to rewrite the expectation of the complete log-likelihood as:

$$\begin{aligned} \mathbb {E}_{q_{\gamma }}{\left[ \mathcal {L}{\left( \boldsymbol{R}, \boldsymbol{U}, \boldsymbol{V}, \boldsymbol{A}, \boldsymbol{B}, \boldsymbol{C}\right) }\right] } =\;&\mathbb {E}_{q_{\gamma }}{\left[ \mathcal {L}{\left( \boldsymbol{U}\right) }\right] } + \mathbb {E}_{q_{\gamma }}{\left[ \mathcal {L}{\left( \boldsymbol{V}\right) }\right] }\\&+ \mathbb {E}_{q_{\gamma }}{\left[ \mathcal {L}{\left( \boldsymbol{A}\right) }\right] } + \mathbb {E}_{q_{\gamma }}{\left[ \mathcal {L}{\left( \boldsymbol{B}\right) }\right] } + \mathbb {E}_{q_{\gamma }}{\left[ \mathcal {L}{\left( \boldsymbol{C}\right) }\right] }\\&+ \mathbb {E}_{q_{\gamma }}{\left[ \mathcal {L}{\left( \left. \boldsymbol{R}\right| \boldsymbol{U}, \boldsymbol{V}, \boldsymbol{A}, \boldsymbol{B}, \boldsymbol{C}\right) }\right] } , \end{aligned}$$

with the following terms:

$$\begin{aligned} \mathbb {E}_{q_{\gamma }} \mathcal {L}{\left( \boldsymbol{U}\right) }&= \mathbb {E}_{q_{\gamma }} {\left[ \sum _{iq}{U_{iq} \log \alpha _q}\right] } = \sum _{iq} { \tau ^{\left( U\right) }_{iq} \log \alpha _q} \\ \mathbb {E}_{q_{\gamma }} \mathcal {L}{\left( \boldsymbol{V}\right) }&= \mathbb {E}_{q_{\gamma }} {\left[ \sum _{jl}{V_{jl} \log \beta _l}\right] } = \sum _{jl} { \tau ^{\left( V\right) }_{jl} \log \beta _l}\\ \end{aligned}$$

$$\begin{aligned} \mathbb {E}_{q_{\gamma }} \mathcal {L}{\left( \boldsymbol{A}\right) }&= - \frac{{n_1}}{2} \log 2\pi - \frac{{n_1}}{2} \log \sigma ^2_{A}- \frac{1}{2\sigma ^2_{A}} \sum _{i}{\mathbb {E}_{q_{\gamma }}A_i^2} \\&= - \frac{{n_1}}{2} \log 2\pi - \frac{{n_1}}{2} \log \sigma ^2_{A}- \frac{1}{2\sigma ^2_{A}} \sum _{i}{\left( {\left( \nu ^{\left( A\right) }_i\right) }^2 + \rho ^{\left( A\right) }_i\right) } \\ \mathbb {E}_{q_{\gamma }} \mathcal {L}{\left( \boldsymbol{B}\right) }&= - \frac{{n_2}}{2} \log 2\pi - \frac{{n_2}}{2} \log \sigma ^2_{B}- \frac{1}{2\sigma ^2_{B}} \sum _{i}{\left( {\left( \nu ^{\left( B\right) }_i\right) }^2 + \rho ^{\left( B\right) }_i\right) } \\ \mathbb {E}_{q_{\gamma }} \mathcal {L}{\left( \boldsymbol{C}\right) }&= - \frac{{n_2}}{2} \log 2\pi - \frac{{n_2}}{2} \log \sigma ^2_{C}- \frac{1}{2\sigma ^2_{C}} \sum _{j}{\left( {\left( \nu ^{\left( C\right) }_j\right) }^2 + \rho ^{\left( C\right) }_j\right) }\\ \end{aligned}$$

and as the entries of the data matrix $\boldsymbol{R}$ are independent and identically distributed:

$$\begin{aligned}&\mathbb {E}_{q_{\gamma }} \mathcal {L}{\left( \left. \boldsymbol{R}\right| \boldsymbol{A}, \boldsymbol{B}, \boldsymbol{C}, \boldsymbol{U}, \boldsymbol{V}\right) } = \mathbb {E}_{q_{\gamma }} \mathcal {L}{\left( \left. \boldsymbol{R}^{(\text {o})}\right| \boldsymbol{A}, \boldsymbol{B}, \boldsymbol{C}, \boldsymbol{U}, \boldsymbol{V}\right) } + \mathcal {L}{\left( \boldsymbol{R}^{{\left( \lnot o\right) }}\right) } \end{aligned}$$

(S3)

where $\boldsymbol{R}^{(\text {o})}$ denotes the set of observed ratings and $\boldsymbol{R}^{{\left( \lnot o\right) }}$, the set of non-observed ratings, where $R_{ij}=\text {NA}$. From Eq. S3, it becomes clear that maximizing $\mathbb {E}_{q_{\gamma }} \mathcal {L}(\boldsymbol{R}^{(\lnot o)})$ is not necessary to infer the model parameters used for prediction and therefore ignoring the non-observed data is correct. The expectation of the conditional log-likelihood (first term of right side of Eq. S3) is numerically estimated by sampling from $q_{\gamma }$.

Stochastic Gradient Optimization. To optimize the criterion with stochastic gradient descent, we express the variational log-likelihood criterion on a single rating:

$$\begin{aligned} \mathcal {J}{\left( R_{ij};q_{\gamma }, \theta \right) }&= \mathbb {E}_{q_{\gamma }}{\left[ \mathcal {L}{\left( \left. R^{(\text {o})}_{ij}\right| \boldsymbol{U}_i, \boldsymbol{V}_j, A_i, B_j, C_j\right) }\right] } \\&\quad +\frac{1}{{n_2}}{\left( \mathcal {H}{\left( q_{\gamma }{\left( \boldsymbol{U}_i\right) }\right) } +\mathcal {H}{\left( q_{\gamma }{\left( A_i\right) }\right) } +\mathbb {E}_{q_{\gamma }}{\left[ \mathcal {L}{\left( \boldsymbol{U}_i\right) }\right] } + \mathbb {E}_{q_{\gamma }}{\left[ \mathcal {L}{\left( A_i\right) }\right] } \right) } \\&\quad + \frac{1}{{n_2}} {\left( \mathcal {H}{\left( q_{\gamma }{\left( \boldsymbol{V}_j\right) }\right) } +\mathcal {H}{\left( q_{\gamma }{\left( B_j\right) }\right) } + \mathbb {E}_{q_{\gamma }}{\left[ \mathcal {L}{\left( \boldsymbol{V}_j\right) }\right] } +\mathbb {E}_{q_{\gamma }}{\left[ \mathcal {L}{\left( B_j\right) }\right] } \right) } \\&\quad + \frac{1}{{n_2}} {\left( \mathcal {H}{\left( q_{\gamma }{\left( C_j\right) }\right) } + \mathbb {E}_{q_{\gamma }}{\left[ \mathcal {L}{\left( C_j\right) }\right] } \right) } \end{aligned}$$

A batch of data, $\boldsymbol{R}_{(i:i+n),(j:j+n)}$, consists of a $(n\times n)$ sub-matrix randomly sampled from the original matrix $\boldsymbol{R}$.

B Clustering $\varepsilon $-parity and $\varepsilon $-fair Recommendation for Arbitrary Discrete Sensitive Attribute

Definition S1

(Clustering $\varepsilon $-parity, arbitrary discrete sensitive attribute). The clustering of users is said to respect $\varepsilon $-parity with respect to the discrete attribute $s\in {\mathcal S}$ iff:

$$\begin{aligned} \forall (t,t') \in {\mathcal S}^2,\ \forall q,\left|\frac{\#\left\{ i|s_i= t\wedge u_{iq} = 1\right\} }{\#\left\{ i|s_i= t\right\} } - \frac{\#\left\{ i|s_i= t'\wedge u_{iq} = 1\right\} }{\#\left\{ i|s_i= t'\right\} } \right|\le \varepsilon , \end{aligned}$$

(S4)

where $\varepsilon \in \mathbb {R}_+$ measures the gap to exact parity, $u_{iq}$ is the (hard) membership of user $i$ to cluster $q$,and $\#\left\{ i|\varOmega \right\} $ is the number of users defined by the cardinality of the set $\varOmega $.

Definition S2

($\varepsilon $-fair recommendation, arbitrary discrete sensitive attribute). A recommender system is said to be $\varepsilon $-fair with respect to the dicrete attribute $s\in {\mathcal S}$ if for any two items $j$ and $j'$:

$$\begin{aligned} \forall (t,t') \in {\mathcal S}^2,\left|\frac{\#\left\{ i|s_i= t\wedge (\hat{R}_{ij}>\hat{R}_{ij'})\right\} }{\#\left\{ i|s_i= t\right\} } - \frac{\#\left\{ i|s_i= t'\wedge (\hat{R}_{ij}>\hat{R}_{ij'})\right\} }{\#\left\{ i|s_i= t'\right\} } \right|\le \varepsilon , \end{aligned}$$

(S5)

where $\varepsilon \in \mathbb {R}_+$ measures the gap to exact fairness

C Proof of Theorem 1

Theorem 1

(Fair recommendation from clustering parity). If the clustering of users in ${k_1}$ groups respects $\varepsilon $-parity (Definition 3 or Definition S1) then the recommender system relying on the relevance score defined in Eq. (7) is $({k_1}\varepsilon )$-fair (Definition 1 or Definition S2).

Proof

Suppose that $\boldsymbol{\tau }^{\left( U\right) }$, the maximum a posteriori of $\boldsymbol{U}$, is a binary matrix; $\boldsymbol{\tau }^{\left( U\right) }$ is thus a ${n_1}\times {k_1}$ indicator matrix of row classes membership. Then, given user $i$, item $j$ is said to be preferred to item $j'$ if $\hat{R}_{ij} > \hat{R}_{ij'}$, that is:

$$\begin{aligned} \hat{R}_{ij}> \hat{R}_{ij'}&\iff \boldsymbol{\tau }^{\left( U\right) }_{i}\hat{\boldsymbol{\mu }}{\boldsymbol{\tau }^{\left( V\right) }_{{j}}}^T + \nu ^{\left( A\right) }_i+ \nu ^{\left( B\right) }_{j}> \boldsymbol{\tau }^{\left( U\right) }_{i}\hat{\boldsymbol{\mu }}{\boldsymbol{\tau }^{\left( V\right) }_{{j'}}}^T + \nu ^{\left( A\right) }_i+ \nu ^{\left( B\right) }_{j'} \nonumber \\&\iff \boldsymbol{\tau }^{\left( U\right) }_i\hat{\boldsymbol{\mu }} {\left( \boldsymbol{\tau }^{\left( V\right) }_{j} - \boldsymbol{\tau }^{\left( V\right) }_{j'}\right) }^T> \nu ^{\left( B\right) }_{j'} - \nu ^{\left( B\right) }_{j} \nonumber \\&\iff \boldsymbol{\tau }^{\left( U\right) }_i\boldsymbol{a}> b\nonumber \\&\iff \boldsymbol{a}_{d_{i}} > b , \end{aligned}$$

(S6)

with $\boldsymbol{a} \in \mathbb {R}^{{k_1}}$ defined by $\boldsymbol{a}=\hat{\boldsymbol{\mu }} {\left( \boldsymbol{\tau }^{\left( V\right) }_{j} - \boldsymbol{\tau }^{\left( V\right) }_{j'}\right) }^T$, $b \in \mathbb {R}$ defined by $b = \nu ^{\left( B\right) }_{j'} - \nu ^{\left( B\right) }_{j}$ and $d_{i}\in \{1,\cdots ,{k_1}\}$ being the group indicator of user $i$: $\tau ^{\left( U\right) }_{i, d_{i}} = 1$.

Suppose $\varepsilon $-parity, from Definition S1 (Definition 3 is a particular case of Definition S1), we have

$$\begin{aligned}&\forall (t,t'),\qquad \forall q,\quad \left|\frac{\#\left\{ i|s_i= t\wedge d_{i}= q\right\} }{\#\left\{ i|s_i= t\right\} } - \frac{\#\left\{ i|s_i= t'\wedge d_{i}= q\right\} }{\#\left\{ i|s_i= t'\right\} } \right|\le \varepsilon \end{aligned}$$

therefore,

$$\begin{aligned}&\forall (t,t'),\;\forall q,\;\left|\mathbbm {1}_{\boldsymbol{a}_{d_{i}}> b} \frac{\#\left\{ i|s_i= t\wedge d_{i}= q\right\} }{\#\left\{ i|s_i= t\right\} } - \mathbbm {1}_{\boldsymbol{a}_{d_{i}}> b}\frac{\#\left\{ i|s_i= t'\wedge d_{i}= q\right\} }{\#\left\{ i|s_i= t'\right\} } \right|\le \varepsilon \mathbbm {1}_{\boldsymbol{a}_{d_{i}} > b} \end{aligned}$$

By summing over all groups, we get:

$$\begin{aligned} \forall (t,t'),\;\sum _q\left|\frac{\mathbbm {1}_{\boldsymbol{a}_{d_{i}}> b} \#\left\{ i|s_i= t\wedge d_{i}= q\right\} }{\#\left\{ i|s_i= t\right\} } - \frac{\mathbbm {1}_{\boldsymbol{a}_{d_{i}}> b} \#\left\{ i|s_i= t'\wedge d_{i}= q\right\} }{\#\left\{ i|s_i= t'\right\} } \right|\!\le \! \varepsilon \sum _q\mathbbm {1}_{\boldsymbol{a}_{d_{i}} > b} \end{aligned}$$

and from the triangular inequality,

$$\begin{aligned} \forall (t,t'), \left|\frac{\sum _q\mathbbm {1}_{\boldsymbol{a}_{d_{i}}> b} \#\left\{ i|s_i= t\wedge d_{i}= q\right\} }{\#\left\{ i|s_i= t\right\} } - \frac{\sum _q\mathbbm {1}_{\boldsymbol{a}_{d_{i}}> b} \#\left\{ i|s_i= t'\wedge d_{i}= q\right\} }{\#\left\{ i|s_i= t'\right\} } \right|&\le \varepsilon \sum _q\mathbbm {1}_{\boldsymbol{a}_{d_{i}}> b} \\ \forall (t,t'), \qquad \qquad \left|\frac{ \#\left\{ i|s_i= t\wedge \boldsymbol{a}_{d_{i}}> b\right\} }{\#\left\{ i|s_i= t\right\} } - \frac{ \#\left\{ i|s_i= t'\wedge \boldsymbol{a}_{d_{i}} > b\right\} }{\#\left\{ i|s_i= t'\right\} } \right|&\le \varepsilon {k_1}\\ \end{aligned}$$

And, applying (S6), the result is obtained:

$$\begin{aligned} \forall (t,t'),\quad \left|\frac{\#\left\{ i|s_i= t\wedge (\hat{R}_{ij}>\hat{R}_{ij'})\right\} }{\#\left\{ i|s_i= t\right\} } - \frac{\#\left\{ i|s_i= t'\wedge (\hat{R}_{ij}>\hat{R}_{ij'})\right\} }{\#\left\{ i|s_i= t'\right\} } \right|&\le \varepsilon {k_1}\end{aligned}$$

$\square $

D Supplemental Results for MovieLens 1M

1.1 D.1 Gender as Sensitive Attribute

Supplemental Analysis of the Model. We list in Tables 2 and 3 the most extreme movies according to the inferred value of their latent variable $C_j$. Variable $C_j$ encodes the difference in opinion between the sensitive groups, not the overall opinion. For example, a movie may well be liked by most people but liked even more by males. Table 2 lists movies for which females have a better opinion than males and Table 3 lists movies for which males have a better opinion than females.

Table 2. List of movies with the largest gap in opinion between females and males for which females have a better opinion than males

Full size table

Higher Number of Groups. We did not optimize the hyper-parameters of the compared models. We present here additional experiments to illustrate that the conclusions of Sect. 4 apply to different hyper-parameter settings. Using a substantially larger number of groups (${k_1}=50$ user groups and ${k_2}=50$ item groups) or a larger dimension of latent factors for SVD (also 50), the statistical gender parity measures given in Table 4 and the recommendation performance given in Fig. 7 are qualitatively similar to the ones given in Table 1 and Fig. 5.

Table 3. List of movies with the largest gap in opinion between females and males for which males have a better opinion than females

Full size table

Table 4. Measures of gender statistical parity. The number of user groups is ${k_1}=50$. The $\chi ^2$ statistic (with 49 degrees of freedom) is averaged over the five replicates of the experiment. A high value of the $\chi ^2$ statistic (or a low p-value) leads to the rejection of the statistical parity hypothesis.

Full size table

1.2 D.2 Age as Sensitive Attribute

The age range of the users is indicated within the following intervals: ‘Under 18’,‘18–24’, ‘25–34’, ‘35–44’, ‘45–49’, ‘50–55’ and ‘56+’. The counts of users in each age category is displayed in Fig. 8.

User age is treated as sensitive: we introduce seven binary sensitive attributes $s_i$ encoding for the seven categories of user age. We use a one-hot encoding of the seven categories of user age and introduce for the purpose seven binary sensitive attributes $s^{1}_i, \cdots , s^{7}_i$ and their item associated latent variables $C^{1}_j, \cdots , C^{7}_j$. We use the protocol described in Sect. 4 with the exception that our Parity-LBM is initialized from estimates obtained with the Standard-LBM. Table 5 presents results of the $\chi ^2$ statistics constructed from the contingency table of user age counts in each group. The methods that do not consider the sensitive variable in the modelling create groups that are dependent on the age and assuming the statistical parity with our Parity-LBM model is reasonable.

Table 5. Measures of statistical parity with respect to age category. The number of group of users is ${k_1}=15$. A high value of the $\chi ^2$ statistic (or a low p-value) leads to the rejection of the statistical parity hypothesis. The $\chi ^2$ statistic is averaged on the five folds of the cross-validation. Degrees of freedom is 14.

Full size table

Finally, we illustrate the interpretability of the estimates of the latent variables $C^{1}_j, \cdots ,C^{7}_j$ related to movies. For each age category k, we select the thirty movies with the largest value of the latent variables $C^{k}_j$. These movies have the largest positive opinion bias for users in the given age category. Figure 9 displays a boxplot of the release years of these films for all user age categories. The greater variability in the distribution for older users means that they have a comparatively higher opinion of older movies than younger users. If user age were the sensitive attribute, the recommendations would not account for these differences.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Frisch, G., Leger, JB., Grandvalet, Y. (2021). Co-clustering for Fair Recommendation. In: Kamp, M., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1524. Springer, Cham. https://doi.org/10.1007/978-3-030-93736-2_44

Download citation

DOI: https://doi.org/10.1007/978-3-030-93736-2_44
Published: 17 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93735-5
Online ISBN: 978-3-030-93736-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Co-clustering for Fair Recommendation

Abstract

Access this chapter

Notes

References