Advertisement

Numerical study of reciprocal recommendation with domain matching

  • Kotaro Sudo
  • Naoya Osugi
  • Takafumi KanamoriEmail author
Original Paper Computational statistics and machine learning

Abstract

Reciprocal recommendation is the task of finding preferable matches among users in two distinct groups. Popular examples of reciprocal recommendation include online job recruiting and online dating services. In this paper, we propose a new method of reciprocal recommendation that uses a graph embedding technique. In particular, we use cross-domain matching correlation analysis (CDMCA) as a graph embedding method. In CDMCA, feature vectors in different domains are mapped into a common representation space, and reciprocal recommendation is conducted in the common mapped space. Numerical experiments show that the CDMCA with a similarity-based weighting scheme provides a high-quality reciprocal recommendation.

Keywords

Reciprocal recommendation Cross-domain matching correlation analysis Jaccard similarity 

1 Introduction

Recommendation systems are currently used in various fields such as e-commerce. In particular, reciprocal recommendation has been lately attracting attention. Suppose that users are divided into two groups. Reciprocal recommendation system finds preferable matches between users in those groups in consideration of both their characteristics and their preferences.

Popular examples of reciprocal recommendation include online job recruiting and online dating service. The two groups involved in job recruiting are the recruiters and the job seekers. Each user has characteristics and preferences for users in the other group. Some recruiters and job seekers might have contacts with each other. A contact might be regarded as an expression of interest (EOI). Based on characteristics of the users provided on an online recruiting site and the history of the EOIs, the reciprocal recommendation system provides favorable matches between recruiters and job seekers. On an online dating site, the two groups consist typically of male and female, and the purpose is to find preferable partners for each. Details of reciprocal recommendation are found in Li and Li (2012), Pizzato et al. (2013), Xia et al. (2015), Yu et al. (2011, 2013), Pizzato et al. (2010), Tu et al. (2014), Hong et al. (2013), Brun et al. (2011), Wang et al. (2010), Hopcroft et al. (2011) and references therein.

In this paper, we propose a reciprocal recommendation algorithm using a graph embedding technique. In particular, we use cross-domain matching correlation analysis (CDMCA) that was proposed by Shimodaira (2015) as the graph embedding method. CDMCA is used to investigate the relationships among observed data vectors in multiple domains. We show that CDMCA can be applied to the detection of preferable matches in reciprocal recommendation problems. In CDMCA, data vectors in distinct domains are mapped into a common representation space, and statistical inference is conducted in the mapped space. The mapped points can be directly compared to each other, even when two groups are originally expressed in terms of different features. Hence, we can find preferable matches between two groups using such convenient properties of CDMCA. In numerical experiments, we investigate whether the CDMCA-based approach attain a higher prediction accuracy in comparison to existing ones. The CDMCA is regarded as an extension of classical data-description methods such as principal component analysis and canonical correlation analysis. Hence, we obtain a visualization of the global structure among all users in the common representation space.

The remainder of the paper is organized as follows. In Sect. 2, we introduce related work on reciprocal recommendation algorithms. The problem setup is explained in Sect. 3. In Sect. 4, we propose a new method for reciprocal recommendation, and consider some variants of the proposed method. Some measures of ranking quality are introduced in Sect. 5. Section 6 is devoted to numerical experiments, and Sect. 7 draws conclusions.

2 Related works

There are many algorithms for reciprocal recommendation. Algorithms are basically designed based on the assumption that similar people are drawn to similar people. Reciprocal recommendation algorithms are classified roughly into three types of methods, content-based, graph-based, and hybrids (Kim et al. 2012).

In a content-based method, user’s profiles and rating are used for measurement over the set of users in the other group, while data from other users in the same group is not actively used. RECON (Pizzato et al. 2010) is a representative content-based method. Suppose that a user \(x_i\) sent an EOI to users in the other group. The recommendation algorithm assigns a high score to users having similar profiles to those who received the EOI from \(x_i\). The scores are calculated reciprocally and used to select preferable matches.

Graph-based methods focus on the graph structure corresponding to EOIs. In this approach, data from other users is utilized to measure a similarity between users in the same group. First, the history of the EOIs is represented by a directed graph, in which each node corresponds to each user. Second, the similarity of two users is measured based on the graph structure around each user. Jaccard similarity is commonly used in reciprocal recommendation (Xia et al. 2015). Based on the similarity measure, the system provides recommendations to each user. Collaborative filtering for reciprocal recommendation also falls into this category (Cai et al. 2010; Schafer et al. 2007). A sophisticated statistical modeling using stochastic blockmodels has also been used to detect the clustering structure among nodes in the graph (Gao et al. 2017; Rohe et al. 2011). Naive graph-based methods do not utilize information on user’s profiles.

The hybrid approach takes both user’s profiles and the graph structure into account. As a hybrid method, the content-collaborative reciprocal recommender (CCR) was proposed as an extension of RECON (Akehurst et al. 2011). Kim et al. (2012) proposed a hybrid method to address cold start problem on internet services.

In this paper, we propose a hybrid method. First, the graph is defined based on the set of users and the history of the EOIs, and the similarities between users in the graph are calculated. Based on the similarities, user’s profiles are mapped to a common representation space. The CDMCA with the similarity measure is applied to find the representation of each users in the common space. If there is no user profiles, the map yields the graph embedding. Then, preferable match recommendation is conducted in the common space. We show that the CDMCA provides a simple way to incorporate information on features such as user’s profiles into the graph-based method.

As shown above, there are some hybrid methods using both graph structure and user profiles. The important point is how to utilize such information to provide a high-quality recommendation. Our method is different from existing methods in the following point.
  • Xia et al. (2015) proposed a graph-based reciprocal recommendation system. The similarity between users is measured by the Jaccard similarity, and basically user’s profiles are not used. The authors showed in their numerical experiments that their graph-based recommendation methods achieved a high prediction performance than content-based methods such as RECON. Our method also uses the Jaccard similarity as the similarity measure among users. However, it is not directly used to yields the recommendation. We show that the CDMCA provides an effective way of incorporating information on user’s profiles into similarity-based methods to raise the prediction performance.

  • In the CCR (Akehurst et al. 2011), the similarity between users is measured based on their profiles, and the candidates of the recommendation are chosen using the interactions between users, where the interaction is expressed as the bipartite graph. As for the similarity measure, our method does not use the user’s profiles. The main difference between the CCR and our method is that our method provides not only recommendation, but the global structure among all users in the common space. In other word, our method simultaneously takes the relationship among all users into account when recommendations are provided. On the other hand, the CCR uses a local structure based on the similarity.

  • Kim et al. (2012) proposed a hybrid method to resolve the so-called cold start problem. The problem is to give an appropriate recommendation to new users that do not have rich information on history of EOIs. To solve this problem, the profile-based similarity and rule-based similarity constructed using subgroup interaction patterns are incorporated into the reciprocal recommendation system.

    In this paper, we focus on the prediction accuracy of the recommendation for existing users, and we do not deal with the cold start problem.

Our method is regarded as a preprocessing technique for conducting CDMCA. When the CDMCA is used to analyze the real-world data, the weight is often defined by the empirical distribution of the co-occurrence (Fukui et al. 2016; Nori et al. 2012). In numerical experiments, we investigate which feature vectors and weights are efficient in the CDMCA for the reciprocal recommendation. We show that the feature vector defined from user’s profiles including information on EOIs, and the weight based on the similarity measure outperforms the other methods. Additionally, we report that other popular methods such as RECON are sensitive to the amount of information in user’s profiles. Systematic numerical experiments show that our approach using the CDMCA is rather robust to the characteristics of observations. We show that our method efficiently utilizes both user’s profiles and EOIs to achieve high prediction accuracy.

3 Problem setup

Reciprocal recommendation is formulated as the prediction of edges in the graph. We define the sets of users, X and Y, as \(X=\{x_1,\ldots ,x_n\},\,Y=\{y_1,\ldots ,y_m\}\), and G as the directed bipartite graph \(G=(X,Y,E)\), where E is the set of directed edges from X to Y or Y to X. The edge \(e=(x,y)\) (resp. (yx)) represents the directed edge from x to y (resp. from y to x). In addition, each node of the graph G has a feature vector that corresponds to a user’s profile. We define the feature of \(x_i\in X\) and \(y_j\in {Y}\) as \({\varvec{x}}_i\in {\mathbb {R}}^{d_X}\) and \({\varvec{y}}_j\in {\mathbb {R}}^{d_Y}\), respectively.

The observed data consist of the directed bipartite graph \(G=(X,Y,E)\) and features \({\varvec{x}}_i,{\varvec{y}}_j,\,i=1,\ldots ,n,\,j=1,\ldots ,m\). In real data analysis such as job recruiting, X and Y are the sets of job seekers and companies, respectively. The edge \(e=(x,y)\in {X}\times {Y}\) represents the requirement of information regarding the company y from the job seeker x, and the reverse edge (yx) represents a similar event, such as a character reference from y to x. The feature \({\varvec{x}}_i\) includes the characteristics, such as his or her major specialty of \(x_i\in {X}\), and \({\varvec{y}}_j\) represents such characteristics as the type of industry or firm size of \(y_j\in {Y}\).

Note that a user x can express an interest in y only when x can detect the existence of y. Typically, the number of nodes, n and m, is quite large. Hence, a user in X can observe only a small portion of Y, and vice versa. Therefore, there might be a more preferable matching that has not yet been observed. The primary purpose of reciprocal recommendation is to predict such potential edges of the graph G from the observed data. For an online job recruiting or dating site, the prediction accuracy of such events is the key to improving the quality of their services.

4 Reciprocal recommendation with CDMCA

We propose a reciprocal recommendation method that uses cross-domain matching correlation analysis (CDMCA) as its principle technique.

4.1 CDMCA

Suppose that related data \(z, z'\) and \(z''\) are observed. For example, z might be an image of a horse, \(z'\) the tag or keyword for describing horses, and \(z''\) the link to a website that discusses horses. The expression of each piece of information, \(z, z'\) and \(z''\), might be different. Such data are referred to as multi-domain data. The task is to find a map of the multi-domain data into a common space \({\mathbb {R}}^K\) such that samples that are likely to co-occur are located near each other. As a result, we can detect unknown pairs that are likely to co-occur in the common space.

CDMCA provides such a map for multi-domain data. Suppose that there are D domains (or D groups) and that the samples \(x_i^d,\,i=1,\ldots , n_d\) are observed from the dth domain. Let us define the weight \(w_{ij}^{dd'}\) between two samples \(x_i^d\) and \(x_j^{d'}\). The samples \(x_i^d\) is mapped to a point \({\varvec{f}}_{i}^d\in {\mathbb {R}}^K\). To find \({\varvec{f}}_i^d\), CDMCA minimizes the squared error
$$\begin{aligned} \sum _{d,d'}\sum _{i,j}w_{ij}^{dd'}\Vert {\varvec{f}}_i^d-{\varvec{f}}_j^{d'}\Vert ^2 \end{aligned}$$
(1)
under appropriate constraints, where \(\Vert {\varvec{a}}\Vert\) is the Euclidean distance \(\sqrt{{\varvec{a}}^\mathrm{T}{\varvec{a}}}\). When the multi-domain data \(x_i^d\) have the feature \({\varvec{x}}_i^d\in {\mathbb {R}}^{k_d}\), the map is defined as a linear transformation
$$\begin{aligned} {\varvec{f}}_{i}^d=A^{(d)} {\varvec{x}}_i^d. \end{aligned}$$
The matrices \(A^{(d)}\in {\mathbb {R}}^{K\times k_d},\,d=1,\ldots ,D\) are found by minimizing the squared error under specified constraints.

4.2 General algorithm for reciprocal recommendation

General reciprocal recommendation involves two domains, X and Y. Here, CDMCA with \(D=2\) is used. This is nothing but cross-view graph embedding (Huang et al. 2012). The weight \(w_{ij}^{dd'}\) is determined from the bipartite graph G. Later, we introduce mean of determining the weight from the graph G. For \(i=1,\ldots ,n,\,j=1,\ldots ,m\), let \({\varvec{f}}_i\in {\mathbb {R}}^K\) and \({\varvec{g}}_j\in {\mathbb {R}}^K\) be the feature vectors of \(x_i\in {X}\) and \(y_j\in {Y}\), respectively. The weight between \(x_i\) and \(y_j\) is denoted as \(w_{ij}\), and the matrices W and \(\widetilde{W}\) are defined as
$$\begin{aligned} W=(w_{ij})\in {\mathbb {R}}^{n\times m},\quad \widetilde{W} = \begin{pmatrix} O & W \\ W^T & O \end{pmatrix}\in {\mathbb {R}}^{(n+m)\times (n+m)}. \end{aligned}$$
Since G is a bipartite graph, the weights in the same group are set to zero. We define the matrix H as
$$\begin{aligned} H= \begin{pmatrix} {\varvec{f}}_1&\cdots&{\varvec{f}}_n&{\varvec{g}}_1&\cdots&{\varvec{g}}_m \end{pmatrix}\in {\mathbb {R}}^{K\times (n+m)}, \end{aligned}$$
the parameters of which are to be found by minimizing the squared error. Then, the squared error is expressed as
$$\begin{aligned} \sum _{i,j}w_{ij}\Vert {\varvec{f}}_i-{\varvec{g}}_j\Vert ^2&= H \left( \begin{pmatrix} D_X & O \\ O & D_Y \end{pmatrix} - \begin{pmatrix} O & W \\ W^\mathrm{T} & O \end{pmatrix} \right) H^\mathrm{T} \nonumber \\&= H\widetilde{D}H^\mathrm{T}- H\widetilde{W}H^\mathrm{T}, \end{aligned}$$
(2)
where \(D_X={\mathrm {diag}}(W{\varvec{1}}_m)\in {\mathbb {R}}^{n\times n}, D_Y={\mathrm {diag}}(W^T{\varvec{1}}_n)\in {\mathbb {R}}^{m\times m}\), and
$$\begin{aligned} \widetilde{D} = \begin{pmatrix} D_X & O \\ O & D_Y \end{pmatrix}. \end{aligned}$$
Here, \({\mathrm {diag}}({\varvec{v}})\in {\mathbb {R}}^{n\times n}\) is the diagonal matrix whose diagonal elements are given by the vector \({\varvec{v}}\in {\mathbb {R}}^n\).
Now we minimize the squared error (2). A common constraint is to fix the first term of (2) as the constraint. Then, the feature vectors are obtained by minimizing the second term,
$$\begin{aligned} \max _{H\in {\mathbb {R}}^{K\times (n+m)}} H\widetilde{W}H^\mathrm{T}\quad {\mathrm {s.t.}}\ H\widetilde{D}H^\mathrm{T}=I, \end{aligned}$$
(3)
where I is the identity matrix. The optimal solution is found by solving the generalized eigenvalue problem
$$\begin{aligned} \widetilde{W}{\varvec{h}}_i=\lambda _i \widetilde{D}{\varvec{h}}_i \end{aligned}$$
with the eigenvalue \(\lambda _i\) and eigenvector \({\varvec{h}}_i\). The top K eigenvalues are chosen, and the corresponding eigenvectors \({\varvec{h}}_1,\ldots ,{\varvec{h}}_K\in {\mathbb {R}}^{n+m}\) are obtained with the result that the optimal solution of (3) is \(H=\begin{pmatrix}{\varvec{h}}_1&\cdots&{\varvec{h}}_K \end{pmatrix}^T\). When multi-domain data have features, the map on each domain is defined as
$$\begin{aligned} {\varvec{x}}_i\mapsto A{\varvec{x}}_i\in {\mathbb {R}}^K,\quad \text {and}\quad {\varvec{y}}_j\mapsto B{\varvec{y}}_j\in {\mathbb {R}}^K. \end{aligned}$$
We define the matrix H as
$$\begin{aligned} H&=\begin{pmatrix} A{\varvec{x}}_1&\cdots&A{\varvec{x}}_n&B{\varvec{y}}_1&\cdots&B{\varvec{y}}_m \end{pmatrix} \\&= \begin{pmatrix} A&B \end{pmatrix} \begin{pmatrix} {\varvec{x}}_1 & \cdots & {\varvec{x}}_n & {\varvec{0}} & \cdots & {\varvec{0}} \\ {\varvec{0}} & \cdots & {\varvec{0}} & {\varvec{y}}_1 & \cdots & {\varvec{y}}_m \end{pmatrix} =C Z. \end{aligned}$$
Now, let us consider the optimization problem,
$$\begin{aligned} \max _{C\in {\mathbb {R}}^{K\times (k_X+k_Y)}} CZ\widetilde{W}Z^\mathrm{T} C^\mathrm{T}\quad {\mathrm {s.t.}}\ CZ\widetilde{D}Z^\mathrm{T}C^\mathrm{T}=I. \end{aligned}$$
(4)
In the same manner as above, the optimal solution of (4) is obtained by solving the generalized eigenvalue problem. Note that the matrix \(Z\widetilde{D}Z^T\) must be non-singular for problem (4). If \(Z\widetilde{D}Z^T\) is not invertible, we introduce a regularization term and the constraint is replaced with
$$\begin{aligned} C(Z\widetilde{D}Z^\mathrm{T}+\varepsilon I)C^\mathrm{T}=I, \end{aligned}$$
where \(\varepsilon\) is set to a small positive real number such as \(10^{-6}\).

Eventually, we obtain the representation vectors in the common space. The distance between \(x_i\in {X}\) and \(y_j\in {Y}\) is measured by \(d_{ij}:=\Vert {\varvec{f}}_i-{\varvec{g}}_j\Vert\). The user \(y_j\) is recommended to \(x_i\) in ascending order of \(d_{ij},j=1,\ldots ,m\). Likewise, user \(x_i\) in X is recommended to \(y_j\in {Y}\) in ascending order of \(d_{ij},i=1,\ldots ,n\).

4.3 Definition of weights

There are several means of determining the weight matrix W. We discuss two kinds of weighting scheme. One is defined by the empirical distribution of edges E in G, and the other is defined by the Jaccard similarity which appears in some existing algorithms for reciprocal recommendation.

First, we define the weight determined from the empirical distribution of the graph edges. Given the graph \(G=(X,Y,E)\), let \(J\in {\mathbb {R}}^{(n+m)\times (n+m)}\) be the incidence matrix. Then, the weight \(\widetilde{W}\) is defined as \(\widetilde{W}=J+J^T\). In other words, the weight \(w_{ij}\) between \(x_i\) and \(y_j\) is given by
$$\begin{aligned} w_{ij} = {\left\{ \begin{array}{ll} 2, & (x_i,y_j)\in {E}\ \& \ (y_j,x_i)\in {E},\\ 1, & (x_i,y_j)\in {E}\ \& \ (y_j,x_i)\not \in {E},\\ 1, & (x_i,y_j)\not \in {E}\ \& \ (y_j,x_i)\in {E},\\ 0, & (x_i,y_j)\not \in {E}\ \& \ (y_j,x_i)\not \in {E}. \end{array}\right. } \end{aligned}$$
(5)
The weight \(w_{ji}\) is defined similarly. The empirical weight is used in the matching between images and tags with multinomial relation prediction being used in social data; see Fukui et al. (2016) and Nori et al. (2012).

The above empirical weight is considered to be insufficient to formalize the assumption that similar people are drawn to similar people. Hence, let us consider the second weighting scheme that uses a similarity measure between graph nodes. Given a directed graph, the similarity between two nodes can be defined by comparing the topological structures near these nodes.

In this paper, we use Jaccard similarity (Real and Vargas 1996). SimRank (Jeh and Widom 2002) is another common graph-based similarity measure that is defined from the random walk on the graph. Iterative algorithm is available to compute the SimRank, while the direct and efficient computation is possible for the Jaccard similarity. In addition, we confirmed that both similarity measures provide almost the same prediction accuracy for the reciprocal recommendation in our preliminary experiments. Hence, the Jaccard similarity is employed in our method.

Let us introduce some terminology regarding the graph \(G=(V,E)\). Given a node, say \(v\in {V}\), the out-neighborhood of v is defined as
$$\begin{aligned} n^{\mathrm {out}}(v)=\{v'\in {V}\,|\,(v,v')\in E\} \end{aligned}$$
and the in-neighborhood of v is defined as
$$\begin{aligned} n^{\mathrm {in}}(v)=\{v'\in {V}\,|\,(v',v)\in E\}. \end{aligned}$$
Then, the Jaccard similarity between the node \(v,v'\in {V}\) is defined as
$$\begin{aligned} {\mathrm {sim}}^{\text {out}}(v,v') = \frac{|n^{\mathrm {out}}(v)\cap n^{\mathrm {out}}(v')|}{|n^{\mathrm {out}}(v)\cup n^{\mathrm {out}}(v')|}, \end{aligned}$$
for the out-neighborhood, and
$$\begin{aligned} {\mathrm {sim}}^{\text {in}}(v,v') = \frac{|n^{\mathrm {in}}(v)\cap n^{\mathrm {in}}(v')|}{|n^{\mathrm {in}}(v)\cup n^{\mathrm {in}}(v')|} \end{aligned}$$
for the in-neighborhood, where |S| is the cardinality of the set S. If the denominator in the above equation is zero, then the Jaccard similarity is defined as zero. The Jaccard similarity compares the topological similarity between neighborhoods of two nodes. Let us define \(s_{ij}^{\mathrm {out}}\) and \(\bar{s}_{ji}^{\mathrm {out}}\) as the average similarity,
$$\begin{aligned} s_{ij}^{\mathrm {out}} = \frac{1}{|n^{\mathrm {in}}(y_j)|}\sum _{x\in n^{\mathrm {in}}(y_j)}{\mathrm {sim}}^{\text {out}}(x_i,x),\ \ \bar{s}_{ji}^{\mathrm {out}} = \frac{1}{|n^{\mathrm {in}}(x_i)|}\sum _{y\in n^{\mathrm {in}}(x_i)}{\mathrm {sim}}^{\text {out}}(y_j,y), \end{aligned}$$
According to the assumption that similar people are drawn to similar people, a large \(s_{ij}^{\mathrm {out}}\) indicates that \(x_i\) is likely to send \(y_j\) an EOI, and a large \(\bar{s}_{ji}^{\mathrm {out}}\) indicates the same in the reverse direction. The weight based on the out-neighborhood is defined by the harmonic mean of \(s_{ij}^{\mathrm {out}}\) and \(\bar{s}_{ji}^{\mathrm {out}}\),
$$\begin{aligned} w_{ij}^{\mathrm {out}}:=2\left( \frac{1}{s_{ij}^{\mathrm {out}}} +\frac{1}{\bar{s}_{ji}^{\mathrm {out}}}\right) ^{-1}, \end{aligned}$$
The harmonic mean of two positive numbers has the property that it becomes smaller than the standard mean if one of the two values is small. The weight \(w_{ij}^{\mathrm {in}}\) is similarly defined by replacing “out” with “in.” Finally, the weight \(w_{ij}\) is defined as
$$\begin{aligned} w_{ij}=w_{ij}^{\mathrm {out}}+w_{ij}^{\mathrm {in}}. \end{aligned}$$
(6)
We propose a version of CDMCA with the above weight defined from the Jaccard similarity for reciprocal recommendation.

Xia et al. (2015) proposed a graph-based method using the weight \(w_{ij}^{\mathrm {out}}\) or \(w_{ij}^{\mathrm {in}}\). Given \(x_i\in {X}\), the algorithm recommends \(y_j\in {Y}\) with a large \(w_{ij}^{\mathrm {out}}\) or a large \(w_{ij}^{\mathrm {in}}\). The recommendation algorithm by Xia et al. (2015) showed a high performance for the reciprocal recommendation. In our method, the weight is not directly used for the recommendation, but it is used to express the relative closeness among users in the CDMCA.

5 Evaluation of prediction accuracy

Let us introduce our method for evaluating the accuracy of reciprocal recommendation. Suppose that the kth preferable user for the user \(x_i\) is given as \(y_{j_k}\) for \(k=1,\ldots ,m\). In our method, this order is determined by the distance from \({\varvec{f}}_i\) to \({\varvec{g}}_1,\ldots ,{\varvec{g}}_m\) in ascending order,
$$\begin{aligned} \Vert {\varvec{f}}_i-{\varvec{g}}_{j_1}\Vert \le \Vert {\varvec{f}}_i-{\varvec{g}}_{j_2}\Vert \le \cdots \le \Vert {\varvec{f}}_i-{\varvec{g}}_{j_m}\Vert . \end{aligned}$$
In the same manner, the \(\ell\)th preferable user for the user \(y_i\) is denoted as \(x_{i_\ell },\,\ell =1,\ldots ,n\). The preference relation among users is determined once the vectors \({\varvec{f}}_i\) and \({\varvec{g}}_j\) are given.
Let \(\widetilde{G}=(X,Y,\widetilde{E})\) be the graph that represents the test samples. Let \(T_{x}\) for \(x\in {X}\) be the subset of Y defined as
$$\begin{aligned} T_x=\left\{ y\in {Y}\,:\,(x,y),(y,x)\in \widetilde{E}\right\} . \end{aligned}$$
The subset \(T_y\) for \(y\in {Y}\) is defined in the same manner. From the definition, we find that bidirectional mutual actions are taken for \(y\in {T}_x\) and \(x\in {T}_y\).
Suppose that the recommendation to x is given as \(y_{j_1},\ldots ,y_{j_m}\) in order of the preference obtained from the training data. Let us define \(z_k\) for \(k=1,\ldots ,m\) to be
$$\begin{aligned} z_k= {\left\{ \begin{array}{ll} 1, &\quad y_{j_k}\in T_x,\\ 0, &\quad \text {otherwise}. \end{array}\right. } \end{aligned}$$
If the recommendation to x is highly accurate, \(z_k\) tends to be one for small k’s. The average prediction \(\nu _x\) defined as
$$\begin{aligned} \nu _x = \frac{1}{\sum _{i=1}^{m}z_i}\left( \sum _{i=1}^m\frac{z_i}{i}\sum _{k=1}^i z_k\right) , \end{aligned}$$
measures the accuracy of the recommendation for \(x\in {X}\). Kishida (2005) proved that if the recommendation to x is obtained entirely at random, the mean value of the average prediction is given as
$$\begin{aligned} {\mathbb {E}}[\nu _x]=\frac{|T_x|-1+\frac{1}{m}\sum _{i=1}^{m}\frac{m-|T_x|}{i}}{m-1}. \end{aligned}$$
(7)
The accuracy of the recommendation is assessed by comparing to the above mean value. The average value of \(\nu _x\) or \(\nu _y\) is referred to as the mean average precision (MAP) (Schütze et al. 2008). For example, the average value on the union \(X\cup {Y}\) is calculated by
$$\begin{aligned} {\mathrm {MAP}} = \frac{\sum _{x:T_x\ne \emptyset }\nu _x + \sum _{y:T_y\ne \emptyset }\nu _y}{|\{x:T_x\ne \emptyset \}|+|\{y:T_y\ne \emptyset \}|}, \end{aligned}$$
which measures the prediction accuracy for the recommendation to all members in \(X\cup {Y}\). Likewise, the average only on X or Y is also used to evaluate the quality of the recommendation on each group.

In addition to the MAP, we can use the normalized discounted cumulative gain@k or NDCG@k for short (Järvelin and Kekäläinen 2002) to measure the top-k prediction accuracy; see (Shalev-Shwartz and Ben-David 2014, Chap 17.4) for details regarding the NDCG. The calculation of the NDCG@k requires test data with the top-k preference order. We use the NDCG@k to assess the recommendation accuracy for synthetic data and MAP for real-world data.

6 Numerical experiments

Xia et al. (2015) reported that the graph-based method using the Jaccard similarity measure outperformed other methods such as RECON (Pizzato et al. 2010). In this paper we compare the performance of our proposed algorithms mainly with Xia’s method, RECON, and the link prediction method by SVM.

We compare the following methods. For each method, its abbreviation is shown in the parenthesis.
  1. 1.

    Random recommendation (random): The recommendation to each user is given totally at random.

     
  2. 2.

    Recommendation with Jaccard similarity (Jac-similarity): The recommendation to \(x_i\) is given as \(y_j\)’s with large \(w_{ij}\) in (6). The feature vector is not taken into account.

     
  3. 3.

    RECON (Pizzato et al. 2010): the preference of \(x_i\) is expressed as the probability distribution over feature vectors of Y. Let \({\varvec{y}}_j=(\bar{y}_{j1}, \ldots , \bar{y}_{ja})\in {\mathbb {R}}^a\) be the feature vector of \(y_j\in {Y}\). Probability densities for continuous variables and the probability function for discrete variables, \(p_{ia'}(\bar{y}_{a'}),\,1\le a'\le a\), represent the preference of the user \(x_i\) for the \(a'\)-th feature of \(y\in {Y}\), and the probability is estimated from the data \(\{{\varvec{y}}_j\,:\,(x_i,y_j)\in {E}\}\). Similarly, the probability \(q_{jb'}(\bar{x}_{b'})\) for \({\varvec{x}}=(\bar{x}_1,\ldots ,\bar{x}_b)\) is estimated from \(\{{\varvec{x}}_i\,:\,(y_j,x_i)\in {E}\}\). The weight \(w_{ij}\) is defined as the harmonic mean of \(\sum _{a'}p_{ia'}(\bar{y}_{ja'})/a\) and \(\sum _{b'}q_{jb'}(\bar{x}_{ib'})/b\). This weight is used to provide the reciprocal recommendation. In numerical experiments, the probability is estimated in accordance with the histogram using the function \(\mathtt{hist}\) in R (R Core Team 2017).

     
  4. 4.

    Link prediction using support vector machine (Xia et al. 2014; Leskovec et al. 2010): The problem is formulated as the link prediction on the graph. The positive labeled data consists of all the concatenated profiles \(({\varvec{x}}_i, {\varvec{y}}_j)\) with the reciprocal edge between \(x_i\) and \(y_j\). The training data having the negative label are randomly sampled from all pairs having only one edge. Here, the dataset has the equal number of reciprocal and non-reciprocal edges. Then, the binary classification methods is applied to obtain the classifier to predict hidden edges for given a pair of profiles. Here we used the linear SVM with the option of the estimation of the label probability. Libsvm (Chang and Lin 2011) is used for the SVM.

     
  5. 5.
    Recommendation using CDMCA:
    1. (a)

      (emp-CDMCA) The weight \(w_{ij}\) for CDMCA is defined from the empirical distribution of graph edges (5).

      (Jac-CDMCA) The weight \(w_{ij}\) is determined from the Jaccard similarity (6).

       
    2. (b)
      The following feature vectors are examined.
      • no feature: The method using the representation vectors \({\varvec{f}}_i, {\varvec{g}}_j\in {\mathbb {R}}^K\) determined by (3).

      • profile: The feature vector of \(x_i\) is defined as its profile vector. The representation vector in the common space \({\mathbb {R}}^K\) is given by \({\varvec{f}}_i=A{\varvec{x}}_i, {\varvec{g}}_j=B{\varvec{y}}_j\in {\mathbb {R}}^K\), where the transformation matrices A and B are determined by solving (4).

      • edge: The feature vector is defined from the edges. More precisely, let us define \(M^X_{ij}\) and \(M^Y_{ji}\) for \(i=1,\ldots ,n, j=1,\ldots ,m\) of the bipartite graph \(G=(X,Y,E)\) as
        $$M^X_{ij}= {\left\{ \begin{array}{ll} 1 & (x_i,y_j)\in {E},\\ 0 & (x_i,y_j)\notin {E}, \end{array}\right. } \quad M^Y_{ji}= {\left\{ \begin{array}{ll} 1 & (y_j,x_i)\in {E},\\ 0 & (y_j,x_i)\notin {E}, \end{array}\right. }$$
        The feature vector of \(x_i\in {X}\) is defined by the concatenation of the profile and the information on edges,
        $$\begin{aligned} {\varvec{x}}_i=\left( M_{i1}^X,\ldots ,M_{im}^X,M_{1i}^Y,\ldots ,M_{mi}^Y\right) . \end{aligned}$$
        The feature vector of \(y_j\in {Y}\) is defined similarly. The transformation matrices A and B for the above feature vectors are obtained by solving (4).
      • profile + edge: In addition to the profile, information on edges is included in the feature vector, i.e., the feature vector of \(x_i\in {X}\) is defined by the concatenation of the profile and the information on edges, \(({\varvec{x}}_i, M_{i1}^X,\ldots ,M_{im}^X,M_{1i}^Y,\ldots ,M_{mi}^Y)\). The feature vector of \(y_j\in {Y}\) is defined similarly. The transformation matrices A and B for the above feature vectors are obtained by solving (4).

       
     

For each method, the results for synthetic data and real-world data were reported.

6.1 Numerical experiments with synthetic data

In this section, we use synthetic data to evaluate the accuracy of learning methods for reciprocal recommendation. The number of users in X and Y is set to 2000. The feature vector of \(x_i\) consists of \(x_i\)’s profile and preference vectors, \({\varvec{x}}_i\) and \({\varvec{x}}_i'\), respectively. Similarly, the feature of \(y_j\in {Y}\) is expressed by the profile \({\varvec{y}}_j\) and preference \({\varvec{y}}_j'\). The dimension of these vectors is set to 100.

The preference of \(x_i\) for \(y_j\) is determined by the Euclidean distance from the \(x_i\)’s preference to \(y_j\)’s profile, i.e., \(\Vert {\varvec{x}}_i'-{\varvec{y}}_j\Vert\). A smaller distance corresponds to a larger occurrence probability of the edge \((x_i,y_j)\). Similarly, the occurrence probability of the edge \((y_j,x_i)\) depends on the distance \(\Vert {\varvec{x}}_i-{\varvec{y}}_j'\Vert\). This setting indicates that the distance between one’s preference and the other’s profile is closely related to the probability that one will show the EOI to the other. In the numerical simulation, information on the profile is observable, but the preference is not observed directly. The individual preference can be observed through the structure of the bipartite graph G. This setting is similar to that in such practical situations as online job recruiting, or online dating.

Let us explain the data-generation process. All profile and preference vectors, \({\varvec{x}}_i, {\varvec{x}}_i', {\varvec{y}}_j, {\varvec{y}}_j'\), are independent and identically distributed from the normal mixture distribution,
$$\begin{aligned} \frac{1}{2}N_{100}({\varvec{0}},I)+\frac{1}{2}N_{100}({\varvec{1}},I), \end{aligned}$$
where \(N_d(\varvec{\mu }, I)\) is the d-dimensional multivariate normal distribution with mean vector \(\varvec{\mu }\) and variance–covariance matrix I.

The graph \(G=(X,Y,E)\) is generated as follows. The sets of users, X and Y, are fixed. For any user \(x_i\in {X}\), choose S users \(y_{j_1},\ldots ,y_{j_S}\) from Y randomly, and calculate the distance from \({\varvec{x}}_i'\), i.e., \(\Vert {\varvec{x}}_i'-{\varvec{y}}_{j_1}\Vert ,\ldots ,\Vert {\varvec{x}}_i'-{\varvec{y}}_{j_S}\Vert\). Then, select the s nearest points from \({\varvec{x}}_i'\) and add the edges from \(x_i\) to the corresponding users in Y. Similarly, add some edges from \(y_j\) to the graph. Repeat the same procedure for all nodes. As the result, the bipartite graph G is obtained. The observed data consist of the profiles of users \(\{{\varvec{x}}_i\}_{i=1,\ldots ,n},\,\{{\varvec{y}}_j\}_{j=1,\ldots ,m}\), and the generated graph G.

Note that the profiles and preferences are distributed from the normal mixture distribution. As a result, both X and Y are classified roughly into two groups, each of which corresponds to a component of the normal mixture distribution. In numerical experiments, we evaluate the prediction accuracy of the reciprocal recommendation, while varying the values of S and s.

Figure 1 shows a part of the weight matrices \((w_{ij})_{i,j}\) constructed by Jaccard similarity and the empirical distribution of graph edges, in which the parameter setups is \(n=500, m=400\) and \(S=200, s=10\). We can see that the Jaccard similarity yields smoother weights comparing to the empirical distribution of graph edges. On the same setup, Fig. 2 shows numerical results using CDMCA-based methods. The red (resp. blue) points indicates the representation vectors of X (resp. Y). Jac-CDMCA with “no feature” or “profile + edge” roughly detects four clusters, while the emp-CDMCA and Jac-CDMCA with “profile” cannot reproduce such a clustering structure. This result illustrates that the similarity measure is useful to detect a cluster structure in reciprocal recommendation in comparison to the empirical weight.
Fig. 1

Plot of a part of the weights \(w_{ij}\). Left panel: weight based on empirical distribution. Right panel: weight based on Jaccard similarity

Fig. 2

Representation vectors obtained using CDMCA-based methods. The upper (resp. lower) panels show the results using emp-CDMCA (resp. Jac-CDMCA). The sample size is \(n=500, m=400\) and the parameters S and s are set to \(S=200, s=10\)

Table 1

NDCG@10, NDCG@100 and MAP score when \(S=100\) and \(s=10\)

 

\(S=100, s=10\)

NDCG@10 (sd)

NDCG@100 (sd)

\({\mathrm {MAP}}\) (sd)

Observation: EOIs

   Random

0.5008 (0.0019)

0.5106 (0.0006)

1.0000

   Jac-similarity

0.8481 (0.0037)

0.8147 (0.0038)

5.0398 (1.2596)

   emp-CDMCA: no feature (\(K=10\))

0.7981 (0.0021)

0.7580 (0.0022)

3.7111 (2.1675)

   emp-CDMCA: no feature (\(K=100\))

0.7846 (0.0019)

0.6787 (0.0032)

4.1656 (0.8672)

   emp-CDMCA: edge (\(K=10\))

0.8146 (0.0026)

0.7880 (0.0023)

3.7128 (2.1603)

   emp-CDMCA: edge (\(K=100\))

0.7930 (0.0023)

0.7047 (0.0032)

4.4671 (0.9277)

   Jac-CDMCA: no feature (\(K=10\))

0.8456 (0.0023)

0.8638 (0.0022)

4.5077 (2.4220)

   Jac-CDMCA: no feature (\(K=100\))

0.8315 (0.0112)

0.6372 (0.0086)

3.9470 (2.0980)

   Jac-CDMCA: edge (\(K=10\))

0.8566 (0.0020)

0.8752 (0.0018)

4.3848 (2.0979)

   Jac-CDMCA: edge (\(K=100\))

0.8841 (0.0017)

0.8952 (0.0021)

5.1161 (1.9870)

Observation: EOIs and \(100\dim\) profiles

   RECON

0.9254 (0.0006)

0.8991 (0.0007)

8.3056 (2.6240)

   SVM

0.4819 (0.0341)

0.4973 (0.0100)

0.7981 (0.2732)

   emp-CDMCA: profile (\(K=10\))

0.5290 (0.0049)

0.5369 (0.0049)

1.7915 (1.6864)

   emp-CDMCA: profile (\(K=100\))

0.5925 (0.0081)

0.5860 (0.0055)

2.2022 (0.5903)

   emp-CDMCA: profile+edge (\(K=10\))

0.8192 (0.0021)

0.7955 (0.0022)

3.7680 (2.2583)

   emp-CDMCA: profile+edge (\(K=100\))

0.7971 (0.0019)

0.7130 (0.0032)

4.5301 (0.9327)

   Jac-CDMCA: profile (\(K=10\))

0.5489 (0.0064)

0.5543 (0.0059)

1.1071 (0.6081)

   Jac-CDMCA: profile (\(K=100\))

0.5931 (0.0060)

0.5867 (0.0048)

2.5514 (0.9715)

   Jac-CDMCA: profile + edge (\(K=10\))

0.8654 (0.0017)

0.8847 (0.0016)

4.4069 (2.0644)

   Jac-CDMCA: profile + edge (\(K=100\))

0.8933 (0.0013)

0.9049 (0.0012)

5.2372 (1.9939)

Observation: EOIs and \(2\dim\) profiles

   RECON

0.5977 (0.0060)

0.6182 (0.0045)

1.8705 (0.7409)

   SVM

0.4709 (0.0232)

0.4950 (0.0077)

0.8015 (0.5272)

   emp-CDMCA: profile (\(K=10\))

0.5164 (0.0316)

0.5187 (0.0126)

0.8546 (0.2423)

   emp-CDMCA: profile (\(K=100\))

0.4998 (0.0318)

0.5132 (0.0110)

0.8546 (0.2423)

   emp-CDMCA: profile + edge (\(K=10\))

0.8153 (0.0027)

0.7892 (0.0024)

3.7799 (2.2405)

   emp-CDMCA: profile + edge (\(K=100\))

0.7935 (0.0023)

0.7061 (0.0033)

4.4742 (0.9246)

   Jac-CDMCA: profile (\(K=10\))

0.5167 (0.0315)

0.5186 (0.0128)

0.8775 (0.2966)

   Jac-CDMCA: profile (\(K=100\))

0.4995 (0.0320)

0.5132 (0.0107)

0.8775 (0.2966)

   Jac-CDMCA: profile + edge (\(K=10\))

0.8586 (0.0020)

0.8771 (0.0017)

4.3633 (2.1101)

   Jac-CDMCA: profile + edge (\(K=100\))

0.8856 (0.0018)

0.8970 (0.0018)

5.1389 (1.9855)

Boldface indicates the methods of achieving top-two recommendation accuracy for the various compared methods

Table 2

NDCG@10, NDCG@100 and MAP score when \(S=200\) and \(s=10\)

 

\(S=200, s=10\)

NDCG@10 (sd)

NDCG@100 (sd)

\({\mathrm {MAP}}\) (sd)

Observation: EOIs

   Random

0.5010 (0.0029)

0.5013 (0.0020)

1.0000

   Jac-similarity

0.7995 (0.0079)

0.8025 (0.0078)

10.1382 (2.9361)

   emp-CDMCA: no feature (\(K=10\))

0.7907 (0.0023)

0.8150 (0.0024)

3.1311 (1.0609)

   emp-CDMCA: no feature (\(K=100\))

0.7497 (0.0020)

0.6893 (0.0036)

7.5639 (2.1329)

   emp-CDMCA: edge (\(K=10\))

0.8083 (0.0024)

0.7785 (0.0022)

3.3235 (1.0517)

   emp-CDMCA: edge (\(K=100\))

0.8071 (0.0027)

0.7086 (0.0041)

7.9797 (2.2437)

   Jac-CDMCA: no feature (\(K=10\))

0.8168 (0.0043)

0.7778 (0.0160)

4.6325 (1.6588)

   Jac-CDMCA: no feature(\(K=100\))

0.8347 (0.0039)

0.5550 (0.0297)

6.6214 (3.2465)

   Jac-CDMCA: edge (\(K=10\))

0.8494 (0.0030)

0.8701 (0.0025)

4.5774 (1.7458)

   Jac-CDMCA: edge (\(K=100\))

0.8826 (0.0018)

0.8925 (0.0022)

8.5750 (3.2427)

Observation: EOIs and \(100\dim\) profiles

   RECON

0.9378 (0.0007)

0.9052 (0.0006)

14.893 (4.0983)

   SVM

0.4706 (0.0347)

0.4955 (0.0091)

0.8052 (0.4815)

   emp-CDMCA: profile (\(K=10\))

0.5308 (0.0044)

0.6045 (0.0097)

1.2064 (0.5259)

   emp-CDMCA: profile (\(K=100\))

0.5375 (0.0044)

0.5942 (0.0061)

2.8875 (1.3146)

   emp-CDMCA: profile + edge (\(K=10\))

0.8113 (0.0027)

0.8239 (0.0024)

3.3675 (1.0718)

   emp-CDMCA: profile + edge (\(K=100\))

0.7805 (0.0022)

0.7182 (0.0035)

8.0314 (2.0999)

   Jac-CDMCA: profile (\(K=10\))

0.5487 (0.0058)

0.5995 (0.0087)

1.2073 (0.5747)

   Jac-CDMCA: profile (\(K=100\))

0.5539 (0.0052)

0.5910 (0.0055)

2.7778 (1.1072)

   Jac-CDMCA: profile + edge (\(K=10\))

0.8647 (0.0026)

0.9011 (0.0043)

4.6163 (1.7864)

   Jac-CDMCA: profile + edge (\(K=100\))

0.8859 (0.0032)

0.9087 (0.0045)

8.7847 (3.2997)

Observation: EOIs and \(2\dim\) profiles

   RECON

0.5968 (0.0069)

0.6183 (0.0048)

2.2477 (0.9037)

   SVM

0.4691 (0.0291)

0.4950 (0.0094)

0.8049 (0.4405)

   emp-CDMCA: profile (\(K=10\))

0.5062 (0.0333)

0.5160 (0.0122)

0.8939 (0.3376)

   emp-CDMCA: profile (\(K=100\))

0.5140 (0.0325)

0.5168 (0.0114)

0.8939 (0.3376)

   emp-CDMCA: profile+edge (\(K=10\))

0.8092 (0.0024)

0.7799 (0.0022)

3.3256 (1.0460)

   emp-CDMCA: profile+edge (\(K=100\))

0.8075 (0.0026)

0.7100 (0.0041)

7.9999 (2.2396)

   Jac-CDMCA: profile (\(K=10\))

0.5061 (0.0334)

0.5162 (0.0122)

0.8864 (0.3357)

   Jac-CDMCA: profile (\(K=100\))

0.5135 (0.0329)

0.5166 (0.0112)

0.8864 (0.3357)

   Jac-CDMCA: profile + edge (\(K=10\))

0.8534 (0.0026)

0.8740 (0.0022)

4.5922 (1.7493)

   Jac-CDMCA: profile + edge (\(K=100\))

0.8851 (0.0018)

0.8950 (0.0022)

8.5986 (3.2522)

Boldface indicates the methods of achieving top-two recommendation accuracy for the various compared methods

Tables 1 and 2 show values the of NDCG@10, NDCG@100, and MAP score for each recommendation method. Here, the MAP score of each method is normalized by that of the random recommendation. In the numerical experiments, the number of users is \(n=m=2000\), and the parameter s is set to \(s=10\). The parameter S was set to 100 in Table 1, and 200 in Table 2. For each method using CDMCA, \(K=10\) and \(K=100\) are examined as the dimensions of representation vectors.
  • The upper panel shows the results when only EOIs were observed.

  • The middle panel corresponds to the results when user’s 100 dimensional profiles and EOIs were available.

  • The bottom panel is the results when the EOIs and only the first two elements in the user’s profile out of 100 elements were observed.

Overall, Jac-CDMCA showed high prediction performance in comparison to the other method. As for the projection dimension, a larger K tends to provide better results. Though the larger S indicates less noise for graph edges, the recommendation accuracy is not significantly affected by S. CDMCA with the empirical weight, i.e., emp-CDMCA, is not necessarily superior to the other methods. In addition, Jac-similarity achieved high evaluation values.

There exists a trade-off between computational cost and the prediction performance. CDMCA with “edge” and “profile + edge” whose vector dimension is roughly proportional to the sample size n and m shows better performance, while CDMCA with “profile” vectors whose dimension is fixed requires less computational cost. In Sect. 7, we show some possible means to resolve the computational issue.

RECON shows high prediction accuracy when the full profiles were observed. On the other hand, when only a part of profiles was available, the prediction with RECON is almost the random guess. This means that RECON does not efficiently utilize information on the observed edges that much. We see that the profile data \({\varvec{x}}_i\) and \({\varvec{y}}_j\) do not contain sufficient information to predict the edges. Hence, the CDMCA with only profiles and the link prediction with SVM do not yield a high prediction accuracy in this problem setup.

6.2 Numerical experiments with real data

6.2.1 Data description

We use the real-world data collected from an online dating site. Here, X and Y are the sets of males and females, respectively. The edge \((x_i,y_j)\) indicates that \(x_i\) send a message to \(y_j\) as the EOI, and \((y_j,x_i)\) is EOI from \(y_j\) to \(x_i\). The sample size is \(n=\) 15,925 and \(m=\) 16,659 after removing users who did not send or receive a message. The data were gathered from 2016-01-03 to 2017-06-05. We used 1308126 messages from 2016-01-03 to 2016-10-31 as the training data. Test data consists of 177450 messages from 2016-11-01 to 2017-06-05.

The profile of each user contains 25 features such as the user’s age, height, weight, education level, and income. The profile consisted of two types of data, quantitative and categorical. The profile vectors, \({\varvec{x}}_i\) and \({\varvec{y}}_j\), were constructed by concatenating the values of the quantitative data with one-hot-vectors for categorical data. As a result, the number of elements in the profile vector reached 214.

6.2.2 Evaluation of prediction accuracy

We used random samples of the training data to evaluate the average accuracy of each method. First, half of the users were sampled randomly from the user sets, X and Y. The profile vectors of the sampled users and the corresponding bipartite subgraph were used as the training data. For each method, the recommendation accuracy over the test subgraph was evaluated using the MAP score. Finally, the mean and standard deviation of the MAP scores over 20 runs with different random seeds were summarized. Here, the recommendation to male and that to female were separately considered, since Xia et al. (2015) observed that male and female users behave differently when it comes to looking for potential dates.

The results are presented in Table 3. Here, the MAP score of each method is normalized by that of the random recommendation. Thus, the MAP score of learning methods better than random recommendation is larger than one. In this sense, all learning methods provided meaningful results.

Jac-CDMCA using the feature vector consisting of profiles and information on edges provided high prediction accuracy for both recommendation to male and to female. For the real-world data under consideration, not only the graph structure of the EOIs but the user’s profiles contribute to improve the prediction accuracy. On the other hand, RECON did not perform well. Hence, user’s profiles did not necessarily have a strong correlation with the graph edges.

In CDMCA method, the variance of the MAP score was relatively small so that the cases between \(K=50\) and \(K=100\) or 200 can be distinguished. Hence, the validation method is expected to effectively work in order to choose an appropriate dimension K.

In the experiments, Jac-similarity achieved a high MAP score even though it used only information on edges. This result is consistent with that reported by Xia et al. (2015). Jac-CDMCA with (only) edges with \(K=200\) also achieved a high MAP score. This result implies that building algorithms that efficiently use edge information is not a trivial problem.
Table 3

Normalized MAP score for reciprocal recommendation methods

Method

Normalized \({\mathrm {MAP}}\) (sd)

Recom. to male

Recom. to female

Random

1.0000

1.0000

Jac-similarity

4.3043 (0.3305)

6.3515 (0.4457)

RECON

1.8695 (1.0301)

1.4928 (0.4595)

SVM

1.4635 (0.4153)

3.0005 (0.9404)

emp-CDMCA: no feature (\(K=50\))

3.1833 (0.5147)

4.6297 (0.7593)

emp-CDMCA: no feature (\(K=100\))

4.4533 (0.5621)

5.5899 (0.7628)

emp-CDMCA: no feature (\(K=200\))

4.5098 (0.4893)

5.4042 (0.4980)

emp-CDMCA: edge (\(K=50\))

4.1455 (0.6869)

4.6931 (0.7495)

emp-CDMCA: edge (\(K=100\))

4.4548 (0.6472)

5.5023 (0.7474)

emp-CDMCA: edge (\(K=200\))

4.3322 (0.5245)

5.3294 (0.5298)

emp-CDMCA: profile (\(K=50\))

2.3433 (0.5929)

2.3603 (0.7052)

emp-CDMCA: profile (\(K=100\))

1.8989 (0.3595)

2.4250 (0.7006)

emp-CDMCA: profile (\(K=200\))

1.8805 (0.4187)

2.2853 (0.6885)

emp-CDMCA: profile + edge (\(K=50\))

3.3315 (0.4756)

4.6747 (0.7275)

emp-CDMCA: profile + edge (\(K=100\))

4.6285 (0.6634)

5.5904 (0.7552)

emp-CDMCA: profile + edge (\(K=200\))

4.7582 (0.4446)

5.6503 (0.5047)

Jac-CDMCA: no feature (\(K=50\))

3.1833 (0.5147)

4.6919 (0.8362)

Jac-CDMCA: no feature (\(K=100\))

5.0179 (0.9455)

7.2605 (1.0341)

Jac-CDMCA: no feature (\(K=200\))

4.7314 (0.5215)

7.3155 (0.9215)

Jac-CDMCA: edge (\(K=50\))

3.2852 (0.4521)

4.9024 (0.8118)

Jac-CDMCA: edge (\(K=100\))

4.9871 (0.8218)

7.3295 (0.9791)

Jac-CDMCA: edge (\(K=200\))

5.3776 (0.6231)

7.9695 (0.9730)

Jac-CDMCA: profile (\(K=50\))

2.3433 (0.5929)

3.0380 (0.7923)

Jac-CDMCA: profile (\(K=100\))

2.3293 (0.6598)

2.7047 (0.7211)

Jac-CDMCA: profile (\(K=200\))

2.2035 (0.6275)

2.5859 (0.6335)

Jac-CDMCA: profile + edge (\(K=50\))

3.3315 (0.4756)

4.9772 (0.8282)

Jac-CDMCA: profile + edge (\(K=100\))

4.9395 (0.8424)

7.5375 (0.9651)

Jac-CDMCA: profile + edge (\(K=200\))

5.4811 (0.5942)

8.2074 (0.9941)

The boldface denotes the methods achieving top-two recommendation accuracy

7 Concluding remarks

We proposed learning methods for reciprocal recommendation, in which CDMCA was used with Jaccard similarity. Through numerical experiments, we found that the feature vector consisting of the user profile and edge information is also important for achieving high prediction accuracy.

As a future research direction, it is important to develop a non-linear cross-domain matching to achieve higher prediction accuracy. CDMCA with non-linear mapping is obtained by applying the kernel method (Schölkopf and Smola 2002). The computationally efficient non-linear CDMCA is required. The CDMCA requires the solution of generalized eigenvalue problems. There are some computationally efficient methods for solving such problems (Saibaba et al. 2015). Moreover, since the Jaccard similarity is regraded as a non-negative definite kernel, one can use low-dimensional approximation methods of the kernel-based Gram matrix such as Nyström approximation (Williams and Seeger 2001). Our preliminary experiments showed that the application of appropriate numerical methods will widen the scope of our methods.

Additionally, theoretical analysis of the reciprocal recommendation is important. In past studies, CDMCA with empirical weights has been used for other problems such as the image-tag matching problem (Fukui et al. 2016) and higher-order relation prediction (Nori et al. 2012). In our study, weights defined by the Jaccard similarity showed better performance than a naive weighting scheme. A theoretical justification of our results is expected to result in substantial progress toward designing more efficient learning algorithms not only for recommendation tasks, but also for multi-relational data analysis.

Notes

Acknowledgements

TK was supported by KAKENHI 16K00044, 15H03636, and 15H01678.

Compliance with ethical standards

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

References

  1. Akehurst, J., Koprinska, I., Yacef, K., Pizzato, L.A.S, Kay, J., & Rej, T. (2011). CCR—A content-collaborative reciprocal recommender for online dating. In T. Walsh (Ed.), IJCAI, IJCAI/AAAI (pp. 2199–2204).Google Scholar
  2. Brun, A., Castagnos, S., & Boyer, A. (2011). Social recommendations: Mentor and leader detection to alleviate the cold-start problem in collaborative filtering. In T. P. H. I-Hsien Ting & L. S. Wang LS (Eds.), Social network mining, analysis and research trends: Techniques and applications, IGI global (pp. 270–290).Google Scholar
  3. Cai, X., Bain, M., Krzywicki, A., Wobcke, W., Kim, Y.S., Compton, P., & Mahidadia, A. (2010). Collaborative filtering for people to people recommendation in social networks. In J. Li (Ed.) AI 2010: Advances in Artificial Intelligence. AI 2010. Proceedings of the Australian Joint Conference on Artificial Intelligence, LNCS (Vol. 6464, pp. 476–485). Berlin: Springer.Google Scholar
  4. Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  5. Fukui, K., Okuno, A., & Shimodaira, H. (2016). Image and tag retrieval by leveraging image-group links with multi-domain graph embedding. In 2016 IEEE International Conference on Image Processing (ICIP) (pp. 221–225).Google Scholar
  6. Gao, C., Ma, Z., Zhang, A. Y., & Zhou, H. H. (2017). Achieving optimal misclassification proportion in stochastic block models. Journal of Machine Learning Research, 18, 1–45.MathSciNetzbMATHGoogle Scholar
  7. Hong, W., Zheng, S., Wang, H., & Shi, J. (2013). A job recommender system based on user clustering. Journal of Computers, 8(8), 1960–1967.CrossRefGoogle Scholar
  8. Hopcroft, J., Lou, T., & Tang, J. (2011). Who will follow you back?: Reciprocal relationship prediction. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11 (pp. 1137–1146). New York: ACM.Google Scholar
  9. Huang, Z., Shan, S., Zhang, H., Lao, S., & Chen, X. (2012). Cross-view graph embedding. In Computer Vision ACCV (pp. 770–781).Google Scholar
  10. Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446.CrossRefGoogle Scholar
  11. Jeh, G., & Widom. J. (2002). Simrank: A measure of structural-context similarity. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02 (pp. 538–543). New York: ACM.Google Scholar
  12. Kim, Y. S., Krzywicki, A., Wobcke, W., Mahidadia, A., Compton, P., Cai, X., & Bain, M. (2012). Hybrid techniques to address cold start problems for people to people recommendation in social networks. In PRICAI 2012: Trends in Artificial Intelligence—12th Pacific Rim International Conference on Artificial Intelligence, Kuching, Malaysia, September 3–7, 2012. Proceedings (pp. 206–217).Google Scholar
  13. Kishida, K. (2005). Property of average precision as performance measure for retrieval experiment. Tech. rep., National Institute of Informatics, nII-2005-014E.Google Scholar
  14. Leskovec, J., Huttenlocher, D., & Kleinberg, J. (2010). Predicting positive and negative links in online social networks. In Proceedings of the 19th International Conference on World Wide Web, WWW ’10 (pp. 641–650). New York: ACM.Google Scholar
  15. Li, L., & Li, T. (2012). MEET: A generalized framework for reciprocal recommender systems. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12 (pp. 35–44). New York: ACM.Google Scholar
  16. Nori, N., Bollegala, D., & Kashima, H. (2012). Multinomial relation prediction in social data: A dimension reduction approach. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, July 22–26, 2012. Toronto, Ontario, Canada.Google Scholar
  17. Pizzato, L., Rej, T., Chung, T., Koprinska, I., & Kay, J. (2010). RECON: A reciprocal recommender for online dating. In Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys ’10 (pp. 207–214). New York: ACM.Google Scholar
  18. Pizzato, L., Rej, T., Akehurst, J., Koprinska, I., Yacef, K., & Kay, J. (2013). Recommending people to people: The nature of reciprocal recommenders with a case study in online dating. User Modeling and User-Adapted Interaction, 23(5), 447–488.CrossRefGoogle Scholar
  19. R Core Team. (2017). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. https://www.R-project.org/.
  20. Real, R., & Vargas, J. M. (1996). The probabilistic basis of Jaccard’s index of similarity. Systematic Biology, 45(3), 380–385.CrossRefGoogle Scholar
  21. Rohe, K., Chatterjee, S., & Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, 39(4), 1878–1915.MathSciNetCrossRefzbMATHGoogle Scholar
  22. Saibaba, A. K., Lee, J., & Kitanidis, P. K. (2015). Randomized algorithms for generalized Hermitian eigenvalue problems with application to computing Karhunen–Loève expansion. Numerical Linear Algebra with Applications, 23, 314–339.CrossRefzbMATHGoogle Scholar
  23. Schafer, J. B., Frankowski, D., Herlocker, J., & Sen, S. (2007). Collaborative filtering recommender systems. In P. Brusilovsky, A. Kobsa, & W. Nejdl (Eds.), The Adaptive Web, LNCS (Vol. 4321, pp. 291–324). Berlin: Springer.Google Scholar
  24. Schölkopf, B., & Smola, A. J. (2002). Learning with Kernels. Cambridge, MA: MIT Press.zbMATHGoogle Scholar
  25. Schütze, H., Manning, C. D., & Raghavan, P. (2008). Introduction to information retrieval (Vol. 39). Cambridge: Cambridge University Press.zbMATHGoogle Scholar
  26. Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. New York, NY: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  27. Shimodaira, H. (2015). A simple coding for cross-domain matching with dimension reduction via spectral graph embedding. arXiv:1412.8380.
  28. Tu, K., Ribeiro, B., Jensen, D., Towsley, D., Liu, B., Jiang, H., & Wang, X. (2014). Online dating recommendations: Matching markets and learning preferences. InProceedings of the 23rd International Conference on World Wide Web, WWW ’14 Companion (pp. 787–792). New York: ACM.Google Scholar
  29. Wang, C., Han, J., Jia, Y., Tang, J., Zhang, D., Yu, Y., & Guo, J. (2010). Mining advisor-advisee relationships from research publication networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10 (pp. 203–212). New York: ACM.Google Scholar
  30. Williams, C. K. I., & Seeger, M. (2001). Using the nyström method to speed up kernel machines. In T. K. Leen, T. G. Dietterich, V. Tresp (Eds.), Advances in neural information processing systems (Vol. 13, pp. 682–688). MIT Press.Google Scholar
  31. Xia, P., Jiang, H., Wang, X., Chen, C., & Liu, B. (2014). Predicting user replying behavior on a large online dating site. In International AAAI Conference on Web and Social Media.Google Scholar
  32. Xia, P., Liu, B., Sun, Y., & Chen, C. (2015). Reciprocal recommendation system for online dating. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, ACM, ASONAM ’15 (pp. 234–241).Google Scholar
  33. Yu, H., Liu, C., & Zhang, F. (2011). Reciprocal recommendation algorithm for the field of recruitment. Journal of Information & Computational Science, 8(16), 4061–4068.Google Scholar
  34. Yu, M., Zhao, K., Yen, J., & Kreager, D. (2013). Recommendation in reciprocal and bipartite social networks-a case study of online dating. In Social Computing, Behavioral-Cultural Modeling and Prediction—6th International Conference, SBP 2013, Washington, DC, USA, April 2–5, 2013. Proceedings (pp. 231–239).Google Scholar

Copyright information

© Japanese Federation of Statistical Science Associations 2019

Authors and Affiliations

  1. 1.NS Solutions CorporationTokyoJapan
  2. 2.Recruit Technologies Co., Ltd.TokyoJapan
  3. 3.Tokyo Institute of Technology/RIKEN AIPTokyoJapan

Personalised recommendations