Numerical study of reciprocal recommendation with domain matching
 177 Downloads
 1 Citations
Abstract
Reciprocal recommendation is the task of finding preferable matches among users in two distinct groups. Popular examples of reciprocal recommendation include online job recruiting and online dating services. In this paper, we propose a new method of reciprocal recommendation that uses a graph embedding technique. In particular, we use crossdomain matching correlation analysis (CDMCA) as a graph embedding method. In CDMCA, feature vectors in different domains are mapped into a common representation space, and reciprocal recommendation is conducted in the common mapped space. Numerical experiments show that the CDMCA with a similaritybased weighting scheme provides a highquality reciprocal recommendation.
Keywords
Reciprocal recommendation Crossdomain matching correlation analysis Jaccard similarity1 Introduction
Recommendation systems are currently used in various fields such as ecommerce. In particular, reciprocal recommendation has been lately attracting attention. Suppose that users are divided into two groups. Reciprocal recommendation system finds preferable matches between users in those groups in consideration of both their characteristics and their preferences.
Popular examples of reciprocal recommendation include online job recruiting and online dating service. The two groups involved in job recruiting are the recruiters and the job seekers. Each user has characteristics and preferences for users in the other group. Some recruiters and job seekers might have contacts with each other. A contact might be regarded as an expression of interest (EOI). Based on characteristics of the users provided on an online recruiting site and the history of the EOIs, the reciprocal recommendation system provides favorable matches between recruiters and job seekers. On an online dating site, the two groups consist typically of male and female, and the purpose is to find preferable partners for each. Details of reciprocal recommendation are found in Li and Li (2012), Pizzato et al. (2013), Xia et al. (2015), Yu et al. (2011, 2013), Pizzato et al. (2010), Tu et al. (2014), Hong et al. (2013), Brun et al. (2011), Wang et al. (2010), Hopcroft et al. (2011) and references therein.
In this paper, we propose a reciprocal recommendation algorithm using a graph embedding technique. In particular, we use crossdomain matching correlation analysis (CDMCA) that was proposed by Shimodaira (2015) as the graph embedding method. CDMCA is used to investigate the relationships among observed data vectors in multiple domains. We show that CDMCA can be applied to the detection of preferable matches in reciprocal recommendation problems. In CDMCA, data vectors in distinct domains are mapped into a common representation space, and statistical inference is conducted in the mapped space. The mapped points can be directly compared to each other, even when two groups are originally expressed in terms of different features. Hence, we can find preferable matches between two groups using such convenient properties of CDMCA. In numerical experiments, we investigate whether the CDMCAbased approach attain a higher prediction accuracy in comparison to existing ones. The CDMCA is regarded as an extension of classical datadescription methods such as principal component analysis and canonical correlation analysis. Hence, we obtain a visualization of the global structure among all users in the common representation space.
The remainder of the paper is organized as follows. In Sect. 2, we introduce related work on reciprocal recommendation algorithms. The problem setup is explained in Sect. 3. In Sect. 4, we propose a new method for reciprocal recommendation, and consider some variants of the proposed method. Some measures of ranking quality are introduced in Sect. 5. Section 6 is devoted to numerical experiments, and Sect. 7 draws conclusions.
2 Related works
There are many algorithms for reciprocal recommendation. Algorithms are basically designed based on the assumption that similar people are drawn to similar people. Reciprocal recommendation algorithms are classified roughly into three types of methods, contentbased, graphbased, and hybrids (Kim et al. 2012).
In a contentbased method, user’s profiles and rating are used for measurement over the set of users in the other group, while data from other users in the same group is not actively used. RECON (Pizzato et al. 2010) is a representative contentbased method. Suppose that a user \(x_i\) sent an EOI to users in the other group. The recommendation algorithm assigns a high score to users having similar profiles to those who received the EOI from \(x_i\). The scores are calculated reciprocally and used to select preferable matches.
Graphbased methods focus on the graph structure corresponding to EOIs. In this approach, data from other users is utilized to measure a similarity between users in the same group. First, the history of the EOIs is represented by a directed graph, in which each node corresponds to each user. Second, the similarity of two users is measured based on the graph structure around each user. Jaccard similarity is commonly used in reciprocal recommendation (Xia et al. 2015). Based on the similarity measure, the system provides recommendations to each user. Collaborative filtering for reciprocal recommendation also falls into this category (Cai et al. 2010; Schafer et al. 2007). A sophisticated statistical modeling using stochastic blockmodels has also been used to detect the clustering structure among nodes in the graph (Gao et al. 2017; Rohe et al. 2011). Naive graphbased methods do not utilize information on user’s profiles.
The hybrid approach takes both user’s profiles and the graph structure into account. As a hybrid method, the contentcollaborative reciprocal recommender (CCR) was proposed as an extension of RECON (Akehurst et al. 2011). Kim et al. (2012) proposed a hybrid method to address cold start problem on internet services.
In this paper, we propose a hybrid method. First, the graph is defined based on the set of users and the history of the EOIs, and the similarities between users in the graph are calculated. Based on the similarities, user’s profiles are mapped to a common representation space. The CDMCA with the similarity measure is applied to find the representation of each users in the common space. If there is no user profiles, the map yields the graph embedding. Then, preferable match recommendation is conducted in the common space. We show that the CDMCA provides a simple way to incorporate information on features such as user’s profiles into the graphbased method.

Xia et al. (2015) proposed a graphbased reciprocal recommendation system. The similarity between users is measured by the Jaccard similarity, and basically user’s profiles are not used. The authors showed in their numerical experiments that their graphbased recommendation methods achieved a high prediction performance than contentbased methods such as RECON. Our method also uses the Jaccard similarity as the similarity measure among users. However, it is not directly used to yields the recommendation. We show that the CDMCA provides an effective way of incorporating information on user’s profiles into similaritybased methods to raise the prediction performance.

In the CCR (Akehurst et al. 2011), the similarity between users is measured based on their profiles, and the candidates of the recommendation are chosen using the interactions between users, where the interaction is expressed as the bipartite graph. As for the similarity measure, our method does not use the user’s profiles. The main difference between the CCR and our method is that our method provides not only recommendation, but the global structure among all users in the common space. In other word, our method simultaneously takes the relationship among all users into account when recommendations are provided. On the other hand, the CCR uses a local structure based on the similarity.

Kim et al. (2012) proposed a hybrid method to resolve the socalled cold start problem. The problem is to give an appropriate recommendation to new users that do not have rich information on history of EOIs. To solve this problem, the profilebased similarity and rulebased similarity constructed using subgroup interaction patterns are incorporated into the reciprocal recommendation system.
In this paper, we focus on the prediction accuracy of the recommendation for existing users, and we do not deal with the cold start problem.
3 Problem setup
Reciprocal recommendation is formulated as the prediction of edges in the graph. We define the sets of users, X and Y, as \(X=\{x_1,\ldots ,x_n\},\,Y=\{y_1,\ldots ,y_m\}\), and G as the directed bipartite graph \(G=(X,Y,E)\), where E is the set of directed edges from X to Y or Y to X. The edge \(e=(x,y)\) (resp. (y, x)) represents the directed edge from x to y (resp. from y to x). In addition, each node of the graph G has a feature vector that corresponds to a user’s profile. We define the feature of \(x_i\in X\) and \(y_j\in {Y}\) as \({\varvec{x}}_i\in {\mathbb {R}}^{d_X}\) and \({\varvec{y}}_j\in {\mathbb {R}}^{d_Y}\), respectively.
The observed data consist of the directed bipartite graph \(G=(X,Y,E)\) and features \({\varvec{x}}_i,{\varvec{y}}_j,\,i=1,\ldots ,n,\,j=1,\ldots ,m\). In real data analysis such as job recruiting, X and Y are the sets of job seekers and companies, respectively. The edge \(e=(x,y)\in {X}\times {Y}\) represents the requirement of information regarding the company y from the job seeker x, and the reverse edge (y, x) represents a similar event, such as a character reference from y to x. The feature \({\varvec{x}}_i\) includes the characteristics, such as his or her major specialty of \(x_i\in {X}\), and \({\varvec{y}}_j\) represents such characteristics as the type of industry or firm size of \(y_j\in {Y}\).
Note that a user x can express an interest in y only when x can detect the existence of y. Typically, the number of nodes, n and m, is quite large. Hence, a user in X can observe only a small portion of Y, and vice versa. Therefore, there might be a more preferable matching that has not yet been observed. The primary purpose of reciprocal recommendation is to predict such potential edges of the graph G from the observed data. For an online job recruiting or dating site, the prediction accuracy of such events is the key to improving the quality of their services.
4 Reciprocal recommendation with CDMCA
We propose a reciprocal recommendation method that uses crossdomain matching correlation analysis (CDMCA) as its principle technique.
4.1 CDMCA
Suppose that related data \(z, z'\) and \(z''\) are observed. For example, z might be an image of a horse, \(z'\) the tag or keyword for describing horses, and \(z''\) the link to a website that discusses horses. The expression of each piece of information, \(z, z'\) and \(z''\), might be different. Such data are referred to as multidomain data. The task is to find a map of the multidomain data into a common space \({\mathbb {R}}^K\) such that samples that are likely to cooccur are located near each other. As a result, we can detect unknown pairs that are likely to cooccur in the common space.
4.2 General algorithm for reciprocal recommendation
Eventually, we obtain the representation vectors in the common space. The distance between \(x_i\in {X}\) and \(y_j\in {Y}\) is measured by \(d_{ij}:=\Vert {\varvec{f}}_i{\varvec{g}}_j\Vert\). The user \(y_j\) is recommended to \(x_i\) in ascending order of \(d_{ij},j=1,\ldots ,m\). Likewise, user \(x_i\) in X is recommended to \(y_j\in {Y}\) in ascending order of \(d_{ij},i=1,\ldots ,n\).
4.3 Definition of weights
There are several means of determining the weight matrix W. We discuss two kinds of weighting scheme. One is defined by the empirical distribution of edges E in G, and the other is defined by the Jaccard similarity which appears in some existing algorithms for reciprocal recommendation.
The above empirical weight is considered to be insufficient to formalize the assumption that similar people are drawn to similar people. Hence, let us consider the second weighting scheme that uses a similarity measure between graph nodes. Given a directed graph, the similarity between two nodes can be defined by comparing the topological structures near these nodes.
In this paper, we use Jaccard similarity (Real and Vargas 1996). SimRank (Jeh and Widom 2002) is another common graphbased similarity measure that is defined from the random walk on the graph. Iterative algorithm is available to compute the SimRank, while the direct and efficient computation is possible for the Jaccard similarity. In addition, we confirmed that both similarity measures provide almost the same prediction accuracy for the reciprocal recommendation in our preliminary experiments. Hence, the Jaccard similarity is employed in our method.
Xia et al. (2015) proposed a graphbased method using the weight \(w_{ij}^{\mathrm {out}}\) or \(w_{ij}^{\mathrm {in}}\). Given \(x_i\in {X}\), the algorithm recommends \(y_j\in {Y}\) with a large \(w_{ij}^{\mathrm {out}}\) or a large \(w_{ij}^{\mathrm {in}}\). The recommendation algorithm by Xia et al. (2015) showed a high performance for the reciprocal recommendation. In our method, the weight is not directly used for the recommendation, but it is used to express the relative closeness among users in the CDMCA.
5 Evaluation of prediction accuracy
In addition to the MAP, we can use the normalized discounted cumulative gain@k or NDCG@k for short (Järvelin and Kekäläinen 2002) to measure the topk prediction accuracy; see (ShalevShwartz and BenDavid 2014, Chap 17.4) for details regarding the NDCG. The calculation of the NDCG@k requires test data with the topk preference order. We use the NDCG@k to assess the recommendation accuracy for synthetic data and MAP for realworld data.
6 Numerical experiments
Xia et al. (2015) reported that the graphbased method using the Jaccard similarity measure outperformed other methods such as RECON (Pizzato et al. 2010). In this paper we compare the performance of our proposed algorithms mainly with Xia’s method, RECON, and the link prediction method by SVM.
 1.
Random recommendation (random): The recommendation to each user is given totally at random.
 2.
Recommendation with Jaccard similarity (Jacsimilarity): The recommendation to \(x_i\) is given as \(y_j\)’s with large \(w_{ij}\) in (6). The feature vector is not taken into account.
 3.
RECON (Pizzato et al. 2010): the preference of \(x_i\) is expressed as the probability distribution over feature vectors of Y. Let \({\varvec{y}}_j=(\bar{y}_{j1}, \ldots , \bar{y}_{ja})\in {\mathbb {R}}^a\) be the feature vector of \(y_j\in {Y}\). Probability densities for continuous variables and the probability function for discrete variables, \(p_{ia'}(\bar{y}_{a'}),\,1\le a'\le a\), represent the preference of the user \(x_i\) for the \(a'\)th feature of \(y\in {Y}\), and the probability is estimated from the data \(\{{\varvec{y}}_j\,:\,(x_i,y_j)\in {E}\}\). Similarly, the probability \(q_{jb'}(\bar{x}_{b'})\) for \({\varvec{x}}=(\bar{x}_1,\ldots ,\bar{x}_b)\) is estimated from \(\{{\varvec{x}}_i\,:\,(y_j,x_i)\in {E}\}\). The weight \(w_{ij}\) is defined as the harmonic mean of \(\sum _{a'}p_{ia'}(\bar{y}_{ja'})/a\) and \(\sum _{b'}q_{jb'}(\bar{x}_{ib'})/b\). This weight is used to provide the reciprocal recommendation. In numerical experiments, the probability is estimated in accordance with the histogram using the function \(\mathtt{hist}\) in R (R Core Team 2017).
 4.
Link prediction using support vector machine (Xia et al. 2014; Leskovec et al. 2010): The problem is formulated as the link prediction on the graph. The positive labeled data consists of all the concatenated profiles \(({\varvec{x}}_i, {\varvec{y}}_j)\) with the reciprocal edge between \(x_i\) and \(y_j\). The training data having the negative label are randomly sampled from all pairs having only one edge. Here, the dataset has the equal number of reciprocal and nonreciprocal edges. Then, the binary classification methods is applied to obtain the classifier to predict hidden edges for given a pair of profiles. Here we used the linear SVM with the option of the estimation of the label probability. Libsvm (Chang and Lin 2011) is used for the SVM.
 5.Recommendation using CDMCA:
 (a)
(empCDMCA) The weight \(w_{ij}\) for CDMCA is defined from the empirical distribution of graph edges (5).
(JacCDMCA) The weight \(w_{ij}\) is determined from the Jaccard similarity (6).
 (b)The following feature vectors are examined.

no feature: The method using the representation vectors \({\varvec{f}}_i, {\varvec{g}}_j\in {\mathbb {R}}^K\) determined by (3).

profile: The feature vector of \(x_i\) is defined as its profile vector. The representation vector in the common space \({\mathbb {R}}^K\) is given by \({\varvec{f}}_i=A{\varvec{x}}_i, {\varvec{g}}_j=B{\varvec{y}}_j\in {\mathbb {R}}^K\), where the transformation matrices A and B are determined by solving (4).
 edge: The feature vector is defined from the edges. More precisely, let us define \(M^X_{ij}\) and \(M^Y_{ji}\) for \(i=1,\ldots ,n, j=1,\ldots ,m\) of the bipartite graph \(G=(X,Y,E)\) asThe feature vector of \(x_i\in {X}\) is defined by the concatenation of the profile and the information on edges,$$M^X_{ij}= {\left\{ \begin{array}{ll} 1 & (x_i,y_j)\in {E},\\ 0 & (x_i,y_j)\notin {E}, \end{array}\right. } \quad M^Y_{ji}= {\left\{ \begin{array}{ll} 1 & (y_j,x_i)\in {E},\\ 0 & (y_j,x_i)\notin {E}, \end{array}\right. }$$The feature vector of \(y_j\in {Y}\) is defined similarly. The transformation matrices A and B for the above feature vectors are obtained by solving (4).$$\begin{aligned} {\varvec{x}}_i=\left( M_{i1}^X,\ldots ,M_{im}^X,M_{1i}^Y,\ldots ,M_{mi}^Y\right) . \end{aligned}$$

profile + edge: In addition to the profile, information on edges is included in the feature vector, i.e., the feature vector of \(x_i\in {X}\) is defined by the concatenation of the profile and the information on edges, \(({\varvec{x}}_i, M_{i1}^X,\ldots ,M_{im}^X,M_{1i}^Y,\ldots ,M_{mi}^Y)\). The feature vector of \(y_j\in {Y}\) is defined similarly. The transformation matrices A and B for the above feature vectors are obtained by solving (4).

 (a)
For each method, the results for synthetic data and realworld data were reported.
6.1 Numerical experiments with synthetic data
In this section, we use synthetic data to evaluate the accuracy of learning methods for reciprocal recommendation. The number of users in X and Y is set to 2000. The feature vector of \(x_i\) consists of \(x_i\)’s profile and preference vectors, \({\varvec{x}}_i\) and \({\varvec{x}}_i'\), respectively. Similarly, the feature of \(y_j\in {Y}\) is expressed by the profile \({\varvec{y}}_j\) and preference \({\varvec{y}}_j'\). The dimension of these vectors is set to 100.
The preference of \(x_i\) for \(y_j\) is determined by the Euclidean distance from the \(x_i\)’s preference to \(y_j\)’s profile, i.e., \(\Vert {\varvec{x}}_i'{\varvec{y}}_j\Vert\). A smaller distance corresponds to a larger occurrence probability of the edge \((x_i,y_j)\). Similarly, the occurrence probability of the edge \((y_j,x_i)\) depends on the distance \(\Vert {\varvec{x}}_i{\varvec{y}}_j'\Vert\). This setting indicates that the distance between one’s preference and the other’s profile is closely related to the probability that one will show the EOI to the other. In the numerical simulation, information on the profile is observable, but the preference is not observed directly. The individual preference can be observed through the structure of the bipartite graph G. This setting is similar to that in such practical situations as online job recruiting, or online dating.
The graph \(G=(X,Y,E)\) is generated as follows. The sets of users, X and Y, are fixed. For any user \(x_i\in {X}\), choose S users \(y_{j_1},\ldots ,y_{j_S}\) from Y randomly, and calculate the distance from \({\varvec{x}}_i'\), i.e., \(\Vert {\varvec{x}}_i'{\varvec{y}}_{j_1}\Vert ,\ldots ,\Vert {\varvec{x}}_i'{\varvec{y}}_{j_S}\Vert\). Then, select the s nearest points from \({\varvec{x}}_i'\) and add the edges from \(x_i\) to the corresponding users in Y. Similarly, add some edges from \(y_j\) to the graph. Repeat the same procedure for all nodes. As the result, the bipartite graph G is obtained. The observed data consist of the profiles of users \(\{{\varvec{x}}_i\}_{i=1,\ldots ,n},\,\{{\varvec{y}}_j\}_{j=1,\ldots ,m}\), and the generated graph G.
Note that the profiles and preferences are distributed from the normal mixture distribution. As a result, both X and Y are classified roughly into two groups, each of which corresponds to a component of the normal mixture distribution. In numerical experiments, we evaluate the prediction accuracy of the reciprocal recommendation, while varying the values of S and s.
NDCG@10, NDCG@100 and MAP score when \(S=100\) and \(s=10\)
\(S=100, s=10\)  

NDCG@10 (sd)  NDCG@100 (sd)  \({\mathrm {MAP}}\) (sd)  
Observation: EOIs  
Random  0.5008 (0.0019)  0.5106 (0.0006)  1.0000 
Jacsimilarity  0.8481 (0.0037)  0.8147 (0.0038)  5.0398 (1.2596) 
empCDMCA: no feature (\(K=10\))  0.7981 (0.0021)  0.7580 (0.0022)  3.7111 (2.1675) 
empCDMCA: no feature (\(K=100\))  0.7846 (0.0019)  0.6787 (0.0032)  4.1656 (0.8672) 
empCDMCA: edge (\(K=10\))  0.8146 (0.0026)  0.7880 (0.0023)  3.7128 (2.1603) 
empCDMCA: edge (\(K=100\))  0.7930 (0.0023)  0.7047 (0.0032)  4.4671 (0.9277) 
JacCDMCA: no feature (\(K=10\))  0.8456 (0.0023)  0.8638 (0.0022)  4.5077 (2.4220) 
JacCDMCA: no feature (\(K=100\))  0.8315 (0.0112)  0.6372 (0.0086)  3.9470 (2.0980) 
JacCDMCA: edge (\(K=10\))  0.8566 (0.0020)  0.8752 (0.0018)  4.3848 (2.0979) 
JacCDMCA: edge (\(K=100\))  0.8841 (0.0017)  0.8952 (0.0021)  5.1161 (1.9870) 
Observation: EOIs and \(100\dim\) profiles  
RECON  0.9254 (0.0006)  0.8991 (0.0007)  8.3056 (2.6240) 
SVM  0.4819 (0.0341)  0.4973 (0.0100)  0.7981 (0.2732) 
empCDMCA: profile (\(K=10\))  0.5290 (0.0049)  0.5369 (0.0049)  1.7915 (1.6864) 
empCDMCA: profile (\(K=100\))  0.5925 (0.0081)  0.5860 (0.0055)  2.2022 (0.5903) 
empCDMCA: profile+edge (\(K=10\))  0.8192 (0.0021)  0.7955 (0.0022)  3.7680 (2.2583) 
empCDMCA: profile+edge (\(K=100\))  0.7971 (0.0019)  0.7130 (0.0032)  4.5301 (0.9327) 
JacCDMCA: profile (\(K=10\))  0.5489 (0.0064)  0.5543 (0.0059)  1.1071 (0.6081) 
JacCDMCA: profile (\(K=100\))  0.5931 (0.0060)  0.5867 (0.0048)  2.5514 (0.9715) 
JacCDMCA: profile + edge (\(K=10\))  0.8654 (0.0017)  0.8847 (0.0016)  4.4069 (2.0644) 
JacCDMCA: profile + edge (\(K=100\))  0.8933 (0.0013)  0.9049 (0.0012)  5.2372 (1.9939) 
Observation: EOIs and \(2\dim\) profiles  
RECON  0.5977 (0.0060)  0.6182 (0.0045)  1.8705 (0.7409) 
SVM  0.4709 (0.0232)  0.4950 (0.0077)  0.8015 (0.5272) 
empCDMCA: profile (\(K=10\))  0.5164 (0.0316)  0.5187 (0.0126)  0.8546 (0.2423) 
empCDMCA: profile (\(K=100\))  0.4998 (0.0318)  0.5132 (0.0110)  0.8546 (0.2423) 
empCDMCA: profile + edge (\(K=10\))  0.8153 (0.0027)  0.7892 (0.0024)  3.7799 (2.2405) 
empCDMCA: profile + edge (\(K=100\))  0.7935 (0.0023)  0.7061 (0.0033)  4.4742 (0.9246) 
JacCDMCA: profile (\(K=10\))  0.5167 (0.0315)  0.5186 (0.0128)  0.8775 (0.2966) 
JacCDMCA: profile (\(K=100\))  0.4995 (0.0320)  0.5132 (0.0107)  0.8775 (0.2966) 
JacCDMCA: profile + edge (\(K=10\))  0.8586 (0.0020)  0.8771 (0.0017)  4.3633 (2.1101) 
JacCDMCA: profile + edge (\(K=100\))  0.8856 (0.0018)  0.8970 (0.0018)  5.1389 (1.9855) 
NDCG@10, NDCG@100 and MAP score when \(S=200\) and \(s=10\)
\(S=200, s=10\)  

NDCG@10 (sd)  NDCG@100 (sd)  \({\mathrm {MAP}}\) (sd)  
Observation: EOIs  
Random  0.5010 (0.0029)  0.5013 (0.0020)  1.0000 
Jacsimilarity  0.7995 (0.0079)  0.8025 (0.0078)  10.1382 (2.9361) 
empCDMCA: no feature (\(K=10\))  0.7907 (0.0023)  0.8150 (0.0024)  3.1311 (1.0609) 
empCDMCA: no feature (\(K=100\))  0.7497 (0.0020)  0.6893 (0.0036)  7.5639 (2.1329) 
empCDMCA: edge (\(K=10\))  0.8083 (0.0024)  0.7785 (0.0022)  3.3235 (1.0517) 
empCDMCA: edge (\(K=100\))  0.8071 (0.0027)  0.7086 (0.0041)  7.9797 (2.2437) 
JacCDMCA: no feature (\(K=10\))  0.8168 (0.0043)  0.7778 (0.0160)  4.6325 (1.6588) 
JacCDMCA: no feature(\(K=100\))  0.8347 (0.0039)  0.5550 (0.0297)  6.6214 (3.2465) 
JacCDMCA: edge (\(K=10\))  0.8494 (0.0030)  0.8701 (0.0025)  4.5774 (1.7458) 
JacCDMCA: edge (\(K=100\))  0.8826 (0.0018)  0.8925 (0.0022)  8.5750 (3.2427) 
Observation: EOIs and \(100\dim\) profiles  
RECON  0.9378 (0.0007)  0.9052 (0.0006)  14.893 (4.0983) 
SVM  0.4706 (0.0347)  0.4955 (0.0091)  0.8052 (0.4815) 
empCDMCA: profile (\(K=10\))  0.5308 (0.0044)  0.6045 (0.0097)  1.2064 (0.5259) 
empCDMCA: profile (\(K=100\))  0.5375 (0.0044)  0.5942 (0.0061)  2.8875 (1.3146) 
empCDMCA: profile + edge (\(K=10\))  0.8113 (0.0027)  0.8239 (0.0024)  3.3675 (1.0718) 
empCDMCA: profile + edge (\(K=100\))  0.7805 (0.0022)  0.7182 (0.0035)  8.0314 (2.0999) 
JacCDMCA: profile (\(K=10\))  0.5487 (0.0058)  0.5995 (0.0087)  1.2073 (0.5747) 
JacCDMCA: profile (\(K=100\))  0.5539 (0.0052)  0.5910 (0.0055)  2.7778 (1.1072) 
JacCDMCA: profile + edge (\(K=10\))  0.8647 (0.0026)  0.9011 (0.0043)  4.6163 (1.7864) 
JacCDMCA: profile + edge (\(K=100\))  0.8859 (0.0032)  0.9087 (0.0045)  8.7847 (3.2997) 
Observation: EOIs and \(2\dim\) profiles  
RECON  0.5968 (0.0069)  0.6183 (0.0048)  2.2477 (0.9037) 
SVM  0.4691 (0.0291)  0.4950 (0.0094)  0.8049 (0.4405) 
empCDMCA: profile (\(K=10\))  0.5062 (0.0333)  0.5160 (0.0122)  0.8939 (0.3376) 
empCDMCA: profile (\(K=100\))  0.5140 (0.0325)  0.5168 (0.0114)  0.8939 (0.3376) 
empCDMCA: profile+edge (\(K=10\))  0.8092 (0.0024)  0.7799 (0.0022)  3.3256 (1.0460) 
empCDMCA: profile+edge (\(K=100\))  0.8075 (0.0026)  0.7100 (0.0041)  7.9999 (2.2396) 
JacCDMCA: profile (\(K=10\))  0.5061 (0.0334)  0.5162 (0.0122)  0.8864 (0.3357) 
JacCDMCA: profile (\(K=100\))  0.5135 (0.0329)  0.5166 (0.0112)  0.8864 (0.3357) 
JacCDMCA: profile + edge (\(K=10\))  0.8534 (0.0026)  0.8740 (0.0022)  4.5922 (1.7493) 
JacCDMCA: profile + edge (\(K=100\))  0.8851 (0.0018)  0.8950 (0.0022)  8.5986 (3.2522) 

The upper panel shows the results when only EOIs were observed.

The middle panel corresponds to the results when user’s 100 dimensional profiles and EOIs were available.

The bottom panel is the results when the EOIs and only the first two elements in the user’s profile out of 100 elements were observed.
Overall, JacCDMCA showed high prediction performance in comparison to the other method. As for the projection dimension, a larger K tends to provide better results. Though the larger S indicates less noise for graph edges, the recommendation accuracy is not significantly affected by S. CDMCA with the empirical weight, i.e., empCDMCA, is not necessarily superior to the other methods. In addition, Jacsimilarity achieved high evaluation values.
There exists a tradeoff between computational cost and the prediction performance. CDMCA with “edge” and “profile + edge” whose vector dimension is roughly proportional to the sample size n and m shows better performance, while CDMCA with “profile” vectors whose dimension is fixed requires less computational cost. In Sect. 7, we show some possible means to resolve the computational issue.
RECON shows high prediction accuracy when the full profiles were observed. On the other hand, when only a part of profiles was available, the prediction with RECON is almost the random guess. This means that RECON does not efficiently utilize information on the observed edges that much. We see that the profile data \({\varvec{x}}_i\) and \({\varvec{y}}_j\) do not contain sufficient information to predict the edges. Hence, the CDMCA with only profiles and the link prediction with SVM do not yield a high prediction accuracy in this problem setup.
6.2 Numerical experiments with real data
6.2.1 Data description
We use the realworld data collected from an online dating site. Here, X and Y are the sets of males and females, respectively. The edge \((x_i,y_j)\) indicates that \(x_i\) send a message to \(y_j\) as the EOI, and \((y_j,x_i)\) is EOI from \(y_j\) to \(x_i\). The sample size is \(n=\) 15,925 and \(m=\) 16,659 after removing users who did not send or receive a message. The data were gathered from 20160103 to 20170605. We used 1308126 messages from 20160103 to 20161031 as the training data. Test data consists of 177450 messages from 20161101 to 20170605.
The profile of each user contains 25 features such as the user’s age, height, weight, education level, and income. The profile consisted of two types of data, quantitative and categorical. The profile vectors, \({\varvec{x}}_i\) and \({\varvec{y}}_j\), were constructed by concatenating the values of the quantitative data with onehotvectors for categorical data. As a result, the number of elements in the profile vector reached 214.
6.2.2 Evaluation of prediction accuracy
We used random samples of the training data to evaluate the average accuracy of each method. First, half of the users were sampled randomly from the user sets, X and Y. The profile vectors of the sampled users and the corresponding bipartite subgraph were used as the training data. For each method, the recommendation accuracy over the test subgraph was evaluated using the MAP score. Finally, the mean and standard deviation of the MAP scores over 20 runs with different random seeds were summarized. Here, the recommendation to male and that to female were separately considered, since Xia et al. (2015) observed that male and female users behave differently when it comes to looking for potential dates.
The results are presented in Table 3. Here, the MAP score of each method is normalized by that of the random recommendation. Thus, the MAP score of learning methods better than random recommendation is larger than one. In this sense, all learning methods provided meaningful results.
JacCDMCA using the feature vector consisting of profiles and information on edges provided high prediction accuracy for both recommendation to male and to female. For the realworld data under consideration, not only the graph structure of the EOIs but the user’s profiles contribute to improve the prediction accuracy. On the other hand, RECON did not perform well. Hence, user’s profiles did not necessarily have a strong correlation with the graph edges.
In CDMCA method, the variance of the MAP score was relatively small so that the cases between \(K=50\) and \(K=100\) or 200 can be distinguished. Hence, the validation method is expected to effectively work in order to choose an appropriate dimension K.
Normalized MAP score for reciprocal recommendation methods
Method  Normalized \({\mathrm {MAP}}\) (sd)  

Recom. to male  Recom. to female  
Random  1.0000  1.0000 
Jacsimilarity  4.3043 (0.3305)  6.3515 (0.4457) 
RECON  1.8695 (1.0301)  1.4928 (0.4595) 
SVM  1.4635 (0.4153)  3.0005 (0.9404) 
empCDMCA: no feature (\(K=50\))  3.1833 (0.5147)  4.6297 (0.7593) 
empCDMCA: no feature (\(K=100\))  4.4533 (0.5621)  5.5899 (0.7628) 
empCDMCA: no feature (\(K=200\))  4.5098 (0.4893)  5.4042 (0.4980) 
empCDMCA: edge (\(K=50\))  4.1455 (0.6869)  4.6931 (0.7495) 
empCDMCA: edge (\(K=100\))  4.4548 (0.6472)  5.5023 (0.7474) 
empCDMCA: edge (\(K=200\))  4.3322 (0.5245)  5.3294 (0.5298) 
empCDMCA: profile (\(K=50\))  2.3433 (0.5929)  2.3603 (0.7052) 
empCDMCA: profile (\(K=100\))  1.8989 (0.3595)  2.4250 (0.7006) 
empCDMCA: profile (\(K=200\))  1.8805 (0.4187)  2.2853 (0.6885) 
empCDMCA: profile + edge (\(K=50\))  3.3315 (0.4756)  4.6747 (0.7275) 
empCDMCA: profile + edge (\(K=100\))  4.6285 (0.6634)  5.5904 (0.7552) 
empCDMCA: profile + edge (\(K=200\))  4.7582 (0.4446)  5.6503 (0.5047) 
JacCDMCA: no feature (\(K=50\))  3.1833 (0.5147)  4.6919 (0.8362) 
JacCDMCA: no feature (\(K=100\))  5.0179 (0.9455)  7.2605 (1.0341) 
JacCDMCA: no feature (\(K=200\))  4.7314 (0.5215)  7.3155 (0.9215) 
JacCDMCA: edge (\(K=50\))  3.2852 (0.4521)  4.9024 (0.8118) 
JacCDMCA: edge (\(K=100\))  4.9871 (0.8218)  7.3295 (0.9791) 
JacCDMCA: edge (\(K=200\))  5.3776 (0.6231)  7.9695 (0.9730) 
JacCDMCA: profile (\(K=50\))  2.3433 (0.5929)  3.0380 (0.7923) 
JacCDMCA: profile (\(K=100\))  2.3293 (0.6598)  2.7047 (0.7211) 
JacCDMCA: profile (\(K=200\))  2.2035 (0.6275)  2.5859 (0.6335) 
JacCDMCA: profile + edge (\(K=50\))  3.3315 (0.4756)  4.9772 (0.8282) 
JacCDMCA: profile + edge (\(K=100\))  4.9395 (0.8424)  7.5375 (0.9651) 
JacCDMCA: profile + edge (\(K=200\))  5.4811 (0.5942)  8.2074 (0.9941) 
7 Concluding remarks
We proposed learning methods for reciprocal recommendation, in which CDMCA was used with Jaccard similarity. Through numerical experiments, we found that the feature vector consisting of the user profile and edge information is also important for achieving high prediction accuracy.
As a future research direction, it is important to develop a nonlinear crossdomain matching to achieve higher prediction accuracy. CDMCA with nonlinear mapping is obtained by applying the kernel method (Schölkopf and Smola 2002). The computationally efficient nonlinear CDMCA is required. The CDMCA requires the solution of generalized eigenvalue problems. There are some computationally efficient methods for solving such problems (Saibaba et al. 2015). Moreover, since the Jaccard similarity is regraded as a nonnegative definite kernel, one can use lowdimensional approximation methods of the kernelbased Gram matrix such as Nyström approximation (Williams and Seeger 2001). Our preliminary experiments showed that the application of appropriate numerical methods will widen the scope of our methods.
Additionally, theoretical analysis of the reciprocal recommendation is important. In past studies, CDMCA with empirical weights has been used for other problems such as the imagetag matching problem (Fukui et al. 2016) and higherorder relation prediction (Nori et al. 2012). In our study, weights defined by the Jaccard similarity showed better performance than a naive weighting scheme. A theoretical justification of our results is expected to result in substantial progress toward designing more efficient learning algorithms not only for recommendation tasks, but also for multirelational data analysis.
Notes
Acknowledgements
TK was supported by KAKENHI 16K00044, 15H03636, and 15H01678.
Compliance with ethical standards
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
References
 Akehurst, J., Koprinska, I., Yacef, K., Pizzato, L.A.S, Kay, J., & Rej, T. (2011). CCR—A contentcollaborative reciprocal recommender for online dating. In T. Walsh (Ed.), IJCAI, IJCAI/AAAI (pp. 2199–2204).Google Scholar
 Brun, A., Castagnos, S., & Boyer, A. (2011). Social recommendations: Mentor and leader detection to alleviate the coldstart problem in collaborative filtering. In T. P. H. IHsien Ting & L. S. Wang LS (Eds.), Social network mining, analysis and research trends: Techniques and applications, IGI global (pp. 270–290).Google Scholar
 Cai, X., Bain, M., Krzywicki, A., Wobcke, W., Kim, Y.S., Compton, P., & Mahidadia, A. (2010). Collaborative filtering for people to people recommendation in social networks. In J. Li (Ed.) AI 2010: Advances in Artificial Intelligence. AI 2010. Proceedings of the Australian Joint Conference on Artificial Intelligence, LNCS (Vol. 6464, pp. 476–485). Berlin: Springer.Google Scholar
 Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
 Fukui, K., Okuno, A., & Shimodaira, H. (2016). Image and tag retrieval by leveraging imagegroup links with multidomain graph embedding. In 2016 IEEE International Conference on Image Processing (ICIP) (pp. 221–225).Google Scholar
 Gao, C., Ma, Z., Zhang, A. Y., & Zhou, H. H. (2017). Achieving optimal misclassification proportion in stochastic block models. Journal of Machine Learning Research, 18, 1–45.MathSciNetzbMATHGoogle Scholar
 Hong, W., Zheng, S., Wang, H., & Shi, J. (2013). A job recommender system based on user clustering. Journal of Computers, 8(8), 1960–1967.CrossRefGoogle Scholar
 Hopcroft, J., Lou, T., & Tang, J. (2011). Who will follow you back?: Reciprocal relationship prediction. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11 (pp. 1137–1146). New York: ACM.Google Scholar
 Huang, Z., Shan, S., Zhang, H., Lao, S., & Chen, X. (2012). Crossview graph embedding. In Computer Vision ACCV (pp. 770–781).Google Scholar
 Järvelin, K., & Kekäläinen, J. (2002). Cumulated gainbased evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446.CrossRefGoogle Scholar
 Jeh, G., & Widom. J. (2002). Simrank: A measure of structuralcontext similarity. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02 (pp. 538–543). New York: ACM.Google Scholar
 Kim, Y. S., Krzywicki, A., Wobcke, W., Mahidadia, A., Compton, P., Cai, X., & Bain, M. (2012). Hybrid techniques to address cold start problems for people to people recommendation in social networks. In PRICAI 2012: Trends in Artificial Intelligence—12th Pacific Rim International Conference on Artificial Intelligence, Kuching, Malaysia, September 3–7, 2012. Proceedings (pp. 206–217).Google Scholar
 Kishida, K. (2005). Property of average precision as performance measure for retrieval experiment. Tech. rep., National Institute of Informatics, nII2005014E.Google Scholar
 Leskovec, J., Huttenlocher, D., & Kleinberg, J. (2010). Predicting positive and negative links in online social networks. In Proceedings of the 19th International Conference on World Wide Web, WWW ’10 (pp. 641–650). New York: ACM.Google Scholar
 Li, L., & Li, T. (2012). MEET: A generalized framework for reciprocal recommender systems. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12 (pp. 35–44). New York: ACM.Google Scholar
 Nori, N., Bollegala, D., & Kashima, H. (2012). Multinomial relation prediction in social data: A dimension reduction approach. In Proceedings of the TwentySixth AAAI Conference on Artificial Intelligence, July 22–26, 2012. Toronto, Ontario, Canada.Google Scholar
 Pizzato, L., Rej, T., Chung, T., Koprinska, I., & Kay, J. (2010). RECON: A reciprocal recommender for online dating. In Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys ’10 (pp. 207–214). New York: ACM.Google Scholar
 Pizzato, L., Rej, T., Akehurst, J., Koprinska, I., Yacef, K., & Kay, J. (2013). Recommending people to people: The nature of reciprocal recommenders with a case study in online dating. User Modeling and UserAdapted Interaction, 23(5), 447–488.CrossRefGoogle Scholar
 R Core Team. (2017). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. https://www.Rproject.org/.
 Real, R., & Vargas, J. M. (1996). The probabilistic basis of Jaccard’s index of similarity. Systematic Biology, 45(3), 380–385.CrossRefGoogle Scholar
 Rohe, K., Chatterjee, S., & Yu, B. (2011). Spectral clustering and the highdimensional stochastic blockmodel. The Annals of Statistics, 39(4), 1878–1915.MathSciNetCrossRefzbMATHGoogle Scholar
 Saibaba, A. K., Lee, J., & Kitanidis, P. K. (2015). Randomized algorithms for generalized Hermitian eigenvalue problems with application to computing Karhunen–Loève expansion. Numerical Linear Algebra with Applications, 23, 314–339.CrossRefzbMATHGoogle Scholar
 Schafer, J. B., Frankowski, D., Herlocker, J., & Sen, S. (2007). Collaborative filtering recommender systems. In P. Brusilovsky, A. Kobsa, & W. Nejdl (Eds.), The Adaptive Web, LNCS (Vol. 4321, pp. 291–324). Berlin: Springer.Google Scholar
 Schölkopf, B., & Smola, A. J. (2002). Learning with Kernels. Cambridge, MA: MIT Press.zbMATHGoogle Scholar
 Schütze, H., Manning, C. D., & Raghavan, P. (2008). Introduction to information retrieval (Vol. 39). Cambridge: Cambridge University Press.zbMATHGoogle Scholar
 ShalevShwartz, S., & BenDavid, S. (2014). Understanding machine learning: From theory to algorithms. New York, NY: Cambridge University Press.CrossRefzbMATHGoogle Scholar
 Shimodaira, H. (2015). A simple coding for crossdomain matching with dimension reduction via spectral graph embedding. arXiv:1412.8380.
 Tu, K., Ribeiro, B., Jensen, D., Towsley, D., Liu, B., Jiang, H., & Wang, X. (2014). Online dating recommendations: Matching markets and learning preferences. InProceedings of the 23rd International Conference on World Wide Web, WWW ’14 Companion (pp. 787–792). New York: ACM.Google Scholar
 Wang, C., Han, J., Jia, Y., Tang, J., Zhang, D., Yu, Y., & Guo, J. (2010). Mining advisoradvisee relationships from research publication networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10 (pp. 203–212). New York: ACM.Google Scholar
 Williams, C. K. I., & Seeger, M. (2001). Using the nyström method to speed up kernel machines. In T. K. Leen, T. G. Dietterich, V. Tresp (Eds.), Advances in neural information processing systems (Vol. 13, pp. 682–688). MIT Press.Google Scholar
 Xia, P., Jiang, H., Wang, X., Chen, C., & Liu, B. (2014). Predicting user replying behavior on a large online dating site. In International AAAI Conference on Web and Social Media.Google Scholar
 Xia, P., Liu, B., Sun, Y., & Chen, C. (2015). Reciprocal recommendation system for online dating. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, ACM, ASONAM ’15 (pp. 234–241).Google Scholar
 Yu, H., Liu, C., & Zhang, F. (2011). Reciprocal recommendation algorithm for the field of recruitment. Journal of Information & Computational Science, 8(16), 4061–4068.Google Scholar
 Yu, M., Zhao, K., Yen, J., & Kreager, D. (2013). Recommendation in reciprocal and bipartite social networksa case study of online dating. In Social Computing, BehavioralCultural Modeling and Prediction—6th International Conference, SBP 2013, Washington, DC, USA, April 2–5, 2013. Proceedings (pp. 231–239).Google Scholar