HE-SNA: an efficient cross-platform network alignment scheme from privacy-aware perspective

User alignment across online social network platforms (OSNPs) is a growing concern with the rapid development of internet technology. In reality, users tend to register different accounts on multiple OSNPs, and the network platforms are reluctant to share network structure and user’s information due to business interest and privacy protection, which brings great obstacles to cross-platform user alignment. In view of this, we propose a homomorphic encryption-based social network alignment (HE-SNA) algorithm from the perspective of privacy leakage. Specifically, we first consider the OSNPs as a system containing multiple social networks, that each participant of OSNPs owns part of the network, i.e., a separate private sub-network. Then, encryption, fusion and decryption operations of the alignment information are performed by two third-party servers using HE scheme, which can protect the privacy information of sub-networks effectively. Finally, each sub-network uses the fused alignment information sent back from the third-party server for user alignment. Experimental results show that the HE-SNA method can provide a sum of locally trained models to third-party servers without leaking the privacy of any single sub-network. Moreover, the HE-SNA achieves a promising network alignment performance than only using the structural information and alignment data of single private sub-network while protecting its topology structure information.


Introduction
With the rise of various online social network platforms (OSNPs), people tend to register different social accounts to log into these networks according to their personal preferences and needs [1]. However, user relationships on platforms are usually not publicly available because they may contain private information about the users, such as connections due to religious beliefs, specific identities and financial accounts. Once released without permission, it can violate the law and be harmful to business interests. Moreover, many shopping sites are reluctant to disclose the friendship of users from the B Kai Zhong kaizhong0402@ahu.edu.cn 1 profit perspective. In order to maximize the integration and perfection of'users information to provide better services for users, network alignment strategy was raised to find out the same person behind different networks [2,3]. Network alignment plays more and more important role in network analysis as it is conducive to facilitate many downstream applications, such as recommender systems [4,5] and malicious entities detection [6], etc.
In many real-world scenarios, many complex systems may be recorded by multiple OSNPs due to privacy protection, commercial competition and other factors. For example, a person may be logged into WeChat, Facebook, Douban, Twitter at the same time. In this case, multiple OSNPs can be considered as a system of multiple private online social networks [7,8], that each participant of OSNPs owns part of the network (i.e., a separate private sub-network) and part of the user alignment data. Existing alignment methods do not consider the issue of privacy protection, which may be inconsistent with the fact that the information of multiple private sub-networks may not be available due to trade secrets and privacy protection. A question arises: is it possible to build a secure protocol framework in which multiple private sub-networks can be used collaboratively without exposing the network structure and user privacy? thus the user alignment problem can be solved effectively. It is well known that the main economic sources of online platforms are advertising and product pushing, and user alignment helps the platform monetize more broadly and precisely. However, due to the relationships between users are partially recorded by different OSNPs that belong to different companies and they are unwilling to disclose their social relationships, it is necessary to design a secure protocol framework to realize user alignment and protect data privacy of each platform simultaneously. The illustration of using multiple private subnetworks to realize network alignment is shown in Fig. 1.
Cloud computing technology is becoming more mature as web technology evolves. The adoption of cloud computing shortens the product development cycle while saving the cost of purchasing and maintaining infrastructure [9,10]. Despite those potential benefits of cloud computing, security remains a major obstacle to cloud computing development from the perspective consumers [11]. In recent years, ciphertext data computing supported by homomorphic encryption (HE) technology is widely used for privacy protection and large-scale computing scenarios [12]. Due to the good property of HE, the data holder sends the encrypted private data to a third-party (whether the third-party is trusted or not), which processes the data on the ciphertext and returns it to the data holder after finished. In this process, the data is confidential to the third-party, so the data privacy of users is well protected.
In light of this, a privacy protection-driven network alignment scheme (named HE-SNA) is proposed in this work. The algorithm "collaboratively" uses the topology structures of the multiple private sub-networks for alignment while protecting the user's private information with the help of two third-party servers. Experimental results on different networks show that the HE-SNA method can achieves better network alignment performance, which is much better than using only the structural information and alignment data of single private sub-network. It is necessary to emphasize that we are not proposing a new algorithm for cross-platform alignment, but are more concerned with how to design a set of privacy protection protocols to better integrate information and leverage the capabilities of existing algorithms. The main contributions of this work are summarized as follows: 1. Considering the fact that online network topologic data is usually held by multiple platforms. A completely new problem is defined: how to collaboratively utilize the structure and the data owned by different networks to realize alignment across multiple networks in a confidential manner. 2. To the best of our knowledge, this work is the first time that considers the HE scheme for cross-platform network alignment applications. The proposed HE-SNA shows enhanced alignment performance than only using single private sub-network while protecting its topology structure information. 3. Experimental results on different networks show that HE-SNA achieves good results in terms of adaptability to new problem scenarios and robustness of model performance.

Related works
In this section, two classical categories of network alignment are reviewed, including the users attribute feature based methods and the network topology structure based methods. Besides, a brief history of homomorphic encryption (HE) and its applications are introduced, which plays an important role in understanding the proposed HE-SNA method.

Network alignment
As described in the introduction, network alignment method has recently emerged following the concept of OSNPs, privacy-preserving and so on. Recent advancements in network alignment can be broadly divided into the following two categories.

Attribute feature based methods
This method converts profile information such as user name, age, gender, occupation or address of users on different OSNPs into a multidimensional vector that is used to characterize user information in social networks. Consider the distribution discrepancy of user representations from different networks, Zheng et al. designed the mapping functions across the latent representation spaces, and the representation distribution discrepancy is addressed through the adversarial training between the mapping functions and the discriminators as well as the cycle-consistency training [13]. Nguyen et al. introduced NAWAL, a novel, end-to-end unsupervised embedding-based network alignment framework emphasizing on structural information, which demonstrates the robustness against adversarial conditions [14]. Li et al. proposed a user identification solution across social networks based on username and display name (UISN-UD), which enables the possibility of matching user accounts with high accessibility and small amount of online data [15]. Liu

Network topology structure based methods
Since few people share the same circle of friends, it is more likely that the same person will share the same circle of friends on different social networks. Because the relationship between users can reflect the topological features of the network and is relatively easy to obtain, some scholars use the network topological structure to identify the matched users [17]. Alignment based on network topology is to transform the social relations between users into network topology equivalently, and then match the users according to the similarity between nodes. Depending on whether matching data is used, these methods are mainly divided into two categories: unsupervised and supervised. Narayanana et al. first proposed to identify users based on network topology, starting from a small number of known seed nodes and finding new matching nodes through continuous iterative updates, which can achieve user identification between two social networks [18]. Yan et al. proposed a meta-learning algorithm to guide the updating of the pseudo anchor embeddings during the network alignment process, which allows the learning framework to be applicable to a wide spectrum of network alignment methods with structural proximity preserving [19]. Tang et al. proposed a degree penalty principle to calculate the matching degree of all unmatched node pairs, and studied the importance of scalefree characteristic of SMNs for inter-layer link prediction in the real world [20]. The proposed method verifies that better user alignment can be achieved using the network topology.
Chen et al. designed a novel semisupervised model, namely the multilevel attribute embedding for semisupervised user identity linkage (MAUIL) and the superiority of the MAUIL approach over other ones through extensive experiments on two real-world datasets [21]. Thanh et al. proposed an unsupervised alignment framework that emphasized structural information. The model embeds the network nodes into a low-dimensional space and then uses the generated adversarial deep neural network to extract structural features [14]. Since unsupervised method does not rely on labeled data, its performance is discounted when compared with the supervised ones. Due to the heterogeneity of social networks and the sparsity of some users, the network topology structure based methods leave much to be desired in terms of comprehensive performance of network alignment.
With the rapid development of machine learning technology, a large number of machine learning-based methods have been applied to the field of network alignment, and fruitful research results have been achieved [22,23]. Among them, the representation learning approach uses graph embeddings to solve the network alignment problem. Specifically, the best user representation suitable for user alignment task is obtained from the model first, and then the mapping function is defined to match the users across different networks. The achievements, such as PALE [24], IONE [25], COSNET [26], LHNE [27], TransLink [28] are several classical representation learning methods.

Homomorphic encryption (HE) and its applications
In recent years, cloud computing has received a lot of attention, and one of the problems encountered in its imple-mentation is how to ensure the privacy of data [29,30]. Meanwhile, system security and cryptography provide a variety of security frameworks for the privacy protection of machine learning [31,32]. In the field of cryptography, HE can solve this technical problem to a certain extent, which refers to the encryption function for the ciphertext that obtained from the encrypted plaintext. Note that the result of calculating and then decrypting the ciphertext is equivalent to that of calculating ciphertext after decryption. In this way, the third party only needs to calculate the ciphertext to protect the privacy of each participant from the third party. Due to this good nature, one can entrust a third-party to process the data without revealing information.
The concept of HE was first proposed by Rivest et al. in 1978 to construct an encryption mechanism that supported ciphertext retrieval [33]. Later, it was developed into the idea of computing before decrypting the ciphertext, which is equivalent to decrypting before computing [34]. Due to the advantages of HE in terms of computational cost, communication consumption and security, more and more theoretical and applied researches are conducted by scholars [35,36]. For example, Paillier proposed a provably secure cryptosystem that allows additive operations on ciphertexts, and has been widely used in many applications [37]. In 2009, Gentry gave the first construction of a fully HE scheme that supports performing arbitrary multiplication operations on encrypted data, which is a milestone in homomorphic cryptography [38].
Since then, HE technology developed rapidly and has been widely used in various aspects. Dowlin et al. developed a cryptonets method based on HE that allows cloud servers to evaluate the security of cryptographic queries from trained neural networks [39]. Li et al. proposed a new framework for HE on nonlinear rings that could achieve one-way security based on the conjugate search problem [40]. Based on differential privacy and HE, Jia et al. presented the distributed clustering and distributed random forest methods for multiple data protection with data sharing and model sharing [41]. Lu et al. designed a privacy-preserving Cox regression protocol, which allows researchers to train models on horizontally or vertically segmented datasets while providing privacy protection for sensitive data and the trained models [42].

Preliminaries
In general, each participant is independent of each other and has only partial information about the structure of the original network. Our aim is to improve user alignment performance by using two third-party servers that can collaboratively use information from all sub-networks without exposing any subnetwork information. Since no sensitive information will be exposed, which helps to protect the privacy of each participant, thus more and more users are willing to participate. At the same time, the more parties involved, the more network structure information is used, so the better network alignment performance can be guaranteed. In this section, two alignment matching metrics and the HE technology are introduced, respectively.

Matching degree metrics
The problem of network alignment has been thoroughly studied by many scholars with a number of matching degree metrics. In this paper, only two representative ones are selected to demonstrate the superiority of the proposed approach.
Given a matched inter-layer node pair is called a common matching neighbor (CMN) [43] of the nodes v X i and v Y j , if there is an intra-layer link between the nodes v X a and v X i and an intra-layer link also exists between the node v Y b and v Y j , i.e., e X ai and e Y bj exist. It can be expressed as where has only one neighbor v Y j at the network layer G Y , then there is a high probability that the inter-layer nodes v X i and v Y j are the same user. Conversely, if the matched inter-layer nodes v X i and v Y j have many neighbors, which makes it difficult to determine the inter-layer matching relationship among their neighbor nodes. Therefore, a greater matching weight is given to matched inter-layer node pairs that have fewer neighbors [20], which can be expressed as: where P denotes the set of pre-aligned users, k v X a and k v Y b denote the degree of nodes v X a and v Y b , respectively. It is worth noting that log −1 (k v X a ) will be equal to 0 when k v X a = 1. To overcome this problem, 1 is added to each logarithmic function.

Revisit of HE technology
HE is a cryptographic technique based on the complexity theory of mathematical computation, which has the advantage of performing computation without decrypting the encrypted data (ciphertext) in advance, i.e., the result of computation before decrypting the ciphertext is equivalent to decrypting the ciphertext before computing. HE methods can be divided into three categories according to the number of operations on the encrypted data: partially homomorphic encryption (PHE) allows only one type of operations for an unlimited times [37,44], somewhat homomorphic encryption (SHE) allows some types of operations for a limited number of times [45,46] and fully homomorphic encryption (FHE) allows an unlimited number of operations for an unlimited number of times [38,47].
Paillier encryption system [37] in the HE scheme is used in our work, which is a novel probabilistic encryption scheme based on the composite residuosity problem [48]. It has four main operations: KeyGen, Encrypt, Evaluate, Decrypt. KeyGen operation generates public key k p and private key k s . First, two large prime numbers p and q are randomly selected, so that GCD( pq, (p − 1)(q − 1)) = 1, where GCD(, ) represents the greatest common divisor. Second, n = pq and λ = LCM( p − 1, q − 1) are calculated, where LCM(, ) represents the least common multiple. Finally, g ∈ Z n 2 is randomly selected by checking whether GCD(n, L(g λ mod n 2 )) = 1, where L(u) = u−1 n for every u from the subgroup Z n 2 that is a multiplicative subgroup of integers modulo n 2 instead of n as in the Benaloh cryptosystem, then k p = {n, g} and k s = {p, q} are generated. Encr ypt operation encrypts plaintext m, where the number r is randomly chosen and the encryption works as follows: c = E(m) = g m r n (mod n 2 ). Decrypt operation decrypts ciphertext c, where the decryption is done by D(c) = L(c λ (mod n 2 )) L(g λ (mod n 2 )) mod n. Evaluate operation takes ciphertexts as input and outputs evaluated ciphertexts. Pailliler's encryption scheme is a PHE algorithm that supports the Evaluate operation for additive homomorphism: where E k p is the encryption algorithm and M is the set of plaintext messages.

The proposed HE-SNA method
In this section, we first give the motivations of this work, then present the key steps of HE-SNA methods, and finally provide the pseudo-code of the algorithm.

Motivations
Taking into account the users tend to register different accounts on multiple OSNPs, they are generally serialized. Thus, each private sub-network owns private data, which has the potential to contribute to network alignment. Intuitively, centralizing the data of the same/similar users across different social network platforms can train an excellent network alignment model. Despite of that, the private data is restricted in network alignment field due to privacy concerns and business competitions.
For the new problem scenario described above and inspired by the property that the HE scheme allows mathematical operations to be executed on ciphertexts, we design an HE-based network alignment method HE-SNA, which can fuse private sub-networks information for better alignment while protecting user's privacy. In addition, this work is the first time to link HE technology to network alignment across multiple social network platforms and show boosted model robustness.

Security assumptions
The framework is composed of multiple sub-networks and two cloud servers which will follow the protocol. Server 1 is responsible for fusing the matching degree information of the sub-networks. Server 2 is responsible for generating the public key and private key, and decrypting the ciphertext of the fused match information. Each sub-network intends to preserve its own private information against the cloud servers and other sub-networks, but they want to have access to the alignment information of other sub-networks for better user alignment.

The proposed HE-SNA method
As described earlier, network alignment is a process of integrating social accounts on different social network platforms. We consider two social networks G X and G Y , where G X is online layer occupied by different participants and each private participant independently owns a sub-network and part of the alignment data respectively. In most cases, each participant is independent and has only partial topological information about the social networks G X , while off-line layer G Y is open and the structure information is known. Our goal is to leverage the topological information of the social networks G X and the partial inter-layer alignment data Fig. 2 The complete structure of the proposed protocol that the participants have, then collaborate with third-party servers to better calibrate the network without exposing any information of participants.
The proposed protocol is composed of the following six steps: (1) generation of sub-networks, (2) encryption of matrices, (3) transmission of information, (4) fusion of information, (5) decryption of matrix, (6) alignment, as described in Fig. 2. The framework processes can be seen in Fig. 3, where the HE-SNA method is illustrated by the example of G X divided into three sub-networks (G X 1 , G X 2 , G X 3 ) and the details are introduced as follows: Step 1: Generation of sub-networks Social network is represented by G (V , E), where V is the set of users, N = |V | is the number of users and E is the set of relationships between users, respectively. Social network G X (V X , E X ) is occupied by d private participants, denoted as {G X 1 , G X 2 , . . . , G X d }, where G X t is the ith subnetwork, t ∈ {1, 2, . . . , d}. The structure of social network G Y (V Y , E Y ) is completely known. The d sub-networks are aligned with the social network G Y separately using the matching degree metric (i.e., CMN, IDP) to obtain the matching degree matrices, denoted as {S 1 , S 2 , . . . , S d }, where S t represents the matching degree matrix obtained by aligning G X t with G Y , and S t i j represents the matching degree value of the ith node in G X t with the j-th node in G Y .
Step 2: Encryption of matrices Server 2 (it has both public key k p and private key k s ) distributes k p to G X t , and G X t gets the result E k p (S t ) obtained by encrypting all elements in S t with k p . For the matching degree matrix S t , the ciphertext matrix element E k p (S t i j ) = (g S t i j ×r n ) mod n 2 is calculated using k p , where r ∈ Z n is a random integer. Step

3: Transmission of information
The d sub-networks {G X 1 , G X 2 , . . . , G X d } send the ciphertext matrices {E k p (S 1 ), E k p (S 2 ), . . . , E k p (S d )} to server 1 (server 1 does not know k p nor k s ).
Step 4: Fusion of information Server 1 fuses all the ciphertext matrices {E k p (S 1 ), E k p (S 2 ), . . . , E k p (S d )} from each sub-network to get matrix After that, the Server 1 sends V to Server 2. This is the most important step in HE-SNA, which fuses the alignment data of all sub-networks together.
Step 5: Decryption of matrix Server 2 uses the private key k s to decrypt V and obtains matrix U , the elements of which are calculated by the following: mod n. (5) According to the additive homomorphism property, we have where D k s () denotes the decryption scheme. Finally, server 2 sends the matrix U to each sub-network.
Step 6: Alignment Each sub-network receives the fused matching degree matrix U from Server 2, where U i j represents the sum of matching degree value obtained by aligning the user i in G X t with the user j in G Y using the matching degree metric (i.e., CMN, IDP). For the user i in G X t , the user j with the largest element U i j , ( j = 1, 2, . . . , N ) in G Y is taken as the user aligned with user i, i.e., user i in G X t and user j in G Y are the same person. With the help of third-party servers, the alignment information can be aggregated efficiently without leaking the information of the sub-networks. Thus, the enhanced alignment performance can be achieved. Finally, the pseudo codes of HE-SNA algorithm are demonstrated in Algorithm 1.

Security analysis
Since privacy is an important security requirement, the proposed HE-SNA approach should meet this requirement. Because sub-networks is reluctant to share information to others, the proposed method uses two third-party servers to achieve the purpose of fusing the information of subnetworks while protecting the information from leakage. The sub-networks encrypt the matching degree matrices with k p and send them to Server 1 for fusion. Since Server 1 does not have k s , it is unable to decrypt them to get the real matching information. Moreover, the encryption of information can prevent leakage by malicious attacks during transmission. Server 1 fuses the encrypted matching degree matrices and sends it to Server 2, which decrypts it using the k s . In this case, the fused matching degree of all sub-networks can be obtained by Server 2, the real matching information of a single sub-network is not available.

Data sets introduction
In reality, it is difficult to obtain multiple private subnetworks of an online network, we consider a real online network as a system of "multiple private social networks" and generate several sub-networks using a special sampling strategy for the online network. Four real-world alignment  [49], Twitter vs Foursquare [50], DBLP vs ACM [51], Youtube vs Twit-ter1 [52]. Douban online, Twitter, DBLP, Youtube are the online layers G X while Douban Offline, Foursquare, ACM, and Twitter1 are the offline layers G Y . The basic structural information of them is listed in Table 1, and the sub-networks of each original network are obtained by a specific sampling scheme [50]. Without losing generality, this part takes d = 2, 3 as examples. Consider the case of dividing the online layer G X into two sub-networks first: a random value p ∈ [0, 1] is generated to determine whether an edge in the original network exists in one sub-network or in two sub-networks. If p ≤ 1 − 2α s + α s α 0 , the edge is not retained in any sub- network. If 1−2α s +α s α 0 < p ≤ 1−α s , the edge is retained in the first sub-network. If 1 − α s < p ≤ 1 − α s α 0 , the edge is retained only in the second sub-network. Otherwise, the edge is retained in both sub-networks. In addition, parameter α 0 is used to measure the proportion of edges shared by two sub-networks, and parameter α s is used to measure the sparsity level of the sub-networks. For the three sub-networks case, we introduce an additional parameter α t to control the overlapping level between the two sub-networks (where α 0 is used to control the overlapping level of all three sub-networks). Specifically, for each edge in the original network, a random value p ∈ [0, 1] is generated. If p ≤ 1 − 3α s + 3α s α t − α s α 0 , the edge is not kept in any sub-network. If 1 − 3α s + 3α s α t − α s α 0 < p ≤ 1 + 2α s α 0 − 3α s α t , the edge is kept in only one sub-network. If 1 + 2α s α 0 − 3α s α t < p ≤ 1 − α s α 0 , the edge is kept in two sub-networks. Otherwise, the edge is kept in all three sub-networks if 1 − α s α 0 < p ≤ 1.

Experimental settings
When using traditional methods, such as CMN, IDP similarity, sub-networks can only be aligned using their own structure and partially aligned data. But the proposed method can "collaboratively" leverage the structure and alignment information of each sub-network for better alignment without revealing information of them. In this paper, the original cross-social networks G X and G Y are regarded as an on line layer and a common offline layer respectively (where the online layer is recorded by multiple OSNPs). 90% of the aligned user data (links between layers) are used as the training set and the rest are the test set. The training set is divided into 10 groups and the average results are taken. Each sub-network has a portion of the original user-aligned data training set. Without special description, we set the parameters α s = 0.5, α 0 = 0.5 for generating the online layer containing two sub-networks and parameters α s = 0.5, α 0 = 0.2, α t = 0.4 for generating the online layer containing three sub-networks.

Evaluation metrics
AUC (area under curve) measures the accuracy of inter-layer link prediction (user alignment) from an overall perspective [53]. Assuming that the process of comparing missing interlayer links with nonexistent inter-layer links is implemented independently f times, if the case of missing inter-layer links with higher scores exists f 1 times and the case of both with the same score exists f 2 times, the AUC is described as:

Effects of overlapping level of training sets among sub-networks
The overlapping level of training sets between sub-networks is defined as the number of overlapping edges/total number of training set edges. Tables 2 and 3 represent the effects of the proportion of training sets of online layer owned by two sub-networks and the overlapping level of training sets between sub-networks on AUC using CMN and IDP metrics under DBLP-ACM network, respectively. It is worth noting that when the overlapping level of training sets between two sub-networks is unchanged, the AUC increases as the proportion of training sets owned by the sub-networks increases. It can be seen that the HE-SNA approach does outperform the alignment using only a single network, regardless of the CMN or IDP metric, due to the fusion of information from sub-networks for alignment. It shows that the algorithm can indeed effectively fuse data from different private participants to achieve better alignment performance. Conversely, if the proportion of training sets owned by the two sub-networks is unchanged, the AUC gradually decreases as the overlapping level of training sets between sub-networks increases. This is not difficult to understand, because when the overlapping level of training sets between sub-networks keeps increasing, the fraction of having duplicate aligned data also increases, and HE-SNA is unable to obtain more different information from the aligned data of the two sub-networks, which leads to a decreasing trend of AUC. Here, only the DBLP-ACM network is used as an example, and similar results can be found for the other three sets of aligned networks. Figure 4 compares the change in AUC with different training set overlapping level between sub-networks using only a single sub-network versus the fusion case (using HE-SNA method) when the proportion of training sets owned by each sub-network is 4/9 (the matching metric is CMN). It can be seen from the diagram that HE-SNA algorithm is much better than the model that only uses the information of a single subnetwork, regardless of the variation in training set overlapping level between the two sub-networks. The above results are reasonable and logical since the "fusion" trick can aggregate more alignment information than a single subnetwork. To avoid tediousness, this paper only takes the case that the training set ratio of each sub-network is 4/9 as an example, and the similar results can also be obtained for other ratios.
Besides, we further discuss the effect of training set size on the alignment results. If train1 ∪ train2 = train, whether using CMN metric or IDP metric in Fig. 5, as the overlapping level of training sets between sub-networks increases, the AUC of both the single sub-network and fusion case in four sets of aligned networks increases due to the increasing number of edges of training sets owned by each sub-network, but the performance of using only a single sub-network is Fig. 4 The change in AUC with different training set overlapping level between sub-networks using only a single sub-network versus the fusion case when the proportion of training sets owned by each sub-network is 4/9 (CMN metric) Fig. 5 The change in AUC with different overlapping levels of training sets between sub-networks using only a single sub-network versus the fusion case when training sets owned by two sub-networks satisfy train1 ∪ train2 = train (CMN metric above, IDP metric below) Fig. 6 The change in AUC of two sub-networks and fusion case when train1 = train2 = train with different proportions of divided training sets (CMN metric above, IDP metric below) Fig. 7 The change in AUC of three sub-networks and fusion case when train1 = train2 = train3 = train with different proportions of divided training sets (CMN metric above, IDP metric below) much less than that of HE-SNA since the fusion case has more alignment information. Here train1 and train2 denote the training sets of sub-network1 and sub-network2, respectively, and train denotes the training set of the original social network G X .
From the above analysis, it is clear that when train1 ∪ train2 = train, the experimental alignment results are obvi-ously more effective and privacy-protected than using only a single sub-network, since the HE-SNA method fuses the information from all sub-networks. Therefore, when the training sets of the sub-networks are the same as the original social network, is the HE-SNA method bad for alignment? This is not the case: when train1= train2 = train, train1 = train2 = train3 = train, Figs 6 and 7 show the change in AUC It can be seen that with the increasing proportion of training sets owned by subnetworks, the AUC of using only a single sub-network and fusion case are increasing, but the HE-SNA method in this paper achieves far better alignment in both the two and three sub-networks than the single sub-networks approach.

Parameter sensitivity analysis
To comprehensively evaluate the HE-SNA method, we investigate the effects of parameters α s and α 0 on model performance. Taking the Douban online-Douban offline network as an example: set train:test = 9:1 and select the CMN metric, sub-figures (a) and (b) of Fig. 8 are the results of experiments containing two sub-networks when train1 = train2 = train, while sub-figures (c) and (d) of Fig. 8 are the results of experiments containing three sub-networks when train1 = train2 = train3 = train. The results in Fig. 8a, c indicate that whether it is two sub-networks or three sub-networks, the conclusion is similar, i.e., with increasing sparsity α s , the AUC is increasing as each platform has a more complete sub-network structure. Therefore, it is more conducive to user matching regarding both individual sub-networks and the HE-SNA method.
As demonstrated in Fig. 8b, d, as the α 0 increases, the overlapping level between sub-networks is getting higher and higher, i.e., each sub-network becomes more and more similar, the advantage of HE-SNA method (i.e., fusing the structure of each sub-network) is weakening, so the AUC is decreasing. If α 0 = 1, the structure of each sub-networks are the same, so the AUC of the HE-SNA method and the individual sub-network methods are identical. From the above analysis, we can draw the conclusion that the lower overlapping level between sub-networks, the greater the advantage of HE-SNA method.

Computation cost analysis
To compare the running times between the original model (i.e., HE mechanism is not considered) and the HE-SNA method, we consider the extra running time of the HE-SNA model to compare the computation cost difference between the two methods. Suppose the number of sub-networks is 3, and the CMN metric is used to obtain the matching degree matrix of each sub-network. The following Table 4 presents the extra running times of our HE-SNA when HE mechanism is considered. It can be found that the extra running times increased by the HE-SNA method, which is proportional to the size of the matching degree matrix. The operation of encrypting and fusing the matching degree matrix followed by the corresponding decryption is implemented on PyCharm 2019 with the phe open source library, and the running times of all networks are averaged over five runs obtained and run on a Windows 10 system with a 2.60 GHz Intel processor and running memory of 8.00 GB.

Conclusions
Due to the importance of privacy protection, different OSNPs are reluctant to share information about the structure of the network and the attributes of the users, which brings a significant obstacle to the alignment of users across the networks. Our work starts from privacy protection and designs an HE-SNA method based on HE to align the original cross-network users. Experimental results show that regardless of different matching metrics, our method can effectively protect the data privacy and perform cross-network user identity alignment more accurately than using information from a single network only. Therefore, the proposed method provides a new idea for collaborative identification of identical user entities in multiple private networks. In future work, how to better extract the structural features of the network and combine them with the attribute features of the nodes to improve the accuracy of the HE-SNA will be investigated, and the application of HE-SNA algorithm in other types of network data is also worth studying.