1 Introduction

With the explosion of big data, social networks are providing a plethora of information on user interactions [1, 2]. For example, Twitter’s average realizable daily active count in the second quarter of 2022 was 237.8 million, a year-on-year increase of 16.6% [3]. However, what we are seeing with the isolation imposed by COVID-19 is social networks playing an even more important role in the everyday interactions between people - particularly when it comes to social influence. The term ’social influence’ is usually understood as the process whereby a user’s emotions, opinions, or behaviors are shaped by their environment, i.e., the process by which people alter their behavior under the influence of others [3]. With the globalization of online social networks, social influence analysis has spread to many domains, including marketing [4], behavior prediction [5], recommendation systems [6, 7], influence maximization [8], public opinion guiding [9], communities [10], and graph anomaly detection [11]. We cannot deny that social influence has become ubiquitous and complex in shaping our social decisions. Therefore, there is much interest in developing methods to understand, describe, and identify the mechanisms and evolutions behind the social influence.

The process of predicting influence on a user involves sampling local neighbors, building a local network from the samples, and then learning the potential predictive signals from that network. Matsubara et al. [12] designed a dynamic model of social influence based on a differential equation drawn from classic susceptibility theory. The approach proposed by Li et al. [13] combines RNNs with representation learning to infer the cascading size of features. With both these methods, the aim is to predict the statistical patterns of social influence over time, including cascading size and global patterns. According to Qiu et al. [14], the famous social influence prediction method, DeepInf, has been developed for general social networks scenes. The idea is that we can predict the user’s behavior state by taking into account the behavior state and characteristics of their neighbors. However, due to the sparsity, the neighbor information of the COVID-19 network is very limited, so they are invalid for the network-based COVID-19 prediction.

In this paper, to overcome these issues, we extended DeepInf such that our proposed model utilizes personalized propagation to predict social influence at the user level. We built on DeepInf [14] by integrating the transition probability α of the page rank domain with a GCN model. Specifically, our proposed model first uses a graph neural network to learn the latent social representation of users by taking their local network as input. We then replace the GAT/GCN network with PPNP (personalized propagation of neural predictions) [15], a model that enhances influence scores efficiently. Meanwhile, α adjusts the size of the neighborhood influence. H is the prediction matrix, and αH offers greater flexibility, as shown in Figure 1. Finally, the transition probability α provides us with a way to achieve the ideal balance between maintaining locality (i.e., retaining a close proximity to the root to avoid over-smoothing) and leveraging the information obtained from large neighbors. In experiments, this propagation scheme has been shown to be extremely efficient and has the benefit of being able to use far more propagation steps (an infinite number in fact) without resulting in over-smoothing of the output. In short, we extended DeepInf by leveraging the two algorithms, PPNP and APPNP (approximation to PPNP) [15] extended from a GCN, and transplanting them from page ranking into social influence analysis. Our implementation gives rise to deep learning-based personalized propagation algorithm, called DeepPP. The algorithm predicts how v will behave in the future based on its current behavior by analyzing the state of n neighbors around a user v. An active or inactive node (user) can exist in the system. But, to forecast the state of v at the end of the interval, we need to know v’s current state.

Figure 1
figure 1

The neural network first predicts the influence of each node according to its own characteristics. Then, the personalized PageRank algorithm is used to propagate influence adaptively. (a) shows the personalized propagation. (b) shows the improved personalized propagation

We examined four social networks in various domains to evaluate our algorithm’s efficiency and effectiveness, including OAG, Digg, Twitter, and Sina Weibo, as well as two COVID-19 datasets (Hubei and Holland). We compared DeepPP with: a conventional GNN* model [16], the advanced PPNP and APPNP [15], and the most famous models, DeepInf [14] and DeepEmLAN [17]. The results of an extensive study demonstrated that DeepPP provided a better F1 − Measure than the current advanced baselines. On the COVID-19 datasets, the DeepPP provided a higher level of precision. It has been demonstrated that DeepPP can predict the social influence of COVID-19 based on excellent performance on various datasets.

The following list summarizes our main contributions.

  1. 1.

    Inspired by DeepInf, we integrated the transition probability α of the page rank domain into a GCN model, thereby extending DeepInf.

  2. 2.

    The resulting DeepPP algorithm repurposes PPNP and APPNP without any additional time complexity from page ranking to social influence analysis.

  3. 3.

    Comprehensive evaluations of model performance show DeepPP to be more accurate than the baseline methods.

Following is the remainder of this paper. We review the status of research on social influence prediction in Section 2. Section 3 introduces the overview of DeepPP and outlines the personalized propagation process based on deep learning for modeling the social influence of COVID-19. Section 4 describes the experiments used to validate DeepPP. Section 5 highlights concluding remarks and next steps.

2 Related work

Our literature review covers the following topics: traditional social influence analysis, deep learning-based social influence prediction, and graph representation learning. The results show that an accurate model for predicting social influence may not have been established.

2.1 Methods of traditional social influence analysis

Social influence analysis is traditionally based on the study of interpersonal communications, where user features are typically extracted by hand, which can be tedious. Further, the analyses are based on domain expertise in sociology or cognitive science and, therefore, extending the results is difficult [14]. A study by Li et al. [18] distinguished two types of social influence analysis models: (1) macro, which assumes all users have equal power to influence; and (2) micro, which explores individual levels of user influence. Of the micro-level influence models, independent cascade and linear threshold are the two most common. Nevertheless, both types of models assume that users will not change their state, i.e., the probability of influencing others or being influenced [19] using Bayes Theorem [20, 21]. As such, they are not particularly reflective of real social networks.

2.2 Methods of social influence prediction with deep learning

Many fields have benefited from deep learning, and social influence analysis is no exception. However, its application to this field is relatively recent. To date, both micro- and macro-level methods of analysis have been developed. Micro-level models focus on user interactions. The key assumption is that user interactions, e.g., ratings, comments, retweets, affect the behavior of others. One of the most advanced models is DeepInf [14], a model that predicts user-level influence from end to end. It integrates a network embedding [22] with a GAT [23] and a GCN [24]. Both GATs and GCNs are better suited to non-Euclidean spatial data at aggregating features of neighbor vertices onto central vertices than traditional machine learning methods [25]. In fact, experiments with DeepInf show the best predictive performance with multi-headed GATs, even when compared to a GCN [24]. By using dual GNNs instead of a single GNN in recommender systems, Wu et al. [26] designed a deep latent representation of multifaceted social impact.

The macro-level solutions focus on the patterns of global social influence. For example, DeepCas [13], a method of analyzing macro-social impact models using RNNs, involves a pattern of cascading that includes all aspects of the cascading and their associations with the final cascading size [22]. In this method, as a two-dimensional colored diagram, an end-to-end predictor visualizes all cascades of influence information. Using DeepCas as inspiration, Cao et al. [27] developed a method which represents information cascades using an explainable model of the generation Hawkes process [28]. Their model showed better performance than traditional generation and feature-based models. DeepHawkes [27], as well as DeepInf, is data-driven, which means it can learn from previous cascades and can take advantage of historical information. A data-driven cascading method called LSTMICs, that uses long short-term memory (LSTM), was further developed and extended by Gou et al. [29] to learn sequence features from cascading features and RNN functions. They used a Weibo and a Twitter dataset to predict outbreaks more accurately than previous methods. With Cas2vec [30], it is possible to accurately predict virus cascades without manually extracting the relevant information from the framework, which could be costly to obtain. Time intervals are used instead of the information contained in the events to extract information.

Social influence analysis models are closely related to graph representation learning, and many studies have been conducted on graph representation learning as part of graph mining, e.g., DeepWalk [31], Line [32], Node2vec [33], Metapath2vec [34], NetMF [35], Graph Kernel [36], and the most advanced method PSCN [37]. More recently, researchers have been exploring the notion of graph representation learning with semi-supervised information. Some examples include GCN [24], GraphSage [38], and GAT [23] as the most advanced model.

3 Methodology

Analyzing social influence with deep learning techniques is a problem related to GNN. Because GNN in recommendation systems has other types of applications, such as collaborative filtering and predictive page ranking, drawing inspiration from other fields to improve our existing research is a highly useful exercise. Figure 2 illustrates personalized propagation based on deep learning for modeling the social influence associated with COVID-19.

Figure 2
figure 2

Overview of the DeepPP model. The neighbors of node v are sampled with the goal being to predict the state of node v after a number of iterations

3.1 Influence propagating

We extended DeepInf by leveraging two algorithms, PPNP and APPNP. Klicpera et al. [15] proposed them by extending a GCN, where each node has the same effect on its neighbors. In other words, the independent cascade model recalled in Section 2 has the same influence scores and probability of being propagated to its neighbors for each edge. With the transition probability α, information transmission is adjusted so that each node has a different influence on its neighbors. In experiments, PPNP and APPNP algorithms perform better than the GNN algorithm when the transition probability α is set to between 0.05 and 0.2 [15]. This is also compared to the GCN algorithm, which has an average effect on all neighbors. Given that the graph \(\mathcal {G}=(\mathcal {V}, \mathcal {E})\) contains \(\lvert {\mathcal {V}}\rvert \) vertices and \(\lvert {\mathcal {E}}\rvert \) edges. A is the adjacency matrix of \(\mathcal {G}\). In is the added self-loops for \(\mathcal {V}\). Let \(\hat {\mathbf {A}}=\mathbf {A}+\mathbf {I}_{n}\). The transport vector ix preserves the local neighborhood of the node [39]. Essentially, the I(x,y) score (i.e. Influence of root node x on non-root node y) is equivalent to the y(th) of a personalized page ranking πpr(ix). A recursive equation with a transport vector is expressed with a transition probability α as:

$$ \pi_{pr}(\mathbf{i}_{x})=\alpha((\mathbf{I}_{n})-(1-\alpha)\hat{\mathbf{A}})^{-1}\mathbf{i}_{x}, \alpha\in(0,1) $$
(1)

The influence score in (1) can be used to generate the prediction objective function.

$$ \mathcal{Z} = softmax(\alpha(\mathbf{I}_{n}-(1-\alpha)\hat{\mathbf{A}})^{-1}\mathbf{H}), \mathbf{H}_{i,i}=\mathcal{F}_{\theta}(\mathbf{X}_{i,s}) $$
(2)

where Xi is the eigenmatrix. H is the prediction matrix. \(\mathcal {F}_{\theta }\) represents a neural network that predicts \(\mathbf {H}\in \mathbb {R}^{n\times c}\). (1) and (2) are the PPNP model. However, calculating the time complexity requires a dense matrix \(\mathbb {R}^{n\times c}\) with the time complexity O(n2). APPNP was developed as a means of addressing this shortcoming. More specifically, by using the power method, which has a linear computational complexity, the eigenvalues for diagonalizable matrices are calculated as:

$$ Z^{\left( 0 \right)} = \mathbf{H} = {f_{\theta} }\left( \mathbf{X} \right) $$
(3)
$$ \mathcal{Z}^{(k+1)}=(1-\alpha)\hat{\mathbf{A}}\mathcal{Z}^{(k)}+\alpha\mathbf{H} $$
(4)
$$ \mathcal{Z}^{(k)}=softmax((1-\alpha)\hat{\mathbf{A}}\mathcal{Z}^{(k-1)}+\alpha\mathbf{H}) $$
(5)

where \(\mathcal {Z}^{(k)}\) is a transport set that can effectively provide approximate predictions. The integer k defines the number of steps required to perform a power iteration.

Our model is derived from a GCN and a graph attention network model as follows.

3.2 Network encoding with a GCN

GCN, that analyses graph-structured data, is semi-supervised learning [24]. For eigen decomposition, the GCN uses a Laplace graph of the Fourier domain. Specifically, the rule in (6) for a GCN is layer-by-layer propagation, which consists of multiple GCN layers.

$$ \mathcal{F}\left( {{\mathbf{H}^{\left( l \right)}},\mathbf{A}} \right) = \sigma \left( {\mathbf{A}{\mathbf{H}^{\left( l \right)}}{\mathbf{W}^{\left( l \right)}}} \right) $$
(6)

where, H(0) is the eigenmatrix. When D is normalized, the adjacency matrix A benefits from a diagonal node degree matrix by preserving the graph’s self-loop. Putting all this together, A is as follows:

$$ \mathbf{A}=\hat{\mathbf{D}}^{-\frac{1}{2}}\hat{\mathbf{A}}\hat{\mathbf{D}}^{-\frac{1}{2}} $$
(7)

3.3 Multi-head attention with a GAT

Another neural network model, a GAT [40, 41], processes graphical structured data in the propagation using self-attention techniques. In more detail, the GAT calculates the node states based on the node’s neighbors, where each node is assigned an attention factor, calculated using its coefficient of attention αij (\(i\rightarrow j\)).

$$ \alpha_{ij}= \frac{{\exp \left( {{\text{LeakyReLU}}\left( {{{\vec \alpha }^{T}}\left[ {{{\mathbf{W}}}{{\vec h}_{i}}\left\| {{{\mathbf{W}}}{{\vec h}_{j}}} \right.} \right]} \right)} \right)}}{{\sum\limits_{k \in {N_{i}}} {\exp \left( {{\text{LeakyReLU}}\left( {{{\vec \alpha }^{T}}\left[ {{{\mathbf{W}}}{{\vec h}_{i}}\left\| {{{\mathbf{W}}}{{\vec h}_{j}}} \right.} \right]} \right)} \right)}}} $$
(8)

By the weight matrix \(\left [ {{{\mathbf {W}}}{{\vec h}_{i}}\left \| {{{\mathbf {W}}}{{\vec h}_{j}}} \right .} \right ]\), (\({\mathbf {W} \in \mathbb {R}^{F\times F}}\)), α as a weight vector. T stands for transposition, ∥ stands for series operation. By combining the normalized attention coefficient and the linear feature combination of the final output feature, we can obtain the following expression.

$$ {{\vec h}_{i}}^{\prime} = \sigma \left( {\sum\limits_{j \in {N_{i}}} {{\alpha_{ij}}{{\mathbf{W}}}{{\vec h}_{j}}} } \right) $$
(9)

The characteristic functions can be concatenated using the k independent attention layers in (9) through the multi-head attention technique, which is an appropriate approach for the learning process:

$$ {{\vec h}_{i}}^{\prime} = \sigma \left( {\frac{1}{K}\sum\limits_{k = 1}^{K} {\sum\limits_{j \in {N_{i}}} {\alpha_{ij}^{k}{{{\mathbf{W}}}^{k}}{{\vec h}_{j}}} } } \right) $$
(10)

3.4 Modeling the social influence of COVID-19

Through deep learning, DeepInf can automatically identify hidden patterns at the user level and predict their influence. PPNP is a personalized neural prediction propagation algorithm. By combining the PPNP and APPNP algorithms, we improve the DeepInf method to model the social influence of COVID-19. More specifically, we combine the personalized page ranking algorithms in the DeepInf method. The GAT/GCN network is replaced with a personalized page ranking model, and a novel algorithm named DeepPP is devised.

We use the transfer probability α to adjust the size of the neighborhood influence. Therefore, the regression equation of a stealthy transport vector enhancement is combined with the prediction matrix H and the stealthy transfer probability α as (11)

$$ \mathcal{Z}_{DeepPP}=softmax(\alpha(\mathbf{I}_{n}-(1-\alpha)\hat{\mathbf{A}})^{-1}\mathbf{H}+\alpha\mathbf{H}) $$
(11)

PPNP’s linearity is reduced by adding αH. The flexibility of αH is apparent in practical applications.

To demonstrate the algorithm, Figure 3 shows two examples from the Digg dataset. Social influence prediction aims to predict the behavior state of target node v from neighboring nodes. User v represents the target node. The solid nodes represent the active state ‘1’, and the hollow nodes represent the inactive state ‘0’. Figure 3(a) shows a social influence prediction with an active node, while Figure 3(b) shows a prediction with an inactive node. Our experiments show that DeepPP predicts the ground truth more accurately than DeepInf.

Figure 3
figure 3

Two samples from the Digg dataset. (a) Social influence prediction with an active node. (b) Social influence prediction with an inactive node

4 Experiments

In our experiments, we compared the DeepPP model with the conventional models GNN* [16], PPNP and APPNP [15], and the most advanced available models DeepInf [14] and DeepEmLAN [17]. To avoid confusion with GNN in a broader sense, the particular method we used has been named GNN*. GNN* is an extension to a common framework that includes information diffusion, relaxation mechanisms, and a random walk model. Its input can be a cyclic graph, a directed graph, an undirected graph, or a mixed graph. DeepEmLAN integrates different types of attributes and topologies into a single semantic space in a seamless manner, while retaining different types of attributes and topologies to the extent possible. The learning rate for all baselines was set to 0.1, and 1000 epochs for mean square error loss.

4.1 Datasets

To train the classification models, three categories of features were considered for ego-user:(a) vertex features; (b) pretrained network embeddings (in DeepWalk [31]); and (c) hand-crafted features. They are listed in Table 1.

Table 1 List of features for ego-user

4.1.1 Four social networks in different domains

We conducted the experiment with four datasets from different fields. Table 2 presents the statistical information. \(\lvert \mathcal {V} \rvert \) and \(\lvert \mathcal {E} \lvert \) represent the number of nodes and edges in graph \(\mathcal {G}=(\mathcal {V}, \mathcal {E})\), respectively. There are N observable instances.

Table 2 Statistics for the datasets

OAG

. Microsoft Academic Graph and Aminer are linked in this dataset. [47]. We chose 20 major conferences in the areas of computer science and artificial intelligence, such as SIGCOMM, SIGMOD, AAAI, NeurIPS, similar to the approach described in [48]. The social networks were defined as co-author networks, and social behavior was defined as citation behavior where one researcher cites a paper presented at the conference. Specifically, we want to know how collaborators influence citation behavior.

Digg

, a social news site, is based in the United States. There are two features, digging and burying, based on whether people agree with the story. In this dataset, we provide data on the stories that appeared on the front page of major newspapers in 2019.

Twitter

. The Twitter dataset was built from propagations of the announcement of the discovery of the elusive Higgs boson on Twitter in July 2012. This is a friendship network where social behaviors are mapped from retweets in Twitter.

Weibo

[14]. Sina Weibo is the most widely used microblogging service in China. It’s a social media platform based on user relationships. At present, the monthly active users are 523 million, and the daily active users are 229 million. This dataset contains target tracking networks and tweets during Sept. 28-Oct. 29., 2012.

4.1.2 COVID-19 Datasets

Hubei

(see Figure 4). This is a dataset of infected cases provided by the Hubei Provincial Health Committee Footnote 1 [49]. In December 2019, COVID-19 first appeared in Wuhan, China. On January 21st, 2020, Hubei Provincial Health Committee reported the first case outside Wuhan. Then from February 15, 2020, the diagnostic policy was changed , resulting in a sharp increase in the number of recorded infection cases. Therefore, this dataset is limited to the period from January 21 to February 14, 2020.

Figure 4
figure 4

Infection during COVID-19 in Hubei, China

Holland

(see Figure 5). This dataset contains data on infection cases collected by the Dutch National Institute for Public Health and the Environment Footnote 2 [50]. The first infection, diagnosed on February 27, 2020, went to Italy a week ago, after which the number of cases grew rapidly. Reported cases reached their peak at the end of March, followed by a downward trend in daily reported cases. As reported cases in Holland increased more slowly than that in Hubei, the overall infection period was longer and there are more data points. The gradually increasing number of infected cases is conducive to the accuracy of prediction.

Figure 5
figure 5

Infections during COVID-19 in Holland

4.2 Evaluation

The following metrics were used to measure performance in the experiments: AUC, Precision, Recall, and F1 − Measure [51].

We first analyzed the overall performance of methods. We searched each parameter in the threshold space across the six datasets and calculated the corresponding F1 to get the most appropriate value, i.e., the best F1 − Measure (F1best). Figure 6 shows the F1best score and the total average of the datasets. On F1best, DeepPP was on average 2.01% higher than DeepEmLAN, 2.80% higher than APPNP, 3.78% higher than PPNP, 6.26% higher than DeepInfGAT, and 8.73% higher than DeepInf-GCN. Hence, DeepPP yielded superior results to the baselines.

Figure 6
figure 6

A comparative analysis of DeepPP and baseline methods in F1best across six datasets

To determine the effects of hyper-parameters, the transfer probability α was adjusted from 0.2≤ α ≤ 1.0, in 0.2-step increments. We then ran 1024 mini batches over 1000 epochs. Moreover, we used dropout technology with a dropout rate of 0.2. We compared predictions at α = 0.2, 0.4, 0.6, and 0.8. These results, as detailed in Table 3, clearly indicate that DeepPP outperformed the other six method in terms of AUC, Precision, Recall, and F1_Measure. In DeepInf, DeepInf-GAT had the higher predictive performance than DeepInf-GCN. According to Table 3, DeepPP, DeepEmLAN, APPNP and PPNP were competitive. However, DeepInf-GAT was especially good with regard to Precision on the Weibo, Hubei, and Holland datasets.

Table 3 Comparison of different methods with respect to AUC, Precision, Recall, and F1 − Measure, at varying values for α

Some models based on deep learning were proposed to analyze the social influence, such as the famous DeepInf. We designed a novel model to improve the prediction, which draws on DeepInf, PPNP and APPNP as inspiration. However, compared to the baseline methods, DeepPP has: more flexibility in practical applications, a more convenient way to adjust the parameters; and better performance with large and small datasets.

4.3 Parameters analysis

In terms of the gauging the influence of the other parameters, there are three categories to test: 1) the basic training parameters, which include the window size and the z-variable dimension of the potential space; 2) the temporal convolution network (TCN) unit parameters, i.e., the filter size and the TCN levels; and 3) the score attention parameters. For all these experiments, we set α to 0.8.

First, we studied the effects of changing the windows size. This directly affects the length of the time-dependencies in the historical data. The larger the window, the more data dependencies that can be captured. However, as the window size increases, so does the computing power required, which in turn affects the detection speed. The first row of Figures 7 and 8 shows that five window sizes were tested, i.e., 5, 20, 50, 100, 300. OAG and Weibo reached maximum F1 − score at a window size of 50, while Digg, Twitter, Hubei and Holland reached their maximums at a window size of 100. From this, we determined that the optimal window size relates to the composition of the dataset. A small window expresses better performance for datasets with weak time correlations. However, with time-dependent datasets, small windows cannot capture long-term dependencies. Additionally, observing the six datasets, all showed a performance degradation with a window size of 300. This indicates that, if the length of the data is too long, the generalization ability of the model will decrease. In addition, a too-large window will lead to a rapid increase in the required computing power, a larger model scale, and a slower training speed.

Figure 7
figure 7

The effect of parameters on OAG, Digg and Twitter

Figure 8
figure 8

The effect of parameters on Weibo, Hubei and Holland

Next, we studied the effects of the second basic parameter, being the variable z in the m-dimension. The second row of Figures 7 and 8 shows the results for five z-variable dimensions. What we see is poor performance with the OAG dataset with a small z-dimension. This is because the dimension is mapped to a small potential variable space, which results in a large amount of information loss in the encoder stage. In turn, the decoder is unable to recover, resulting in performance degradation. We also observed that changes in the z-dimension had little effect on performance with the Hubei and Holland datasets. As z increased, the loss in the training process changed greatly. However, many iterations helped the models to stabilize.

The TCN filter size was next. Here, we fixed the expansion factor d, so only the filter size needed to be adjusted to change the field size. The third row of Figures 7 and 8 show the results for five filter sizes. As shown, the optimal filter size for the smaller Digg dataset was 7, while for the Hubei and Holland datasets, the optimal size was 14. This indicates that the optimal filter size is determined by the size of the dataset. However, due to the fixed expansion coefficient, a model’s performance is relatively unaffected by the choice of filter size.

In terms of TCN levels, we found that changes in the TCN level directly affected the scale of the DeepPP model. The fourth row of Figures 7 and 8 shows the results for four TCN levels. From a data perspective, OAG and Digg (group training) have a smaller data scale, and so we saw better performance when using a smaller TCN level. Conversely, for the large-scale Hubei and Holland datasets, representation performance was greatly reduced at a TCN level of 1. Moreover, the smaller models could not capture the time dependencies effectively. From a model point of view, performance was excellent with a TCN level of 8. However, at a TCN level of 14, although the Weibo dataset showed a slight improvement, the size of the model almost doubled, which is unworkable in practice.

Next, we looked at score attention, varying the parameter’s value across 2, 5, 10, 25, 50. The role of this mechanism is to improve the accuracy of abnormal data that is close to normal. The results, appearing in the fifth row of Figures 7 and 8, show that varying this parameter has little effect on the results. This therefore warrants further attention in our ablation experiments.

Test Loss

Figure 9 shows a common trend, namely, that the final test loss decreases with an increase in α. In fact, the results show that the larger α is, the less the test loss and the better the performance. From a comparison of four models with different datasets, as shown in Figure 9, we see that DeepPP had better performance than any of the other algorithms.

Figure 9
figure 9

Test loss with α ∈ (0.2,1.0). (a) Test loss on OAG, (b) Test loss on Digg, (c) Test loss on Twitter, (d) Test loss on Weibo, (e) Test loss on COVID-19 in Hubei, (f) Test loss on COVID-19 in Holland

4.4 Spread analysis of the COVID-19

To illustrate the advantages and disadvantages of our method, we chose a COVID-19 dataset from two different regions, Hubei and Holland. Although these two areas do not represent complete representations of COVID-19’s spread, they can reflect the spread of infectious diseases in general.

4.4.1 Hubei

Figure 10 shows the change in prediction accuracy of different prediction algorithms over time. The dates are displayed along the horizontal axis. From January 22, we used all the available information to make COVID-19 day-ahead prediction. For example, in Figure 10(a), the rightmost point displays the results from January 22 to February 13 to predict what will happen on February 14.

Figure 10
figure 10

COVID-19 day-ahead predictions from Hubei China. Predictions are given over a 1-to-6-day interval (see subfigures (a) to (f))

Over time, the absolute percentage mean error (APME) error has a tendency to decrease as the amount of data available increases. A rapid increase in infection cases was followed by a more gradual trend, displaying a sub exponential rise in daily infections. The APME error will decrease with sub-exponential growth since it is a measure of relative error. Also, as the prediction horizon extends, prediction accuracy decreases rapidly. As shown in Figure 10(e) and (f), it is not possible to accurately predict the number of cases five or six days before and after February 1.

Figure 11
figure 11

COVID-19 day-ahead predictions from Holland. Predictions are given over a 1-to-6-day interval (see subfigures (a) to (f))

DeepEmLAN performed better, but it did not find an accurate prediction before and after January 31. The time series in the leftmost part of Figure 10 is the shortest, so there was less data available to train DeepEmLAN. In these cases with short time series, the prediction accuracy of this pure machine learning algorithm was lower than the other methods.

4.4.2 Holland

Prediction accuracy with the Holland dataset is shown in Figure 11. The COVID-19 situation in Holland was essentially the same as in Hubei prior to April 1, 2020. Here, DeepPP proved to be the best method but with large deviations in prediction accuracy. All compared algorithms have been roughly the same since April 1. Also, whether the network is initially static or dynamic appears to have little effect on prediction accuracy. The DeepPP algorithm is trained on more and more infection data over time. The DeepInf-GCN and DeepInf-GAT separation methods performed best in the whole cycle, while PPNP and DeepEmLAN performed worst.

The prediction accuracy of DeepPP and APPNP is comparable. One possible reason is that the transmission of COVID-19 was primarily inter provincial interaction. The COVID-19 spread mainly in the provinces after the end of March. We then compared the spread of COVID-19 across all seven algorithms. The errors are listed in Table 4, obtained by averaging all APME prediction errors over a 1-to-6-day prediction interval. Table 4 clearly illustrates that the prediction error of DeepPP algorithm is smaller than the other algorithms, because DeepPP takes into account cities interact with each other. The prediction errors for DeepPP in each city are equal to that of DeepEmLAN. In conclusion, the network-based method offers better prediction accuracy.

Table 4 Error comparison of different methods

4.4.3 Ablation experiments

In the ablation experiments, we compared multiple variants of DeepPP: DeepPP-RNN, DeepPP-noPNF, DeepPP-noScoreAttention, DeepPP-noPointAdjust, using the DeepPP prototype as the control model. DeepPP-RNN replaces the TCN unit with an RNN. DeepPP-noPNF replaces the transformation process in the potential space with a Gaussian distribution N(0,1). DeepPP-noScoreAttention omits the score attention mechanism and, similarly, DeepPP-noPointAdjust omits the point adjust method. For fairness, we also eliminated the peaks-over-threshold (POT) mechanism in our model and used F1best as the evaluation indicator.

As can be seen from Figure 12, DeepPP-RNN did not perform as well as DeepPP. This is because a simple RNN cannot capture the long-term dependencies in time series data. Moreover, DeepPP resulted in a much smaller model that took far less time to train compared to DeepPP-RNN. Hence, training DeepPP would be much easier. From the perspective of the impact of latent space, PNF can capture complex latent variable data patterns and help build and generate latent space variables. From DeepPP-noPNF and DeepPP-RNN in Figure 12, there is evidence that the noPNF variant has a lesser impact on the model compared to the RNN variant, while the RNN variant had a greater impact with the Digg dataset. This is because the scale of packet data in the Digg dataset is very small, so the RNN was able to capture the time dependencies.

Figure 12
figure 12

A comparative analysis of ablation experiment in F1best on six datasets

In general, the impact of DeepPP-noScoreAttention and DeepPP-noPointAdjust on model organization was smaller than DeepPP-RNN and DeepPP-noPNF. After DeepPP-noScoreAttention eliminates the abnormal scoring mechanism, the gap between DeepPP-noScoreAttention and DeepPP is very small. This is because the score attention mechanism has abnormal data close to normal data. The detection results have an impact, but the impact is much smaller than with the overall model. DeepPP-noPointAdjust had the greatest impact with the Twitter dataset. This is because point adjustment can effectively detect continuous anomaly types. Compared to abnormally scattered datasets and massive data, the impact is much greater.

5 Conclusion

We presented a deep learning-based personalized propagation algorithm, referred to as DeepPP. By extending DeepInf, this algorithm integrates the transition probability of the page rank domain with a GCN. The method can adjust the size of a neighbor’s influence. It also has greater flexibility. A variety of social networks (OAG, Digg, Twitter, and Weibo) as well as two COVID-19 datasets (Hubei and Holland) were studied in extensive experiments. As demonstrated by the results, DeepPP yielded better F1 − Measure than the current advanced baselines. The proposed method performed better on the COVID-19 datasets in precision. The excellent performance of DeepPP on various datasets proves that it can be effectively applied to general real-world scenarios for predicting the social influence of COVID-19. However, although we used a transition probability to achieve greater flexibility in our modeling, we did not consider any user-specified constraints. Further exploring and incorporating these into the model is a worthy future research direction. Another exciting direction would be to use reinforcement learning to combine sampling and learning for modeling the social influence of COVID-19.