Event Detection and Multi-source Propagation for Online Social Network Management

The social network is a huge source of information, which plays an increasingly crucial role in people’s daily lives. As a form of online social network management, much information can be discovered via posts, which allows people to exchange and propagate real-life events. Multi-source event propagation involves relevant posts of interesting topics from some key users to others in microblogging network users for network management. However, there are many noisy data in traditional microblogging network management. Meanwhile few people study the spontaneous transmission of events in microblogging network management, as well as the cooperation and competition among multiple event sources. To this end, the event detection and multi-source propagation model, is established. Specifically, for efficient and accurate result of the hot event detection and propagation, we obtain the information of previous event detection and propagation to create some experience sets for the intelligent event propagation. And a multi-source events propagation model based on individual interest is established to describe the process of multi-source event information detection and dissemination, and to describe the key role of users and information characteristics in the process of communication and network management. Meanwhile, the experimental results show that the proposed intelligent multi-source events detection and propagation model can learn from previous propagation to better discover and propagate the hot events under users’ changing interest. Besides, the interaction broadens the influence scope of hot events. This helps to explain the formation of microblogging hot events dissemination, to provide a theoretical basis for the research and network management of the guiding strategy.


Introduction
In recent years, online social network management has become an important part of our daily lives [1][2][3][4][5]. As a form of online social network management, microblogging network management platforms are also developing and attracting people at a rapid pace [6][7][8][9][10]. Microblogging network management platforms is known as the best tool for people to share and exchange opinion [11][12][13][14]. For example, many companies can promote their goods and services via microblogging network management platforms. Some people who have interest in football games can get information of their favorite football players immediately via relevant posts of users on microblogging network management platforms, which serves as tools for sending posts, which will also allow users to discover trending events [15][16][17][18][19].
On microblogging network management platforms [20,21], the spread of information is likened to fission. After the release of the user's post, the microblogging network management platforms will automatically push these posts to neighbours. These neighbours may forward these posts, which will be pushed to the neighbour's neighbours. And, the user is not only the consumer of information, but also the producer of information. Users forward other people's posts, but also release new information. The information can also be forwarded by neighbours, thus spreading to more users. Therefore, in a microblogging environment, the information spread faster and local discussion is more likely to cause group effect. At the same time, users of microblogging network management platforms obviously have different participation behaviours, different interests in topics, different levels of activity, and the contents of post affects their behaviour, which results in heterogeneity of topics. In addition, most of the topics will quickly disappear from the list of discussed topics, and some of the topics will stand out amongst competing topics to become a hot topic, causing a lot of attention.
At present, the dynamics of information dissemination is like infectious disease dynamics, such as the SIR model [22]. These models assume that there is only one communicator in the system at the initial time, and the communicator will pass the information to the neighbours through interaction with the neighbours. At the same time, interest in the communicated information may slow down leading to a loss enthusiasm from users and exit the topic discussion pool, to enter the stable state. However, users in microblogging network may spontaneously publish new posts and become a communicator. At the same time, users may not be interested in these events information, and will not get involved in the communication. The existing information propagation models [22][23][24][25] do not consider multi-source events detection and propagation, competing hot events and user interaction, hence it is needed to establish a multi-source events detection and propagation model which is suitable to describe the process of hot events information dissemination, and to describe the key role of users and event characteristics in the process of communication.
To this end, the multi-source events detection and propagation model, named event detection and multi-source propagation (EDMP) model, is proposed. And we study the propagation process of a single hot event, modelling the individual spontaneous communication behaviours. Then, interaction between the communication sources is analysed in the model. Finally, we study the process of simultaneous communication and establish multi-source propagation model for competing events based on user interest, which describes the relationship between the hot events.
The main contributions of this paper are listed as follows: 1. We propose an intelligent event propagation model with the knowledge sets [8].
Specifically, this event propagation model does not require any knowledge from the microblogging network. The microblogging network management platform only assigns a set of key users as initial users representing hot events to the event propagation model. Then, the model will use the first initial users set for learning and generating experience sets. And the event propagation model will compare the keywords of users' interest with the content of discovered hot events, and it should have proper keywords to describe the topic of users' interest for comparison to form the topic keywords experience set. Therein, the model will compute a prediction score according to the information of users and events obtained from previous event propagation. What's more, the prediction score will be used in user ordering process to generate target users learning experience set when the next event propagation starts. Finally, for the next hot event propagation, it will use these experience sets to achieve more effective event propagation. 2. We establish multi-source events detection and propagation model based on individual interest [7] to describe the process of multi-source event information dissemination, and to describe the key role of users and hot event characteristics in the process of microblogging network management and communication. Specifically, we firstly study the propagation process of a single event, modelling the individual spontaneous communication behaviours. Each time, an individual chooses a message to participate in from the information disseminated from the information collection, at the same time consider their own topics of interest.
Then as in the model, there is a cooperative relationship among the communication sources, we finally study the process of simultaneous communication and establish a multi-source event propagation model, which describes the relationship between the hot events based on cooperation and competition. 3. We apply our model to the real Twitter dataset to demonstrate the effectiveness of our proposed multi-source events detection and propagation model compared with some existing event detection and propagation models [7,14,17,26].
The remainder of this paper is structured as follows: we discuss related work for event detection and propagation in Twitter in Sect. 2. In Sect. 3, we introduce our intelligent event propagation model. In Sect. 4, we design our multi-source events propagation model. We do our experiments in Sect. 5. The last section concludes our study and future work.

Related Work
Recently, event detection and propagation, has drawn more and more attention from various fields of research especially concerning influence maximization based on users' opinion, and all kinds of methods have been proposed to catch event propagation in social networks [13,14,[22][23][24][25]. Besides event detection and propagation have extensive applications such as viral marketing [1], product promotion [2], and friend recommendation [11] and rumors control [12]. And some researchers pay attention to create some effective models for explains the general process of event information dissemination. These models are useful for the dissemination of event information in social network simulation [24,25,27]. However, these models cannot be directly applied to propagation of hot events because of the complex processes involved and uncertainty.
As we all know, the influence maximization problem was first introduced into the social network as an algorithm proposed by Richardson, which has been proved to be NP-hard, and can perform an approximate optimal solution with the accuracy of (1 − 1/e) based on a greedy algorithm. Developments from this initial work have generated excellent algorithms [28][29][30] and effectively improve the time efficiency of mining influential nodes. However, the influence of a given node on other nodes is the same in those studies; that is to say, the activation chance of a node to activate other nodes is a constant. Similarly, information content and node preference are not taken into consideration; i.e. the influence exerted by a node is also fixed, even for totally different event topics. Obviously, this is not accurate in real life, where, for example, an individual may has high influence among peers when it comes to discuss the subject "economy" but completely unknown by peers in the area of "law". In simple terms, it is rare for any individual to be considered as an expert in multiple fields. The influence of a person in a social network is likewise related to both the node and the topic, and the influence of a given node is different for different topics [31][32][33]. However, a limitation of these works [31][32][33] is that it solely considered the topic influence on user activation probability and did not take into account the popularity degree of the users' interest, the links between posts and the diffusion power of users. Overall, this results in a low efficiency algorithm and the improper number of final mining core users. Minimal research combines the topic popularity degree scoring, topic community detection and event propagation together, which can improve the efficiency and enlarge the influence scope of key users of hot events.
Therefore, previous researches have focused on studying event propagation in various ways [34][35][36][37][38][39]. Richardson and Domingos [32] studied the information propagation problem and propose a probabilistic method. And Kempe et al. [27] formulated the problem of event propagation as an optimization problem and developed an algorithm for an event diffusion model. Meanwhile, some other researchers have also put forward a lot of excellent algorithms on the basis of this work [22,24,25] and effectively improve the time efficiency of event propagation. However, the event propagation model on all other users is the same in those studies, that is, the activation probability of a user to activate other users is a constant. Each of these studies does not consider the importance of event content and user preferences. It has been shown that the event propagation in the microblogging network is related to the relationship between the users and the topic of event, and the event propagation ability of the same user is different under different events [27,40,41].
To this end, Zhang et al. [42] proposed a two-stage algorithm to propagate the hot events for a specific topic and improves the event propagation scope. Zhou et al. [43] calculated the user activation probability at the topic-level by user interest distribution and then proposed a new event propagation algorithm to quickly diffuse the events under specific topic based on the probability, which also improves the event propagation scope. These studies focus on the effect of event influence on user's activation probability and do not consider the popularity of events, the links between posts and the diffusion power of users, which also causes the waste of users' influence, resulting to low efficiency of event propagation. Meanwhile, unlike our model, these works only studied single event propagation. In addition, few people study the spontaneous transmission of events in microblogging network, as well as the interaction and competition among event sources. To the best of our knowledge, the intelligent multi-source events propagation question has not yet been well discussed.
Although previous researches have proposed many methods to event propagation, our work is very different. First, we propose the new event detection and propagation model based on key users [19] and users' interest [7]. Meanwhile, we express the problem of event propagation as a learned task and aim to identify the accurate characteristics of such events. Then we investigate the relation between extracted features of event propagation and user interest. Last, our dataset is extracted from Twitter, and we validate the effectiveness of our model compared with existing models [7,14,17,26].

Preliminary
Given a microblogging network G = (V,E), V = {v 1 ,v 2 ,…,V n } is a set of users, E = {e 1 ,e 2 ,…,E m } is a set of edges. Adjacency matrix denotes the connection relationship among users, the value of the corresponding element of the matrix indicates whether the edge exists: if there is an edge between v i and v j , then A ij = 1; if no edge exists between v i and v j , then A ij = 0.
Generally, the adjacency matrix A can be used as the similarity matrix of the microblogging network to describe the similarity between users. However, in addition to the similarity between the users which are directly connected in the network, there are different degrees of similarity between the users which are not directly connected. For example, there is a certain similarity between two users that can reach one another after a finite number of steps. The adjacency matrix is used as the similarity matrix of the network and it can simply represent the similarity relationship between users that are directly connected but it cannot be used to express the similarity relationship between the users that are not directly connected. Therefore, the adjacency matrix loses the similarity relation information between many users and cannot reflect the complete local information of each user. Adjacency matrix contains limited information which affects the accuracy of community discovery.
Therefore, in order to describe the local information of each user more adequately, a method based on step number is proposed in this section. According to the adjacency matrix A in the network, the similarity relationship score between users is calculated, and a new similarity matrix is obtained. The definitions of s-steps and similarity matrix are given in this paper as follow.
Definition 1 (s-steps) Given a social network G = (V,E), For any user in the point set, if user u can arrive at user v at least after s steps, that is, the length of the shortest path from user u to user v is s, it will be said that user u can arrive at user v through s steps.
Step number and attenuation factor are used to calculate the similarity relationship between two users which are not directly connected, which can better reflect the community topology structure, and improve the accuracy of community detection [15]. However, when the number of steps is greater than a certain threshold, two users that are not in the same community will also get a certain similarity value, which makes the boundary of community structure more obscure. Therefore, setting step threshold S, only calculates the similarity between users that can reach each other in the S steps, so as to ensure that the topological information of the microblogging social network is enhanced without affecting the division of community boundaries. In the experimental part, the step number threshold S and attenuation factor σ are analyzed, and the influence of different step threshold S and attenuation factor σ on the result is studied.

The Improved HITS Method
In the original HITS method, a link is used to represent the hyperlinks between web pages. While in our improved HITS method, a link represents an operational relationship between a user and a post such as publishing or commenting.
In this paper, the HITS algorithm is extended to exploit the inseparable connection between the users and their corresponding posts for the purpose of distilling the influential users [7,17,19]. As a result, the proposed improved HITS method can effectively filter out the random ordinary users, this helps to improve the efficiency and accuracy of intelligent event propagation model.

Intelligent Event Propagation Process
As we can see from the Fig. 1, it depicts the process of the intelligent event propagation. Intelligent event propagation consists of three steps: first propagation, learning process and consecutive propagation, in which the learning process is pretty important because the intelligent event propagation model's experience sets will be gained from this process.
Journal of Network and Systems Management (2020) 28:1-20 First propagation is a step of propagating events without any prior information about how to choose the initial users. During this step, the event propagation model only has some keywords extracted from key posts describing an interesting topic from an event. The key users are chosen to be the candidate set of initial influential users for propagating the hot events.
Learning process is a step where the event propagation model learns how to better get the relevant influential users. First, initial influential users set will be obtained by computing a hub score for each user and obtaining the high hub ones based on the HITS algorithm. Besides, topic keyword set will be created by extracting keywords from users' interests [7], as well as from the key posts of users, which point to hot events [17]. Finally, target user prediction set will be achieved by calculating topic similarity between the content of all detected hot events and the content of all detected users' interests [7,9,10] and employing those scores in user prediction process. These sets are composed of the intelligent event propagation model's experience sets. As we all know, appropriate initial influential users support the model to propagate as many influential users of hot events as possible at the beginning process of hot event propagation. What's more, proper topic keywords will help the model to recognize from the propagated users, the keywords related to a topic of users' interest. Furthermore suitable target user prediction assists the model to predict the relevancy of the content of users extracted from hot events. Consecutive propagation is a step during which the event propagation model detects high influence users based on these experience sets. During this process, suitable initial influential users and high-quality topic keywords have been learned.

Topic Popularity Based Event Propagation
In the IC model, the activation probability is generated randomly. However, the activation probability of a node is related to the social relationships among nodes and topics in the process of event propagation, and nodes have a different activation probability for different topics. Therefore, a Topic Popularity-based Event Propagation model, named TPEP model is proposed, which calculates the node activation probability P t u,v for specific topics to simulate the event propagation in social networks in a more realistic way.
The activation probability P t u,v is influenced by the following factors. Firstly, it is closely related to the social connections between nodes; greater connection times imply a more intimate relationship between nodes and have a higher activation probability. Therefore, user intimacy can be used to represent the degree of intimacy between nodes.

Definition 2
User intimacy, C u,v , denotes the frequency of the connection between nodes u and v. It can be obtained from the ratio of the connection times of u and v to the connection times of u and other nodes. The calculation method is shown in formula (1).
where R u,V i denotes the connection time of nodes u and V i , R u,v denotes the connection time of nodes u and v.
In addition, P t u,v is also influenced by users' topic popularity. The more popular the two users' topics are, the easier and quicker information is propagated. Therefore, the topic popularity can affect the activation probability P t u,v of two users.

Definition 3
Topic popularity, TP T u,v , denotes the popularity degree of two users' topic. The topic popularity TP T u,v can be calculated as formula (2).
where Authority T u,v denotes the authority of key post in the topic T, Authority if the authoritative value of key posts occupies a significant part of the topics, the more popular the topic will be.
In summary, the activation probability P t u,v is influenced by the user intimacy C u,v and topic popularity TP T u,v , so the activation probability of user u to v for specific topic t is calculated using formula (3).
The propagation process of the TPEP model is the same as the IC model that each user has only one chance to activate its neighboring users, and the user's activation process is independent of each other. The difference is that the users' activation probability of TPEP is different under different topics, which is more in line with the information propagation of microblogging networks.
In the first stage, we only choose the initial influential spreaders, and not consider the information propagation characteristics of the microblogging network. Therefore, this second stage uses the spreaders from the first stage to spread information using the TPEP model proposed in this paper, it then iteratively mines top-k spreaders with biggest topic influence increment as the remaining influential nodes. The biggest topic influence increment refers to the influence scope value of the spreader set after adding a spreader u minus the scope value before adding the spreader u to achieve a maximum. The calculation method is shown in formula (4).

Experiments
In this section, we detail the experiments in order to show the effectiveness of our proposed EDMP model. We consider typical event detection and propagation models as our baseline, namely IC (Independent Cascade) [14], BEE (Bursty Event dEtection) [26], EVE (Efficient eVent dEtection) [17], HEE (Hot Event Evolution) [7].

Dataset
Our datasets are collected from Twitter (http://twitt er.com/) via Twitter API [20]. The collected dataset is composed of 1,500,000 posts and 36,845 users.

Baseline Approaches
The efficiency and effectiveness of the proposed EDMP model is validated by evaluating our model against IC model, BEE, EVE, HEE, which are the classic event detection and propagation algorithms.

Parameter Experiment
The effect of step number threshold S and attenuation factor σ on experimental results are in this section. 1000 users' data in the database are randomly selected for experiments, and the F-measure score mentioned above is a measure of s the index.
In the experiment, the value of one parameter is fixed, and the influence of the change of the other parameter value on the F-measure is analyzed to determine the final value of the parameters. (1) Step number threshold S In view of the data set, the attenuation factor σ = 0.5 is set up, and the effect of the step number threshold S on the F-measure is analyzed.
As shown in Fig. 2, with the increase of the step number threshold S, the trend of F-measure increases first and then decreases. The experimental results show that considering the similarity of user pairs which are not directly connected but reachable within a certain number of steps, the local information structure of each user can be effectively determined. However, if the threshold is too large, the distance between the users in the same community will also increase with a certain similarity value, which will not facilitate the identification of the community boundaries, and the accuracy of the community will be reduced. For small datasets, select small step number threshold 3, and for big datasets, select slightly larger step threshold 8 to achieve the optimal result. The threshold selection in this paper is 3.
(2) Attenuation factor σ In view of the data set, step number threshold S = 0.5 is set up, and the effect of the attenuation factor σ on the F-measure is analyzed.
As shown in Fig. 3, with the increase of attenuation factor, the trend of F-measure overall increases first and then decreases. This due to the fact that the attenuation factor controls the attenuation degree of similarity with the increase of hop counts. For small datasets, a slight attenuation factor σ = 0.5 is selected to avoid the

Evaluation
The Precision is an important metric, which can be used to measure the efficiency of our proposed model, as defined as follows: where k represents the number of posts related to the real-life event in the top K posts under a topic. As mentioned above, the scoring method based on HITS algorithm is proposed to select high-quality posts, high-influence users and high-popularity topics from the social media data streams. Threshold A is then defined and posts (where the authority score are greater than A) are high-quality.
Three experiments are conducted setting different value to get a suitable threshold A. Table 1   hot events more accurately and efficiently when A = 0.0001. Therefore, the next contrast experiments are all conducted with A = 0.0001.
We present a propagation result on a two-dimensional graph in Fig. 7 where x-axis is the number of propagated users and y-axis is a precision obtained as follows.
In our experiments, we set the top 10 popular events to be our multi-source events set for showing the performance of our proposed EDMP model. At the same time, we will focus on the top 10 users for each propagation process and calculate their influence scope.
1. Filtering the hot events based on topic decision model: We can also detect the proper number of hot events from Fig. 4 according to the number of key posts, which also plays a key role in the spread of influence under a specific user interest community. And it can be seen from Tables 4 and 5, our proposed EDMP model can detect the top k (k is set to 10 in Table 4) high-quality posts according to their authority value efficiently and effectively. When the authority value of posts is equal, it can be sorted according to the minimum distance of the key posts.
2. The initial starting users for the first propagation: As is shown in Table 6, we can see the degree and hub value of users for topics, which can distinguish the importance of users under each popular topic. Meanwhile, we can also discover the number of influential users for each popular topic from Table 6, by setting different number of initial influential users. With the increase of the number of initial influential users, the influence scope is achieved to 82 when the number of initial influential users is 10 and remains the same later from Fig. 5. And the top 10 initial influential spreaders and the popular topics they belong to are shown from Table 7, which plays a key role to the spread of influence for specific users' interests.      Influecnce Scope

Fig. 5
The influence scope of IC model 3. The contrast of final influence scope results about initial users' discovery: In order to verify the effectiveness of influence scope of the proposed EDMP model, all four algorithms are running on the same configuration of PC. The experiment was repeated 5 times to compute the average value, then comparing the influence scope of users discovered by these four models, the experimental results are shown in Table 8 and Fig. 6. We can see that the proposed EDMP model outperforms the other three IC based models. This is because the proposed EDMP model considers the impact of the topic popularity, and it selects enough number users with high topic diffusion power as the influence users where their influence scope spreads most of the topic areas. Besides, the proposed EDMP model builds three kinds of knowledge sets, i.e. starting users, topic keywords and target users' prediction. These knowledge sets are outputs of the intelligent event propagation model's learning process. Proper initial users support the model ability to select as many influential users as possible at the beginning of event propagation process. Suitable topic keywords help the model to recognize, from the gathered users, the keywords related to a topic with considerable users' interest. Good target user prediction assists the model to predict the relevancy of the content of user's key posts extracted from the hot events. However, the BEE + IC model and EVE + IC model do not considered the topic diffusion power of the users and the popularity of topics, so the number of selected users is not adequate under specific event. Meanwhile, HEE + IC model do not take into account the learning ability of consecutive propagation, thus the influential spreaders discovered by this paper are the most adequate set compared with BEE + IC, EVE + IC and  Fig. 6 The influence scope of propagation for EDMP model 1 3 HEE + IC models. This is because the activation probability of the IC model is not stable and the propagation of IC model is one event. However, our presented EDMP model can improve its initial users through three knowledge sets. 4. Learnable Ability and Precision Analysis of Multi-source Events propagation: We first set initial topics set of events as 'Basketball', 'Music', 'Economy' and 'Emotion' to describe the multi-source events. We then start the event propagation for selecting top 10 users to be the proper number of initial set of starting users. The first event propagation process will be used to build the three experience sets. Besides, each consecutive event propagation process has been done using experience sets built and learned from the previous event propagation process. Finally, Fig. 7 shows the learnable capability of the EDMP model for the first, the second and the third propagation process.
When we investigated the interests of users found in the topic keywords experience set, we found that the EDMP model can incrementally learn new interests of users from the previous event propagation process which can extract the set of users' topic of interest, such as 'Basketball', 'Music', 'Economy' and 'Emotion', i.e. it could use 'Basketball', 'Music', 'Economy' as a set of users' topic of interest in the second propagation process and use 'Basketball', 'Music' as a set of users' topic of interest in the third propagation process. This is because the proposed EDMP model builds three kinds of experience set, i.e. starting users, topic keywords and target users prediction. These experience set compose the intelligent event propagation of the model's learning experience. Proper starting users help the model to identify as many relevant users as possible at the beginning of propagation process. Appropriate topic keywords help the model to recognize, from the gathered users, the keywords related to a topic of users' interest. Suitable target user prediction assists the model to predict the relevancy of the content of users extracted from a hot event. Fig. 7 The learnable ability results in improvement of the precision of users propagated during the consecutive propagation process

Conclusion and Future Work
In this paper, we present a novel approach to build an intelligent event propagation model which is capable of learning from event propagation experience and adapts itself to better propagation through relevant users and key posts during consecutive propagation process for microblogging network management. Specifically, for efficient and accurate result of the next event propagation, we derive the information of previous event propagation process to build some experience sets: starting users, topic keywords and target users' prediction. These experience sets are used to build the experience sets of the intelligent event propagation model to produce better result for the next propagation. And we study the propagation process of a single hot event, modelling the individual spontaneous communication behaviours. Then, an interactive relationship among the communication sources is analysed in the model. Finally, we study the process of simultaneous communication, and establish multi-source events competition propagation model based on user interest, which describes the relationship between the hot events.
Besides, the competitions between events shorten the survival time, and at the same time, the cooperation broadens the influence scope of hot events. This help to explain the formation of microblogging's hot events dissemination, to provide a theoretical basis for the research of the guiding strategy about the online social network management. Meanwhile, the next research points will be how to predict the links of target users during the event propagation and how to predict the users' behaviour evolution in hot events propagation process in the future.