Predicting the Evolution of Narratives in Social Media
The emergence of global networking capabilities (e.g. social media) has provided newfound mechanisms and avenues for information to be generated, disseminated, shaped, and consumed. The spread and evolution of online information represents a unique narrative ecosystem that is facilitated by cyberspace but operates at the nexus of three dimensions: the social network, the contextual, and the spatial. Current approaches to predict patterns of information spread across social media primarily focus on the social network dimension of the problem. The novel challenge formulated in this work is to blend the social, spatial, and contextual dimensions of online narratives in order to support high fidelity simulations that are contextually informed by past events, and support the multi-granular, reconfigural and dynamic prediction of the dissemination of a new narrative.
the social network dimension, as defined by the social networks that are formed between individual nodes and serve as potential dissemination routes,
the spatial dimension, reflecting communication in the physical world, and
the contextual dimension of the particular interests and opinions of these networks on diverse topics, providing context that discerns event responses.
In order to develop more powerful models to predict the spread and evolution of narratives, we need to blend the social, spatial and contextual dimensions of online narratives in order to support high fidelity simulations that are contextually informed by past events, and support the multi-granular, reconfigural and dynamic nature of these networks. An overview of this vision is shown in Fig. 1: First, Fig. 1(a) illustrates the raw data, i.e., occurrences of a given narrative in space and time. For such narrative, a flow-model can be constructed by reducing the space of individuals relevant for a specific topic, as shown in Fig. 1(b). Doing this for a large number of historic narratives yields a library of narrative dissemination models. This library can be used to predict the dissemination of a new narrative, by searching for similar historic narratives and using these to predict future dissemination.
Traditionally, two types of models have been established to model diffusion and dissemination of information in networks: differential equation models  and agent based models . An overview of these models can be found in . A weakness of models using differential equations is the aggregation of individual agents into a relatively small number of compartments or populations. Within each population, people are assumed to be homogeneous and well mixed. Transitions among compartments are modeled as their expected value, losing important information about individual influencers and gate-keepers. In contrast, agent based models are able to capture heterogeneity between individuals, thus allowing to exploit the network structure, as well as individual attributes in the information diffusion model to improve diffusion prediction accuracy. Yet, such models suffer from a high computational complexity, scaling up to hundreds of thousands of agents [3, 5]. Several models have been proposed to incorporate social and spatial information to detect current events [7, 9, 11], however, such work often lack the temporal component necessary to follow a narrative in time. Therefore, the ability to predict future information propagation patterns using past data remains a substantial scientific challenge.
3 Proposed Direction
In our work we aim to combine the high efficiency of differential equation models with the high modeling power of agent based models, by aggregating individuals to compartments/groups only as necessary. Towards such a hybrid model, we can reduce the problem complexity in two ways:
Reduction of the topic space: Different narratives share similar diffusion patterns in space as time, as we were able to show in preliminary work . For instance, different topics related to entertainment disseminate in a similar way, whereas topics related to politics exhibit different dissemination patterns. Such clusters of similar narratives can be organized hierarchically, for example, a broader topic may be health, and under it we may have several subtopics, such as infectious diseases, or chronic diseases. These subgroups can be generated through the analysis of historical data (such as Twitter data). The resulting library of narrative groups represents abstractions of the information dissemination process, and can be fine-tuned in terms of its thematic resolution (moving from broader categories to more specific categories), in terms of its network resolution (moving from broader clusters of nodes to finer ones, as we will see in the next section). This allows us to balance computational performance and fidelity as desired for future simulations.
Reduction of the agent space: For a single specific topic, many users may have similar opinions, and can be aggregated into a population without significant loss of information. For instance, one user might be a vocal influencer and gate-keeper for the information dissemination of topics related to a specific type of sports, whereas this user may, at the same time, be oblivious to topics related to politics. Thus, for politics related topics, we may not need to model this user by an individual agent. This further observation allows to further reduce the space of users that need to be modelled. It allows to group of individuals that are oblivious to the given narrative on an aggregated level, while giving full detail to individuals that the model identifies to be trend-setters and vocal. We can model the information dissemination given (conditioned to) a specific topic this topic-archetype and apply an attributed graph clustering algorithm  to find communities of users which share a similar opinion towards the topic. This layer of abstraction, which is illustrated by the transition from Fig. 1(a) to Fig. 1(b), yields a set of abstracted information dissemination models for each narrative group.
Simulation and Prediction of Narrative Dissemination: Once we have a library of abstracted information dissemination models, these will be used for grounding an agent based simulation. In this simulation, an agent will be a person or a whole population, as dictated by the dissemination model. To start a simulation, a new narrative is injected into the system. The first step of this grounding is to identify the narrative group of a new narrative. This identification task can be formulated as a supervised classification problem, mapping the new narrative to the most fitting narrative group in the narrative group library. Depending on the available information about the new narrative, this classification may be more or less detailed, thus allowing to dive more or less deep into the narrative group hierarchy, and thus, yielding a more or less detailed dissemination model. Intuitively, the longer we observe a new narrative, the more confident our model fitting will become.
The proposed framework would result in models that extend their power beyond the mere structure of the underlying social network. By learning the dissemination of past narratives, latent forms of spreading and evolving a narrative are also captured by this model: While we can not directly observe individuals sharing ideas physically, we can observe the consequence of both individuals frequently sharing the same ideas. This enables us to implicitly capture forms on information dissemination, and allow to learn features of individuals that are more likely, for the given narrative group, to pass on the ideas of others for further dissemination. In doing so we will have the potential to predict the dissemination of new narratives as they emerge.
- 1.Bakshy, E., Rosenn, I., Marlow, C., Adamic, L.: The role of social networks in information diffusion. In: WWW, pp. 519–528. ACM (2012)Google Scholar
- 2.Cha, M., Mislove, A., Gummadi, K.P.: A measurement-driven analysis of information propagation in the Flickr social network. In: WWW, pp. 721–730. ACM (2009)Google Scholar
- 4.Mahajan, V., Muller, E., Wind, Y.: New-Product Diffusion Models, vol. 11. Springer, New York (2000)Google Scholar
- 7.Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, pp. 851–860. ACM (2010)Google Scholar
- 8.Schmid, K.A., Frey, C., Peng, F., Weiler, M., Züfle, A., Chen, L., Renz, M.: TrendTracker: modelling the motion of trends in space and time. In: SSTDM@ICDM Workshop, pp. 1145–1152 (2016)Google Scholar
- 10.Xu, Z., Ke, Y., Wang, Y., Cheng, H., Cheng, J.: A model-based approach to attributed graph clustering. In: SIGMOD, pp. 505–516. ACM (2012)Google Scholar