Encyclopedia of Social Network Analysis and Mining

Living Edition
| Editors: Reda Alhajj, Jon Rokne

Social Provenance

  • Zhuo FengEmail author
  • Pritam Gundecha
  • Huan Liu
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4614-7163-9_388-1



Information Provenance

Sources of a piece of information.

Social Computing

An area of computer science that is concerned with the intersection of social behavior and computational systems (Social Computing).

Social Media

A group of Internet-based applications that build on the ideological and technological foundations of Web 2.0 and that allow the creation and exchanges of user-generated content (Kaplan and Haenlein 2010).

Data Mining

The computational process of discovering patterns in large datasets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems (Data Mining).

Social Network

A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors (Social Network).


Misinformation is false or incorrect information that is spread intentionally or unintentionally (without realizing it is untrue) (Misinformation).


Disinformation is intentionally false or misleading information that is spread in a calculated way to deceive target audiences (Disinformation).

Provenance Paths

Paths of information propagation from sources to terminals.


Social Provenance

An information propagation network can be represented as a directed graph G (V, E), where V is the node set and E is the edge set. Each node in the graph represents an entity, which publishes a piece of information on social media. The entity may refer to an individual user or a webpage. A directed edge between nodes represents the direction of information flow. For a given piece of information propagating through the social media, the social provenance informs a user about the sources of a given piece of information. Sources refer to the nodes that first publish the concerned messages.

Figure 1 shows an information propagation graph indicating the flow of information I = {I 1 ; I 2 ; I 3 } which is about the same event. S 1 , S 2 , and S 3 are the source nodes, or the originators of I 1 , I 2 and I 3 , respectively. The information is transmitted through different nodes in social media or by their recipients. These nodes propagate information; some may retransmit it with modifications. Each edge is labeled with the information indicating where it comes from, e.g., “a” on edge “A-C” means that it is from “A.” A social provenance problem is to help a recipient (say, node D) to answer what are possible information sources in social media for a given piece of information. A provenance path delineates how information spreads from a source to a recipient, including those responsible for retransmitting the information from the sources through intermediaries. If the provenance paths are known, the sources of information can be determined. More often than not, however, provenance paths of a known piece of information are unknown.
Fig. 1

Information propagation in social media

Provenance has been studied in the data management field. In data management, provenance represents the creator of the data and how data has been modified and transferred. Provenance information is used to determine the authenticity and trustworthiness of information. Provenance is the key to solve the data conflict problem (Moreau 2009). Unlike social media, data propagation can be captured in the data management systems. Social provenance has been introduced in the book (Barbier et al. 2013) and received some attention in recent years (Gundecha et al. 2013a; Ranganath et al. 2013; Feng et al. 2013; Gundecha et al. 2013b; Wu et al. 2016). Shah and Zaman (2011) proposed a centrality-based method to determine the single information source among all known recipients on an undirected network. It assumes that information spread on a network follows the susceptible infected (SI) model. Since this method requires the knowledge of all recipients, it is not practical for social provenance. Also, the source computed using this method is more biased toward higher-degree nodes.

Barbier, in his dissertation (Barbier 2012), proposed a method to collect metadata about the received information. Such metadata is referred as provenance attributes. Provenance attributes can play a vital role in obtaining social provenance. As shown in the dissertation (Barbier 2012), some attribute values are easier to obtain than others and some attribute values may be more valuable to a recipient than others. For example, a political statement published by a political candidate might be assessed with some bias if the recipient knows information about the political candidate, such as political party affiliation or special interest associations. An interesting example of the value of provenance attribute would be to reveal the political affiliation and special interests of an unfamiliar social media user propagating political statements, which may help understand latent motivations for propagating a statement in social media. Fake news and its propagation on social media sites have widely been reported during recent US election (http://www.nytimes.com/2016/11/18/technology/fake-news-on-facebook-in-foreign-elections-thats-not-new.html?_r=0, http://www.vox.com/new-money/2016/11/16/13659840/facebook-fake-news-chart). Provenance attributes, including political affiliations and special interest group associations, education, occupation, and demographic attributes, of the nodes involved in the propagation of news articles would have helped recipients to decide quickly the fake news from the real ones. Barbier et. al. (2013) reviews the current research on social provenance and explores exciting research opportunities to address pressing needs. Papers (Gundecha et al. 2013a, b; Ranganath et al. 2013; Feng et al. 2013) show how data mining can enable a social media user to make informed judgments about statements published in social media. The chapter (Wu et al. 2016) proposes few benchmark datasets and evaluation metrics to study the social information problem further.

Key Research Issues in Social Provenance

Social media can help in solving the problem of social provenance due to its unique features: user-generated content (e.g., tweets, blog posts, news articles, etc.), users’ profiles, user interactions (e.g., links between friends, hyperlinks on the blog or news articles), and spatial or temporal information. These features can help reconstruct an information propagation network of a given message, and the network is essential for social provenance.

The social provenance problem answers which nodes are the possible sources of some particular information, say a text message. The provenance path problem seeks to identify the paths that allow us to trace back possible sources. Solving the social provenance problem entails solving the provenance path problem. We present some key research issues in this burgeoning area below:
  1. (a)

    What are the characteristics of sources such that we can identify a source when we encounter one? It is a challenging task because source nodes are not necessarily those without incoming links in social media networks.

  2. (b)

    How can we use different parts of social media data for inferring provenance paths? Content, user profiles, and interaction patterns can play complementary roles in backtracking information propagation. As a popular source can lead to a shallow cascade (Leskovec et al. 2009), the study of node centrality measures can be of help.

  3. (c)

    How can we infer missing links in reconstructing a provenance path with partial information? By the nature of social media, most information is informal and partial. Links can expand the network (i.e., new nodes can be added), and data associated with a node provides more information, though still partial.

  4. (d)

    How can we limit the search space in the vast land of social media? It is incumbent to develop a scalable solution for the social provenance problem.

  5. (e)

    What are effective and objective ways of verifying and comparing different approaches to social provenance and provenance path problems? Lack of ground truth constitutes one of the foremost difficulties.


Illustrative Examples and Impact

One of the important applications of social provenance is to find the rumormongers or misinformation centers in social media (Wu et al. 2016). As mentioned in several news recently, misinformation has helped unnecessary fears and conspiracies spread through social media. One such example is related to the Ebola outbreak (http://time.com/3479254/ebola-social-media/). As some potential cases are found in Miami and Washington, DC, some tweets sounded as if Ebola is rampant and some kept tweeting even after government issued a statement to dispel the rumor. The “Assam Exodus” is an another example that illustrates the importance of social provenance. Assam is a large state in the North-East of India and a series of riots broke out in July and August 2012. Following the riots, virulent messages along with misinformation were spread in other parts of India via social media. Bulk text messages (short message services, SMS) and social media sites were extensively used to spread information, aiming to incite certain Indian population against the North-East Indian population. For example, a Wall Street journalist reported that a twitter user used a gory video clip on riots in Indonesia as that of Assam riots (Twitter 2012). Violent messages were also spread on Facebook that incite hatred and vengeance against the North-East Indian population (Facebook). The misinformation as well as virulent messages resulted in deep fear among North-East Indian population, which ultimately led to their exodus from some major metropolitan cities across India, which includes Bangalore, Mumbai, Hyderabad, Chennai, and Pune (Wikipedia 2012). In all of these cases, social provenance might be able to help to find the rumormongers or misinformation sources early and to help stop the viral spread of misinformation.

Knowing the social provenance of a piece of information published in social media – how the piece of information was modified as it was propagated through social media and how an owner of the piece of information is connected to the transmission of the statement – provides additional context to the piece of information. A social media user can use this context to help assess how much value, trust, and validity should be placed on the information.

In early 2010, it was rumored that the Chief Justice of the US Supreme Court was going to retire due to medical reasons. In fact, the Justice had no plans to retire. The statement originated from a Georgetown University Law School class and was meant only to be a teaching point. However, with the availability of the Internet, before the Law professor revealed the falsehood, students in the class had transmitted the statement, which was subsequently published on a news blog (http://www.npr.org/templates/story/story.php?storyId=124371570, http://nymag.com/daily/intelligencer/2010/03/heres_how_the_rumor_that_john.html). Had the social provenance information been made available, recipient users might not have considered the statement credible. In another case, a US Department of Agriculture employee was erroneously fired after information about her appearing in social media was published out of context (https://en.wikipedia.org/wiki/Firing_of_Shirley_Sherrod). Had social provenance information been available, sought out, or examined, it might have prevented an injustice to the employee and embarrassment for the Department of Agriculture. Fake news and its impact on recent US election have widely been reported (http://www.nytimes.com/2016/11/18/technology/fake-news-on-facebook-in-foreign-elections-thats-not-new.html?_r=0, http://www.vox.com/new-money/2016/11/16/13659840/facebook-fake-news-chart). Social provenance, if available, would have informed users its credibility.

The social provenance problem presents an unprecedented challenge, and its research progress can pave way for many equally challenging and important issues such as source trustworthiness, information reliability, and user credibility.



  1. Barbier G (2012) Finding provenance data in social media. Doctoral dissertationGoogle Scholar
  2. Barbier G, Feng Z, Gundecha P, Liu H (2013) Provenance data in social media. Synth Lect Data Min Knowl Discov 4(1):1–84CrossRefGoogle Scholar
  3. Feng Z, Gundecha P, Liu H (2013) Recovering information recipients in social media via provenance. Short paper, the IEEE/ACM international conference on advances in social networks analysis and miningGoogle Scholar
  4. Gundecha P, Feng Z, and Liu H (2013a) Seeking provenance of information in social media. Short paper, the 22nd ACM international conference on information and knowledge managementGoogle Scholar
  5. Gundecha P, Ranganath S, Feng Z, and Liu H (2013b) A tool for collecting provenance data in social media, Demonstration paper, the 19th ACM SIGKDD international conference on knowledge discovery and data miningGoogle Scholar
  6. Kaplan AM, Haenlein M (2010) Users of the world, unite! The challenges and opportunities of social media. Bus Horiz 53(1):59–68CrossRefGoogle Scholar
  7. Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 497–506Google Scholar
  8. Moreau L (2009) The foundations for provenance on the web. Found Trends Web Sci 2:99–241CrossRefGoogle Scholar
  9. Ranganath S, Gundecha P, and Liu H (2013) A tool for assisting provenance search in social media. Demonstration paper, the 22nd ACM international conference on information and knowledge managementGoogle Scholar
  10. Shah D, Zaman T (2011) Rumors in a network: who’s the culprit? IEEE Trans Inf Theory 57:5163–5181MathSciNetCrossRefzbMATHGoogle Scholar
  11. Twitter (2012) https://twitter.com/dhume01/status/236321660184178688. Accessed 17 Dec 2012
  12. Wu L, Morstatter F, Hu X, Liu H (2016) Mining Misinformation in Social Media, Big Data in Complex and Social Networks, CRC Press, pp 123–152Google Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.AI+R, MicrosoftSunnyvaleUSA
  2. 2.IBM Research, AlmadenSan JoseUSA
  3. 3.Data Mining and Machine Learning Lab, School of Computing, Informatics, and Decision Systems EngineeringArizona State UniversityTempeUSA

Section editors and affiliations

  • Jaideep Srivastava
    • 1
  • Abdullah Uz Tansel
    • 2
  1. 1.Department of Computer Science and EngineeringUniversity of MinnesotaMinneapolisUSA
  2. 2.Baruch College, CUNYNew YorkUSA