Skip to main content

Two 1%s Don’t Make a Whole: Comparing Simultaneous Samples from Twitter’s Streaming API

  • Conference paper
Social Computing, Behavioral-Cultural Modeling and Prediction (SBP 2014)

Abstract

We compare samples of tweets from the Twitter Streaming API constructed from different connections that tracked the same popular keywords at the same time. We find that on average, over 96% of the tweets seen in one sample are seen in all others. Those tweets found only in a subset of samples do not significantly differ from tweets found in all samples in terms of user popularity or tweet structure. We conclude they are likely the result of a technical artifact rather than any systematic bias.

Practically, our results show that an infinite number of Streaming API samples are necessary to collect “most” of the tweets containing a popular keyword, and that findings from one sample from the Streaming API are likely to hold for all samples that could have been taken. Methodologically, our approach is extendible to other types of social media data beyond Twitter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. National Research Council: Frontiers in Massive Data Analysis. The National Academies Press (2013)

    Google Scholar 

  2. Morstatter, F., Pfeffer, J., Liu, H., Carley, K.M.: Is the sample good enough? comparing data from twitter’s streaming API with twitter’s firehose. In: The 7th International Conference on Weblogs and Social Media (ICWSM 2013), Boston, MA (2013)

    Google Scholar 

  3. Li, R., Wang, S., Chen-Chuan, K.: Towards social data platform: Automatic topic-focused monitor for twitter stream. Proceedings of the VLDB Endowment 6(14) (2013)

    Google Scholar 

  4. Boyd, D., Crawford, K.: Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society 15(5), 662–679 (2012)

    Article  Google Scholar 

  5. Wu, S., Hofman, J.M., Mason, W.A., Watts, D.J.: Who says what to whom on twitter. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 705–714. ACM, New York (2011)

    Google Scholar 

  6. Vieweg, S., Hughes, A.L., Starbird, K., Palen, L.: Microblogging during two natural hazards events: what twitter contribute to situational awareness. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems, CHI 2010, pp. 1079–1088. ACM, New York (2010)

    Google Scholar 

  7. Ghosh, S., Zafar, M.B., Bhattacharya, P., Sharma, N., Ganguly, N., Gummadi, K.P.: On sampling the wisdom of crowds: Random vs. expert sampling of the twitter stream. In: CIKM (2013)

    Google Scholar 

  8. González-Bailón, S., Wang, N., Rivero, A., Borge-Holthoefer, J., Moreno, Y.: Assessing the bias in communication networks sampled from twitter. Available at SSRN (2012)

    Google Scholar 

  9. Bakshy, E., Hofman, J.M., Mason, W.A., Watts, D.J.: Everyone’s an influencer: quantifying influence on twitter. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM 2011, pp. 65–74. ACM, New York (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Joseph, K., Landwehr, P.M., Carley, K.M. (2014). Two 1%s Don’t Make a Whole: Comparing Simultaneous Samples from Twitter’s Streaming API. In: Kennedy, W.G., Agarwal, N., Yang, S.J. (eds) Social Computing, Behavioral-Cultural Modeling and Prediction. SBP 2014. Lecture Notes in Computer Science, vol 8393. Springer, Cham. https://doi.org/10.1007/978-3-319-05579-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05579-4_10

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05578-7

  • Online ISBN: 978-3-319-05579-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics