Abstract
Measurements of large systems typically rely on sampling to keep the measurement effort practical. For example, Youtube’s video popularity has been measured by crawling either related videos or videos belonging to certain categories or by using a list of, e.g., the most recent videos as the data-source. In this paper we demonstrate that all these methods lead to a biased sample of data when compared to a random sample. We demonstrate the bias by comparing the differently sampled data sets in terms of different commonly used metrics, such as video popularity, age, length, or category. The results show that different sampling methods lead to significantly different values in the metrics, thus potentially leading to very different conclusions about the system under study. The goal of the paper is not to provide yet-another-set-of-numbers for YouTube; instead we seek to emphasize the importance of using correct measurement methodologies and understanding the inherent weaknesses of different methodologies.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Cha, M., Kwak, H., Rodriguez, P., Ahn, Y.-Y., Moon. S.: I tube, you tube, everybody tubes: analyzing the world’s largest user generated content video system. In: Proceedings of the 7th ACM SIGCOMM Conference on Internet measurement, pp. 1–14. ACM (2007)
Cheng, X., Liu, J., Dale, C.: Understanding the characteristics of internet short video sharing: A youtube-based measurement study. IEEE Transactions on Multimedia 15(5), 1184–1194 (2013)
Gill, P., Arlitt, M., Li, Z., Mahanti, A.: Youtube traffic characterization: a view from the edge. In: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, pp. 15–28. ACM (2007)
Krishnamurthy, B., Gill, P., Arlitt, M.: A few chirps about twitter. In: Proceedings of the Tworkshop on Online Social Networks, pp. 19–24. ACM (2008)
Stutzbach, D., Rejaie, R., Duffield, N., Sen, S., Willinger, W.: On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Transactions on Networking (TON) 17(2), 377–390 (2009)
Szabo, G., Huberman, B.A.: Predicting the popularity of online content. Communications of the ACM 53(8), 80–88 (2010)
Valancius, V., Laoutaris, N., Massoulié, L., Diot, C., Rodriguez, P.: Greening the internet with nano data centers. In: Proceedings of the 5th International Conference on Emerging Networking Experiments and Technologies, pp. 37–48. ACM (2009)
Zhou, J., Li, Y., Adhikari, V.K., Zhang, Z.-L.: Counting youtube videos via random prefix sampling. In: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, pp. 371–380. ACM (2011)
Zink, M., Suh, K., Gu, Y., Kurose, J.: Characteristics of youtube network traffic at a campus network-measurements, models, and implications. Computer Networks 53(4), 501–514 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 IFIP International Federation for Information Processing
About this paper
Cite this paper
Karkulahti, O., Kangasharju, J. (2015). Youtube Revisited: On the Importance of Correct Measurement Methodology. In: Steiner, M., Barlet-Ros, P., Bonaventure, O. (eds) Traffic Monitoring and Analysis. TMA 2015. Lecture Notes in Computer Science(), vol 9053. Springer, Cham. https://doi.org/10.1007/978-3-319-17172-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-17172-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17171-5
Online ISBN: 978-3-319-17172-2
eBook Packages: Computer ScienceComputer Science (R0)