Skip to main content
Log in

Finding Informative Comments for Video Viewing

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Of all the information-sharing methods on the Web, video is a factor with increasing importance and will continue to influence the future Web environment. Various services such as YouTube, Vimeo, and Liveleak are information-sharing platforms that support uploading UGC (user-generated content) to the Web. Users tend to seek related information while or after watching an informative video when they are using these Web services. In this situation, the best way of satisfying information needs of this kind is to find and read the comments on Web services. However, existing services only support sorting by recentness (newest one) or rating (high LIKES score). Consequently, the search for related information is limited unless the users read all the comments. Therefore, we suggest a novel method to find informative comments by considering original content and its relevance. We developed a set of methods composed of measuring informativeness priority, which we define as the level of information provided by online users, classifying the intention of the information posted online, and clustering to eliminate duplicate themes. The first method of measuring informativeness priority calculates the extent to which the comments cover all the topics in the original contents. After the informativeness priority calculation, the second method classifies the intention of information posted in comments. Then, the next method picks the most informative comments by applying clustering methods to eliminate duplicate themes using rules. Experiments based on 20 sampled videos with 1000 comments and analysis of 1861 TED talk videos and 380,619 comments show that the suggested methods can find more informative comments compared to existing methods such as sorting by high LIKES score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. View, organize, or delete comments—YouTube Help.

    Available: https://support.google.com/youtube/answer/6000976?hl=en (Date last accessed on 4 Oct 2019).

  2. Weka 3: Data Mining Software in Java, Expectation-Maximization API. Available: http://weka.sourceforge.net/doc.dev/weka/clusterers/EM.html (Date last accessed on 4 Oct 2019).

  3. Language Detection Library for Java. Available: https://code.google.com/archive/p/language-detection/ (Date last accessed on 4 Oct 2019).

  4. Software—The Stanford Natural Language Processing Group. Available: http://nlp.stanford.edu/software/index.shtml (Date last accessed on 4 Oct 2019).

  5. Emoticon Analysis. Available: http://www.datagenetics.com/blog/october52012/index.html (Date last accessed on 4 Oct 2019).

  6. N-grams: based on 520 million word COCA corpus. Available: http://www.ngrams.info/ (Data last accessed on 4 Oct 2019).

  7. Weka 3: Data Mining Software in Java. Available: http://www.cs.waikato.ac.nz/ml/weka (Date last accessed on 4 Oct 2019).

References

  1. Apaza RG, Cervantes EV, Quispe LC, Luna JO. Online courses recommendation based on LDA. In: 1st Symposium on information management and big data, pp. 42–48. CEUR Workshop 2014.

  2. Arndt C. Information measures: Information and its description in science and engineering, With 64 figures. New York: Springer Science & Business Media; 2001.

    Book  Google Scholar 

  3. Benevenuto F, Rodrigues T, Cha M, Almeida V. Characterizing user behavior in online social networks. In: Proceedings of the 9th ACM SIGCOMM internet measurement conference, IMC ’09, pp. 49–62. ACM, New York, NY, USA 2009. https://doi.org/10.1145/1644893.1644900.

  4. Bhuiyan H, Ara J, Bardhan R, Islam DMR. Retrieving YouTube video by sentiment analysis on user comment. In: Proceedings of the IEEE International conference on signal and image processing applications (ICSIPA), pp. 474–478 2017. https://doi.org/10.1109/ICSIPA.2017.8120658.

  5. Blooma MJ, Chua AY, Goh DH. Selection of the best answer in CQA services. In: 2010 Seventh International Conference on information technology: new generations (ITNG), pp. 534–539, 2010. https://doi.org/10.1109/ITNG.2010.127.

  6. Cettolo M, Girardi C, Federico M. WIT\(^3\): Web inventory of transcribed and translated talks. In: Proceedings of the 16\(^{th}\) Conference of the European Association for machine translation (EAMT), pp. 261–268. Trento, Italy 2012.

  7. Chang CC, Lin CJ. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST). 2011;2(3):27.

    Google Scholar 

  8. Chen YL, Chang CL, Yeh CS. Emotion classification of YouTube videos. Decis Support Syst. 2017;101:40–50.

    Article  Google Scholar 

  9. Choi S, Segev A. Finding informative comments for video viewing. In: 2016 IEEE International Conference on big data workshop, application of big data for computational social science, IEEE Big Data ’16, pp. 2457–2465. IEEE Computer Society 2016. https://doi.org/10.1109/BigData.2016.7840882.

  10. Cong G, Wang L, Lin C, Song Y, Sun Y. Finding question-answer pairs from online forums. In: Proceedings of the 31st Annual International ACM SIGIR Conference on research and development in information retrieval, SIGIR ’08, pp. 467–474. ACM, New York, NY, USA 2008. https://doi.org/10.1145/1390334.1390415.

  11. Daradoumis T, Bassi R, Xhafa F, Caballe S. A review on massive e-learning (MOOC) design, delivery and assessment. In: 2013 Eighth International Conference on P2P, parallel, grid, cloud and internet computing (3PGCIC), pp. 208–213 2013. https://doi.org/10.1109/3PGCIC.2013.37.

  12. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodological). 1977; 39(1): 1–38.

  13. Di Carlo GS. Humour in popularization: analysis of humour-related laughter in TED talks. Eur J Humour Res. 2014; 1(4):81–93.

  14. Dumais ST. Latent semantic analysis. Annu Rev Inf Sci Technol. 2004;38(1):188–230. https://doi.org/10.1002/aris.1440380105.

    Article  Google Scholar 

  15. Ellis D. A behavioural approach to information retrieval system design. J Doc. 1989;45(3):171–212. https://doi.org/10.1108/eb026843.

    Article  Google Scholar 

  16. Figueiredo F, Belém F, Pinto H, Almeida J, Gonçalves M, Fernandes D, Moura E, Cristo M. Evidence of quality of textual features on the Web 2.0. In: Proceedings of the 18th ACM Conference on information and knowledge management, CIKM ’09, pp. 909–918. ACM, New York, NY, USA 2009. https://doi.org/10.1145/1645953.1646070.

  17. Ghose A, Ipeirotis PG. Designing novel review ranking systems: predicting the usefulness and impact of reviews. In: Proceedings of the Ninth International Conference on electronic commerce, ICEC ’07, pp. 303–310. ACM, New York, NY, USA 2007. https://doi.org/10.1145/1282100.1282158.

  18. Gündüz c, Özsu MT. A web page prediction model based on click-stream tree representation of user behavior. In: Proceedings of the Ninth ACM SIGKDD International Conference on knowledge discovery and data mining, KDD ’03, pp. 535–540. ACM, New York, NY, USA 2003. https://doi.org/10.1145/956750.956815.

  19. Huang Y, Tseng Y, Sun YS, Chen MC. TEDquiz: Automatic quiz generation for TED talks video clips to assess listening comprehension. In: 2014 IEEE 14th International Conference on advanced learning technologies (ICALT), pp. 350–354 2014. https://doi.org/10.1109/ICALT.2014.105.

  20. Jihan SH, Segev A. Context ontology for humanitarian assistance in crisis response. In: Proceedings of the International Conference on information systems for crisis response and management (ISCRAM), pp. 526–535 (2013)

  21. John BM, Chua AY, Goh DH. What makes a high-quality user-generated answer? IEEE Internet Comput. 2011;15(1):66–71. https://doi.org/10.1109/MIC.2011.23.

    Article  Google Scholar 

  22. Ko M, Choi S, Lee J, Yang S, Lee U, Segev A, Song J. Motives for mass interactions in online sports viewing. In: Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion, WWW Companion ’14, pp. 329–330. International World Wide Web Conferences Steering Committee. 2014. https://doi.org/10.1145/2567948.2577340.

  23. Krishnamoorthy N, Malkarnenkar G, Mooney RJ, Saenko K, Guadarrama S. Generating natural-language video descriptions using text-mined knowledge. In: Proceedings of the Twenty-Seventh AAAI Conference on artificial intelligence 2013, AAAI ’13. AAAI Press, 2013. http://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6454.

  24. Li G, Ming Z, Li H, Chua T. Video reference: question answering on YouTube. In: Proceedings of the 17th ACM International Conference on multimedia, MM ’09, pp. 773–776. ACM, New York, NY, USA, 2009. https://doi.org/10.1145/1631272.1631411.

  25. Liu J, Dolan P, Pedersen ER. Personalized news recommendation based on click behavior. In: Proceedings of the 15th International Conference on intelligent user interfaces, IUI ’10, pp. 31–40. ACM, New York, NY, USA, 2010. https://doi.org/10.1145/1719970.1719976.

  26. Lopes J, Trancoso I, Abad A. A nativeness classifier for ted talks. In: Acoustics, speech and signal processing (ICASSP), 2011 IEEE International Conference on, pp. 5672–5675, 2011. https://doi.org/10.1109/ICASSP.2011.5947647.

  27. Momeni E, Cardie C, Ott M. Properties, prediction, and prevalence of useful user-generated comments for descriptive annotation of social media objects. In: Proceedings of the Seventh International Conference on weblogs and social media 2013, ICWSM ’13. AAAI Press, 2013.

  28. Momeni E, Sageder G. An empirical analysis of characteristics of useful comments in social media. In: Proceedings of the 5th Annual ACM Web Science Conference, WebSci ’13, pp. 258–261. ACM, New York, NY, USA, 2013. https://doi.org/10.1145/2464464.2464490.,

  29. Pappas N, Popescu-Belis A. Combining content with user preferences for ted lecture recommendation. In: Content-Based Multimedia Indexing (CBMI), 2013 11th International Workshop on, pp. 47–52, 2013. https://doi.org/10.1109/CBMI.2013.6576551

  30. Paul M, Federico M, Stüker S. Overview of the IWSLT 2010 evaluation campaign. In: Proceedings of the 7th International Workshop on spoken language translation (IWSLT), vol. 10, pp. 3–27, 2010.

  31. Potthast M, Becker S. Opinion summarization of Web comments. In: Gurrin C, He Y, Kazai G, Kruschwitz U, Little S, Rüger S, van Rijsbergen K, editors. Advances in information retrieval, lecture notes in computer science, vol. 5993. Berlin: Springer Berlin Heidelberg; 2010. p. 668–9. https://doi.org/10.1007/978-3-642-12275-0_73.

    Chapter  Google Scholar 

  32. Russell DM, Klemmer S, Fox A, Latulipe C, Duneier M, Losh E. Will massive online open courses (MOOCs) change education? In: CHI ’13 Extended Abstracts on human factors in computing systems, CHI EA ’13, pp. 2395–2398. ACM, New York, NY, USA, 2013. https://doi.org/10.1145/2468356.2468783.

  33. Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Inf Process Manag. 1988;24(5):513–23. https://doi.org/10.1016/0306-4573(88)90021-0..

  34. Salton G, Wong A, Yang CS. A vector space model for automatic indexing. Commun ACM. 1975;18(11):613–20. https://doi.org/10.1145/361219.361220.

    Article  MATH  Google Scholar 

  35. Schein AI, Popescul A, Ungar LH, Pennock DM. Methods and metrics for cold-start recommendations. In: Proceedings of the 25th Annual International ACM SIGIR Conference on research and development in information retrieval, SIGIR ’02, pp. 253–260. ACM, New York, NY, USA, 2002. https://doi.org/10.1145/564376.564421.

  36. Schmidt DC, McCormick Z. Producing and delivering a Coursera MOOC on pattern-oriented software architecture for concurrent and networked software. In: Proceedings of the 2013 Companion Publication for Conference on systems, programming, & applications: software for humanity, SPLASH ’13, pp. 167–176. ACM, New York, NY, USA, 2013. https://doi.org/10.1145/2508075.2508465.

  37. Schultes P, Dorner V, Lehner F. Leave a comment! an in-depth analysis of user comments on YouTube. In: Tagungsbände der Wirtschaftsinformatik, pp. 42 2013.

  38. Segev A. Adaptive ontology use for crisis knowledge representation. Int J Inf Syst Crisis Response Manag (IJISCRAM). 2009;1(2):16–30. https://doi.org/10.4018/jiscrm.2009040102.

    Article  Google Scholar 

  39. Shah C, Pomerantz J. Evaluating and predicting answer quality in community qa. In: Proceedings of the 33rd International ACM SIGIR Conference on research and development in information retrieval, SIGIR ’10, pp. 411–418. ACM, New York, NY, USA, 2010. https://doi.org/10.1145/1835449.1835518.

  40. Shatnawi S, Gaber MM, Cocea M. Text stream mining for massive open online courses: review and perspectives. Syst Sci Control Eng. 2014;2(1):664–76. https://doi.org/10.1080/21642583.2014.970732.

    Article  Google Scholar 

  41. Turney PD. Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for computational linguistics, ACL ’02, pp. 417–424. Association for Computational Linguistics, Stroudsburg, PA, USA, 2002. https://doi.org/10.3115/1073083.1073153.

  42. Vihavainen A, Luukkainen M, Kurhila J. Multi-faceted support for MOOC in programming. In: Proceedings of the 13th Annual Conference on information technology education, SIGITE ’12, pp. 171–176. ACM, New York, NY, USA, 2012. https://doi.org/10.1145/2380552.2380603.

  43. Vivekraj VK, Debashis S, Balasubramanian R. Video skimming: taxonomy and comprehensive survey. ACM Comput Surv. 2019;52(5):106:1–38. https://doi.org/10.1145/3347712.

    Article  Google Scholar 

  44. Wang B, Wang X, Sun C, Liu B, Sun L. Modeling semantic relevance for question-answer pairs in web social communities. In: Proceedings of the 48th Annual Meeting of the Association for computational linguistics, ACL ’10, pp. 1230–1238. Association for Computational Linguistics, Stroudsburg, PA, USA, 2010. http://dl.acm.org/citation.cfm?id=1858681.1858806.

  45. Wang X, Tu X, Feng D, Zhang L. Ranking community answers by modeling question-answer relationships via analogical reasoning. In: Proceedings of the 32Nd International ACM SIGIR Conference on research and development in information retrieval, SIGIR ’09, pp. 179–186. ACM, New York, NY, USA, 2009. https://doi.org/10.1145/1571941.1571974.

  46. Wilson T. Models in information behaviour research. J Doc. 1999;55(3):249–70. https://doi.org/10.1108/EUM0000000007145.

    Article  Google Scholar 

  47. Witten IH, Frank E. Data Mining: Practical machine learning tools and techniques. Burlington: Morgan Kaufmann; 2005.

    MATH  Google Scholar 

  48. Xiong W, Litman D. Empirical analysis of exploiting review helpfulness for extractive summarization of online reviews. In: Proceedings of COLING 2014, the 25th International Conference on computational linguistics, pp. 1985–1995. Dublin, Ireland 2014.

  49. Yang D, Adamson D, Rosé CP. Question recommendation with constraints for massive open online courses. In: Proceedings of the 8th ACM Conference on recommender systems, RecSys ’14, pp. 49–56. ACM, New York, NY, USA, 2014. https://doi.org/10.1145/2645710.2645748.

  50. Yang D, Piergallini M, Howley I, Rose C. Forum thread recommendation for massive open online courses. Proceedings of the 7th International Conference on educational data mining pp. 257–260, 2014.

  51. Yousef AMF, Chatti MA, Schroeder U, Wosnitza M. What drives a successful MOOC? an empirical examination of criteria to assure design quality of MOOCs. In: Advanced Learning Technologies (ICALT), 2014 IEEE 14th International Conference on, pp. 44–48, 2014. https://doi.org/10.1109/ICALT.2014.23

  52. Yu H, Zheng D, Zhao BY, Zheng W. Understanding user behavior in large-scale video-on-demand systems. In: Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on computer systems 2006, EuroSys ’06, pp. 333–344. ACM, New York, NY, USA, 2006. https://doi.org/10.1145/1217935.1217968.

  53. Zhang R, Tran T. An entropy-based model for discovering the usefulness of online product reviews. In: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, volume 01, WI-IAT ’08, pp. 759–762. IEEE Computer Society, Washington, DC, USA, 2008. https://doi.org/10.1109/WIIAT.2008.149.

  54. Zhao L, Hua T, Lu CT, Chen R. A topic-focused trust model for twitter. Comput Commun. 2016;76:1–11.

    Article  Google Scholar 

  55. Zhao Z, Hong L, Wei L, Chen J, Nath A, Andrews S, Kumthekar A, Sathiamoorthy M, Yi X, Chi E. Recommending what video to watch next: A multitask ranking system. In: Proceedings of the 13th ACM Conference on Recommender Systems, RecSys ’19, pp. 43–51. ACM, New York, NY, USA 2019. https://doi.org/10.1145/3298689.3346997.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aviv Segev.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Choi, S., Segev, A. Finding Informative Comments for Video Viewing. SN COMPUT. SCI. 1, 47 (2020). https://doi.org/10.1007/s42979-019-0048-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-019-0048-2

Keywords

Navigation