Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Algorithms and criteria for diversification of news article comments

Abstract

In this paper, we introduce an approach for diversifying user comments on news articles. We claim that, although content diversity suffices for the keyword search setting, as proven by existing work on search result diversification, it is not enough when it comes to diversifying comments of news articles. Thus, in our proposed framework, we define comment-specific diversification criteria in order to extract the respective diversification dimensions in the form of feature vectors. These criteria involve content similarity, sentiment expressed within comments, named entities, quality of comments and combinations of them. Then, we apply diversification on comments, utilizing the extracted features vectors. The outcome of this process is a subset of the initial set that contains heterogeneous comments, representing different aspects of the news article, different sentiments expressed, different writing quality, etc. We perform an experimental analysis showing that the diversity criteria we introduce result in distinctively diverse subsets of comments, as opposed to the baseline of diversifying comments only w.r.t. to their content. We also present a prototype system that implements our diversification framework on news articles comments.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Notes

  1. 1.

    http://news.yahoo.com/

  2. 2.

    http://www.readabilityformulas.com/flesch-reading-ease-readability-formula.php

  3. 3.

    http://sentistrength.wlv.ac.uk/

  4. 4.

    http://nlp.stanford.edu/software/CRF-NER.shtml

  5. 5.

    http://developer.nytimes.com/docs/read/article_search_api

  6. 6.

    http://developer.nytimes.com/docs/community_api

  7. 7.

    www.opencalais.com

  8. 8.

    http://www.alchemyapi.com/api/concept/

  9. 9.

    http://www.socialresearchmethods.net/kb/stat_t.php

  10. 10.

    http://www.arcomem.eu/

References

  1. Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S. (2009). Diversifying search results. In Proceedings of the second international conference on web search and web data mining (WSDM 2009) (pp.5-14).

  2. Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’98)(pp.335-336).

  3. Chandra, B., & Halldórsson, M. M. (2001). Approximation algorithms for dispersion problems. Journal of Algorithms, 38(2), 438–465.

  4. Chen, H., & Karger, D. R. (2006). Less is more: Probabilistic models for retrieving fewer relevant documents. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’06)(pp. 429-436).

  5. Clarke, C. L. A., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I. (2008). Novelty and diversity in information retrieval evaluation.In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’08)(pp. 659–666).

  6. Diakopoulos, N., & Naaman, M. (2011). Towards quality discourse in online news comments. In Proceedings of the ACM 2011 conference on Computer supported cooperative work (CSCW ’11)(pp. 133–142).

  7. Drosou, M., & Pitoura, E. (2010). Search result diversification. ACM SIGMOD record, 39(1), 41–47.

  8. Erkut, E. (1990). The discrete p-dispersion problem. Operations Research Letters, 46(1), 48–60.

  9. Erkut, E., Ülküsal, Y., Yeniçerioglu, O. (1994). A comparison of p-dispersion heuristics. Computers Operations Research, 21(10), 1103–1113.

  10. Finkel, J. R., Grenager, T., Manning, C. (2005). Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In: Proceedings of the 43nd annual meeting of the association for computational linguistics (ACL ’05)(pp. 363–370).

  11. Giannopoulos, G., Weber, I., Jaimes, A., Sellis, T. (2012). Diversifying User Comments on News Articles. In: Proceedings of the 13th international conference web information systems engineering (WISE ’12)(pp. 100–113).

  12. Gollapudi, S., & Sharma, A. (2009). An axiomatic approach for result diversification. In: Proceedings of the 18th international conference on World wide web (WWW ’09)(pp. 381–390).

  13. Hassin, R., Rubinstein, S., Tamir, A. (1997). Approximation algorithms for maximum dispersion. Operations Research Letters, 21(3), 133–137.

  14. Herring, S. C., Kouper, I., Paolillo, J. C., Scheidt, L. A., Tyworth, M., Welsch, P., Wright, E., Ning, Y. (2005). Conversations in the blogosphere: an analysis “from the bottom up”. In: Proceedings of the 38th annual hawaii international conference on system sciences, (HICSS ’05)(pp. 107b–107b).

  15. Hu, M., Sun, A., Lim, E. (2008). Comments-oriented document summarization: Understanding documents with readers’ feedback. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’08)(pp. 291–298).

  16. Kucuktunc, O., Cambazoglu, B. B., Weber, I., Ferhatosmanoglu, H. (2012). A large-scale sentiment analysis for Yahoo! answers. In: Proceedings of the 5th ACM international conference on Web search and data mining (WSDM’12)(pp. 633–642).

  17. Mishne, G. A., & Glance, N. (2006). Leave a Reply: An analysis of weblog comments. In: Proceedings of the WWW 2006 workshop on weblogging ecosystem: aggregation, analysis and dynamics, at WWW ’: the 15th international conference on world wide web.

  18. Munson, S. A., & Resnick, P. (2010). Presenting diverse political opinions: How and how much. In: Proceedings of the 28th international conference on Human factors in computing systems (CHI ’10)(pp. 1457–1466).

  19. Park, S., Ko, M., Kim, J., Liu, Y., Song, J. (2011). The politics of comments: predicting political orientation of news stories with commenters sentiment patterns. In: Proceedings of the ACM 2011 conference on computer supported cooperative work (CSCW ’11)(pp. 113–122).

  20. Potthast, M. (2009). Measuring the descriptiveness of web comments. In: Proceedings of the 32nd international ACM SIGIR conference on research and development (SIGIR ’09)(pp. 724–725).

  21. Ravi, S. S., Rosenkrantzt, D. J., Tayi, G. K. (2007). Approximation algorithms for facility dispersion In Gonzalez, T. F. (Ed.), Handbook of Approximation algorithms and metaheuristics: Chapman & Hall/CRC.

  22. Shmueli, E., Kagian, A., Koren, Y., Lempel, R. (2012). Care to Comment? Recommendations for Commenting on News Stories. In: Proceedings of the 18th international conference on World wide web WWW ’12, to appear.

  23. Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A. (2010). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12), 2544–2558.

  24. Tsagkias, E., Weerkamp, W., de Rijke, M (2009). Predicting the volume of comments on online news stories. In: Proceedings of the 18th ACM conference on Information and knowledge management (CIKM ’09)(pp.1765–1768).

  25. Tsagkias, E., Weerkamp, W., de Rijke, M. (2010). News Comments: exploring, modeling, and online predicting. In: Proceedings of the 32nd european conference on information retrieval (ECIR ’10)(pp. 109–203).

  26. Vallet, D., & Castells, P. (2012). Personalized diversification of search results. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval (SIGIR ’12)(pp. 841–850).

  27. Vee, E., Srivastava, U., Shanmugasundaram, J., Bhat, P., Yahia, S. A. (2008). Efficient computation of diverse query results. In: Proceedings of the 2008 IEEE 24th international conference on data engineering (ICDE ’08)(pp. 228–236).

  28. Li, Q., Wang, J., Chen, Y. P., Lin, Z. (2010). User comments for news recommendation in forum-based social media. Information Sciences: An International Journal, 180(24), 4929–4939.

  29. Wong, D., Faridani, S., Bitton, E., Hartmann, B., Goldberg, K. (2011). The diversity donut: enabling participant control over the diversity of recommended responses. In: Proceedings of the 2011 annual conference extended abstracts on human factors in computing systems (CHI EA ’11)(pp. 1471–1476).

Download references

Acknowledgments

This research is conducted as part of the EU project ARCOMEMFootnote 10 FP7-ICT- 270239.

Author information

Correspondence to Giorgos Giannopoulos.

Additional information

This research has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program ”Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF) - Research Funding Program: Heracleitus II. Investing in knowledge society through the European Social Fund.

Appendix

Appendix

Information nuggets of evaluated articles

Table 10 Evaluation articles abstracts and corresponding nuggets
Table 11 Evaluation articles abstracts and corresponding nuggets
Table 12 Evaluation articles abstracts and corresponding nuggets

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Giannopoulos, G., Koniaris, M., Weber, I. et al. Algorithms and criteria for diversification of news article comments. J Intell Inf Syst 44, 1–47 (2015). https://doi.org/10.1007/s10844-014-0328-1

Download citation

Keywords

  • News Article
  • Positive Sentiment
  • Candidate Comment
  • Diversification Process
  • Sentiment Class