Skip to main content

Improving opinionated blog retrieval effectiveness with quality measures and temporal features

Abstract

The massive acceptance and usage of the blog communities by a significant portion of the Web users has rendered knowledge extraction from blogs a particularly important research field. One of the most interesting related problems is the issue of the opinionated retrieval, that is, the retrieval of blog entries which contain opinions about a topic. There has been a remarkable amount of work towards the improvement of the effectiveness of the opinion retrieval systems. The primary objective of these systems is to retrieve blog posts which are both relevant to a given query and contain opinions, and generate a ranked list of the retrieved documents according to the relevance and opinion scores. Although a wide variety of effective opinion retrieval methods have been proposed, to the best of our knowledge, none of them takes into consideration the issue of the importance of the retrieved opinions. In this work we introduce a ranking model which combines the existing retrieval strategies with query-independent information to enhance the ranking of the opinionated documents. More specifically, our model accounts for the influence of the blogger who authored an opinion, the reputation of the blog site which published a specific blog post, and the impact of the post itself. Furthermore, we expand the current proximity-based opinion scoring strategies by considering the physical locations of the query and opinion terms within a document. We conduct extensive experiments with the TREC Blogs08 dataset which demonstrate that the application of our methods enhances retrieval precision by a significant margin.

This is a preview of subscription content, access via your institution.

References

  1. Agarwal, N., Liu, H.: Blogosphere: research issues, tools, and applications. ACM SIGKDD Explor. Newslett. 10(1), 18–31 (2008)

    Article  Google Scholar 

  2. Agarwal, N., Liu, H., Tang, L., Yu, P.: Identifying the influential bloggers in a community. In: Proceedings of the International Conference on Web Search and Web Data Mining (WSDM ’08), pp. 207–218 (2008)

  3. Akritidis, L., Katsaros, D., Bozanis, P.: Identifying influential bloggers: time does matter. In: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies (WI-IAT’09), vol. 1, pp. 76–83 (2009)

  4. Akritidis, L., Katsaros, D., Bozanis, P.: Identifying the productive and influential bloggers in a community. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 41(5), 759–764 (2011)

    Article  Google Scholar 

  5. Akritidis, L., Katsaros, D., Bozanis, P.: Improved retrieval effectiveness by efficient combination of term proximity and zone scoring: a simulation-based evaluation. Simul. Model. Pract. Theory 22, 74–91 (2012)

    Article  Google Scholar 

  6. Büttcher, S., Clarke, C., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: Proceedings of the 29th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’06), pp. 621–622 (2006)

  7. Dave, K., Lawrence, S., Pennock, D.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web (WWW ’03), pp. 519–528 (2003)

  8. Esuli, A., Sebastiani, F.: Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC ’06), vol. 6, pp. 417–422 (2006)

  9. Garfield, E.: The Application of Citation Indexing to Journals Management. Thomson Reuters (1994)

  10. Gerani, S., Carman, M., Crestani, F.: Proximity-based opinion retrieval. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’10), pp. 403–410 (2010)

  11. Hirsch, J.: An index to quantify an individual’s scientific research output. Proc. Natl. Acad. Sci. U. S. A. 102(46), 16,569 (2005)

    Article  Google Scholar 

  12. Kritikopoulos, A., Sideri, M., Varlamis, I.: Blogrank: ranking weblogs based on connectivity and similarity features. In: Proceedings of the 2nd International Workshop on Advanced Architectures and Algorithms for Internet Delivery and Applications, p. 8 (2006)

  13. Langville, A., Meyer, C.: Google Page Rank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton (2006)

    Google Scholar 

  14. Lee, Y., Na, S., Kim, J., Nam, S., Jng, H., Lee, J.: Kle at trec 2008 blog track: blog post and feed retrieval. In: Proccedings of TREC 2008 (2008)

  15. Macdonald, C., Ounis, I., Soboroff, I.: Overview of the trec 2007 blog track. In: Proceedings of TREC 2007 (2007)

  16. Mullen, T., Collier, N.: Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP ’04), vol. 4, pp. 412–418 (2004)

  17. Na, S., Lee, Y., Nam, S., Lee, J.: Improving opinion retrieval based on query-specific sentiment lexicon. LLNCS 5478, 734–738 (2009)

    Google Scholar 

  18. Ounis, I., De Rijke, M., Macdonald, C., Mishne, G.: Overview of the trec 2006 blog track. In: Proceedings of TREC 2006 (2006)

  19. Ounis, I., Macdonald, C., Soboroff, I.: Overview of the trec-2008 blog track. In: Proccedings of TREC 2008 (2008)

  20. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP ’02), pp. 79–86 (2002)

  21. Tayebi, M., Hashemi, S., Mohades, A.: B2rank: an algorithm for ranking blogs based on behavioral features. In: Proceedings of the 2007 IEEE/WIC/ACM International Conference on Web Intelligence (WI ’07), pp. 104–107 (2007)

  22. Turney, P.: Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL ’02), pp. 417–424 (2002)

  23. Turney, P., Littman, M.: Measuring praise and criticism: inference of semantic orientation from association. ACM Trans. Inf. Syst. (TOIS) 21(4), 315–346 (2003)

    Article  Google Scholar 

  24. Vechtomova, O.: Facet-based opinion retrieval from blogs. Inf. Process. Manag. 46(1), 71–88 (2010)

    Article  Google Scholar 

  25. Zhang, M., Ye, X.: A generation model to unify topic relevance and lexicon-based sentiment for opinion retrieval. In: Proceedings of the 31st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’08), pp. 411–418 (2008)

  26. Zhang, W., Yu, C., Meng, W.: Opinion retrieval from blogs. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM ’07), pp. 831–840 (2007)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonidas Akritidis.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Akritidis, L., Bozanis, P. Improving opinionated blog retrieval effectiveness with quality measures and temporal features. World Wide Web 17, 777–798 (2014). https://doi.org/10.1007/s11280-013-0237-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-013-0237-1

Keywords

  • Information retrieval
  • Opinionated retrieval
  • Search
  • Blog
  • Post
  • Blogger
  • Influence
  • Impact
  • Ranking