Skip to main content

A Uniform Theoretic Approach to Opinion and Information Retrieval

  • Chapter
Intelligent Information Access

Abstract

In this paper, we introduce a supervised method for the generation of a dictionary of weighted opinion bearing terms from a collection of opinionated documents. We also describe how such a dictionary is used in the framework of an algorithm for opinion retrieval, that is for the problem of identifying the documents in a collection where some opinion is expressed with respect to a given query topic. Several experiments, performed on the TREC Blog collection, are reported together with their results; in these experiments, the use of different combinations of DFR (Divergence from Randomness) probabilistic models to assign weights to terms in the dictionary and to documents is studied and evaluated. The results show the stability of the method and its practical utility. Moreover, we investigate the composition of the generated lexicons, mainly focusing on the presence of stop-words. Quite surprisingly, the best performing dictionaries show a predominant presence of stop-words. Finally, we study the effectiveness of the same approach to generate dictionaries of polarity-bearing terms: preliminary results are provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amati, G.: Probability Models for Information Retrieval based on Divergence from Randomness. PhD thesis, Department of Computing Science, University of Glasgow (2003)

    Google Scholar 

  2. Amati, G.: Frequentist and bayesian approach to information retrieval. In: Proc. of the 28th European Conference on IR Research (ECIR) (2006)

    Google Scholar 

  3. Amati, G., Ambrosi, E., Bianchi, M., Gaibisso, C., Gambosi, G.: Fub, iasi-cnr and university of tor vergata at trec 2007 blog track. In: Proc. of the 16th Text Retrieval Conference (TREC) (2007)

    Google Scholar 

  4. Amati, G., Ambrosi, E., Bianchi, M., Gaibisso, C., Gambosi, G.: Automatic construction of an opinion-term vocabulary for ad hoc retrieval. In: Proc. of the 30th European Conference on IR Research, ECIR (2008)

    Google Scholar 

  5. Amati, G., Carpineto, C., Romano, G.: Fub at trec-10 web track: a probabilistic framework for topic relevance term weighting. In: Proc. of the 16th Text Retrieval Conference, TREC (2001)

    Google Scholar 

  6. Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring divergence from randomness. ACM Transactions on Information Systems 20(4), 357–389 (2002)

    Article  Google Scholar 

  7. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)

    Google Scholar 

  8. Benamara, F., Cesarano, C., Picariello, A., Reforgiato, D.: Sentiment analysis: Adjectives and adverbs are better than adjectives alone. In: Proc. of the 1st International Conference on Weblogs and Social Media, ICWSM (2007)

    Google Scholar 

  9. Chang, J.S., Luo, Y.F., Su, K.Y.: Gpsm: A generalized probabilistic semantic model for ambiguity resolution. In: Proc. of the 30th annual meeting on Association for Computational Linguistics (1992)

    Google Scholar 

  10. Chesley, P., Vincent, B., Xu, L., Srihari, R.: Using verbs and adjectives to automatically classify blog sentiment. In: AAAI Spring Symposium on Computational Approaches to Analysing Weblogs (2006)

    Google Scholar 

  11. Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web, WWW (2003)

    Google Scholar 

  12. Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs (1982)

    MATH  Google Scholar 

  13. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, Cambridge (1998)

    Google Scholar 

  14. He, B., Macdonald, C., He, J., Ounis, I.: An effective statistical approach to blog post opinion retrieval. In: Proc. of the 17th ACM conference on Information and knowledge management, CIKM (2008)

    Google Scholar 

  15. Kaszkiel, M., Zobel, J.: Passage retrieval revisited. In: Proc. of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR (1997)

    Google Scholar 

  16. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proc. of the 14th International Joint Conference on Artificial Intelligence, IJCAI (1995)

    Google Scholar 

  17. Kullback, S.: The kullback-leibler distance. The American Statistician 41, 340–341 (1987)

    Google Scholar 

  18. Kullback, S., Leibler, R.A.: On information and sufficiency. Annals of Mathematical Statistics 22, 49–86 (1951)

    Article  MathSciNet  Google Scholar 

  19. Lee, Y., Na, S., Kim, J., Nam, S., Jung, H., Lee, J.: Kle at trec 2008 blog track: Blog post and feed retrieval. In: Proc. of the 17th Text Retrieval Conference, TREC (2008)

    Google Scholar 

  20. Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications). Springer, Heidelberg (2007)

    MATH  Google Scholar 

  21. Liu, X., Croft, W.B.: Passage retrieval based on language models. In: Proc. of the 11th ACM Conference on Information and Knowledge Management (CIKM), pp. 375–382 (2002)

    Google Scholar 

  22. Macdonald, C., He, B., Plachouras, V., Ounis, I.: University of glasgow at trec 2005: Experiments in terabyte and enterprise tracks with terrier. In: Proc. of the 14th Text Retrieval Conference, TREC (2005)

    Google Scholar 

  23. Macdonald, C., Ounis, I.: The trec blogs06 collection: Creating and analysing a blog test collection. DCS Technical Report Series (2006)

    Google Scholar 

  24. Macdonald, C., Ounis, I., Soboroff, I.: Overview of the trec-2007 blog track. In: Proc. of the 16th Text Retrieval Conference, TREC (2007)

    Google Scholar 

  25. Manmatha, R., Rath, T., Feng, F.: Modeling score distributions for combining the outputs of search engines. In: Proc. of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR (2001)

    Google Scholar 

  26. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    MATH  Google Scholar 

  27. Na, S., Kang, I., Lee, Y., Lee, J.: Completely-arbitrary passage retrieval in language modeling approach. In: Proc. of the 4th Asia Infomation Retrieval Symposium, AIRS (2008)

    Google Scholar 

  28. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A high performance and scalable information retrieval platform. In: Proc. of the ACM SIGIR’06 Workshop on Open Source Information Retrieval, OSIR (2006)

    Google Scholar 

  29. Ounis, I., de Rijke, M., Macdonald, C., Mishne, G.A., Soboroff, I.: Overview of the trec-2006 blog track. In: TREC 2006 Working Notes (2006)

    Google Scholar 

  30. Ounis, I., Macdonald, C., Soboroff, I.: On the trec blog track. In: Proc. of the 2nd International Conference on Weblogs and Social Media, ICWSM (2008)

    Google Scholar 

  31. Ounis, I., Macdonald, C., Soboroff, I.: Overview of the trec-2008 blog track. In: Proc. of the 17th Text Retrieval Conference, TREC (2008)

    Google Scholar 

  32. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1–2), 1–135 (2008)

    Article  Google Scholar 

  33. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: Proc. of the ACL 2002 conference on Empirical Methods in Natural Language Processing, pp. 79–86 (2002)

    Google Scholar 

  34. Plachouras, P., He, B., Ounis, I.: University of glasgow at trec 2004: Experiments in web, robust and terabyte tracks with terrier. In: Proc. of the 13th Text Retrieval Conference, TREC (2004)

    Google Scholar 

  35. Porter, M.F.: An algorithm for suffix stripping. Program 3(14), 130–137 (1980)

    Google Scholar 

  36. Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: Proc. of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR (1993)

    Google Scholar 

  37. Santos, R., He, B., Macdonald, C., Ounis, I.: Integrating proximity to subjective sentences for blog opinion retrieval. In: Proc. of the 31st European Conference on IR Research, ECIR (2009)

    Google Scholar 

  38. Santos, R.L.T., He, B., Macdonald, C., Ounis, I.: Integrating proximity to subjective sentences for blog opinion retrieval. In: ECIR, pp. 325–336 (2009)

    Google Scholar 

  39. Skomorowski, J., Vechtomova, O.: Ad hoc retrieval of documents with topical opinion. In: Proc. of the 29th European Conference on IR Research, ECIR (2007)

    Google Scholar 

  40. Tan, S., Wang, Y., Cheng, X.: Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples. In: Proc. of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR (2008)

    Google Scholar 

  41. Turney, P.D.: Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proc. of the 40th Annual Meeting on Association for Computational Linguistics (2001)

    Google Scholar 

  42. Zhang, Q., Wang, B., Wu, L., Huang, X.: Fdu at trec 2007: opinion retrieval of blog track. In: Proc. of the 16th Text Retrieval Conference, TREC (2007)

    Google Scholar 

  43. Zipf, G.K.: Human Behavior and the Principle of Least-Effort. Addison-Wesley, Reading (1949)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Amati, G., Amodeo, G., Bianchi, M., Gaibisso, C., Gambosi, G. (2010). A Uniform Theoretic Approach to Opinion and Information Retrieval. In: Armano, G., de Gemmis, M., Semeraro, G., Vargiu, E. (eds) Intelligent Information Access. Studies in Computational Intelligence, vol 301. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14000-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14000-6_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13999-4

  • Online ISBN: 978-3-642-14000-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics