Abstract
In this paper, we introduce a supervised method for the generation of a dictionary of weighted opinion bearing terms from a collection of opinionated documents. We also describe how such a dictionary is used in the framework of an algorithm for opinion retrieval, that is for the problem of identifying the documents in a collection where some opinion is expressed with respect to a given query topic. Several experiments, performed on the TREC Blog collection, are reported together with their results; in these experiments, the use of different combinations of DFR (Divergence from Randomness) probabilistic models to assign weights to terms in the dictionary and to documents is studied and evaluated. The results show the stability of the method and its practical utility. Moreover, we investigate the composition of the generated lexicons, mainly focusing on the presence of stop-words. Quite surprisingly, the best performing dictionaries show a predominant presence of stop-words. Finally, we study the effectiveness of the same approach to generate dictionaries of polarity-bearing terms: preliminary results are provided.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amati, G.: Probability Models for Information Retrieval based on Divergence from Randomness. PhD thesis, Department of Computing Science, University of Glasgow (2003)
Amati, G.: Frequentist and bayesian approach to information retrieval. In: Proc. of the 28th European Conference on IR Research (ECIR) (2006)
Amati, G., Ambrosi, E., Bianchi, M., Gaibisso, C., Gambosi, G.: Fub, iasi-cnr and university of tor vergata at trec 2007 blog track. In: Proc. of the 16th Text Retrieval Conference (TREC) (2007)
Amati, G., Ambrosi, E., Bianchi, M., Gaibisso, C., Gambosi, G.: Automatic construction of an opinion-term vocabulary for ad hoc retrieval. In: Proc. of the 30th European Conference on IR Research, ECIR (2008)
Amati, G., Carpineto, C., Romano, G.: Fub at trec-10 web track: a probabilistic framework for topic relevance term weighting. In: Proc. of the 16th Text Retrieval Conference, TREC (2001)
Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring divergence from randomness. ACM Transactions on Information Systems 20(4), 357–389 (2002)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Benamara, F., Cesarano, C., Picariello, A., Reforgiato, D.: Sentiment analysis: Adjectives and adverbs are better than adjectives alone. In: Proc. of the 1st International Conference on Weblogs and Social Media, ICWSM (2007)
Chang, J.S., Luo, Y.F., Su, K.Y.: Gpsm: A generalized probabilistic semantic model for ambiguity resolution. In: Proc. of the 30th annual meeting on Association for Computational Linguistics (1992)
Chesley, P., Vincent, B., Xu, L., Srihari, R.: Using verbs and adjectives to automatically classify blog sentiment. In: AAAI Spring Symposium on Computational Approaches to Analysing Weblogs (2006)
Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web, WWW (2003)
Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs (1982)
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, Cambridge (1998)
He, B., Macdonald, C., He, J., Ounis, I.: An effective statistical approach to blog post opinion retrieval. In: Proc. of the 17th ACM conference on Information and knowledge management, CIKM (2008)
Kaszkiel, M., Zobel, J.: Passage retrieval revisited. In: Proc. of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR (1997)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proc. of the 14th International Joint Conference on Artificial Intelligence, IJCAI (1995)
Kullback, S.: The kullback-leibler distance. The American Statistician 41, 340–341 (1987)
Kullback, S., Leibler, R.A.: On information and sufficiency. Annals of Mathematical Statistics 22, 49–86 (1951)
Lee, Y., Na, S., Kim, J., Nam, S., Jung, H., Lee, J.: Kle at trec 2008 blog track: Blog post and feed retrieval. In: Proc. of the 17th Text Retrieval Conference, TREC (2008)
Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications). Springer, Heidelberg (2007)
Liu, X., Croft, W.B.: Passage retrieval based on language models. In: Proc. of the 11th ACM Conference on Information and Knowledge Management (CIKM), pp. 375–382 (2002)
Macdonald, C., He, B., Plachouras, V., Ounis, I.: University of glasgow at trec 2005: Experiments in terabyte and enterprise tracks with terrier. In: Proc. of the 14th Text Retrieval Conference, TREC (2005)
Macdonald, C., Ounis, I.: The trec blogs06 collection: Creating and analysing a blog test collection. DCS Technical Report Series (2006)
Macdonald, C., Ounis, I., Soboroff, I.: Overview of the trec-2007 blog track. In: Proc. of the 16th Text Retrieval Conference, TREC (2007)
Manmatha, R., Rath, T., Feng, F.: Modeling score distributions for combining the outputs of search engines. In: Proc. of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR (2001)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Na, S., Kang, I., Lee, Y., Lee, J.: Completely-arbitrary passage retrieval in language modeling approach. In: Proc. of the 4th Asia Infomation Retrieval Symposium, AIRS (2008)
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A high performance and scalable information retrieval platform. In: Proc. of the ACM SIGIR’06 Workshop on Open Source Information Retrieval, OSIR (2006)
Ounis, I., de Rijke, M., Macdonald, C., Mishne, G.A., Soboroff, I.: Overview of the trec-2006 blog track. In: TREC 2006 Working Notes (2006)
Ounis, I., Macdonald, C., Soboroff, I.: On the trec blog track. In: Proc. of the 2nd International Conference on Weblogs and Social Media, ICWSM (2008)
Ounis, I., Macdonald, C., Soboroff, I.: Overview of the trec-2008 blog track. In: Proc. of the 17th Text Retrieval Conference, TREC (2008)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1–2), 1–135 (2008)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: Proc. of the ACL 2002 conference on Empirical Methods in Natural Language Processing, pp. 79–86 (2002)
Plachouras, P., He, B., Ounis, I.: University of glasgow at trec 2004: Experiments in web, robust and terabyte tracks with terrier. In: Proc. of the 13th Text Retrieval Conference, TREC (2004)
Porter, M.F.: An algorithm for suffix stripping. Program 3(14), 130–137 (1980)
Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: Proc. of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR (1993)
Santos, R., He, B., Macdonald, C., Ounis, I.: Integrating proximity to subjective sentences for blog opinion retrieval. In: Proc. of the 31st European Conference on IR Research, ECIR (2009)
Santos, R.L.T., He, B., Macdonald, C., Ounis, I.: Integrating proximity to subjective sentences for blog opinion retrieval. In: ECIR, pp. 325–336 (2009)
Skomorowski, J., Vechtomova, O.: Ad hoc retrieval of documents with topical opinion. In: Proc. of the 29th European Conference on IR Research, ECIR (2007)
Tan, S., Wang, Y., Cheng, X.: Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples. In: Proc. of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR (2008)
Turney, P.D.: Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proc. of the 40th Annual Meeting on Association for Computational Linguistics (2001)
Zhang, Q., Wang, B., Wu, L., Huang, X.: Fdu at trec 2007: opinion retrieval of blog track. In: Proc. of the 16th Text Retrieval Conference, TREC (2007)
Zipf, G.K.: Human Behavior and the Principle of Least-Effort. Addison-Wesley, Reading (1949)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Amati, G., Amodeo, G., Bianchi, M., Gaibisso, C., Gambosi, G. (2010). A Uniform Theoretic Approach to Opinion and Information Retrieval. In: Armano, G., de Gemmis, M., Semeraro, G., Vargiu, E. (eds) Intelligent Information Access. Studies in Computational Intelligence, vol 301. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14000-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-14000-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13999-4
Online ISBN: 978-3-642-14000-6
eBook Packages: EngineeringEngineering (R0)