World Wide Web

, Volume 16, Issue 5–6, pp 763–791 | Cite as

Can predicate-argument structures be used for contextual opinion retrieval from blogs?

  • Sylvester O. Orimaye
  • Saadat M. Alhashmi
  • Eu-Gene Siew
Article

Abstract

We present the results of our investigation on the use of predicate-argument structures for contextual opinion retrieval. The use of predicate-argument structure for opinion retrieval is a novel approach that exploits the grammatical derivation of sentences to show contextual and subjective relevance. We do not use frequency of certain keywords as it is usually done in keyword-based opinion retrieval approaches. Rather, our novel solution is based on frequency of contextually relevant and subjective sentences. We use a linear relevance model that leverages semantic similarities among predicate-argument structures of sentences. Thus, this paper presents the evaluation results of the linear relevance model. The model does a linear combination of a popular relevance model, our proposed transformed terms similarity model, and the absolute value of a sentence subjectivity scoring scheme. The predicate-argument structures are derived from the grammatical derivations of natural language query topics and the well formed sentences from blog documents. The derived predicate-argument structures are then semantically compared to compute an opinion relevance score. Our scoring technique uses the highest frequency of semantically related predicate-argument structures enriched with the total subjectivity score from sentences. Evaluation and experimental results show that predicate-argument structures can indeed be used for contextual opinion retrieval as it improves performance of opinion retrieval task by 15% over the popular TREC baselines.

Keywords

contextual opinion retrieval predicate-argument structures sentence-level linear relevance model 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agarwal, N., Liu, H.: Blogosphere: research issues, tools, and applications. SIGKDD Explor. Newsl. 10(1), 18–31 (2008)CrossRefGoogle Scholar
  2. 2.
    Akaike, H.: Likelihood of a model and information criteria. Econometrics 16, 3–14 (1981)CrossRefMATHGoogle Scholar
  3. 3.
    Akaike, H.: Factor analysis and AIC. Psychometrika 52(3), 317–332 (1987)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Amati, G., Rijsbergen, C.J.V.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20(4), 357–389 (2002)CrossRefGoogle Scholar
  5. 5.
    Amati, G., Amodeo, G., Bianchi, M., Gaibisso, C., Gambosi, G.: A Uniform Theoretic Approach to Opinion and Information Retrieval. In: Armano, G., de Gemmis, M., Semeraro, G., Vargiu, E. (eds.) Intelligent Information Access, vol. 301. Studies in Computational Intelligence, pp. 83-108. Springer Berlin/Heidelberg, (2010)Google Scholar
  6. 6.
    Bermingham, A., Smeaton, A.F.: A study of inter-annotator agreement for opinion retrieval. In: Proc. of the 32nd international ACM SIGIR conference on Research and development in information retrieval, Boston, MA, USA (2009)Google Scholar
  7. 7.
    Boiy, E., Moens, M.-F.: A machine learning approach to sentiment analysis in multilingual Web texts. Inf. Retriev. 12(5), 526–558 (2009)CrossRefGoogle Scholar
  8. 8.
    Bozdogan, H.: Model selection and Akaike’s Information Criterion (AIC): the general theory and its analytical extentions. Psychometrika 52(3), 345–370 (1987)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Burnham, K.P., Anderson, D.R: Model Selection and Multimodel Inference. Springer-Verlag New York, Inc. (2002)Google Scholar
  10. 10.
    Charniak, E.: A maximum-entropy-inspired parser. In: Proc. of the 1st North American chapter of the Association for Computational Linguistics conference, Seattle, Washington (2000)Google Scholar
  11. 11.
    Charniak, E.: Top-down nearly-context-sensitive parsing. In: Proc. of the 2010 Conference on Empirical Methods in Natural Language Processing, Cambridge, Massachusetts (2010)Google Scholar
  12. 12.
    Clark, S., Curran, J.R.: Wide-coverage efficient statistical parsing with ccg and log-linear models. Comput. Linguist. 33(4), 493–552 (2007)CrossRefMATHGoogle Scholar
  13. 13.
    Curran, J.R., Clark, S., Bos, J.: Linguistically motivated large-scale NLP with C & C and boxer. In: Proc. of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, Prague, Czech Republic (2007)Google Scholar
  14. 14.
    Ding, X., Liu, B.: The utility of linguistic rules in opinion mining. In: Proc. of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, Amsterdam, The Netherlands (2007)Google Scholar
  15. 15.
    Ding, X., Liu, B., Yu, P.S.: A holistic lexicon-based approach to opinion mining. In: Proc. of the international conference on Web search and web data mining, Palo Alto, California, USA (2008)Google Scholar
  16. 16.
    Du, W., Tan, S.: An iterative reinforcement approach for fine-grained opinion mining. In: Proc. of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, Colorado (2009)Google Scholar
  17. 17.
    Duan, H., Hsu, B.-J.: Online spelling correction for query completion. In: Proc. of the 20th international conference on World Wide Web, Hyderabad, India (2011)Google Scholar
  18. 18.
    Dumais, S.T., Furnas, G.W., Landauer, T.K., Deerwester, S., Harshman, R.: Using latent semantic analysis to improve access to textual information. In: Proc. of the SIGCHI conference on Human factors in computing systems, Washington, D.C., USA, (1988)Google Scholar
  19. 19.
    Esuli, A.: Automatic generation of lexical resources for opinion mining: models, algorithms and applications. SIGIR Forum 42(2), 105–106 (2008)CrossRefGoogle Scholar
  20. 20.
    Fernández, R.T., Losada, D.E, Azzopardi, L.A: Extending the language modeling framework for sentence retrieval to include local context. Information Retrieval, 1-35 (2010)Google Scholar
  21. 21.
    Gerani, S., Carman, M.J., Crestani, F.: Proximity-Based Opinion Retrieval. SIGIR ACM,Geneva, Switzerland, 978 (2010)Google Scholar
  22. 22.
    Gildea, D., Hockenmaier, J.: Identifying semantic roles using Combinatory Categorial Grammar. In: Proc. of the 2003 conference on Empirical methods in natural language processing, Sapporo, Japan (2003)Google Scholar
  23. 23.
    He, B., Macdonald, C., Ounis, I.: Ranking opinionated blog posts using OpinionFinder. In: Proc. of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, Singapore (2008)Google Scholar
  24. 24.
    Hiemstra, D.: Using language models for information retrieval. Centre for Telematics and Information Technology, The Netherlands (2000)Google Scholar
  25. 25.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, California, USA (1999)Google Scholar
  26. 26.
    Huang, X., Croft, W.B.: A unified relevance model for opinion retrieval. In: Proc. of the 18th ACM conference on Information and knowledge management, Hong Kong, China (2009)Google Scholar
  27. 27.
    Huang, J., Efthimiadis, E.N.: Analyzing and evaluating query reformulation strategies in web search logs. In: Proc. of the 18th ACM conference on Information and knowledge management, Hong Kong, China (2009)Google Scholar
  28. 28.
    Javanmardi, S., Gao, J., Wang, K.: Optimizing two stage bigram language models for IR. In: Proc. of the 19th international conference on World Wide Web, Raleigh, North Carolina, USA (2010)Google Scholar
  29. 29.
    Jones, R., Rey, B., Madani, O., Greiner, W.: Generating query substitutions. In: Proc. of the 15th international conference on World Wide Web, Edinburgh, Scotland (2006)Google Scholar
  30. 30.
    Kanayama, H., Nasukawa, T.: Fully automatic lexicon expansion for domain-oriented sentiment analysis. In: Proc. of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia (2006)Google Scholar
  31. 31.
    Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proc. of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, New Orleans, Louisiana, USA (2001)Google Scholar
  32. 32.
    Lee Y, Jung H-y, Song W, Lee J-H. Mining the blogosphere for top news stories identification. In: Proc. of the 33rd international ACM SIGIR conference on Research and development in information retrieval, Geneva, Switzerland; 2010.Google Scholar
  33. 33.
    Lee, S.-W., Lee, J.-T., Song, Y.-I., Rim, H.-C.: High precision opinion retrieval using sentiment-relevance flows. In: Proc. of the 33rd international ACM SIGIR conference on Research and development in information retrieval, Geneva, Switzerland (2010)Google Scholar
  34. 34.
    Leung, C., Chan, S., Chung, F-l, Ngai, G.: A probabilistic rating inference framework for mining user preferences from reviews. World Wide Web 14(2), 187–215 (2011)CrossRefGoogle Scholar
  35. 35.
    Liu, B.: Sentiment analysis and subjectivity. Handbook of Natural Language Processing, Second Edition (2010)Google Scholar
  36. 36.
    Lv, Y., Zhai, C.: A comparative study of methods for estimating query language models with pseudo feedback. In: Proc. of the 18th ACM conference on Information and knowledge management, Hong Kong, China (2009)Google Scholar
  37. 37.
    Macdonald, C., Santos, R.L.T., Ounis, I., Soboroff, I.: Blog track research at TREC. SIGIR Forum 44(1), 58–75 (2010)CrossRefGoogle Scholar
  38. 38.
    Mukherjee, S., Ramakrishnan, I.V.: Automated semantic analysis of schematic data. World Wide Web 11(4), 427–464 (2008)CrossRefGoogle Scholar
  39. 39.
    Müller, C., Gurevych, I.: Using Wikipedia and Wiktionary in Domain-Specific Information Retrieval. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) Evaluating Systems for Multilingual and Multimodal Information Access, vol. 5706. Lecture Notes in Computer Science, pp. 219-226. Springer Berlin/Heidelberg (2009)Google Scholar
  40. 40.
    Munson, S.A., Resnick, P.: Presenting diverse political opinions: how and how much. In: Proc. of the 28th international conference on Human factors in computing systems, Atlanta, Georgia, USA (2010)Google Scholar
  41. 41.
    Nam, S.-H., Na, S.-H., Lee, Y., Lee, J.-H.: DiffPost: Filtering Non-relevant Content Based on Content Difference between Two Consecutive Blog Posts. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) Advances in Information Retrieval, vol. 5478. Lecture Notes in Computer Science, pp. 791-795. Springer Berlin/Heidelberg (2009)Google Scholar
  42. 42.
    Natalie, S.G., Matthew, H., Takashi, T.: BlogPulse: Automated Trend Discovery for Weblogs. In. WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation (2004)Google Scholar
  43. 43.
    Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: Proc. of ACM SIGIR'06 Workshop on Open Source Information Retrieval (OSIR 2006), Seattle, Washington, USA (2006)Google Scholar
  44. 44.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proc. of the ACL-02 conference on Empirical methods in natural language processing, Philadelphia, USA (2002)Google Scholar
  45. 45.
    Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)CrossRefGoogle Scholar
  46. 46.
    Rijsbergen, C.J.V.: A Theoretical Basis for the use of Co-Occurrence Data in Information Retrieval. J. Doc. 33(2), 106–119 (1977)CrossRefGoogle Scholar
  47. 47.
    Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends in Inf. Retriev. 3(4), 333–389 (2009)CrossRefGoogle Scholar
  48. 48.
    Santos, R.L.T, He, B., Macdonald, C., Ounis, I.: Integrating Proximity to Subjective Sentences for Blog Opinion Retrieval. ECIR Advances in Information Retrieval 5478/2009, 325-336 (2009)Google Scholar
  49. 49.
    Sarmento, S., Carvalho, P., Silva, M.-J., Eugénio de Oliveira: Automatic creation of a reference corpus for political opinion mining in user-generated content. In: Proc. of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion, Hong Kong, China (2009)Google Scholar
  50. 50.
    Siersdorfer, S.,Chelaru, S., Pedro, J.-S: How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings. In: Proc. of the 19th International World Wide Web Conference, Raleigh, North Carolina, USA, 891-900 (2010)Google Scholar
  51. 51.
    Steedman, M.: The Syntactic Process (Language, Speech, and Communication). The MIT Press (2000)Google Scholar
  52. 52.
    Surdeanu, M., Harabagiu, S., Williams, J., Aarseth, P.: Using predicate-argument structures for information extraction. In: Proc. of the 41st Annual Meeting on Association for Computational Linguistics, Sapporo, Japan (2003)Google Scholar
  53. 53.
    Tata, S., Patel, J.M.: Estimating the selectivity of < i > tf-idf</i > based cosine similarity predicates. SIGMOD Rec. 36(4), 75–80 (2007)CrossRefGoogle Scholar
  54. 54.
    Thet, T.T., Na, J.-C., Khoo, C.S.G., Shakthikumar, S.: Sentiment analysis of movie reviews on discussion boards using a linguistic approach. In: Proc. of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion, Hong Kong, China (2009)Google Scholar
  55. 55.
    Tumasjan, A., Sprenger, T.-O., Sandner, P.-J., Welpe, I.-M.: Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. In: Proc. of the Fourth International AAAI Conference on Weblogs and Social Media (2010)Google Scholar
  56. 56.
    Wei, Z., Clement, Y.: UIC at TREC 2006 Blog Track. In: TREC (ed.). (2006)Google Scholar
  57. 57.
    Wiebe, J., Wilson, T., Cardie, C.: Annotating expressions of opinions and emotions in language. Lang. Res. and Eval. 39(2/3), 165–210 (2005)CrossRefGoogle Scholar
  58. 58.
    Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: OpinionFinder: a system for subjectivity analysis. In: Proc. of HLT/EMNLP on Interactive Demonstrations, Vancouver, British Columbia, Canada (2005)Google Scholar
  59. 59.
    Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis. Comput. Linguist. 35(3), 399–433 (2009)CrossRefGoogle Scholar
  60. 60.
    Xu, X., Liu, Y., Xu, H., Yu, X., Song, L., Guan, F., Peng, Z., Cheng, X.: ICTNET at Blog Track TREC 2009. TREC 2009 (2009)Google Scholar
  61. 61.
    Zafarani, R., Cole, W., Liu, H.: Sentiment propagation in social networks: a case study in livejournal. In: Chai, S.-K., Salerno, J., Mabry, P. (eds.) Advances in Social Computing, vol. 6007. Lecture Notes in Computer Science, pp. 413–420. Springer, Berlin (2010)Google Scholar
  62. 62.
    Zhai, C.: Statistical language models for information retrieval a critical review. Foundations and Trends in Inf. Retriev. 2(3), 137–213 (2008)CrossRefGoogle Scholar
  63. 63.
    Zhang, W., Yu, C., Meng, W.: Opinion retrieval from blogs. In: Proc. of the sixteenth ACM conference on Conference on information and knowledge management, Lisbon, Portugal (2007)Google Scholar
  64. 64.
    Zhang, R., Tran, T., Mao, Y.: Opinion helpfulness prediction in the presence of “words of few mouths”. World Wide Web, 1-22 (2011)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Sylvester O. Orimaye
    • 1
  • Saadat M. Alhashmi
    • 1
  • Eu-Gene Siew
    • 1
  1. 1.Faculty of Information TechnologyMonash UniversityBandar SunwayMalaysia

Personalised recommendations