Corpus-Based Information Extraction and Opinion Mining for the Restaurant Recommendation System

  • Ekaterina PronozaEmail author
  • Elena YagunovaEmail author
  • Svetlana Volskaya
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8791)


In this paper corpus-based information extraction and opinion mining method is proposed. Our domain is restaurant reviews, and our information extraction and opinion mining module is a part of a Russian knowledge-based recommendation system.

Our method is based on thorough corpus analysis and automatic selection of machine learning models and feature sets. We also pay special attention to the verification of statistical significance.

According to the results of the research, Naive Bayes models perform well at classifying sentiment with respect to a restaurant aspect, while Logistic Regression is good at deciding on the relevance of a user’s review.

The approach proposed can be used in similar domains, for example, hotel reviews, with data represented by colloquial non-structured texts (in contrast with the domain of technical products, books, etc.) and for other languages with rich morphology and free word order.


Information extraction Opinion mining Restaurant recommendation system Machine learning 



The authors acknowledge Saint-Petersburg State University for a research grant 30.38.305.2014.


  1. 1.
    Aston, N., Liddle, J., Hu, W.: Twitter sentiment in data streams with perceptron. J. Comput. Commun. 2, 11–16 (2014)CrossRefGoogle Scholar
  2. 2.
    Bakliwal, A., Patil, A., Arora, P., Varma, V.: Towards enhanced opinion classification using NLP techniques. In: Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP), IJCNLP, pp. 101–107 (2011)Google Scholar
  3. 3.
    Benamara, F., Cesarano, C., Picariello, A., Reforgiato, D., Subrahmanian, V.S.: Sentiment analysis: adjectives and adverbs are better than adjectives alone. In: Proceedings of the International Conference on Weblogs and Social Media (ICWSM) (2007)Google Scholar
  4. 4.
    Bermingham, A., Smeaton, A.: Classifying sentiment in microblogs: is brevity an advantage? In: Proceedings of the International Conference on Information and Knowledge Management (CIKM) (2010)Google Scholar
  5. 5.
    Collobert, R., Bengio, S.: Links between Perceptrons, MLPs and SVMs. In: Proceedings of the 21th International Conference on Machine Learning (2004)Google Scholar
  6. 6.
    Das, S.R., Chen, M.Y.: Yahoo! for Amazon: sentiment parsing from small talk on the web. Manage. Sci. 53(9), 1375–1388 (2007)CrossRefGoogle Scholar
  7. 7.
    Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web, pp. 519–528 (2003)Google Scholar
  8. 8.
    Davidov, D., Tsur, O., Rappoport, A.: Enhanced sentiment learning using twitter hashtags and smileys. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 241–249. Association for Computational Linguistics (2010)Google Scholar
  9. 9.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)zbMATHMathSciNetGoogle Scholar
  10. 10.
    Devitt, A., Ahmad, K.: Is there a language of sentiment? An analysis of lexical resources for sentiment analysis. Lang. Resour. Eval. 47(2), 475–511 (2013)CrossRefGoogle Scholar
  11. 11.
    Emadzadeh, E., Nikfarjam, A., Ghauth, K.I., Why, N.K.: Learning materials recommendation using a hybrid recommender system with automated keyword extraction. World Appl. Sci. J. 9(11), 1260–1271 (2010)Google Scholar
  12. 12.
    Gatterbauer, W., Bohunsky, P., Herzog, M., Krüpl, B., Pollak, B.: Towards domain-independent information extraction from web tables. In: Proceedings of the 16th International Conference on World Wide Web, pp. 71–80 (2007)Google Scholar
  13. 13.
    Iman, R.L., Davenport, J.M.: Approximations of the critical region of the Friedman statistic. Commun. Stat. 18, 571–595 (1980)CrossRefGoogle Scholar
  14. 14.
    Kennedy, A., Inkpen, D.: Sentiment classification of movie reviews using contextual valence shifters. Comput. Intell. 22(2), 110–125 (2006)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Kotelnikov, M., Klekovkina, M.: The automatic sentiment text classification method based on emotional vocabulary. In: RCDL’2012 (2012)Google Scholar
  16. 16.
    Leksin, V.A., Nikolenko, S.I.: Semi-supervised tag extraction in a web recommender system. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds.) SISAP 2013. LNCS, vol. 8199, pp. 206–212. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  17. 17.
    Li, Y., Nie, J., Zhang, Y., Wang, B., Yan, B., Weng, F.: Contextual recommendation based on text mining. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010): Poster Volume, pp. 692–700 (2010)Google Scholar
  18. 18.
    Liu, J., Seneff, S.: Review sentiment scoring via a parse-and-paraphrase paradigm. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, pp. 161–169 (2009)Google Scholar
  19. 19.
    Marchand, M., Ginsca, A.L., Besançon, R., Mesnard, O.: [LVIC-LIMSI]: using syntactic features and multi-polarity words for sentiment analysis in twitter. In: Proceedings of the 7th International Workshop on Semantic Evaluation, pp. 418–424 (2013)Google Scholar
  20. 20.
    Narayanan, V., Arora, I., Bhatia, A.: Fast and accurate sentiment classification using an enhanced Naive Bayes model. In: Yin, H., Tang, K., Gao, Y., Klawonn, F., Lee, M., Weise, T., Li, B., Yao, X. (eds.) IDEAL 2013. LNCS, vol. 8206, pp. 194–201. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  21. 21.
    Naw, N., Hlaing, E.E.: Relevant words extraction method for recommendation system. Int. J. Emer. Technol. Adv. Eng. 3(1), 680–685 (2013)Google Scholar
  22. 22.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86 (2002)Google Scholar
  23. 23.
    Pak, A., Paroubek, P.: Language independent approach to sentiment analysis. Komp’uternaya Lingvistika i Intellektualnie Tehnologii: po materialam ezhegodnoy mezhdunarodnoy konferencii “Dialog”, vol. 11(18), RGHU, Moscow, pp. 37–50 (2012)Google Scholar
  24. 24.
    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)CrossRefGoogle Scholar
  25. 25.
    Park, D.H., Kim, H.K., Kim, J.K.: A literature review and classification of recommender systems research. Soc. Sci. 5, 290–294 (2011)Google Scholar
  26. 26.
    Pazzani, M.J., Billsus, D.: Content-based recommendation systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) The Adaptive Web, LNCS, vol. 4321, pp. 325–341. Springer, Heildelberg (2007)CrossRefGoogle Scholar
  27. 27.
    Pronoza, E., Yagunova, E., Lyashin, A.: Restaurant information extraction for the recommendation system. In: Proceedings of the 2nd Workshop on Social and Algorithmic Issues in Business Support: “Knowledge Hidden in Text”, LTC’2013, (2013)Google Scholar
  28. 28.
    Ricci, F., Rokach, L., Shapira, B., Kantor, P.: Recommender Systems Handbook. Springer, New York (2011)CrossRefzbMATHGoogle Scholar
  29. 29.
    Saif, H.: Sentiment analysis of microblogs. Mining the New World. Technical Report KMI-12-2 (2012)Google Scholar
  30. 30.
    Sarawagi, S.: Information extraction. Found. Trends Databases 1(3), 261–377 (2007)CrossRefGoogle Scholar
  31. 31.
    Shah, K., Munshi, N., Reddy, P.: Sentiment Analysis and Opinion Mining of Microblogs. In: University of Illinois at Chicago, Course CS 583 - Data Mining and Text Mining (2013).
  32. 32.
    Sharma, A., Dey, S.: An artificial neural network based approach for sentiment analysis of opinionated text. In: Proceedings of the 2012 ACM Research in Applied Computation Symposium, pp. 37–42 (2012)Google Scholar
  33. 33.
    Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L.: Syntactic n-grams as machine learning features for natural language processing. Expert Syst. Appl. 41(3), 853–860 (2014)CrossRefGoogle Scholar
  34. 34.
    Socher, R., Perelygin, A., Wy, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2013)Google Scholar
  35. 35.
    Turmo, J., Ageno, A., Català, N.: Adaptive information extraction. ACM Comput. Surv. 38(2), 3 (2006)CrossRefGoogle Scholar
  36. 36.
    Turney, P.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 417–424 (2002)Google Scholar
  37. 37.
    Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), vol. 2, pp. 90–94 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Saint-Petersburg State UniversitySaint-PetersburgRussia

Personalised recommendations