Customer Reviews Analysis Based on Information Extraction Approaches

  • Haiqing ZhangEmail author
  • Aicha Sekhari
  • Florendia Fourli-Kartsouni
  • Yacine Ouzrout
  • Abdelaziz Bouras
Conference paper
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 467)


The existing information extraction approaches are generally analyzed and then categorized into several groups based on the superiority and the intelligence of the approaches as well as their capability to solve complex problems. Two practical approaches are provided to clarify how to use the information extraction solutions to obtain the valuable information from numerous reviews. The first approach is to support the front-end services in the EASY-IMP project. The customer preference and the optimum interest of customers is determined based on TF-IDF approach. Roughly 100,000 pages have been analyzed and the customer preference is studied based on the most relevant keywords. However, TF-IDF approach limits on the capability to provide the personalized infromation, which can only obtain the restricted information based on weights calcualtion. In order to extract more efficient customerized infromation, an opinion mining algorithm is proposed. The proposed algorithm aims to obtain sufficient information extraction results and reduce the complexity and running time of information extraction by jointly discovering the main opinion mining elements. The analyzed reviews show that the proposed algorithm can effectively and simultaneously identify the main elements.


Information extraction TF-IDF Opinion mining Dependency relations Part-of-speech 


  1. Aggarwal, C.C., Zhai, C. (eds.): Mining Text Data. Springer, Boston (2012)Google Scholar
  2. Aizawa, A.: An information-theoretic perspective of tf–idf measures. Inf. Process. Manag. 39, 45–65 (2003)CrossRefzbMATHGoogle Scholar
  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  4. Blunsom, P.: Hidden markov models. Lect. Notes August 15, 18–19 (2004)Google Scholar
  5. De Marneffe, M.-C., MacCartney, B., Manning, C.D. et al.: Generating typed dependency parses from phrase structure parses: In: Proceedings of LREC, pp. 449–454 (2006)Google Scholar
  6. Dey, L., Verma, I.: Text-driven multi-structured data analytics for enterprise intelligence. In: IEEE, pp. 213–220 (2013). doi: 10.1109/WI-IAT.2013.186
  7. Hiemstra, D.: A probabilistic justification for using tf × idf term weighting in information retrieval. Int. J. Digit. Libr. 3, 131–139 (2000)CrossRefGoogle Scholar
  8. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177. ACM (2004)Google Scholar
  9. Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization. In: Proceedings of the Workshop on Multi-Source Multilingual Information Extraction and Summarization, pp. 17–24. Association for Computational Linguistics (2008)Google Scholar
  10. Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Mining Text Data, pp. 415–463. Springer (2012)Google Scholar
  11. Li, Y., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., Jagadish, H.V.: Regular expression learning for information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 21–30. Association for Computational Linguistics (2008)Google Scholar
  12. McCallum, A.: Information extraction: distilling structured data from unstructured text. Queue 3, 48–57 (2005)CrossRefGoogle Scholar
  13. McCallum, A., Freitag, D., Pereira, F.C.: Maximum entropy markov models for information extraction and segmentation. In: ICML, pp. 591–598 (2000)Google Scholar
  14. McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Non-projective dependency parsing using spanning tree algorithms. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 523–530. Association for Computational Linguistics (2005)Google Scholar
  15. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investig. 30, 3–26 (2007)CrossRefGoogle Scholar
  16. Nivre, J.: Dependency grammar and dependency parsing. MSI Rep. 5133, 1–32 (2005)Google Scholar
  17. Paltoglou, G., Thelwall, M.: A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1386–1395. Association for Computational Linguistics (2010)Google Scholar
  18. Salton, G., Fox, E.A., Wu, H.: Extended boolean information retrieval. Commun. ACM 26, 1022–1036 (1983)MathSciNetCrossRefzbMATHGoogle Scholar
  19. Tsuruoka, Y., Tateishi, Y., Kim, J.-D., Ohta, T., McNaught, J., Ananiadou, S., Tsujii, J.: Developing a robust part-of-speech tagger for biomedical text. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 382–392. Springer, Heidelberg (2005)CrossRefGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2016

Authors and Affiliations

  • Haiqing Zhang
    • 1
    Email author
  • Aicha Sekhari
    • 1
  • Florendia Fourli-Kartsouni
    • 2
  • Yacine Ouzrout
    • 1
  • Abdelaziz Bouras
    • 3
  1. 1.DISP LaboratoryUniversity Lumière Lyon 2Bron CedexFrance
  2. 2.HypercliqAthensGreece
  3. 3.Computer Science DepartmentQatar University, ictQATARDohaQatar

Personalised recommendations