Information Systems Frontiers

, Volume 15, Issue 3, pp 331–349 | Cite as

Storing and analysing voice of the market data in the corporate data warehouse

  • Lisette García-Moya
  • Shahad Kudama
  • María José AramburuEmail author
  • Rafael Berlanga


Web opinion feeds have become one of the most popular information sources users consult before buying products or contracting services. Negative opinions about a product can have a high impact in its sales figures. As a consequence, companies are more and more concerned about how to integrate opinion data in their business intelligence models so that they can predict sales figures or define new strategic goals. After analysing the requirements of this new application, this paper proposes a multidimensional data model to integrate sentiment data extracted from opinion posts in a traditional corporate data warehouse. Then, a new sentiment data extraction method that applies semantic annotation as a means to facilitate the integration of both types of data is presented. In this method, Wikipedia is used as the main knowledge resource, together with some well-known lexicons of opinion words and other corporate data and metadata stores describing the company products like, for example, technical specifications and user manuals. The resulting information system allows users to perform new analysis tasks by using the traditional OLAP-based data warehouse operators. We have developed a case study over a set of real opinions about digital devices which are offered by a wholesale dealer. Over this case study, the quality of the extracted sentiment data is evaluated, and some query examples that illustrate the potential uses of the integrated model are provided.


Sentiment analysis Data warehouses OLAP Text processing 



This work has been partially funded by the “Ministerio de Economía y Competitividad” with contract number TIN2011-24147, and the Fundació Caixa Castelló project P1- 1B2010-49.


  1. Archak, N., Ghose, A., Ipeirotis, P.G. (2007). Show me the money!: Deriving the pricing power of product features by mining consumer reviews. In Proceedings of the 13th ACM SIGKDD (pp. 56–65).Google Scholar
  2. Berger, A., & Lafferty, J. (1999). Information retrieval as statistical translation. In Proceedings of the 22nd annual conference on research and development in information retrieval (ACM SIGIR) (pp. 222–229). Berkeley, CA.Google Scholar
  3. Berry, M.W., & Castellanos, M. (2007). Survey of text mining II: Clustering, classification, and retrieval, 1st Edn. ISBN 1848000456, 9781848000452.Google Scholar
  4. Bhide, M., Chakravarthy, V., Gupta, A., Gupta, H., Mohania, M., Puniyani, K., Roy, P., Roy, S., Sengar, V. (2008). Enhanced business intelligence using EROCS. In Proceedings of the 2008 IEEE 24th international conference on data engineering (pp. 1616–1619).Google Scholar
  5. Bryl, V., Giuliano, C., Serafini, L., Tymoshenko, K. (2010). Supporting natural language processing with background knowledge: Coreference resolution case. In International semantic web conference (1) (pp. 80–95).Google Scholar
  6. Codd, E.F. (1993). Providing OLAP (On-line Analytical Processing) to user-analysts: an IT mandate. Technical Report, E.F. Codd and Associates.Google Scholar
  7. Dánger, R., & Berlanga, R. (2009). Generating complex ontology instances from documents. Journal of Algorithms, 64(1), 16–30. 1208CrossRefGoogle Scholar
  8. Deng, H., Lyu, M.R., King, I. (2009). A generalized Co-HITS algorithm and its application to bipartite graphs. In KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 239–248). New York, NY, U.S.A.: ACM. doi: 10.1145/1557019.1557051, ISBN 978-1-60558-495-9.CrossRefGoogle Scholar
  9. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S. (2007). Duplicate record detection: a survey. IEEE Transactions on Knowledge and Data Engineering, 19, 1–16. doi: 10.1109/TKDE.2007.9, ISSN 1041-4347.CrossRefGoogle Scholar
  10. Etzioni, O., Banko, M., Soderland, S., Weld, D.S. (2008). Open information extraction from the web. Communications of the Association for Computing Machinery, 51, 68–74. doi: 10.1145/1409360.1409378, ISSN 0001-0782.CrossRefGoogle Scholar
  11. Funk, A., Li, Y., Saggion, H., Bontcheva, K., Leibold, C. (2008). Opinion analysis for business intelligence applications. In A. Duke, M. Hepp, K. Bontcheva, M.B. Vilain (Eds.), OBI, ACM international conference proceeding series (Vol. 308, p. 3). ACM, ISBN 978-1-60558-219-1.Google Scholar
  12. García, L., Anaya, H., Berlanga, R., Aramburu, M.J. (2011). Probabilistic ranking of product features from customer reviews. In Iberian conference on pattern recognition and image analysis (IbPRIA 2011). Springer (to appear in Lecture Notes in Computer Science).Google Scholar
  13. Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 168–177). New York, NY: ACM Press.Google Scholar
  14. Inmon, W.H. (2005). Building the data warehouse. Wiley.Google Scholar
  15. ISLA (2010). The WIKIXML collection.
  16. Jimeno-Yepes, A., Jiménez-Ruiz, E., Lee, V., Gaudan, S., Berlanga, R., Rebholz-Schuhmann, D. (2008). Assessment of disease named entity recognition on a corpus of annotated sentences. BMC Bioinformatics, 9(Suppl 3), S3. doi:10.1186/1471-2105-9-S3-S3.Google Scholar
  17. Johne, A. (1994). Listening to the voice of the market. International Marketing Review, 11(1), 47–59.CrossRefGoogle Scholar
  18. Kahan, J., & Koivunen, M.-R. (2001). Annotea: An open rdf infrastructure for shared web annotations. In Proceedings of the 10th international conference on World Wide Web, WWW ’01 (pp. 623–632). New York, NY, USA: ACM. doi: 10.1145/371920.372166, ISBN 1-58113-348-0.Google Scholar
  19. Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D. (2004). Semantic annotation, indexing, and retrieval. Web Semantics: Science, Services and Agents on the World Wide Web, 2(1), 49–79.CrossRefGoogle Scholar
  20. Kudama, S., Berlanga, R., García, L., Nebot, V., Aramburu, M.J. (2011). Towards tailored semantic annotation systems from Wikipedia. In Proceedings of the DEXA workshop, DEXA 2011. IEEE.Google Scholar
  21. Liu, B., Hu, M., Cheng, J. (2005). Opinion observer: Analyzing and comparing opinions on the web. In Proceedings of the 14th international conference on the World Wide Web (pp. 342–351).Google Scholar
  22. Liu, Y., Huang, X., An, A., Yu, X. (2007). ARSA: A sentiment-aware model for predicting sales performance using blogs. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 607–614).Google Scholar
  23. Lu, Y., Castellanos, M., Dayal, U., Zhai, C.X. (2011). Automatic construction of a context-aware sentiment lexicon: An optimization approach. In Proceedings of the 20th international conference on World Wide Web, WWW ’11 (pp. 347–356). New York, NY, USA: ACM. doi: 10.1145/1963405.1963456, ISBN 978-1-4503-0632-4.CrossRefGoogle Scholar
  24. Mihalcea, R., & Csomai, A. (2007). Wikify!: Linking documents to encyclopedic knowledge. In CIKM ’07: Proceedings of the sixteenth ACM conference on conference on information and knowledge management (pp. 233–242). ACM. doi: 10.1145/1321440.1321475, ISBN 978-1-59593-803-9.
  25. Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Now Publishers Inc.Google Scholar
  26. Pérez, J.M., Berlanga, R., Aramburu, M.J., Pedersen, T.B. (2007). R-Cubes: OLAP cubes contextualized with documents. In Proceedings of the IEEE 23rd international conference on data engineering (pp. 1477–1478). 1282Google Scholar
  27. Pérez, J.M., Berlanga, R., Aramburu, M.J., Pedersen, T.B. (2008a). Towards a data warehouse contextualized with web opinions. In Proceedings of the 2008 IEEE international conference on e-Business engineering (pp. 697–702).Google Scholar
  28. Pérez, J.M., Berlanga, R., Aramburu, M.J., Pedersen, T.B. (2008b). Contextualizing data warehouses with documents. Decision Support Systems, 45(1), 77–94.CrossRefGoogle Scholar
  29. Reidenbach, R.E. (2009). Listening to the voice of the market: How to increase market share and satisfy current customers. Crc Press.Google Scholar
  30. Stone, P.J., Dunphy, D.C., Smith, M.S., Ogilvie, D.M. (1966). The general inquirer: A computer approach to content analysis (Vol. 08). MIT Press.Google Scholar
  31. Uren, V., Cimiano, P., Iria, J., Handschuh, S., Vargas-Vera, M., Motta, E., Ciravegna, F. (2006). Semantic annotation for knowledge management: Requirements and a survey of the state of the art. In Web semantics: Science, services and agents on the World Wide Web (Vol. 4, no. 1, pp. 14–28). doi: 10.1016/j.websem.2005.10.002, ISSN 15708268.
  32. Wang, H., Lu, Y., Zhai, C. (2010). Latent aspect rating analysis on review text data: A rating regression approach. In Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10 (pp. 783–792). New York, NY, USA: ACM. doi: 10.1145/1835804.1835903.CrossRefGoogle Scholar
  33. Zhang, L., Liu, B., Lim, S.H., O’Brien-Strain, E. (2010). Extracting and ranking product features in opinion documents. In Proceedings of the 23rd international conference on computational linguistics (pp. 1462–1470). Beijing, China.Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Lisette García-Moya
    • 1
  • Shahad Kudama
    • 1
  • María José Aramburu
    • 1
    Email author
  • Rafael Berlanga
    • 1
  1. 1.Temporal Knowledge Bases GroupUniversitat Jaume ICastellónSpain

Personalised recommendations