Skip to main content
Log in

A framework for enriching Data Warehouse analysis with Question Answering systems

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Business Intelligence (BI) applications allow their users to query, understand, and analyze existing data within their organizations in order to acquire useful knowledge, thus making better strategic decisions. The core of BI applications is a Data Warehouse (DW), which integrates several heterogeneous structured data sources in a common repository of data. However, there is a common agreement in that the next generation of BI applications should consider data not only from their internal data sources, but also data from different external sources (e.g. Big Data, blogs, social networks, etc.), where relevant update information from competitors may provide crucial information in order to take the right decisions. This external data is usually obtained through traditional Web search engines, with a significant effort from users in analyzing the returned information and in incorporating this information into the BI application. In this paper, we propose to integrate the DW internal structured data, with the external unstructured data obtained with Question Answering (QA) techniques. The integration is achieved seamlessly through the presentation of the data returned by the DW and the QA systems into dashboards that allow the user to handle both types of data. Moreover, the QA results are stored in a persistent way through a new DW repository in order to facilitate comparison of the obtained results with different questions or even the same question with different dates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Question Answering systems represent the potential future of Web search engines because QA returns specific answers as well as documents. It supposes the combination of IR and IE techniques.

  2. Information Retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources. This activity is currently quite popularized by the Web search engines as Google.

  3. Information Extraction is the task of automatically extracting specific structured information from unstructured and/or semi-structured machine-readable documents. A typical application of IE is to scan a set of documents written in a natural language and populate a database with the information extracted (e.g. the name of products and their prices).

  4. http://www.clef-initiative.eu// (visited on 24th of March, 2013).

  5. http://nlp.lsi.upc.edu/freeling/ (visited on 24th of March, 2013).

  6. http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/ (visited on 24th of March, 2013).

  7. http://www.wordnet-online.com (visited on 24th of March, 2013).

  8. Each passage is formed by a number of consecutive sentences in the document. In this case, the IR-n system (our passage retrieval tool) returns the most relevant passage formed by eight consecutive sentences.

  9. MRR means the inverse of the rank of the first correct answer. For example, MRR = 1 if the first returned document contains the answer for the query, MRR = 1/2 if the first returned document that contains a correct answer is in the second position, and so on.

References

  • Abelló, A., Ferrarons, J., Romero, O. (2011). Building cubes with MapReduce. In Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP (pp. 17–24).

  • Badia, A. (2006). Text warehousing: Present and future. In Processing and Managing Complex Data for Decision Support. In J. Darmont and O. Boussaïd, (Eds.) (pp. 96–121). Idea Group Publishing.

  • Bhide, M., Chakravarthy, V., Gupta, A., Gupta, H., Mohania, M., Puniyani, K., Roy, P., Roy, S., Sengar, V. (2008). Enhanced Business Intelligence using EROCS. In Proceedings of ICDE 2008 (pp. 1616–1619).

  • Damljanovic, D., Agatonovic, M., Cunningham, H. (2012). FREyA: An interactive way of querying Linked Data using natural language. In The Semantic Web: ESWC 2011 Workshops (pp. 125–138).

  • Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113.

    Article  Google Scholar 

  • Dean, M., & Schreiber, G. (2004). OWL Web Ontology Language Reference. W3C Recommendation, http://www.w3.org/TR/2004/REC-owl-ref-20040210/ (visited on 24th of March, 2013).

  • Eckerson, W. (2007). Dashboard or scorecard: which should you use? [Online]. Available: http://www.tdan.com/view-articles/4674 (visited on 24th of March, 2013).

  • Ferrández, A., & Peral, J. (2010). The benefits of the interaction between data warehouses and question answering. EDBT/ICDT Workshops 2010, Article No. 15,(pp. 1–8).

  • Ferrández, A., Palomar, M., & Moreno, L. (1999). An empirical approach to Spanish anaphora resolution. Machine Translation, 14(3/4), 191–216.

    Article  Google Scholar 

  • Ferrández, S., Roger, S., Ferrández, A., López-Moreno, P. (2006). A New Proposal of Word Sense Disambiguation for Nouns on a Question Answering System. Advances in Natural Language Processing. Research in Computing Science (pp). 83–92.

  • Ferrández, S., Toral, A., Ferrández, O., Ferrández, A., & Muñoz, R. (2009). Exploiting Wikipedia and EuroWordNet to solve cross–lingual question answering. Information Sciences, 179(20), 3473–3488.

    Article  Google Scholar 

  • Ferré, S. (2012). SQUALL: A Controlled Natural Language for Querying and Updating RDF Graphs. Controlled Natural Language (pp. 11–25).

  • Gartner Group report. (2011). Gartner Says Solving ‘Big Data’ Challenge Involves More Than Just Managing Volumes of Data. [Online]. Available: http://web.archive.org/web/20110710043533/http://www.gartner.com/it/page.jsp?id=1731916 (visited on 24th of March, 2013).

  • Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pello, F., & Pirahesh, H. (1997). Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Knowledge Discovery, 1(1), 29–53.

    Article  Google Scholar 

  • Henrich, A., & Morgenroth, K. (2003). Supporting Collaborative Software Development by Context-Aware Information Retrieval Facilities. In Proceedings of the DEXA 2003 Workshop on Web Based Collaboration (WBC 2003) (pp. 249–253).

  • IBM. Business insights workbench. [Online]. Available: http://domino.watson.ibm.com/comm/research.nsf/pages/r.servcomp.innovation2.html (visited on 24th of March, 2013).

  • Inmon, W. (2005). Building the data warehouse. Ed: Wiley publishing.

  • Kaufmann, E., Bernstein, A., Zumstein, R. (2006). Querix: A natural language interface to query ontologies based on clarification dialogs. In 5th International Semantic Web Conference (ISWC 2006) (pp. 980–981).

  • Kerui, C., Wanli, Z., Fengling, H., Yongheng, C., & Ying, W. (2011). Data extraction and annotation based on domain-specific ontology evolution for deep web. Computer Science and Information Systems, 8(3), 673–692.

    Article  Google Scholar 

  • Kimball, R., & Ross, M. (2002). The data warehouse toolkit: the complete guide to dimensional modelling, Ed: Wiley publishing.

  • LaBrie, R. C., & St. Louis, R. D. (2005). Dynamic hierarchies for business intelligence Information retrieval. Internationl Journal of Internet and Enterprise Management 2005, 3(1), 3–23.

    Google Scholar 

  • Lim, N.R.T., Saint-Dizier, P. Gay, B., Roxas, R.E. (2009). A preliminary study of comparative and evaluative questions for business intelligence. International Symposium on Natural Language Processing, SNLP’09 (pp. 35–41).

  • Llopis, M., & Ferrández, A. (2012). How to make a natural language interface to query databases accessible to everyone: an example. Computer Standards & Interfaces. doi:10.1016/j.csi.2012.09.005.

    Google Scholar 

  • Llopis, F., Vicedo, J. L., & Ferrández, A. (2003). IR-n system at CLEF-2002. LNCS, 2785, 291–300.

    Google Scholar 

  • Lopez, V., Pasin, M., Motta, E. (2005). Aqualog: An ontology-portable question answering system for the semantic web. The Semantic Web: Research and Applications (pp.135-166).

  • Losiewicz, P., Oard, D., & Kostoff, R. (2000). Textual data mining to support science and technology management. Journal of Intelligent Information Systems, 15(2), 99–119.

    Article  Google Scholar 

  • Luján-Mora, S., Trujillo, J., & Song, I. (2006). A UML profile for multidimensional modeling in data warehouses. Data and Knowledge Engineering, 59(3), 725–769.

    Article  Google Scholar 

  • Maté, A., Llorens, H., de Gregorio, E. (2012). An Integrated Multidimensional Modeling Approach to Access Big Data in Business Intelligence Platforms. ER’12 Proceedings of the 2012 international conference on Advances in Conceptual Modeling (pp.111-120).

  • Maté, A., Trujillo, J., Mylopoulos, J. (2012). Conceptualizing and specifying key performance indicators in business strategy models. 31st International Conference on Conceptual Modeling (ER) (pp. 282–291).

  • Mazón, J. N., & Trujillo, J. (2008). An MDA approach for the development of data warehouses. Decision Support Systems, 45(1), 41–58.

    Article  Google Scholar 

  • Mazón, J. N., Trujillo, J., & Lechtenbörger, J. (2007). Reconciling requirement-driven data warehouses with data sources via multidimensional normal forms. Data and Knowledge Engineering, 63(3), 725–751.

    Article  Google Scholar 

  • McCabe, M. C., Lee, J., Chowdhury, A., Grossman, D., Frieder, O. (2000). On the design and evaluation of a multi-dimensional approach to information retrieval. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval (pp. 363–365).

  • Patel-Schneider, PF., Hayes, P., Horrocks, I. (2004). OWL Web Ontology Language Semantics and Abstract Syntax. W3C Recommendation, http://www.w3.org/TR/2004/REC-owl-semantics-20040210/ (visited on 24th of March, 2013).

  • Pérez-Martínez, J.M. (2007). Contextualizing a Data Warehouse with Documents. Ph. D. Thesis.

  • Pérez-Martínez, J. M., Berlanga, R., Aramburu, M. J., & Pedersen, T. B. (2008a). Contextualizing data warehouses with documents. Decision Support Systems, 45(1), 77–94.

    Article  Google Scholar 

  • Pérez-Martínez, J. M., Berlanga, R., Aramburu, M. J., & Pedersen, T. B. (2008b). Integrating data warehouses with web data: a survey. IEEE Transactions on Knowledge Data Engineering, 20(7), 940–955.

    Article  Google Scholar 

  • Pérez-Martínez, J. M., Berlanga, R., & Aramburu, M. J. (2009). A relevance model for a data warehouse contextualized with documents. Information Processing Management, 45(3), 356–367.

    Article  Google Scholar 

  • Priebe, T., & Pernul, G. (2003a). Towards integrative enterprise knowledge portals. In Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM’03) (pp. 216–223).

  • Priebe, T., & Pernul, G. (2003b). Ontology-based Integration of OLAP and Information Retrieval. In Proceedings of the 14th International Workshop on Database and Expert Systems Applications (DEXA’03) (pp. 610–614).

  • QL2. Real-time web data solutions for better business intelligence. [Online]. Available: http://www.ql2.com/ (visited on 24th of March, 2013).

  • Qu, S., Wang, Q., Liu, K., Zou, Y. (2007). Data Warehouse Design for Chinese Intelligent Question Answering System Based on Data Mining. In Proceedings of the 2nd International Conference on Innovative Computing, Information and Control (ICICIC 2007) (pp. 180–183).

  • Rahm, E., & Bernstein, P. (2001). A survey of approaches to automatic schema matching. The VLDB Journal, 10(4), 334–350.

    Article  MATH  Google Scholar 

  • Rieger, B., Kleber, A., von Maur, E. (2000). Metadatabased Integration of Qualitative and Quantitative Information Resources Approaching Knowledge Management. In Proceedings of the 8th European Conference of Information Systems (pp. 372–378).

  • Roger, S., Vila, K., Ferrández, A., Pardiño, M., Gómez, J. M., Puchol-Blasco, M., & Peral, J. (2009). Using AliQAn in Monolingual QA@CLEF 2008. LNCS, 5706, 333–336.

    Google Scholar 

  • Roussinov, D., & Robles-Flores, J. A. (2004). Web question answering: technology and business applications. Proceedings of the Tenth Americas Conference on Information Systems, 3(1), 46–62.

    Google Scholar 

  • Santoso, H., Haw, S., & Abdul-Mehdi, Z. T. (2010). Ontology extraction from relational database: concept hierarchy as background knowledge. Knowledge-Based Systems, 24(3), 457–464.

    Article  Google Scholar 

  • Stanojevic, M., & Vraneš, S. (2012). Representation of texts in structured form. Computer Science and Information Systems, 9(1), 23–47.

    Article  Google Scholar 

  • Trujillo, J., & Maté, A. (2012). Business intelligence 2.0: a general overview. Lecture Notes in Business Information Processing, 96(1), 98–116.

    Article  Google Scholar 

  • Wang, C., Xiong, M., Zhou, Q., Yu, Y. (2007). Panto: A portable natural language interface to ontologies. In Proceedings of the 4th European Semantic Web Conference (pp.473-487).

Download references

Acknowledgments

This paper has been partially supported by the MESOLAP (TIN2010-14860), GEODAS-BI (TIN2012-37493-C03-03), LEGOLANG-UAGE (TIN2012-31224) and DIIM2.0 (PROMETEOII/2014/001) projects from the Spanish Ministry of Education and Competitivity. Alejandro Maté is funded by the Generalitat Valenciana under an ACIF grant (ACIF/2010/298).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jesús Peral.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ferrández, A., Maté, A., Peral, J. et al. A framework for enriching Data Warehouse analysis with Question Answering systems. J Intell Inf Syst 46, 61–82 (2016). https://doi.org/10.1007/s10844-014-0351-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-014-0351-2

Keywords

Navigation