Abstract
The integration of multidimensional data and machine learning seems to be natural in the area of business intelligence. On-Line Analytical Processing (OLAP) tools are frequent in this area where the data are usually represented in multidimensional datamarts and data mining tools are integrated in some of these tools. However, the efforts for a full integration of data mining and OLAP tools have not been as common as originally expected. Nowadays, this integration is mostly carried out on source code, implementing solutions that perform (i) all the operations on multidimensional data as well as (ii) the data mining algorithms to extract knowledge from these data. Hence, there now exists an important distinction between implementation-based developments where the entire solution is implemented on source code and OLAP-tool-based developments where (at least) the operations on multidimensional data are performed using an OLAP tool. This work analyses these two alternatives in cost-effective terms, performing an experimental analysis on a multidimensional problem and discussing when each approach seems to excel the other.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
We use data cube independently from the number of dimensions, although we often find this term as hypercube when more than 3 dimensions are involved in the hierarchy.
- 3.
- 4.
- 5.
See https://github.com/overcoil/X4R/issues/ for a complete list of issues with this package.
- 6.
- 7.
- 8.
- 9.
Concretely, we used biserver-ce-5.2.0.0-209 version of this software, which is not the last version but compatible. Visit http://community.pentaho.com/ for details.
- 10.
This is only necessary if, as often happens, you have attribute names written down using underscores.
References
Burger, M., Juenemann, K., Koenig, T.: RUnit: R Unit Test Framework. http://cran.r-project.org/web/packages/RUnit
Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. ACM Sigmod Rec. 26(1), 65–74 (1997)
Chow, G., Lee, N.: X4R: XMLA/MDX cube tool for R (2013). https://github.com/overcoil/X4R
Golfarelli, M., Maio, D., Rizzi, S.: The dimensional fact model: a conceptual model for data warehouses. Intl. J. Coop. Inf. Syst. 7, 215–247 (1998)
Han, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (2005)
Harding, P.: RPentaho - R Connector for Pentaho via CTools CDA and CDB interface (2013). https://github.com/piersharding/RPentaho
Harding, P.: RMDX - An XML/A OLAP MDX interface (2015). https://github.com/piersharding/RMDX
Hernández-Orallo, J., Lachiche, N., Martínez-Usó, A.: Predictive models for multidimensional data when the resolution context changes. In: Ferri, C., Flach, P., Lachiche, N. (eds.) Workshop on Learning over Multiple Contexts at ECML 2014 (LMCE) (2014)
IBM Corporation: Introduction to Aroma and SQL (2006). http://www.ibm.com/developerworks/data/tutorials/dm0607cao/dm0607cao.html
Jensen, M., Moller, T., Pedersen, T.: Specifying OLAP cubes on XML data. In: Proceedings of the 13th International Conference on Scientific and Statistical Database Management (SSDBM), pp. 101–112 (2001)
Ková\(\check{c}\), S.: Suitability analysis of data mining tools and methods. Ph.D. thesis (2012)
R Development Core Team.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2012)
Wahbeh, A.H., Al-Radaideh, Q.A., Al-Kabi, M.N., Al-Shawakfa, E.M.: A comparison study between data mining tools over some classification methods. Int. J. Adv. Comput. Sci. Appl. 2, 18–26 (2011)
Wickham, H.: Visualise line profiling results in R (2015). https://github.com/hadley/lineprof
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, third edn. Morgan Kaufmann, Burlington (2011)
Acknowledgements
We thank the anonymous reviewers for their comments, which have helped to improve this paper. This work was supported by the Spanish MINECO under grant TIN 2013-45732-C4-1-P and by Generalitat Valenciana PROMETEOII2015/ 013. This research has been developed within the REFRAME project, granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA), and funded by the Ministerio de Economía y Competitividad in Spain (PCIN-2013-037).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Martínez-Usó, A., Hernández-Orallo, J., Ramírez-Quintana, M.J., Plumed, F.M. (2015). Pentaho + R: An Integral View for Multidimensional Prediction Models. In: Puerta, J., et al. Advances in Artificial Intelligence. CAEPIA 2015. Lecture Notes in Computer Science(), vol 9422. Springer, Cham. https://doi.org/10.1007/978-3-319-24598-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-24598-0_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24597-3
Online ISBN: 978-3-319-24598-0
eBook Packages: Computer ScienceComputer Science (R0)