Skip to main content

Pentaho + R: An Integral View for Multidimensional Prediction Models

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (CAEPIA 2015)

Abstract

The integration of multidimensional data and machine learning seems to be natural in the area of business intelligence. On-Line Analytical Processing (OLAP) tools are frequent in this area where the data are usually represented in multidimensional datamarts and data mining tools are integrated in some of these tools. However, the efforts for a full integration of data mining and OLAP tools have not been as common as originally expected. Nowadays, this integration is mostly carried out on source code, implementing solutions that perform (i) all the operations on multidimensional data as well as (ii) the data mining algorithms to extract knowledge from these data. Hence, there now exists an important distinction between implementation-based developments where the entire solution is implemented on source code and OLAP-tool-based developments where (at least) the operations on multidimensional data are performed using an OLAP tool. This work analyses these two alternatives in cost-effective terms, performing an experimental analysis on a multidimensional problem and discussing when each approach seems to excel the other.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.pentaho.com/.

  2. 2.

    We use data cube independently from the number of dimensions, although we often find this term as hypercube when more than 3 dimensions are involved in the hierarchy.

  3. 3.

    http://www-01.ibm.com/software/de/analytics/spss.

  4. 4.

    http://www.oracle.com/technetwork/database/options/advanced-analytics/odm/index.html.

  5. 5.

    See https://github.com/overcoil/X4R/issues/ for a complete list of issues with this package.

  6. 6.

    http://www.knime.org.

  7. 7.

    https://rapidminer.com/.

  8. 8.

    http://www.kdnuggets.com/software/suites.html.

  9. 9.

    Concretely, we used biserver-ce-5.2.0.0-209 version of this software, which is not the last version but compatible. Visit http://community.pentaho.com/ for details.

  10. 10.

    This is only necessary if, as often happens, you have attribute names written down using underscores.

References

  1. Burger, M., Juenemann, K., Koenig, T.: RUnit: R Unit Test Framework. http://cran.r-project.org/web/packages/RUnit

  2. Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. ACM Sigmod Rec. 26(1), 65–74 (1997)

    Article  Google Scholar 

  3. Chow, G., Lee, N.: X4R: XMLA/MDX cube tool for R (2013). https://github.com/overcoil/X4R

  4. Golfarelli, M., Maio, D., Rizzi, S.: The dimensional fact model: a conceptual model for data warehouses. Intl. J. Coop. Inf. Syst. 7, 215–247 (1998)

    Article  Google Scholar 

  5. Han, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (2005)

    Google Scholar 

  6. Harding, P.: RPentaho - R Connector for Pentaho via CTools CDA and CDB interface (2013). https://github.com/piersharding/RPentaho

  7. Harding, P.: RMDX - An XML/A OLAP MDX interface (2015). https://github.com/piersharding/RMDX

  8. Hernández-Orallo, J., Lachiche, N., Martínez-Usó, A.: Predictive models for multidimensional data when the resolution context changes. In: Ferri, C., Flach, P., Lachiche, N. (eds.) Workshop on Learning over Multiple Contexts at ECML 2014 (LMCE) (2014)

    Google Scholar 

  9. IBM Corporation: Introduction to Aroma and SQL (2006). http://www.ibm.com/developerworks/data/tutorials/dm0607cao/dm0607cao.html

  10. Jensen, M., Moller, T., Pedersen, T.: Specifying OLAP cubes on XML data. In: Proceedings of the 13th International Conference on Scientific and Statistical Database Management (SSDBM), pp. 101–112 (2001)

    Google Scholar 

  11. Ková\(\check{c}\), S.: Suitability analysis of data mining tools and methods. Ph.D. thesis (2012)

    Google Scholar 

  12. R Development Core Team.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2012)

    Google Scholar 

  13. Wahbeh, A.H., Al-Radaideh, Q.A., Al-Kabi, M.N., Al-Shawakfa, E.M.: A comparison study between data mining tools over some classification methods. Int. J. Adv. Comput. Sci. Appl. 2, 18–26 (2011)

    Article  Google Scholar 

  14. Wickham, H.: Visualise line profiling results in R (2015). https://github.com/hadley/lineprof

  15. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, third edn. Morgan Kaufmann, Burlington (2011)

    Google Scholar 

Download references

Acknowledgements

We thank the anonymous reviewers for their comments, which have helped to improve this paper. This work was supported by the Spanish MINECO under grant TIN 2013-45732-C4-1-P and by Generalitat Valenciana PROMETEOII2015/ 013. This research has been developed within the REFRAME project, granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA), and funded by the Ministerio de Economía y Competitividad in Spain (PCIN-2013-037).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adolfo Martínez-Usó .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Martínez-Usó, A., Hernández-Orallo, J., Ramírez-Quintana, M.J., Plumed, F.M. (2015). Pentaho + R: An Integral View for Multidimensional Prediction Models. In: Puerta, J., et al. Advances in Artificial Intelligence. CAEPIA 2015. Lecture Notes in Computer Science(), vol 9422. Springer, Cham. https://doi.org/10.1007/978-3-319-24598-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24598-0_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24597-3

  • Online ISBN: 978-3-319-24598-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics