Skip to main content

Improving the Predictive Power of Business Performance Measurement Systems by Constructed Data Quality Features? Five Cases

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9165))

Abstract

Predictive power is an important objective for current business performance measurement systems and it is based on metrics design, collection and preprocessing of data and predictive modeling. A promising but less studied preprocessing activity is to construct additional features that can be interpreted to express the quality of data and thus provide predictive models not only data points but also their quality characteristics. The research problem addressed in this study is: can we improve the predictive power of business performance measurement systems by constructing additional data quality features? Unsupervised, supervised and domain knowledge approaches were used to operationalize eight features based on elementary data quality dimensions. In the case studies five corporate datasets Toyota Material Handling Finland, Innolink group, 3StepIt, Papua Merchandising and Lempesti constructed data quality features performed better than minimally processed data sets in 29/38 and equally in 9/38 tests. Comparison to a competing method of preprocessing combinations with the first two datasets showed that constructed features had slightly lower prediction performance, but they were clearly better in execution time and easiness of use. Additionally, constructed data quality features helped to visually explore high dimensional data quality patterns. Further research is needed to expand the range of constructed features and to map the findings systematically to data quality concepts and practices.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abdul-Rahmana, S., Abu Bakara, A., Hussein, B., Zeti, A.: An intelligent data pre-processing of complex datasets. Intell. Data Anal. 16, 305–325 (2012)

    Google Scholar 

  2. Bellman, R.E.: Dynamic Programming. Rand Corporation, Princeton University Press, New Jersey (1957)

    MATH  Google Scholar 

  3. Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F.: Guide to Intelligent Data Analysis – How to Intelligently Make Sense of Real Data. Springer, London (2010)

    Book  MATH  Google Scholar 

  4. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  5. Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of ACM SIGMOD 2000 International Conference on Management of Data, pp. 93–104 (2000)

    Google Scholar 

  6. Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection for libraries of models. In: Proceedings of ICML, p. 18 (2004)

    Google Scholar 

  7. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15 (2009)

    Article  Google Scholar 

  8. Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: Crisp-Dm 1.0 Step by Step Data Mining Guide. Crisp-DM Consortium (2000)

    Google Scholar 

  9. Crone, S.F., Lessmann, S., Stahlbock, R.: The impact of preprocessing on data mining: an evaluation of classifier sensitivity in direct marketing. Eur. J. Oper. Res. 173(3), 781–800 (2005)

    Article  MathSciNet  Google Scholar 

  10. Engel, J., Gerretzen, J., Szymanka, E., Jeroen, J.J., Downey, G., Blanchet, L., Buydens, L.: Breaking with trends in preprocessing. TrAC Trends in Analytical Chemistry 50, 96–106 (2013)

    Article  Google Scholar 

  11. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: The KDD process for extracting useful knowledge from volumes of data. Commun. ACM 39(11), 27–34 (1996)

    Article  Google Scholar 

  12. Filzmoser, P., Maronna, R., Werner, M.: Outlier identification in high dimensions. Comput. Stat. Data Anal. 52(3), 1694–1711 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  13. Franco-Santos, M., Kennerley, M., Micheli, P., Martinez, V., Mason, S., Marr, B., Gray, D., Neely, A.: Towards a definition of a business performance measurement system. Int. J. Oper. Prod. Manag. 27(8), 784–801 (2007)

    Article  Google Scholar 

  14. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1995)

    Article  MathSciNet  Google Scholar 

  15. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  16. Han, J., Kamber, M., Pei, J.: Data mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2012)

    Book  Google Scholar 

  17. Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A Practical Guide to Support Vector Classification. Taiwan National University, Taipei (2010)

    Google Scholar 

  18. Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)

    Article  MATH  Google Scholar 

  19. Hu, M.-X., Salvucci, S.: A Study of Imputation Algorithms, Institure of Education Science, NCES, New York (1991)

    Google Scholar 

  20. Järvinen, P.: On Research Methods. Opinpajan kirja, Tampere (2012)

    Google Scholar 

  21. Kaplan, R.S., Norton, D.P.: the balanced scorecard – measures that drive performance. Harvard Bus. Rev. 71(1), 71–79 (1992)

    Google Scholar 

  22. Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 249–256 (1992)

    Google Scholar 

  23. Kitchenham, B., Brereton, O.P., Budgen, D., Turner, M., Bailey, J., Linkman, S.: Systematic literature reviews in software engineering - a systematic literature review. J. Inf. Softw. Technol. 51(1), 7–15 (2009)

    Article  Google Scholar 

  24. Kriegel, H.-P., Borgwardt, K.M., Kröger, P., Pryakhin, A., Schubert, M., Zimek, A.: Future trends in data mining. Data Min. Knowl. Disc. 15(1), 87–97 (2007)

    Article  Google Scholar 

  25. Kriegel, H.-P., Kröger, P., Zimek, A.: Outlier detection techniqes. In: 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC (2010)

    Google Scholar 

  26. Kuhn, M., Johnson, K.: Applied Predictive Modeling. Springer, New York (2013)

    Book  MATH  Google Scholar 

  27. Kochanski, A., Perzyk, M., Klebczyk, M.: Knowledge in imperfect data in advances in knowledge representation. In: Ramirez, C. (ed), DOI: 10.5772/37714. http://www.intechopen.com/books/advances-inknowledge-representation/knowledge-in-imperfect-data (2012)

  28. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)

    Article  MATH  Google Scholar 

  29. Kotsiantis, S.B., Kanellopoulos, D., Pintelas, P.E.: Data preprocessing for supervised learning. Int. J. Comput. Sci. 2, 111–117 (2006)

    Google Scholar 

  30. Ludmila, K.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, New Jersey (2004)

    Google Scholar 

  31. Longadge, R., Dongre, S.S., Malik, L.: Class imbalance problem in data mining: review. Int. J. Comput. Sci. Netw. 2(1) (2013)

    Google Scholar 

  32. March, S., Smith, G.: Design and natural science research on information technology. J. Decis. Support Syst. 15(4), 251–266 (1995)

    Article  Google Scholar 

  33. Nørreklit, H.: The balance on the balanced scorecard—a critical analysis of some of its assumptions. Manag. Acc. Res. 11(1), 65–88 (2000)

    Article  Google Scholar 

  34. Peltonen, J.: Dimensionality Reduction. Lecture Series, University of Tampere (2014)

    Google Scholar 

  35. Pyle, D.: Data Preparation for Data Mining. Morgan Kauffman, San Francisco (2003)

    Google Scholar 

  36. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kauffman, San Francisco (1993)

    Google Scholar 

  37. Sadiq, S., Khodabandehloo, Y.N., Induska, M.: 20 Years of data quality research: themes, trends and synergies. In: ADC 2011 Proceedings of the Twenty-Second Australasian Database Conference, vol. 115, pp. 153–162 (2011)

    Google Scholar 

  38. Torgo, L.: Data Mining with R: Learning with Case Studies. CRC Press, Boca Raton (2010)

    Book  Google Scholar 

  39. Vattulainen, M.: A method to improve the predictive power of a business performance measurement system by data preprocessing combinations: two cases in predictive classification of service sales volume from balanced data. In: Ghazawneh, A., Nørbjerg, J., Pries-Heje, J. (eds.) Proceedings of the 37th Information Systems Research Seminar in Scandinavia (IRIS 37), Ringsted, Denmark, pp.10–13 (2014)

    Google Scholar 

  40. Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G., Ng, A., Liu, B., Yu, P.S., Zhou, Z.H., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)

    Article  Google Scholar 

  41. Wand, Y., Wang, R.: Anchoring data quality dimensions in ontological foundations. Commun. ACM 39(11), 86–95 (1996)

    Article  Google Scholar 

  42. Wu, X., Zhu, X., Wu, G.-Q., Ding, W.: Data mining with big data. IEEE Trans. Kowl. Disc. Data Eng. 26(1), 97–107 (2013)

    Google Scholar 

  43. Wolpert, D.: Stacked generalization. Neural Netw. 5, 241–259 (1992)

    Article  Google Scholar 

  44. Yang, Q., Wu, X.: 10 Challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(4), 597–604 (2006)

    Article  Google Scholar 

  45. Zhao, H., Sudra, R.: Entity identification for heterogenous database integration —a multiple classifier system approach and empirical evaluation. Inf. Syst. 30(2), 119–132 (2005)

    Article  Google Scholar 

Download references

Acknowledgements

Professor emeritus Pertti Järvinen, professor Martti Juhola and Dr. Kati Iltanen University of Tampere, Finland. After sales director Jarmo Laamanen Toyota Material Handling Finland, managing director Marko Kukkola Innolink Group, sales director Mika Karjalainen 3StepIt, managing director Olli Vaaranen Papua Merchandising and managing director Sirpa Kauppila Lempesti.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Markus Vattulainen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Vattulainen, M. (2015). Improving the Predictive Power of Business Performance Measurement Systems by Constructed Data Quality Features? Five Cases. In: Perner, P. (eds) Advances in Data Mining: Applications and Theoretical Aspects. ICDM 2015. Lecture Notes in Computer Science(), vol 9165. Springer, Cham. https://doi.org/10.1007/978-3-319-20910-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20910-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20909-8

  • Online ISBN: 978-3-319-20910-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics