The data quality improvement plan: deciding on choice and sequence of data quality improvements

Kleindienst, Dominikus

doi:10.1007/s12525-017-0245-6

The data quality improvement plan: deciding on choice and sequence of data quality improvements

Research Paper
Published: 14 January 2017

Volume 27, pages 387–398, (2017)
Cite this article

Electronic Markets Aims and scope Submit manuscript

Dominikus Kleindienst ORCID: orcid.org/0000-0002-4289-5397¹

925 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

With the rapid growth in the amount of data generated worldwide, ensuring adequate data quality (DQ) is increasingly becoming a challenge for companies: data are, among others, required to be timely, complete, consistent, valid, and accessible. Given this multidimensionality, DQ improvements (DQIs) need to be purposefully chosen and –as there can be path dependencies– arranged in an optimal sequence. Thus, this research contributes to performing the complex multidimensional task of ensuring adequate DQ in an economically reasonable manner by providing a formal decision model for identifying an optimal data quality improvement plan (DQIP). This DQIP comprises both an economically reasonable selection and execution sequence of DQIs based on existing interrelationships between different DQ dimensions. Furthermore, a comprehensive Monte Carlo simulation provides insights in implications to put the decision model into operation. For practitioners, the decision model enables efficient allocation of resources to DQIs. The model also gives advice on how to sequence DQIs and attracts attention to the complex problem context of DQ in order to support valid managerial decisions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Business Process and Organizational Data Quality Model (BPODQM) for Integrated Process and Data Mining

Maturing Pay-as-you-go Data Quality Management: Towards Decision Support for Paying the Larger Bills

Prologue: Research and Practice in Data Quality Management

Notes

Weights, $ {w}_{d_i} $, are not varied, as this would not provide adequate informative value.
This proportionality holds for a +/−1% or +/− 5% variation too.

References

Ballou, D. P., & Pazer, H. L. (1995). Designing information systems to optimize the accuracy-timeliness tradeoff. Information Systems Research, 6(1), 51–72.
Article Google Scholar
Ballou, D. P., & Pazer, H. L. (2003). Modeling completeness versus consistency tradeoffs in information decision contexts. IEEE Transactions on Knowledge and Data Engineering, 15(1), 240–243.
Article Google Scholar
Ballou, D. P., & Tayi, G. K. (1989). Methodology for allocating resources for data quality enhancement. Communications of the ACM, 32(3), 320–329.
Article Google Scholar
Ballou, D. P., & Tayi, G. K. (1999). Enhancing data quality in data warehouse environments. Communications of the ACM, 42(1), 73–78.
Article Google Scholar
Ballou, D. P., Wang, R. Y., Pazer, H. L., & Tayi, G. K. (1998). Modeling information manufacturing systems to determine information product quality. Management Science, 44(4), 462–484.
Article Google Scholar
Barse, E. L., Kvarnström, H., & Jonsson, E. (2003). Synthesizing test data for fraud detection systems. Proceedings of the 19th Annual Computer Security Applications Conference, Las Vegas, NV, (USA). 384–395.
Batini, C., & Scannapieco, M. (2006). Data quality. Concepts, methodologies and techniques (data-centric systems and applications) (1st ed.). Berlin: Springer.
Google Scholar
De Amicis, F., Barone, D., & Batini, C. (2006). An analytical framework to analyze dependencies among data quality dimensions. Proceedings of the 11th International Conference on Information Quality, Cambridge, MA, (USA). 369–383.
Even, A., & Kaiser, M. (2009). A framework for economics-driven assessment of data quality decisions. Proceedings of the Fifteenth Americas Conference on Information Systems. San Francisco, California. Paper 436.
Even, A., & Shankaranarayanan, G. (2007). Utility-driven assessment of data quality. The DATA BASE for Advances in Information Systems, 38(2), 75–93.
Article Google Scholar
Fisher, C. W., Chengalur-Smith, I. N., & Ballou, D. P. (2003). The impact of experience and time on the use of data quality information in decision making. Information Systems Research, 14(2), 170–188.
Article Google Scholar
Fishman, G. S. (1996). Monte Carlo; concepts, algorithms, and applications. New York [u.a.]: Springer.
Forrester Research. (2011). Trends in data quality and business process alignment. Cambridge (USA).
Fridgen, G., & Müller, H. (2011). An approach for portfolio selection in multi-vendor IT outsourcing. Proceedings of the 32nd International Conference on Information Systems (ICIS), Shanghai, China.
Gackowski, Z. J. (2004). Logical interdependence of data/information quality dimensions—A purpose-focused view on IQ. Proceedings of the Ninth International Conference on Information Quality (ICIQ 2004), Cambridge, MA, (USA).
Gelman, I. A. (2010). Setting priorities for data accuracy improvements in satisficing decision-makingscenarios: a guiding theory. Decision Support Systems, 48(4), 507–520.
Article Google Scholar
Gelman, I. A. (2012). A model of error propagation in conjunctive decisions and its application to database quality management. Journal of Database Management, 23(1), 103–126.
Article Google Scholar
Harris Interactive. (2006). Information workers beware: Your business data can't be trusted. Retrieved 10/13, 2008, from http://www.sap.com/about/newsroom/businessobjects/20060625_005028.epx
Heinrich, B., Kaiser, M., & Klier, M. (2007a). How to measure data quality? – a metric based approach. Proceedings of the 28th International Conference on Information Systems (ICIS), Montreal, (Canada).
Heinrich, B., Kaiser, M., & Klier, M. (2007b). Metrics for measuring data quality – foundations for an economic data quality management. 2nd International Conference on Software and Data Technologies (ICSOFT), Barcelona, (Spain).
Heinrich, B., Kaiser, M., & Klier, M. (2009). A procedure to develop metrics for currency and its application in CRM. ACM Journal of Data and Information Quality, 1(1), 5:1–5:28.
Google Scholar
Heinrich, B., & Klier, M. (2011). Assessing data currency — a probabilistic approach. Journal of Information Science, 37(1), 86–100.
Article Google Scholar
Helfert, M., Foley, O., Ge, M., & Cappiello, C. (2009). Limitations of weighted sum measures for information quality. San Francisco, CA, (USA).
Hevner, A. R., March, S. T., Park, J., & Ram, S. (2004). Design science in information systems research. Management Information Systems Quarterly, 28(1), 75–106.
Google Scholar
Hüner, K. H., Schierning, A., Otto, B., & Österle, H. (2011). Product data quality in supply chains: the case of beiersdorf. Electronic Markets, 21, 141–154.
Article Google Scholar
Jiang, Z., Sarkar, S., De, P., & Dey, D. (2007). A framework for reconciling attribute values from multiple data sources. Management Science, 53(12), 1946–1963.
Article Google Scholar
Lee, Y. W., Strong, D. M., Kahn, B. K., & Wang, R. Y. (2002). AIMQ: a methodology for information quality assessment. Information & Management, 40(2), 133–146.
Article Google Scholar
Orr, K. (1998). Data quality and systems theory. Communications of the ACM, 41(2), 66–71.
Article Google Scholar
Parssian, A., Sarkar, S., & Jacob, V. S. (2004). Assessing data quality for information products: impact of selection, projection, and cartesian product. Management Science, 50(7), 967–982.
Article Google Scholar
Pipino, L., Lee, Y. W., & Wang, R. Y. (2002). Data quality assessment. Communications of the ACM, 45(4), 211–218.
Article Google Scholar
Provost, F., & Fawcett, T. (2013). Data science and its relationship to big data and data-driven decision making. Big Data, 1(1), 51–59.
Article Google Scholar
Radant, O., Colomo-Palacios, R., & Stantchev, V. (2014). Analysis of reasons, implications and consequences of demographic change for IT departments in times of scarcity of talent: a systematic review. International Journal of Knowledge Management, 10(4), 1–15.
Article Google Scholar
Redman, T. C. (2004). Data: An unfolding quality disaster. DM Review.
Google Scholar
Russom, P. (2006). Taking data quality to the enterprise through data governance. Seattle: The Data Warehousing Institute.
Google Scholar
Sawilowsky, S., & Fahoome, G. C. (2002). Statistics through Monte Carlo simulation with fortran. Rochester Hills: JMASM.
Google Scholar
Shah, S., Horne, A., & Capellá, J. (2012). Good data won't guarantee good decisions. Harvard Business Review, 90(4), 23–25.
Google Scholar
Vera-Baquero, A., Colomo-Palacios, R., Stantchev, V., & Molloy, O. (2015). Leveraging big-data for business process analytics. The Learning Organization., 22(4), 215–228.
Article Google Scholar
Wand, Y., & Wang, R. Y. (1996). Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39(11), 86–95.
Article Google Scholar
Wang, R. Y. (1998). A product perspective on total data quality management. Communications of the ACM, 41(2), 58–65.
Article Google Scholar
Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems, 12(4), 5–33.
Article Google Scholar

Download references

Acknowledgments

Supportive inputs and helpful comments by Dr. Quirin Görz on an earlier version of this paper are gratefully acknowledged.

Author information

Authors and Affiliations

FIM Research Center, Universitätsstraße 12, 86159, Augsburg, Germany
Dominikus Kleindienst

Authors

Dominikus Kleindienst
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dominikus Kleindienst.

Additional information

Responsible Editor: Hans-Dieter Zimmermann

Appendix

Supplementing the chapter “Development of the Decision Model

”, a more detailed derivation of the objective function is given in the following.

The decision to consider a DQI $ {v}_j^p $ in a DQIP is described formally using the decision variable $ {x}_{v_j^p} $, whereas the binary variable $ {x}_{v_j^p} $ is equal to 1 if DQI $ {v}_j^p $ is part of the DQIP, and 0 if not. Thus, the resulting DQ level $ {Q}_{d_{i,}{v}_j}^p $ of DQ dimension d _i, after applying DQI $ {v}_j^p $, is calculated in the following manner:

$$ {Q}_{d_{i,}{v}_j}^p={Q}_{d_{i,}{v}_j}^{p-1}+\left\{\begin{array}{l}\left(1-{Q}_{d_{i,}{v}_j}^{p-1}\right)\cdot {I}_{d_i,{v}_j^p}\cdot {x}_{v_j^p}\kern0.5em if\kern0.5em {I}_{d_i,{v}_j^p}\ge 0\\ {}{Q}_{d_{i,}{v}_j}^{p-1}\cdot {I}_{d_i,{v}_j^p}\cdot {x}_{v_j^p}\kern0.5em if\kern0.5em {I}_{d_i,{v}_j^p}<0\end{array}\right.. $$

After remodelling and, for mathematical reasons, introducing the substitute variables $ {f}_{d_i,{v}_j^p} $ and $ {s}_{d_i,{v}_j^p} $, the resulting DQ level $ {Q}_{d_{i,}{v}_j}^p $ can also be written as

$$ {Q}_{d_{i,}{v}_j}^p={Q}_{d_{i,}{v}_j}^{p-1}\cdot {f}_{d_{i,}{v}_j^p}+{s}_{d_{i,}{v}_j^p,}\kern0.5em \mathrm{with}\ {f}_{d_{i,}{v}_j^p}=\left\{\begin{array}{l}1-{I}_{d_{i,}{v}_j^p}\cdot {x}_{v_j^p}\; if\;{I}_{d_{i,}{v}_j^p}\ge 0\\ {}1+{I}_{d_{i,}{v}_j^p}\cdot {x}_{v_j^p}\; if\;{I}_{d_{i,}{v}_j^p}<0\end{array}\right.\mathrm{and}\ {s}_{d_i,{v}_j^p}=\left\{\begin{array}{l}{I}_{d_{i,}{v}_j^p}\cdot {x}_{v_j^p}\; if\;{I}_{d_{i,}{v}_j^p}\ge 0\\ {}\begin{array}{cc}\hfill 0\hfill & \hfill if\;{I}_{d_{i,}{v}_j^p}<0\hfill \end{array}\end{array}\right.. $$

Based on this, the DQ level $ {Q}_{d_i}^m $ for dimension d _i, after applying a complete DQIP for m DQIs $ {v}_j^p $ is calculated in the following manner:

$$ {\mathrm{Q}}_{{\mathrm{d}}_{\mathrm{i}}}^{\mathrm{m}}=\left(\left(\left(\dots \left(\left({\mathbf{Q}}_{{\mathbf{d}}_{\mathbf{i}}}^{\mathbf{p}=0} \cdot {\mathbf{f}}_{{\mathbf{d}}_{\mathbf{i}},{\mathbf{v}}_{\mathbf{j}}^{\mathbf{p}=1}}+{\mathbf{s}}_{{\mathbf{d}}_{\mathbf{i}},{\mathbf{v}}_{\mathbf{j}}^{\mathbf{p}=1}}\right) \cdot {\mathbf{f}}_{{\mathbf{d}}_{\mathbf{i}},{\mathbf{v}}_{\mathbf{j}}^{\mathbf{p}=2}}+{\mathbf{s}}_{{\mathbf{d}}_{\mathbf{i}},{\mathbf{v}}_{\mathbf{j}}^{\mathbf{p}=2}}\right)\cdot \dots \right) \cdot {\mathbf{f}}_{{\mathbf{d}}_{\mathbf{i}},{\mathbf{v}}_{\mathbf{j}}^{\mathbf{p}=\mathbf{m}-1}}+{\mathbf{s}}_{{\mathbf{d}}_{\mathbf{i}},{\mathbf{v}}_{\mathbf{j}}^{\mathbf{p}=\mathbf{m}-1}}\right) \cdot {\mathbf{f}}_{{\mathbf{d}}_{\mathbf{i}},{\mathbf{v}}_{\mathbf{j}}^{\mathbf{p}=\mathbf{m}}}+{\mathbf{s}}_{{\mathbf{d}}_{\mathbf{i}},{\mathbf{v}}_{\mathbf{j}}^{\mathbf{p}=\mathbf{m}}}\right) $$

The binary variables $ {x}_{v_j^p} $, that are part of $ {f}_{d_i,{v}_j^p} $ and $ {s}_{d_i,{v}_j^p} $, neutralize the effect of DQI $ {v}_j^p $ by taking the value 0 if a specific DQI $ {v}_j^p $ is not part of the DQIP. Although term (3) contains all m DQIs $ {v}_j^p $, a DQI selection is possible through this neutralization. Knowing the DQ level $ {Q}_{d_i}^m $ for each DQ dimension d _i, the overall DQ level can be calculated. As there can be context-dependent differences between the DQ dimensions, an allocation of weights to different DQ dimensions is reasonable. Therefore, we use a weight $ {w}_{d_i} $ (with $ \sum_{i=1}^n{w}_{d_i}=1 $) to weight the DQ dimensions in our decision model.

$$ {\sum}_{i=1}^n{w}_{d_i}\cdot {Q}_{d_i}^m $$

In contrast to former approaches that have been criticized in literature for not considering interrelationships when aggregating DQ dimensions to an overall DQ level, in this decision model, interrelationships between the DQ dimensions d _i are implicitly considered by the impacts $ {I}_{d_i,{v}_j^p} $.

Since all DQIs are applied on the same dataset, which has a fixed size, and a DQIP is realized within one period (cf. A.3), the costs $ {c}_{v_j} $ for applying DQI v _j are fixed. As a result, the overall DQIP costs are

$$ {\sum}_{j=1}^m{c}_{v_j}\cdot {x}_{v_j}. $$

According to assumption A.8, the costs of a DQIP must not exceed budget B; thus, the budget constraint for the decision model is

$$ {\sum}_{j=1}^m{c}_{v_j}\cdot {x}_{v_j}\le \mathrm{B} $$

In order to allocate a given budget in an economically reasonable manner to an available set of DQIs, all possible DQIPs need to be compared to each other. As a comparison criterion, we calculate the respective effectiveness E ^p = m of a DQIP in the way described in the chapter “Development of the Decision Model”. The objective function maximizes the effectiveness of a DQIP in the following manner:

$$ \mathrm{Maximize}{E}^{p=m}=\frac{\sum_{i=1}^n{w}_{d_i}\cdot \left({Q}_{d_i}^m-{Q}_{d_i}^{p=0}\right)}{1-{\sum}_{i=1}^n{w}_{d_i}\cdot {Q}_{d_i}^{p=0}}\mathrm{subject}\ \mathrm{t}\mathrm{o}\ {\sum}_{j=1}^m{c}_{v_j}\cdot {x}_{v_j}\le \mathrm{B} $$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kleindienst, D. The data quality improvement plan: deciding on choice and sequence of data quality improvements. Electron Markets 27, 387–398 (2017). https://doi.org/10.1007/s12525-017-0245-6

Download citation

Received: 18 December 2015
Accepted: 03 January 2017
Published: 14 January 2017
Issue Date: November 2017
DOI: https://doi.org/10.1007/s12525-017-0245-6

Keywords

Jel Classification

M29

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The data quality improvement plan: deciding on choice and sequence of data quality improvements

Abstract

Access this article

Similar content being viewed by others

Business Process and Organizational Data Quality Model (BPODQM) for Integrated Process and Data Mining

Maturing Pay-as-you-go Data Quality Management: Towards Decision Support for Paying the Larger Bills

Prologue: Research and Practice in Data Quality Management

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Jel Classification

Navigation

The data quality improvement plan: deciding on choice and sequence of data quality improvements

Abstract

Access this article

Similar content being viewed by others

Business Process and Organizational Data Quality Model (BPODQM) for Integrated Process and Data Mining

Maturing Pay-as-you-go Data Quality Management: Towards Decision Support for Paying the Larger Bills

Prologue: Research and Practice in Data Quality Management

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Jel Classification

Search

Navigation