Skip to main content

Abstract

Biobanks collect and store items of biological material and provide these resources for medical research together with data associated with these items. In this paper, we contribute to the fundamentals necessary for establishing data quality management for biobanks. We analyse the properties of biobanks which are most important for an adequate data quality management system. We provide a comprehensive description of the concept of quality for biobank data. For this, we state that the quality of the biobank data can be categorized into data item quality and metadata quality and provide the detailed treatment of common data item quality characteristics, in particular, completeness, accuracy, reliability, consistency, timeliness, precision, and provenance. These definitions of data item quality characteristics are a necessary basis for data quality representation and management. The precise definition of these data quality characteristics also required as a necessary basis for integrating data items derived from different sources which is frequently needed for larger medical studies.

This work has been supported by the Austrian Bundesministerium für Bildung, Wissenschaft und Forschung within the project BBMRI.AT (GZ 10.470/0010-V/3c/2018).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. ASQ Quality Glossary. https://asq.org/quality-resources/quality-glossary/d

  2. Batini, C., Scannapieco, M.: Data and Information Quality: Dimensions, Principles and Techniques. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24106-7

    Book  MATH  Google Scholar 

  3. Betsou, F.: Quality assurance and quality control in biobanking. In: Hainaut, P., Vaught, J., Zatloukal, K., Pasterk, M. (eds.) Biobanking of Human Biospecimens, pp. 23–49. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55120-3_2

    Chapter  Google Scholar 

  4. Cao, S., Zhang, G., Liu, P., Zhang, X., Neri, F.: Cloud-assisted secure eHealth systems for tamper-proofing EHR via blockchain. Inf. Sci. 485, 427–440 (2019)

    Article  Google Scholar 

  5. Carter, A., Betsou, F.: Quality assurance in cancer biobanking. Biopreserv. Biobank. 9(2), 157–163 (2011)

    Article  Google Scholar 

  6. Chan, K.S., Fowles, J.B., Weiner, J.P.: Electronic health records and the reliability and validity of quality measures: a review of the literature. Med. Care Res. Rev. 67(5), 503–527 (2010)

    Article  Google Scholar 

  7. Ciglic, M., Eder, J., Koncilia, C.: Anonymization of data sets with null values. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV. LNCS, vol. 9510, pp. 193–220. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49214-7_7

    Chapter  Google Scholar 

  8. Cowie, M.R., et al.: Electronic health records to facilitate clinical research. Clin. Res. Cardiol. 106(1), 1–9 (2017)

    Article  Google Scholar 

  9. Dinov, I.D.: Volume and value of big healthcare data. J. Med. Stat. Inf. 4 (2016)

    Google Scholar 

  10. Dollé, L., Bekaert, S.: High-quality biobanks: pivotal assets for reproducibility of OMICS-data in biomedical translational research. Proteomics 19(21–22), 1800485 (2019)

    Google Scholar 

  11. Eder, J., Dabringer, C., Schicho, M., Stark, K.: Information systems for federated biobanks. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems I. LNCS, vol. 5740, pp. 156–190. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03722-1_7

    Chapter  Google Scholar 

  12. Eder, J., Gottweis, H., Zatloukal, K.: It solutions for privacy protection in biobanking. Public Health Genom. 15(5), 254–262 (2012)

    Article  Google Scholar 

  13. Eder, J., Shekhovtsov, V.A.: Data quality for medical data lakelands. In: Dang, T.K., Küng, J., Takizawa, M., Chung, T.M. (eds.) FDSE 2020. LNCS, vol. 12466, pp. 28–43. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63924-2_2

    Chapter  Google Scholar 

  14. Eder, J., Shekhovtsov, V.A.: Data quality for federated medical data lakes. Int. J. Web Inf. Syst. (2021). Publisher: Emerald Publishing Limited

    Google Scholar 

  15. Estiri, H., Vasey, S., Murphy, S.N.: Generative transfer learning for measuring plausibility of EHR diagnosis records. J. Am. Med. Inform. Assoc. 28, 559–568 (2020)

    Article  Google Scholar 

  16. Feder, S.L.: Data quality in electronic health records research: quality domains and assessment methods. West. J. Nurs. Res. 40(5), 753–766 (2018)

    Article  Google Scholar 

  17. Fougerou-Leurent, C., et al.: Impact of a targeted monitoring on data-quality and data-management workload of randomized controlled trials: a prospective comparative study. Br. J. Clin. Pharmacol. 85(12), 2784–2792 (2019)

    Article  Google Scholar 

  18. Götzinger, M., Anzanpour, A., Azimi, I., TaheriNejad, N., Rahmani, A.M.: Enhancing the self-aware early warning score system through fuzzified data reliability assessment. In: Perego, P., Rahmani, A.M., TaheriNejad, N. (eds.) MobiHealth 2017. LNICST, vol. 247, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98551-0_1

    Chapter  Google Scholar 

  19. Houston, L., Probst, Y., Humphries, A.: Measuring data quality through a source data verification audit in a clinical research setting. Stud. Health Technol. Inform. 214, 107–13 (2015)

    Google Scholar 

  20. Houston, L., Probst, Y., Yu, P., Martin, A.: Exploring data quality management within clinical trials. Appl. Clin. Inform. 9(01), 072–081 (2018)

    Article  Google Scholar 

  21. Huzooree, G., Khedo, K.K., Joonas, N.: Data reliability and quality in body area networks for diabetes monitoring. In: Maheswar, R., Kanagachidambaresan, G.R., Jayaparvathy, R., Thampi, S.M. (eds.) Body Area Network Challenges and Solutions. EICC, pp. 55–86. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-00865-9_4

    Chapter  Google Scholar 

  22. Jetley, G., Zhang, H.: Electronic health records in IS research: quality issues, essential thresholds and remedial actions. Decis. Support Syst. 126, 113137 (2019)

    Article  Google Scholar 

  23. Karimi-Busheri, F., Rasouli-Nia, A.: Integration, networking, and global biobanking in the age of new biology. In: Karimi-Busheri, F. (ed.) Biobanking in the 21st Century. AEMB, vol. 864, pp. 1–9. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20579-3_1

    Chapter  Google Scholar 

  24. Kaschek, R., Pavlov, R., Shekhovtsov, V.A., Zlatkin, S.: Characterization and tool supported selection of business process modeling methodologies. In: Abramowicz, W., Mayr, H.C. (eds.) Technologies for Business Information Systems, pp. 25–37. Springer, Dordrecht (2007). https://doi.org/10.1007/1-4020-5634-6

    Chapter  Google Scholar 

  25. Kerr, K.A., Norris, T., Stockdale, R.: The strategic management of data quality in healthcare. Health Informatics J. 14(4), 259–266 (2008)

    Article  Google Scholar 

  26. Király, P., Büchler, M.: Measuring completeness as metadata quality metric in Europeana. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 2711–2720. IEEE (2018)

    Google Scholar 

  27. Kyriacou, D.N.: Reliability and validity of diagnostic tests. Acad. Emerg. Med. 8(4), 404–405 (2001)

    Article  Google Scholar 

  28. Langseth, H., Luostarinen, T., Bray, F., Dillner, J.: Ensuring quality in studies linking cancer registries and biobanks. Acta Oncol. 49(3), 368–377 (2010)

    Article  Google Scholar 

  29. Lee, D., Jiang, X., Yu, H.: Harmonized representation learning on dynamic EHR graphs. J. Biomed. Inform. 106, 103426 (2020)

    Article  Google Scholar 

  30. Liu, C., Talaei-Khoei, A., Zowghi, D., Daniel, J.: Data completeness in healthcare: a literature survey. Pac. Asia J. Assoc. Inf. Syst. 9(2) (2017). ISBN 1943-7544

    Google Scholar 

  31. Liu, C., Zowghi, D., Talaei-Khoei, A., Daniel, J.: Achieving data completeness in electronic medical records: a conceptual model and hypotheses development. In: Proceedings of the 51st Hawaii International Conference on System Sciences (2018)

    Google Scholar 

  32. Mandrekar, J.N.: Simple statistical measures for diagnostic accuracy assessment. J. Thorac. Oncol. 5(6), 763–764 (2010)

    Article  Google Scholar 

  33. Margaritopoulos, M., Margaritopoulos, T., Mavridis, I., Manitsaris, A.: Quantifying and measuring metadata completeness. J. Am. Soc. Inform. Sci. Technol. 63(4), 724–737 (2012)

    Article  Google Scholar 

  34. Mayrhofer, M.T., Holub, P., Wutte, A., Litton, J.E.: BBMRI-ERIC: the novel gateway to biobanks. Bundesgesundheitsblatt-Gesundheitsforschung-Gesundheitsschutz 59(3), 379–384 (2016)

    Google Scholar 

  35. Müller, H., Dagher, G., Loibner, M., Stumptner, C., Kungl, P., Zatloukal, K.: Biobanks for life sciences and personalized medicine: importance of standardization, biosafety, biosecurity, and data management. Curr. Opin. Biotechnol. 65, 45–51 (2020)

    Article  Google Scholar 

  36. Nahm, M.: Data quality in clinical research. In: Richesson, R., Andrews, J. (eds.) Clinical Research Informatics, pp. 175–201. Springer, London (2012). https://doi.org/10.1007/978-1-84882-448-5_10

    Chapter  Google Scholar 

  37. Olson, J.E.: Data Quality: The Accuracy Dimension. Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  38. Pantazos, K., Lauesen, S., Lippert, S.: De-identifying an EHR database-anonymity, correctness and readability of the medical record. In: MIE, pp. 862–866 (2011)

    Google Scholar 

  39. Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Commun. ACM 45(4), 211–218 (2002)

    Article  Google Scholar 

  40. Quinlan, P.R., Gardner, S., Groves, M., Emes, R., Garibaldi, J.: A data-centric strategy for modern biobanking. In: Karimi-Busheri, F. (ed.) Biobanking in the 21st Century. AEMB, vol. 864, pp. 165–169. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20579-3_13

    Chapter  Google Scholar 

  41. Ranasinghe, S., Pichler, H., Eder, J.: Report on data quality in biobanks: problems, issues, state-of-the-art. arXiv preprint 1812.10423 (2018)

    Google Scholar 

  42. Saaty, T.L.: Decision making with the analytic hierarchy process. Int. J. Serv. Sci. 1(1), 83–98 (2008)

    Google Scholar 

  43. Saaty, T.L., Vargas, L.G.: Decision Making with the Analytic Network Process, vol. 282. Springer, Boston (2006). https://doi.org/10.1007/978-1-4614-7279-7

    Book  MATH  Google Scholar 

  44. Salati, M., et al.: Task-independent metrics to assess the data quality of medical registries using the European Society of Thoracic Surgeons (ESTS) Database. Eur. J. Cardiothorac. Surg. 40(1), 91–98 (2011)

    Article  Google Scholar 

  45. Stark, K., Eder, J., Zatloukal, K.: Priority-based k-anonymity accomplished by weighted generalisation structures. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 394–404. Springer, Heidelberg (2006). https://doi.org/10.1007/11823728_38

    Chapter  Google Scholar 

  46. Stark, K., Koncilia, C., Schulte, J., Schikuta, E., Eder, J.: Incorporating data provenance in a medical CSCW system. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds.) DEXA 2010. LNCS, vol. 6261, pp. 315–322. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15364-8_26

    Chapter  Google Scholar 

  47. Staroselsky, M., et al.: Improving electronic health record (EHR) accuracy and increasing compliance with health maintenance clinical guidelines through patient access and input. Int. J. Med. Informatics 75(10–11), 693–700 (2006)

    Article  Google Scholar 

  48. Stvilia, B., Gasser, L., Twidale, M.B., Shreeves, S.L., Cole, T.W.: Metadata quality for federated collections. In: Proceedings of the Ninth International Conference on Information Quality (ICIQ 2004), pp. 111–125 (2004)

    Google Scholar 

  49. Weiskopf, N.G., Hripcsak, G., Swaminathan, S., Weng, C.: Defining and measuring completeness of electronic health records for secondary use. J. Biomed. Inform. 46(5), 830–836 (2013)

    Article  Google Scholar 

  50. Weiskopf, N.G., Rusanov, A., Weng, C.: Sick patients have more data: the non-random completeness of electronic health records. In: AMIA Annual Symposium Proceedings, vol. 2013, p. 1472. American Medical Informatics Association (2013)

    Google Scholar 

  51. Weiskopf, N.G., Weng, C.: Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J. Am. Med. Inform. Assoc. 20(1), 144–151 (2013)

    Article  Google Scholar 

  52. Zúñiga, F., Blatter, C., Wicki, R., Simon, M.: National quality indicators in Swiss nursing homes: questionnaire survey on data reliability and users’ view on the usefulness. Z. Gerontol. Geriatr. 52(8), 730–736 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johann Eder .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Shekhovtsov, V.A., Eder, J. (2021). Data Item Quality for Biobanks. In: Hameurlain, A., Tjoa, A.M. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems L. Lecture Notes in Computer Science(), vol 12930. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-64553-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-64553-6_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-64552-9

  • Online ISBN: 978-3-662-64553-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics