Skip to main content

Data Item Quality for Biobanks

  • 96 Accesses

Part of the Lecture Notes in Computer Science book series (TLDKS,volume 12930)

Abstract

Biobanks collect and store items of biological material and provide these resources for medical research together with data associated with these items. In this paper, we contribute to the fundamentals necessary for establishing data quality management for biobanks. We analyse the properties of biobanks which are most important for an adequate data quality management system. We provide a comprehensive description of the concept of quality for biobank data. For this, we state that the quality of the biobank data can be categorized into data item quality and metadata quality and provide the detailed treatment of common data item quality characteristics, in particular, completeness, accuracy, reliability, consistency, timeliness, precision, and provenance. These definitions of data item quality characteristics are a necessary basis for data quality representation and management. The precise definition of these data quality characteristics also required as a necessary basis for integrating data items derived from different sources which is frequently needed for larger medical studies.

This work has been supported by the Austrian Bundesministerium für Bildung, Wissenschaft und Forschung within the project BBMRI.AT (GZ 10.470/0010-V/3c/2018).

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-662-64553-6_5
  • Chapter length: 39 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-662-64553-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   74.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.

References

  1. ASQ Quality Glossary. https://asq.org/quality-resources/quality-glossary/d

  2. Batini, C., Scannapieco, M.: Data and Information Quality: Dimensions, Principles and Techniques. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24106-7

    CrossRef  MATH  Google Scholar 

  3. Betsou, F.: Quality assurance and quality control in biobanking. In: Hainaut, P., Vaught, J., Zatloukal, K., Pasterk, M. (eds.) Biobanking of Human Biospecimens, pp. 23–49. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55120-3_2

    CrossRef  Google Scholar 

  4. Cao, S., Zhang, G., Liu, P., Zhang, X., Neri, F.: Cloud-assisted secure eHealth systems for tamper-proofing EHR via blockchain. Inf. Sci. 485, 427–440 (2019)

    CrossRef  Google Scholar 

  5. Carter, A., Betsou, F.: Quality assurance in cancer biobanking. Biopreserv. Biobank. 9(2), 157–163 (2011)

    CrossRef  Google Scholar 

  6. Chan, K.S., Fowles, J.B., Weiner, J.P.: Electronic health records and the reliability and validity of quality measures: a review of the literature. Med. Care Res. Rev. 67(5), 503–527 (2010)

    CrossRef  Google Scholar 

  7. Ciglic, M., Eder, J., Koncilia, C.: Anonymization of data sets with null values. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV. LNCS, vol. 9510, pp. 193–220. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49214-7_7

    CrossRef  Google Scholar 

  8. Cowie, M.R., et al.: Electronic health records to facilitate clinical research. Clin. Res. Cardiol. 106(1), 1–9 (2017)

    CrossRef  Google Scholar 

  9. Dinov, I.D.: Volume and value of big healthcare data. J. Med. Stat. Inf. 4 (2016)

    Google Scholar 

  10. Dollé, L., Bekaert, S.: High-quality biobanks: pivotal assets for reproducibility of OMICS-data in biomedical translational research. Proteomics 19(21–22), 1800485 (2019)

    Google Scholar 

  11. Eder, J., Dabringer, C., Schicho, M., Stark, K.: Information systems for federated biobanks. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems I. LNCS, vol. 5740, pp. 156–190. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03722-1_7

    CrossRef  Google Scholar 

  12. Eder, J., Gottweis, H., Zatloukal, K.: It solutions for privacy protection in biobanking. Public Health Genom. 15(5), 254–262 (2012)

    CrossRef  Google Scholar 

  13. Eder, J., Shekhovtsov, V.A.: Data quality for medical data lakelands. In: Dang, T.K., Küng, J., Takizawa, M., Chung, T.M. (eds.) FDSE 2020. LNCS, vol. 12466, pp. 28–43. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63924-2_2

    CrossRef  Google Scholar 

  14. Eder, J., Shekhovtsov, V.A.: Data quality for federated medical data lakes. Int. J. Web Inf. Syst. (2021). Publisher: Emerald Publishing Limited

    Google Scholar 

  15. Estiri, H., Vasey, S., Murphy, S.N.: Generative transfer learning for measuring plausibility of EHR diagnosis records. J. Am. Med. Inform. Assoc. 28, 559–568 (2020)

    CrossRef  Google Scholar 

  16. Feder, S.L.: Data quality in electronic health records research: quality domains and assessment methods. West. J. Nurs. Res. 40(5), 753–766 (2018)

    CrossRef  Google Scholar 

  17. Fougerou-Leurent, C., et al.: Impact of a targeted monitoring on data-quality and data-management workload of randomized controlled trials: a prospective comparative study. Br. J. Clin. Pharmacol. 85(12), 2784–2792 (2019)

    CrossRef  Google Scholar 

  18. Götzinger, M., Anzanpour, A., Azimi, I., TaheriNejad, N., Rahmani, A.M.: Enhancing the self-aware early warning score system through fuzzified data reliability assessment. In: Perego, P., Rahmani, A.M., TaheriNejad, N. (eds.) MobiHealth 2017. LNICST, vol. 247, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98551-0_1

    CrossRef  Google Scholar 

  19. Houston, L., Probst, Y., Humphries, A.: Measuring data quality through a source data verification audit in a clinical research setting. Stud. Health Technol. Inform. 214, 107–13 (2015)

    Google Scholar 

  20. Houston, L., Probst, Y., Yu, P., Martin, A.: Exploring data quality management within clinical trials. Appl. Clin. Inform. 9(01), 072–081 (2018)

    CrossRef  Google Scholar 

  21. Huzooree, G., Khedo, K.K., Joonas, N.: Data reliability and quality in body area networks for diabetes monitoring. In: Maheswar, R., Kanagachidambaresan, G.R., Jayaparvathy, R., Thampi, S.M. (eds.) Body Area Network Challenges and Solutions. EICC, pp. 55–86. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-00865-9_4

    CrossRef  Google Scholar 

  22. Jetley, G., Zhang, H.: Electronic health records in IS research: quality issues, essential thresholds and remedial actions. Decis. Support Syst. 126, 113137 (2019)

    CrossRef  Google Scholar 

  23. Karimi-Busheri, F., Rasouli-Nia, A.: Integration, networking, and global biobanking in the age of new biology. In: Karimi-Busheri, F. (ed.) Biobanking in the 21st Century. AEMB, vol. 864, pp. 1–9. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20579-3_1

    CrossRef  Google Scholar 

  24. Kaschek, R., Pavlov, R., Shekhovtsov, V.A., Zlatkin, S.: Characterization and tool supported selection of business process modeling methodologies. In: Abramowicz, W., Mayr, H.C. (eds.) Technologies for Business Information Systems, pp. 25–37. Springer, Dordrecht (2007). https://doi.org/10.1007/1-4020-5634-6

    CrossRef  Google Scholar 

  25. Kerr, K.A., Norris, T., Stockdale, R.: The strategic management of data quality in healthcare. Health Informatics J. 14(4), 259–266 (2008)

    CrossRef  Google Scholar 

  26. Király, P., Büchler, M.: Measuring completeness as metadata quality metric in Europeana. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 2711–2720. IEEE (2018)

    Google Scholar 

  27. Kyriacou, D.N.: Reliability and validity of diagnostic tests. Acad. Emerg. Med. 8(4), 404–405 (2001)

    CrossRef  Google Scholar 

  28. Langseth, H., Luostarinen, T., Bray, F., Dillner, J.: Ensuring quality in studies linking cancer registries and biobanks. Acta Oncol. 49(3), 368–377 (2010)

    CrossRef  Google Scholar 

  29. Lee, D., Jiang, X., Yu, H.: Harmonized representation learning on dynamic EHR graphs. J. Biomed. Inform. 106, 103426 (2020)

    CrossRef  Google Scholar 

  30. Liu, C., Talaei-Khoei, A., Zowghi, D., Daniel, J.: Data completeness in healthcare: a literature survey. Pac. Asia J. Assoc. Inf. Syst. 9(2) (2017). ISBN 1943-7544

    Google Scholar 

  31. Liu, C., Zowghi, D., Talaei-Khoei, A., Daniel, J.: Achieving data completeness in electronic medical records: a conceptual model and hypotheses development. In: Proceedings of the 51st Hawaii International Conference on System Sciences (2018)

    Google Scholar 

  32. Mandrekar, J.N.: Simple statistical measures for diagnostic accuracy assessment. J. Thorac. Oncol. 5(6), 763–764 (2010)

    CrossRef  Google Scholar 

  33. Margaritopoulos, M., Margaritopoulos, T., Mavridis, I., Manitsaris, A.: Quantifying and measuring metadata completeness. J. Am. Soc. Inform. Sci. Technol. 63(4), 724–737 (2012)

    CrossRef  Google Scholar 

  34. Mayrhofer, M.T., Holub, P., Wutte, A., Litton, J.E.: BBMRI-ERIC: the novel gateway to biobanks. Bundesgesundheitsblatt-Gesundheitsforschung-Gesundheitsschutz 59(3), 379–384 (2016)

    Google Scholar 

  35. Müller, H., Dagher, G., Loibner, M., Stumptner, C., Kungl, P., Zatloukal, K.: Biobanks for life sciences and personalized medicine: importance of standardization, biosafety, biosecurity, and data management. Curr. Opin. Biotechnol. 65, 45–51 (2020)

    CrossRef  Google Scholar 

  36. Nahm, M.: Data quality in clinical research. In: Richesson, R., Andrews, J. (eds.) Clinical Research Informatics, pp. 175–201. Springer, London (2012). https://doi.org/10.1007/978-1-84882-448-5_10

    CrossRef  Google Scholar 

  37. Olson, J.E.: Data Quality: The Accuracy Dimension. Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  38. Pantazos, K., Lauesen, S., Lippert, S.: De-identifying an EHR database-anonymity, correctness and readability of the medical record. In: MIE, pp. 862–866 (2011)

    Google Scholar 

  39. Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Commun. ACM 45(4), 211–218 (2002)

    CrossRef  Google Scholar 

  40. Quinlan, P.R., Gardner, S., Groves, M., Emes, R., Garibaldi, J.: A data-centric strategy for modern biobanking. In: Karimi-Busheri, F. (ed.) Biobanking in the 21st Century. AEMB, vol. 864, pp. 165–169. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20579-3_13

    CrossRef  Google Scholar 

  41. Ranasinghe, S., Pichler, H., Eder, J.: Report on data quality in biobanks: problems, issues, state-of-the-art. arXiv preprint 1812.10423 (2018)

    Google Scholar 

  42. Saaty, T.L.: Decision making with the analytic hierarchy process. Int. J. Serv. Sci. 1(1), 83–98 (2008)

    Google Scholar 

  43. Saaty, T.L., Vargas, L.G.: Decision Making with the Analytic Network Process, vol. 282. Springer, Boston (2006). https://doi.org/10.1007/978-1-4614-7279-7

    CrossRef  MATH  Google Scholar 

  44. Salati, M., et al.: Task-independent metrics to assess the data quality of medical registries using the European Society of Thoracic Surgeons (ESTS) Database. Eur. J. Cardiothorac. Surg. 40(1), 91–98 (2011)

    CrossRef  Google Scholar 

  45. Stark, K., Eder, J., Zatloukal, K.: Priority-based k-anonymity accomplished by weighted generalisation structures. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 394–404. Springer, Heidelberg (2006). https://doi.org/10.1007/11823728_38

    CrossRef  Google Scholar 

  46. Stark, K., Koncilia, C., Schulte, J., Schikuta, E., Eder, J.: Incorporating data provenance in a medical CSCW system. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds.) DEXA 2010. LNCS, vol. 6261, pp. 315–322. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15364-8_26

    CrossRef  Google Scholar 

  47. Staroselsky, M., et al.: Improving electronic health record (EHR) accuracy and increasing compliance with health maintenance clinical guidelines through patient access and input. Int. J. Med. Informatics 75(10–11), 693–700 (2006)

    CrossRef  Google Scholar 

  48. Stvilia, B., Gasser, L., Twidale, M.B., Shreeves, S.L., Cole, T.W.: Metadata quality for federated collections. In: Proceedings of the Ninth International Conference on Information Quality (ICIQ 2004), pp. 111–125 (2004)

    Google Scholar 

  49. Weiskopf, N.G., Hripcsak, G., Swaminathan, S., Weng, C.: Defining and measuring completeness of electronic health records for secondary use. J. Biomed. Inform. 46(5), 830–836 (2013)

    CrossRef  Google Scholar 

  50. Weiskopf, N.G., Rusanov, A., Weng, C.: Sick patients have more data: the non-random completeness of electronic health records. In: AMIA Annual Symposium Proceedings, vol. 2013, p. 1472. American Medical Informatics Association (2013)

    Google Scholar 

  51. Weiskopf, N.G., Weng, C.: Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J. Am. Med. Inform. Assoc. 20(1), 144–151 (2013)

    CrossRef  Google Scholar 

  52. Zúñiga, F., Blatter, C., Wicki, R., Simon, M.: National quality indicators in Swiss nursing homes: questionnaire survey on data reliability and users’ view on the usefulness. Z. Gerontol. Geriatr. 52(8), 730–736 (2019)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johann Eder .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Shekhovtsov, V.A., Eder, J. (2021). Data Item Quality for Biobanks. In: Hameurlain, A., Tjoa, A.M. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems L. Lecture Notes in Computer Science(), vol 12930. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-64553-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-64553-6_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-64552-9

  • Online ISBN: 978-3-662-64553-6

  • eBook Packages: Computer ScienceComputer Science (R0)