Abstract
Biobanks collect and store items of biological material and provide these resources for medical research together with data associated with these items. In this paper, we contribute to the fundamentals necessary for establishing data quality management for biobanks. We analyse the properties of biobanks which are most important for an adequate data quality management system. We provide a comprehensive description of the concept of quality for biobank data. For this, we state that the quality of the biobank data can be categorized into data item quality and metadata quality and provide the detailed treatment of common data item quality characteristics, in particular, completeness, accuracy, reliability, consistency, timeliness, precision, and provenance. These definitions of data item quality characteristics are a necessary basis for data quality representation and management. The precise definition of these data quality characteristics also required as a necessary basis for integrating data items derived from different sources which is frequently needed for larger medical studies.
This work has been supported by the Austrian Bundesministerium für Bildung, Wissenschaft und Forschung within the project BBMRI.AT (GZ 10.470/0010-V/3c/2018).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
ASQ Quality Glossary. https://asq.org/quality-resources/quality-glossary/d
Batini, C., Scannapieco, M.: Data and Information Quality: Dimensions, Principles and Techniques. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24106-7
Betsou, F.: Quality assurance and quality control in biobanking. In: Hainaut, P., Vaught, J., Zatloukal, K., Pasterk, M. (eds.) Biobanking of Human Biospecimens, pp. 23–49. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55120-3_2
Cao, S., Zhang, G., Liu, P., Zhang, X., Neri, F.: Cloud-assisted secure eHealth systems for tamper-proofing EHR via blockchain. Inf. Sci. 485, 427–440 (2019)
Carter, A., Betsou, F.: Quality assurance in cancer biobanking. Biopreserv. Biobank. 9(2), 157–163 (2011)
Chan, K.S., Fowles, J.B., Weiner, J.P.: Electronic health records and the reliability and validity of quality measures: a review of the literature. Med. Care Res. Rev. 67(5), 503–527 (2010)
Ciglic, M., Eder, J., Koncilia, C.: Anonymization of data sets with null values. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV. LNCS, vol. 9510, pp. 193–220. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49214-7_7
Cowie, M.R., et al.: Electronic health records to facilitate clinical research. Clin. Res. Cardiol. 106(1), 1–9 (2017)
Dinov, I.D.: Volume and value of big healthcare data. J. Med. Stat. Inf. 4 (2016)
Dollé, L., Bekaert, S.: High-quality biobanks: pivotal assets for reproducibility of OMICS-data in biomedical translational research. Proteomics 19(21–22), 1800485 (2019)
Eder, J., Dabringer, C., Schicho, M., Stark, K.: Information systems for federated biobanks. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems I. LNCS, vol. 5740, pp. 156–190. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03722-1_7
Eder, J., Gottweis, H., Zatloukal, K.: It solutions for privacy protection in biobanking. Public Health Genom. 15(5), 254–262 (2012)
Eder, J., Shekhovtsov, V.A.: Data quality for medical data lakelands. In: Dang, T.K., Küng, J., Takizawa, M., Chung, T.M. (eds.) FDSE 2020. LNCS, vol. 12466, pp. 28–43. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63924-2_2
Eder, J., Shekhovtsov, V.A.: Data quality for federated medical data lakes. Int. J. Web Inf. Syst. (2021). Publisher: Emerald Publishing Limited
Estiri, H., Vasey, S., Murphy, S.N.: Generative transfer learning for measuring plausibility of EHR diagnosis records. J. Am. Med. Inform. Assoc. 28, 559–568 (2020)
Feder, S.L.: Data quality in electronic health records research: quality domains and assessment methods. West. J. Nurs. Res. 40(5), 753–766 (2018)
Fougerou-Leurent, C., et al.: Impact of a targeted monitoring on data-quality and data-management workload of randomized controlled trials: a prospective comparative study. Br. J. Clin. Pharmacol. 85(12), 2784–2792 (2019)
Götzinger, M., Anzanpour, A., Azimi, I., TaheriNejad, N., Rahmani, A.M.: Enhancing the self-aware early warning score system through fuzzified data reliability assessment. In: Perego, P., Rahmani, A.M., TaheriNejad, N. (eds.) MobiHealth 2017. LNICST, vol. 247, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98551-0_1
Houston, L., Probst, Y., Humphries, A.: Measuring data quality through a source data verification audit in a clinical research setting. Stud. Health Technol. Inform. 214, 107–13 (2015)
Houston, L., Probst, Y., Yu, P., Martin, A.: Exploring data quality management within clinical trials. Appl. Clin. Inform. 9(01), 072–081 (2018)
Huzooree, G., Khedo, K.K., Joonas, N.: Data reliability and quality in body area networks for diabetes monitoring. In: Maheswar, R., Kanagachidambaresan, G.R., Jayaparvathy, R., Thampi, S.M. (eds.) Body Area Network Challenges and Solutions. EICC, pp. 55–86. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-00865-9_4
Jetley, G., Zhang, H.: Electronic health records in IS research: quality issues, essential thresholds and remedial actions. Decis. Support Syst. 126, 113137 (2019)
Karimi-Busheri, F., Rasouli-Nia, A.: Integration, networking, and global biobanking in the age of new biology. In: Karimi-Busheri, F. (ed.) Biobanking in the 21st Century. AEMB, vol. 864, pp. 1–9. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20579-3_1
Kaschek, R., Pavlov, R., Shekhovtsov, V.A., Zlatkin, S.: Characterization and tool supported selection of business process modeling methodologies. In: Abramowicz, W., Mayr, H.C. (eds.) Technologies for Business Information Systems, pp. 25–37. Springer, Dordrecht (2007). https://doi.org/10.1007/1-4020-5634-6
Kerr, K.A., Norris, T., Stockdale, R.: The strategic management of data quality in healthcare. Health Informatics J. 14(4), 259–266 (2008)
Király, P., Büchler, M.: Measuring completeness as metadata quality metric in Europeana. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 2711–2720. IEEE (2018)
Kyriacou, D.N.: Reliability and validity of diagnostic tests. Acad. Emerg. Med. 8(4), 404–405 (2001)
Langseth, H., Luostarinen, T., Bray, F., Dillner, J.: Ensuring quality in studies linking cancer registries and biobanks. Acta Oncol. 49(3), 368–377 (2010)
Lee, D., Jiang, X., Yu, H.: Harmonized representation learning on dynamic EHR graphs. J. Biomed. Inform. 106, 103426 (2020)
Liu, C., Talaei-Khoei, A., Zowghi, D., Daniel, J.: Data completeness in healthcare: a literature survey. Pac. Asia J. Assoc. Inf. Syst. 9(2) (2017). ISBN 1943-7544
Liu, C., Zowghi, D., Talaei-Khoei, A., Daniel, J.: Achieving data completeness in electronic medical records: a conceptual model and hypotheses development. In: Proceedings of the 51st Hawaii International Conference on System Sciences (2018)
Mandrekar, J.N.: Simple statistical measures for diagnostic accuracy assessment. J. Thorac. Oncol. 5(6), 763–764 (2010)
Margaritopoulos, M., Margaritopoulos, T., Mavridis, I., Manitsaris, A.: Quantifying and measuring metadata completeness. J. Am. Soc. Inform. Sci. Technol. 63(4), 724–737 (2012)
Mayrhofer, M.T., Holub, P., Wutte, A., Litton, J.E.: BBMRI-ERIC: the novel gateway to biobanks. Bundesgesundheitsblatt-Gesundheitsforschung-Gesundheitsschutz 59(3), 379–384 (2016)
Müller, H., Dagher, G., Loibner, M., Stumptner, C., Kungl, P., Zatloukal, K.: Biobanks for life sciences and personalized medicine: importance of standardization, biosafety, biosecurity, and data management. Curr. Opin. Biotechnol. 65, 45–51 (2020)
Nahm, M.: Data quality in clinical research. In: Richesson, R., Andrews, J. (eds.) Clinical Research Informatics, pp. 175–201. Springer, London (2012). https://doi.org/10.1007/978-1-84882-448-5_10
Olson, J.E.: Data Quality: The Accuracy Dimension. Morgan Kaufmann, San Francisco (2003)
Pantazos, K., Lauesen, S., Lippert, S.: De-identifying an EHR database-anonymity, correctness and readability of the medical record. In: MIE, pp. 862–866 (2011)
Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Commun. ACM 45(4), 211–218 (2002)
Quinlan, P.R., Gardner, S., Groves, M., Emes, R., Garibaldi, J.: A data-centric strategy for modern biobanking. In: Karimi-Busheri, F. (ed.) Biobanking in the 21st Century. AEMB, vol. 864, pp. 165–169. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20579-3_13
Ranasinghe, S., Pichler, H., Eder, J.: Report on data quality in biobanks: problems, issues, state-of-the-art. arXiv preprint 1812.10423 (2018)
Saaty, T.L.: Decision making with the analytic hierarchy process. Int. J. Serv. Sci. 1(1), 83–98 (2008)
Saaty, T.L., Vargas, L.G.: Decision Making with the Analytic Network Process, vol. 282. Springer, Boston (2006). https://doi.org/10.1007/978-1-4614-7279-7
Salati, M., et al.: Task-independent metrics to assess the data quality of medical registries using the European Society of Thoracic Surgeons (ESTS) Database. Eur. J. Cardiothorac. Surg. 40(1), 91–98 (2011)
Stark, K., Eder, J., Zatloukal, K.: Priority-based k-anonymity accomplished by weighted generalisation structures. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 394–404. Springer, Heidelberg (2006). https://doi.org/10.1007/11823728_38
Stark, K., Koncilia, C., Schulte, J., Schikuta, E., Eder, J.: Incorporating data provenance in a medical CSCW system. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds.) DEXA 2010. LNCS, vol. 6261, pp. 315–322. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15364-8_26
Staroselsky, M., et al.: Improving electronic health record (EHR) accuracy and increasing compliance with health maintenance clinical guidelines through patient access and input. Int. J. Med. Informatics 75(10–11), 693–700 (2006)
Stvilia, B., Gasser, L., Twidale, M.B., Shreeves, S.L., Cole, T.W.: Metadata quality for federated collections. In: Proceedings of the Ninth International Conference on Information Quality (ICIQ 2004), pp. 111–125 (2004)
Weiskopf, N.G., Hripcsak, G., Swaminathan, S., Weng, C.: Defining and measuring completeness of electronic health records for secondary use. J. Biomed. Inform. 46(5), 830–836 (2013)
Weiskopf, N.G., Rusanov, A., Weng, C.: Sick patients have more data: the non-random completeness of electronic health records. In: AMIA Annual Symposium Proceedings, vol. 2013, p. 1472. American Medical Informatics Association (2013)
Weiskopf, N.G., Weng, C.: Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J. Am. Med. Inform. Assoc. 20(1), 144–151 (2013)
Zúñiga, F., Blatter, C., Wicki, R., Simon, M.: National quality indicators in Swiss nursing homes: questionnaire survey on data reliability and users’ view on the usefulness. Z. Gerontol. Geriatr. 52(8), 730–736 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer-Verlag GmbH Germany, part of Springer Nature
About this chapter
Cite this chapter
Shekhovtsov, V.A., Eder, J. (2021). Data Item Quality for Biobanks. In: Hameurlain, A., Tjoa, A.M. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems L. Lecture Notes in Computer Science(), vol 12930. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-64553-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-662-64553-6_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-64552-9
Online ISBN: 978-3-662-64553-6
eBook Packages: Computer ScienceComputer Science (R0)