Crowd-Sourced Annotation of ECG Signals Using Contextual Information
- 557 Downloads
For medical applications, the ground truth is ascertained through manual labels by clinical experts. However, significant inter-observer variability and various human biases limit accuracy. A probabilistic framework addresses these issues by comparing aggregated human and automated labels to provide a reliable ground truth, with no prior knowledge of the individual performance. As an alternative to median or mean voting strategies, novel contextual features (signal quality and physiology) were introduced to allow the Probabilistic Label Aggregator (PLA) to weight an algorithm or human based on its performance. As a proof of concept, the PLA was applied to QT interval (pro-arrhythmic indicator) estimation from the electrocardiogram using labels from 20 humans and 48 algorithms crowd-sourced from the 2006 PhysioNet/Computing in Cardiology Challenge database. For automatic annotations, the root mean square error of the PLA was 13.97 ± 0.46 ms, significantly outperforming the best Challenge entry (16.36 ms) as well as mean and median voting strategies (17.67 ± 0.56 ms and 14.44 ± 0.52 ms respectively with p < 0.05). When selecting three annotators, the PLA improved the annotation accuracy over median aggregation by 10.7% for human annotators and 14.4% for automated algorithms. The PLA could therefore provide an improved “gold standard” for medical annotation tasks even when ground truth is not available.
KeywordsProbabilistic analysis Crowd-sourcing Unsupervised learning ECG QT estimation Signal quality
TZ and AJ acknowledge the support of the RCUK Digital Economy Programme grant number EP/G036861/1 (Oxford Centre for Doctoral Training in Healthcare Innovation). TZ also acknowledges the support of China Mobile Research Institute. JB is supported by the UK EPSRC, the Balliol French Anderson Scholarship Fund, and MindChild Medical Inc. (North Andover, MA).
- 1.Bousseljot, R., D. Kreiseler, and A. Schnabel. Nutzung der EKG-Signaldatenbank CARDIODAT der PTB uber das Internet. Biomed. Tech. 40(1):317–318, 1995.Google Scholar
- 2.Cholleti, S. R., S. A. Goldman, A. Blum, D. G. Politte, and S. Don. Veritas: combining expert opinions without labeled data. In: Proceedings of 20th IEEE International Conference on Tools with Artificial Intelligence, Vol. 1, 2008, pp. 45–52.Google Scholar
- 3.Christov, I., I. Dotsinsky, I. Simova, R. Prokopova, E. Trendafilova, and S. Naydenov. Dataset of manually measured QT intervals in the electrocardiogram. Biomed. Eng. Online 5:31, 2006.Google Scholar
- 4.Clifford, G. D., F. Azuaje, and P. E. McSharry. Advanced Methods and Tools for ECG Analysis. Engineering in Medicine and Biology. Norwood, MA: Artech House, 2006.Google Scholar
- 6.Clifford, G. D., and M. C. Villarroel. Model-based determination of QT intervals. Comput. Cardiol. 33:357–360, 2006.Google Scholar
- 7.Dawid, A. P., and A. M. Skene. Maximum likelihood estimation of observer error-rates using the EM algorithm. J. R. Stat. Soc. Ser. C Appl. Stat. 28(1):20–28, 1979.Google Scholar
- 8.Dempster, A. P., N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39(1):1–38, 1977.Google Scholar
- 11.Friesen, G. M., T. C. Jannett, M. A. Jadallah, S. L. Yates, S. R. Quint, and H. T. Nagle. A comparison of the noise sensitivity of nine QRS detection algorithms. IEEE Trans. Biomed. Eng. 37(1):85–98, 1990.Google Scholar
- 12.Hamilton, P. S., and W. J. Quantitative investigation of QRS detection rules using the MIT/BIH arrhythmia database. IEEE Trans. Biomed. Eng. 33(12):1157–1165, 1986.Google Scholar
- 13.International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use: Guidance for Industry E14: Clinical Evaluation of QT/ QTc Interval Prolongation and Proarrhythmic Potential for Non-Antiarrhythmic Drugs. http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm073153.pdf.
- 15.Jin, R., and Z. Ghahramani. Learning with multiple labels. In: Advances in Neural Information Processing Systems, Vol. 15, edited by S. Becker, S. Thrun, and K. Obermayer. Cambridge: MIT Press, 2003, pp. 897–904.Google Scholar
- 16.Malik, M. Errors and misconceptions in ECG measurement used for the detection of drug induced QT interval prolongation. J. Electrocardiol. 37(Supplement):25–33, 2004.Google Scholar
- 17.Moody, G. B., H. Koch, and U. Steinhoff. The Physio Net/Computers in Cardiology Challenge 2006: QT interval measurement. In: Computers in Cardiology, 2006, pp. 313–316.Google Scholar
- 18.Ofer Dekel, O. S. Good learners for evil teachers. In: Proceedings of 26th International Conference on Machine Learning, 2009.Google Scholar
- 20.Raykar, V. C., S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, L. Moy, and D. Blei. Learning from crowds. J. Mach. Learn. Res. 11:1297–1322, 2010.Google Scholar
- 22.Viskin, S., U. Rosovski, A. J. Sands, E. Chen, P. M. Kistler, J. M. Kalman, L. Rodriguez Chavez, P. Iturralde Torres, F. E. S. Cruz F, O. A. Centurin, A. Fujiki, P. Maury, X. Chen, A. D. Krahn, F. Roithinger, L. Zhang, G. M. Vincent, and D. Zeltser. Inaccurate electrocardiographic interpretation of long QT: the majority of physicians cannot recognize a long QT when they see one. Heart Rhythm 2:569–574, 2005.Google Scholar
- 25.Willems, J., P. Arnaud, J. van Bemmel, P. Bourdillon, C. Brohet, S. Dalla Volta, J. Andersen, R. Degani, B. Denis, M. Demeester, et al. Assessment of the performance of electrocardiographic computer programs with the use of a reference data base. Circulation 71(3):523–534, 1985.PubMedCrossRefGoogle Scholar
- 26.Zong, W., G. Moody, and D. Jiang. A robust open-source algorithm to detect onset and duration of QRS complexes. Comput. Cardiol. 30:737–740, 2003.Google Scholar