Abstract
Data readiness analysis consists of methods that profile data and flag quality issues to determine the AI readiness of a given dataset. Such methods are being increasingly used to understand, inspect and correct anomalies in data such that their impact on downstream machine learning is limited. This often requires a human in the loop for validation and application of remedial actions. In this paper we describe a tool to assist data workers in this task by providing rich explanations to results obtained through data readiness analysis. The aim is to allow interactive visual inspection and debugging of data issues to enhance interpretability as well as facilitate informed remediation actions by humans in the loop.
The first two authors have contributed equally to this paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amershi, S., Cakmak, M., Knox, W.B., Kulesza, T.: Power to the people: the role of humans in interactive machine learning. AI Mag. 35(4), 105 (2014). https://doi.org/10.1609/aimag.v35i4.2513
Desmond, M., Finegan-Dollak, C., Boston, J., Arnold, M.: Label noise in context. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 157–186. Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-demos.21. https://www.aclweb.org/anthology/2020.acl-demos.21
Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2013)
Ham, K.: Openrefine (version 2.5) open-source tool for cleaning and transforming data. J. Med. Libr. Assoc. JMLA 101(3), 233 (2013). http://openrefine.org.free
Hohman, F., Srinivasan, A., Drucker, S.M.: TeleGam: combining visualization and verbalization for interpretable machine learning, p. 5 (2019)
Jain, A., et al.: Overview and importance of data quality for machine learning tasks, pp. 3561–3562, August 2020. https://doi.org/10.1145/3394486.3406477
Kandel, S., Paepcke, A., Hellerstein, J., Heer, J.: Wrangler: interactive visual specification of data transformation scripts. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 3363–3372 (2011)
Mohseni, S., Zarei, N., Ragan, E.D.: A multidisciplinary survey and framework for design and evaluation of explainable AI systems. arXiv:1811.11839 [cs], August 2020
Northcutt, C.G., Jiang, L., Chuang, I.L.: Confident learning: estimating uncertainty in dataset labels (2020)
Sevastjanova, R., et al.: Going beyond visualization: verbalization as complementary medium to explain machine learning models (2018)
Smilkov, D., Thorat, N., Nicholson, C., Reif, E., Viégas, F.B., Wattenberg, M.: Embedding projector: interactive visualization and interpretation of embeddings. arXiv preprint arXiv:1611.05469 (2016)
Spinner, T., Schlegel, U., Schafer, H., El-Assady, M.: Explainer: a visual analytics framework for interactive and explainable machine learning. IEEE Trans. Vis. Comput. Graph. 1 (2019). https://doi.org/10.1109/TVCG.2019.2934629
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Afzal, S., Chaudhary, A., Gupta, N., Patel, H., Spina, C., Wang, D. (2021). Data-Debugging Through Interactive Visual Explanations. In: Gupta, M., Ramakrishnan, G. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12705. Springer, Cham. https://doi.org/10.1007/978-3-030-75015-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-75015-2_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75014-5
Online ISBN: 978-3-030-75015-2
eBook Packages: Computer ScienceComputer Science (R0)