Skip to main content

ML-ModelExplorer: An Explorative Model-Agnostic Approach to Evaluate and Compare Multi-class Classifiers

Part of the Lecture Notes in Computer Science book series (LNISA,volume 12279)


A major challenge during the development of Machine Learning systems is the large number of models resulting from testing different model types, parameters, or feature subsets. The common approach of selecting the best model using one overall metric does not necessarily find the most suitable model for a given application, since it ignores the different effects of class confusions. Expert knowledge is key to evaluate, understand and compare model candidates and hence to control the training process. This paper addresses the research question of how we can support experts in the evaluation and selection of Machine Learning models, alongside the reasoning about them. ML-ModelExplorer is proposed – an explorative, interactive, and model-agnostic approach utilising confusion matrices. It enables Machine Learning and domain experts to conduct a thorough and efficient evaluation of multiple models by taking overall metrics, per-class errors, and individual class confusions into account. The approach is evaluated in a user-study and a real-world case study from football (soccer) data analytics is presented.

ML-ModelExplorer and a tutorial video are available online for use with own data sets:


  • Multi-class classification
  • Model selection
  • Feature selection
  • Human-centered machine learning
  • Visual analytics

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-57321-8_16
  • Chapter length: 20 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   109.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-57321-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   149.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.


  1. 1.

    domain experts are assumed to have a basic understanding of classification problems, i.e. understand class errors and class confusions.

  2. 2.

    ML-ModelExplorer online:

  3. 3.

    ML-ModelExplorer video:


  1. Alsallakh, B., Hanbury, A., Hauser, H., Miksch, S., Rauber, A.: Visual methods for analyzing probabilistic classification data. IEEE Trans. Visual Comput. Graphics 20(12), 1703–1712 (2014)

    CrossRef  Google Scholar 

  2. Armatas, V., Yiannakos, A., Papadopoulou, S., Skoufas, D.: Evaluation of goals scored in top ranking soccer matches: Greek “superleague” 2006–08. Serbian J. Sports Sci. 3, 39–43 (2009)

    Google Scholar 

  3. Bernard, J., Zeppelzauer, M., Sedlmair, M., Aigner, W.: VIAL: a unified process for visual interactive labeling. Vis. Comput. 34(9), 1189–1207 (2018).

    CrossRef  Google Scholar 

  4. Chang, W., Cheng, J., Allaire, J., Xie, Y., McPherson, J.: shiny: web application framework for R. r package version 1.0.5 (2017).

  5. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000).

    CrossRef  Google Scholar 

  6. Fawcett, T.: ROC graphs: notes and practical considerations for researchers. Technical report, HP Laboratories (2004)

    Google Scholar 

  7. Frencken, W., Lemmink, K., Delleman, N., Visscher, C.: Oscillations of centroid position and surface area of soccer teams in small-sided games. Eur. J. Sport Sci. 11(4), 215–223 (2011).

    CrossRef  Google Scholar 

  8. Goes, F.R., Kempe, M., Meerhoff, L.A., Lemmink, K.A.P.M.: Not every pass can be an assist: a data-driven model to measure pass effectiveness in professional soccer matches. Big Data 7(1), 57–70 (2019).

  9. Goes, F.R., et al.: Unlocking the potential of big data to support tactical performance analysis in professional soccer: a systematic review. Eur. J. Sport Sci. (2020, to appear).

  10. Holzinger, A., et al.: Interactive machine learning: experimental evidence for the human in the algorithmic loop. Appl. Intell. 49(7), 2401–2414 (2018).

    CrossRef  Google Scholar 

  11. Huang, W., Song, G., Li, M., Hu, W., Xie, K.: Adaptive weight optimization for classification of imbalanced data. In: Sun, C., Fang, F., Zhou, Z.-H., Yang, W., Liu, Z.-Y. (eds.) IScIDE 2013. LNCS, vol. 8261, pp. 546–553. Springer, Heidelberg (2013).

    CrossRef  Google Scholar 

  12. Inc., P.T.: Collaborative data science (2015).

  13. Inselberg, A.: The plane with parallel coordinates. Vis. Comput. 1(2), 69–91 (1985)

    CrossRef  MathSciNet  Google Scholar 

  14. Jiang, L., Liu, S., Chen, C.: Recent research advances on interactive machine learning. J. Vis. 22(2), 401–417 (2018).

    CrossRef  Google Scholar 

  15. Kautz, T., Eskofier, B.M., Pasluosta, C.F.: Generic performance measure for multiclass-classifiers. Pattern Recogn. 68, 111–125 (2017).

    CrossRef  Google Scholar 

  16. Krause, J., Perer, A., Bertini, E.: Infuse: interactive feature selection for predictive modeling of high dimensional data. IEEE Trans. Visual Comput. Graph. 20(12), 1614–1623 (2014)

    CrossRef  Google Scholar 

  17. Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)

    CrossRef  Google Scholar 

  18. LeCun, Y.: The MNIST database of handwritten digits (1999).

  19. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015).

    CrossRef  Google Scholar 

  20. Link, D., Lang, S., Seidenschwarz, P.: Real time quantification of dangerousity in football using spatiotemporal tracking data. PLoS ONE 11(12), 1–16 (2016).

  21. Meerhoff, L.A., Goes, F., de Leeuw, A.W., Knobbe, A.: Exploring successful team tactics in soccer tracking data. In: MLSA@PKDD/ECML (2019)

    Google Scholar 

  22. Memmert, D., Lemmink, K.A.P.M., Sampaio, J.: Current approaches to tactical performance analyses in soccer using position data. Sports Med. 47(1), 1–10 (2016).

    CrossRef  Google Scholar 

  23. Park, C., Lee, J., Han, H., Lee, K.: ComDia+: an interactive visual analytics system for comparing, diagnosing, and improving multiclass classifiers. In: 2019 IEEE Pacific Visualization Symposium (PacificVis), pp. 313–317, April 2019

    Google Scholar 

  24. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  25. Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syste. Mag. 6, 21–45 (2006)

    Google Scholar 

  26. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2017).

  27. Raschka, S.: Model evaluation, model selection, and algorithm selection in machine learning. CoRR abs/1811.12808 (2018)

    Google Scholar 

  28. Rawat, W., Wang, Z.: Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput. 29(9), 2352–2449 (2017)

    CrossRef  MathSciNet  Google Scholar 

  29. Ren, D., Amershi, S., Lee, B., Suh, J., Williams, J.D.: Squares: supporting interactive performance analysis for multiclass classifiers. IEEE Trans. Visual Comput. Graphics 23(1), 61–70 (2017)

    CrossRef  Google Scholar 

  30. Sacha, D., et al.: What you see is what you can change: human-centered machine learning by interactive visualization. Neurocomputing 268, 164–175 (2017).

    CrossRef  Google Scholar 

  31. Shneiderman, B.: The eyes have it: a task by data type taxonomy for information visualizations. In: In Proceedings of Visual Languages, pp. 336–343. IEEE Computer Science Press (1996)

    Google Scholar 

  32. Theissler, A.: Detecting known and unknown faults in automotive systems using ensemble-based anomaly detection. Knowl. Based Syst. 123(C), 163–173 (2017).

  33. Zhang, J., Wang, Y., Molino, P., Li, L., Ebert, D.S.: Manifold: a model-agnostic framework for interpretation and diagnosis of machine learning models. IEEE Trans. Visual Comput. Graph. 25(1), 364–373 (2019)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Andreas Theissler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 IFIP International Federation for Information Processing

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Theissler, A., Vollert, S., Benz, P., Meerhoff, L.A., Fernandes, M. (2020). ML-ModelExplorer: An Explorative Model-Agnostic Approach to Evaluate and Compare Multi-class Classifiers. In: Holzinger, A., Kieseberg, P., Tjoa, A., Weippl, E. (eds) Machine Learning and Knowledge Extraction. CD-MAKE 2020. Lecture Notes in Computer Science(), vol 12279. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-57320-1

  • Online ISBN: 978-3-030-57321-8

  • eBook Packages: Computer ScienceComputer Science (R0)