Skip to main content

Explainable Machine Learning for Categorical and Mixed Data with Lossless Visualization

  • Chapter
  • First Online:
Artificial Intelligence and Visualization: Advancing Visual Knowledge Discovery

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1126))

  • 39 Accesses

Abstract

Building accurate and explainable/interpretable Machine Learning (ML) models for heterogeneous/mixed data is a long-standing challenge for algorithms designed for numeric data. This work focuses on developing numeric coding schemes for non-numeric attributes for ML algorithms to support accurate and explainable ML models, methods for lossless visualization of n-D non-numeric categorical data with visual rule discovery in these visualizations, and accurate and explainable ML models for categorical data. This study proposes a classification of mixed data types and analyzes their important role in Machine Learning. It presents a toolkit for enforcing interpretability of all internal operations of ML algorithms on mixed data with a visual data exploration on mixed data. A new Sequential Rule Generation (SRG) algorithm for explainable rule generation with categorical data is proposed and successfully evaluated in multiple computational experiments. This work is one of the steps to the full scope ML algorithms for mixed data supported by lossless visualization of n-D data in General Line Coordinates beyond Parallel Coordinates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ali N, Neagu D, Trundle P (2019) Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets. SN Appl Sci 1:1–5

    Article  Google Scholar 

  2. Kovalerchuk B, Grishin V (2019) Adjustable general line coordinates for visual knowledge discovery in n-D data. Inform Visualiz 18(1):3–32

    Article  Google Scholar 

  3. Kovalerchuk B (2018) Visual knowldege discovery and machine learning. Springer

    Google Scholar 

  4. Rosario GE, Rundensteiner EA, Brown DC, Ward MO, Huang S (2004) Mapping nominal values to numbers for effective visualization. Inf Vis 3(2):80–95

    Article  Google Scholar 

  5. Kovalerchuk B, Delizy F (2004) Visual data mining using monotone Boolean functions. In: Visual and spatial analysis. Springer, pp 387–406

    Google Scholar 

  6. Friendly M (2000) Visualizing categorical data: data, stories, and pictures. In: Proceedings of the 25th annual SAS users group intern. Conference. https://www.datavis.ca/papers/sugi/vcdstory/vcdstory.pdf

  7. Shahid ML, Molchanov V, Mir J, Shaukat F, Linsen L (2020) Interactive visual analytics tool for multidimensional quantitative and categorical data analysis. Inf Vis 19(3):234–246

    Article  Google Scholar 

  8. Roy B (2020) All about categorical variable encoding. https://towardsdatascience.com/all-about-categorical-variable-encoding-305f3361fd02

  9. Peng S, Hu Q, Chen Y, Dang J (2015) Improved support vector machine algorithm for heterogeneous data. Pattern Recogn 48(6):2072–83

    Google Scholar 

  10. Potdar K, Pardawala TS, Pai CD (2017) A comparative study of categorical variable encoding techniques for neural network classifiers. Int J Comp Appl 175(4):7–9

    Google Scholar 

  11. Vityaev EE, Kovalerchuk BY (2008) Relational methodology for data mining and knowledge discovery. Intell Data Anal 12(2):189–210

    Article  Google Scholar 

  12. Lipton Z (2018) The mythos of model interpretability. Commun ACM 61:36–43

    Article  Google Scholar 

  13. Letham B, Rudin C, McCormick TH, Madigan D (2015) Interpretable classifiers using rules and bayesian analysis: building a better stroke prediction model. Ann Appl Stat 9(3):1350–1371

    Article  MathSciNet  Google Scholar 

  14. Fryer D, Strümke I, Nguyen H (2021) Shapley values for feature selection: the good, the bad, and the axioms. IEEE Access. 8(9):144352–144360

    Article  Google Scholar 

  15. Watson DS (2022) Conceptual challenges for interpretable machine learning. Synthese 200:65

    Article  MathSciNet  Google Scholar 

  16. Watson DS (2021) Rational shapley values. arXiv preprint arXiv:2106.10191

  17. Adilova L, Kamp M, Andrienko G, Andrienko N (2023) Re-interpreting rules interpretability. J Data Sci Anal 5:1–21. https://www.researchsquare.com/article/rs-1525944/latest.pdf

  18. Kovalerchuk B, Ahmad MA, Teredesai A (2021) Survey of explainable machine learning with visual and granular methods beyond quasi-explanations. In: Pedrycz W, Chen SM (eds) Interpretable artificial intelligence: a perspective of granular computing. Springer, pp 217–267

    Google Scholar 

  19. Kovalerchuk B, Triantaphyllou E, Deshpande AS, Vityaev E (1996) Interactive learning of monotone Boolean functions. Inf Sci 94(1–4):87–118

    Article  Google Scholar 

  20. Kovalerchuk B, Triantaphyllou E, Ruiz J (1996) Monotonicity and logical analysis of data: a mechanism for evaluation of mammographic and clinical data. In: Computer applications to assist radiology. Carlsbad, CA, Symposia Foundation, pp 191–196

    Google Scholar 

  21. Kovalerchuk B, McCoy E (2022) Explainable mixed data representation and lossless visualization toolkit for knowledge discovery. In: 26th International conference information visualization. IEEE, pp 314–321. arXiv:2206.06476

  22. Krantz DH, Luce RD, Suppes P, Tversky A, et al (1971) Foundations of measurement, vol. 1. Acad. Press

    Google Scholar 

  23. Kovalerchuk B (1975) On cyclical scales. Comput Syst 61:51–59

    Google Scholar 

  24. Ji S, Pan S, Cambria E, Marttinen P, Philip SY (2021) A survey on knowledge graphs: representation, acquisition, and applications. IEEE Trans Neural Netw Learn Syst 33(2):494–514

    Google Scholar 

  25. Cheng V, Li CH, Kwok JT, Li CK (2004) Dissimilarity learning for nominal data. Pattern Recogn 37(7):1471–1477

    Article  Google Scholar 

  26. Stanfill C, Waltz D (1986) Toward memory-based reasoning. Comm ACM 29(12):1213–1228

    Article  Google Scholar 

  27. Dua D, Graff C (2019) Machine learning repository. University of California, Irvine, CA. https://archive.ics.uci.edu/ml/datasets/Mushroom

  28. Kovalerchuk B, Vityaev E (2000) Data mining in finance: advances in relational and hybrid methods. Kluwer

    Google Scholar 

  29. Kovalerchuk B, Hayes D (2021) Discovering explainable machine learning models in parallel coordinates. In: 2021 25th International conference information visualisation (IV). IEEE, pp 181–188

    Google Scholar 

  30. Duch W, Setiono R, Zurada JM (2004) Computational intelligence methods for rule-based data understanding. Proc IEEE 92(5):771–805

    Article  Google Scholar 

  31. Duch W, Adamczak R, Grabczewski K (2001) A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Trans Neural Networks 12(2):277–306

    Article  Google Scholar 

  32. GitHub. https://github.com/CWU-VKD-LAB. VisCanvas 2.0

  33. Bendix F, Kosara R, Hauser H (2005) Parallel sets: visual analysis of categorical data. In: Symposium on information visualization. IEEE, pp 133–140

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boris Kovalerchuk .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Experiment 1 with Algorithm SRG0 for Discovering Rules on Mushroom with Sequential Triples

In this experiment we tested SRG0 algorithm for discovering rules on Mushroom data [27] to solve a two-class classification problem (poisonous - eatable). We found rules with 100% precision for the target class C1 – poisonous mushrooms. One of these rules is a rule reported in the literature [30]. Below we present discovered rules R1-R7 using notation of the case x = (x1,x2,…,xn). If any of these rules is true, then x belongs to class C1.

Rules R1–R7

R1: [(x5 = 3) \(\vee\) (x5 = 4) \(\vee\) (x5 = 5) \(\vee\) (x5 = 6) \(\vee\) (x5 = 8) \(\vee\) (x5 = 9)] ⇒ x ∈ C1

R2: [(x9 = 6) \(\vee\) (x9 = 3)] ⇒ x ∈ C1

R3: [(x19 = 2) & (x20 = 8) & (x21 ≠ 2) & (x22 ≠ 2)] ⇒ x ∈ C1

R4: [(x15 = 3) \(\vee\) (x15 = 2) \(\vee\) (x15 = 9)] ⇒ x ∈ C1

R5: [(x19 ≠ 2) & (x20 ≠ 6) & (x21 = 5) & (x22 = 1)] ⇒ x ∈ C1

R6: [(x19 = 6) & (x20 = 5) & (x21 ≠ 1) & (x22 = !6)] ⇒ x ∈ C1

R7: [(x20 = 8) & (x21 = 2) & (x22 = !6)] ⇒ x ∈ C1

Table 3 presents characteristics of discovered 7 rules, R1-R7, which cover 100% of the mushroom data for the “poisonous” class with 100% precision. The total number of cases in the poisonous Ntotal class is 3916 cases. The rules were with attributes broken up into six sequential triples and one group with 4 attributes: x19, x20, x21, x22. These 6 groups are (x1, x2, x3), (x4, x5, x6), (x7, x8, x9), (x10, x11, x12), (x13, x14, x35), (x16, x17, x18).

Table 3 Characteristics of discovered rules R1–R7

1.2 Experiment 2 with SRG1 Algorithm for Discovering Rules on Mushroom Data with Rule Overlap Minimization

The analysis of rules presented in the previous section shows that, Rule R1 covers 96.94% of the cases with target value “poisonous”, which means that the other 6 rules add to the coverage only 3.04% of these cases (118 cases). For this reason, to get a better understanding of these rules we calculated the overlap between cases classified by different rules. See Tables 4 and 5. In these tables, the overlap, OL, is the total number of cases that are in the intersection of rules Ri and Rj, OL(Ri,Rj) = |Cases(Ri) ∩ Cases(Rj)|, the overlap percentage is OL(Ri,Rj) divided by the total number of cases in the union of cases covered by both rules, OL(Ri,Rj)/|cases(Ri)\(\cup\)  cases(Rj)|. Table 4 shows the relations of rule R1 with other rules and Table 5 of remaining rules with each other.

Table 4 Overlap between dominant rule R1 and other rules
Table 5 Overlap percentage and number of cases for rules R2–R7

For instance, the analysis of the relations between rules R1 and R2 show that together they cover 3820 cases of poisonous mushrooms and overlap in 1728 cases (45.24% of 3820 cases). Rule R1 covers 3796 cases, thus rule R2 adds only 3820–3796 = 24 new cases to R1, which is 20.34% of 120 cases not covered by rule R1. R6 does not overlap with rule R1 it adds 72 new cases to R1, which is 60% of 120 cases not covered by rule R1.

Thus, rules R1&R2, R1&R6 differ in an important characteristic. Heavy overlap of R1 and R2 (45.24%) increases the confidence in classification of those overlapped cases. The non-overlap of R1 and R6, increases coverage expanding the number of cases that the rule predicts two more than with adding R2 to R1.

For rules R2-R7, Table 5 shows overlap percentage for each pair of rules, the total number of distinct cases covered by two rules and the number of cases in their overlap. Only rules R2 and R3 heavily overlap. All other rules either do not overlap (7 pairs) or overlap in no more than 72 cases, which less than 10% of the total number of cases in each pair. As expected, many of the rules overlap with each other. The heaviest overlapping rules are R2 and R3, where the overlap is 64% together of the total cases predicted. This means that these two rules are closely related.

Table 6 shows the relations between rules CR1-CR3 from [30]. Rules CR1-CR3 use 5 attributes (x5, x8, ×12, ×20, ×21), while our rules R1-R7 use 7 attributes (x5, x9, ×15, ×19, ×20, ×21, ×22).

Table 6 Relations between rules CR1–CR3

Rule R1 is the same as rule CR1 from [30]. Tables 4 and 5 show that rules R2-R7 are more general than rules CR2 and CR3 discovered in [30]. Rule CR2 covers 72 cases and rule CR3 covers 912 cases, while our rule R7 with the smallest coverage cover 52 cases. Rules R2-R7 cover in total 2188 cases, while rules CR2 and CR3 cover only 984 cases together.

1.3 Experiment 3 with SRG2 Algorithm on Mushroom Data with Complimentary Rules Generation

1.3.1 Rule Generation

Below we present rules generated at Step 3 first for the poisonous class and then complementary rules for the eatable class on Mushroom data with most frequent attributes generated from all 22 mushroom attributes. In this experiment the following attribute groups are used: Group A1: x9, x5, x7, ×11; Group A2: ×13, ×14, ×15, x6; Group A3: x1, x2, x4, ×21, ×22.

According to the design of SRG algorithm it runs at the several levels of thresholds that can be set up by a user. We experimented with 3 level of precision: 75%. 85% and 95% with fixed level of coverage of 0.5%. This means that rules that have lower precision coverage are filtered out and not selected. For coverage with 3916 cases in the poisonous class it means that rules that cover less than 20 cases are filtered out considered as overfitting rules.

The thresholds 75%, 85%, and 95% limit only the low margin of the rule quality, but do not limit their upper level. Therefore, we also computed the actual precision and coverage reached for the poisonous class at each level.

Table 7 shows that rules at all levels missed only 4 cases from the poisonously class, giving 99.89% coverage and none of the poisonous cases was misclassified as eatable, giving 100% precision. All classification errors came from classifying some eatable cases as poisonous, that ranged from 800 cases for 75% threshold to 192 cases for 95% threshold.

Table 7 Results of sequential rule generation for poisonous class in Experiment 3: Rules R1–R13

We conducted the further analysis for 13 rules selected at level 3 with 95% threshold for rules. See Table 8. This table shows that all 192 misclassified cases belong to the rule R1 that has 98.47% coverage and 95.2% precision. All other rules have 100% precision and coverage that smaller than for dominant rule R1.

Table 8 Characteristics of discovered rules R1–R13 for poisonous class in Experiment 3

Rules R1–R13 (Poisonous)

R1: [(x5 ≠ 7) & (x7 ≠ 2) & (x9 ≠ 7) & (x11 ≠ 2)] ⇒ x ∈ C1

R2: [(x5 = 3) ∨ (x5 = 4) ∨ (x5 = 5) ∨ (x5 = 6) ∨ (x5 = 8) V (x5 = 9)] ⇒ x ∈ C1

R3: [(x9 = 3) ∨ (x9 = 6)] ⇒ x ∈ C1

R4: [(x6 ≠ 1) & (x13 ≠ 4) & (x14 ≠ 8) & (x15 ≠ 1)] ⇒ x ∈ C1

R5: [(x1 ≠ 6) & (x2 ≠ 3) & (x4 ≠ 1) & (x21 ≠ 6) & (x22 = 7)] ⇒ x ∈ C1

R6: [(x9 = 5) & (x11 = 1)] ⇒ x ∈ C1

R7: [(x15 = 3) ∨ (x15 = 3) ∨ (x15 = 9)] ⇒ x ∈ C1

R8: [(x1 = 5) & (x4 = 2) & (x21 = 5) & (x22 ≠ 2)] ⇒ x ∈ C1

R9: [(x21 = 5) & (x22 = 1)] ⇒ x ∈ C1

R10: [(x6 = 3) & (x13 = 2) & (x14 ≠ 1) & (x15 ≠ 8)] ⇒ x ∈ C1

R11: [(x2 = 3) & (x21 = 2) & (x22 ≠ 6)] ⇒ x ∈ C1

R12: [(x1 = 1) & (x21 = 5) & (x22 ≠ 2)] ⇒ x ∈ C1

R13: [(x1 ≠ 6) & (x2 ≠ 1) & (x4 ≠ 2) & (x21 = 5) & (x22 = 3)] ⇒ x ∈ C1

The rules R14 and R15 generated for the eatable class C2 are presented below:

R14: [(x5 = 1) & (x9 ≠ 1)] = > x C2

R15: [(x5 = 2) & (x9 ≠ 1)] = > x C2

Table 9 shows the analysis of both R14 and R15 for class eatable class C2. Rules R14 and R15 together cover and correctly predict all 192 cases misclassed by Rule R1. While both R14 and R15 cover 336 cases each, these cases are different. In fact, rules R14 and R15 have 0% overlap and combined cover 672 cases of eatable class C2.

Table 9 Characteristics of discovered rules R14, R15 for eatable class for Experiment 3

1.3.2 Combining Rules for Two Classes

Now we have rules R1-R13 for the target class and rules R14-R15 for the non-target class and can accomplish step 3 of combining them. The only rule for the target class that misclassified some cases is R1. So, we need to improve only this rule. It is done by creating a new rule RN that combines rule R1 with R14 and R15 as follows.

$${\text{R}}_{\text{N}} (\rm{x})\, = \,{\text{R}}_{1} \left( \rm{x} \right) \, \& \neg ({\text{R}}_{{14}} \left( \rm{x} \right) \vee {\text{R}}_{{15}} \left( \rm{x} \right))$$

resulted in

$${\text{R}}_{\text{N}}{ :} \, [(x_{5} \, \ne \,{7}) \, \& \, (x_{7} \, \ne \,{2}) \, \& \, (x_{9} \, \ne \,{7}) \, \& \, (x_{{11}} \, \ne \,{2})] \, \&$$
$$\neg ([\left( {x_{5} = {1}} \right) \, \& \, (x_{9} \ne {1})] \vee [\left( {x_{5} = {2}} \right) \, \& \, (x_{9} \ne {1})]) = > \rm{x} \in {\text{C}}_{1}$$

after putting actual rules R1, R14 and R15 to the formula. Rule RN is false, RN(x) = 0, for all 192 cases x misclassified by rule R1 as poisonous, for which R1(x) = 1, because for those x rule R14 or R15 is true. If each rule R14 and R15 would independently cover all 192 cases misclassed by R1 then RN can be defined simpler in two ways by using any of these rules:

$${\text{R}}_{\text{N}} (\rm{x})\, = \,{\text{R}}_{1} \left( \rm{x} \right) \, \& \neg {\text{R}}_{{14}} \left( \rm{x} \right),{\text{ R}}_{\text{N}} (\rm{x})\, = \,{\text{R}}_{1} \left( \rm{x} \right) \, \& \neg {\text{R}}_{{15}} \left( \rm{x} \right)$$

1.4 Experiment 4 with SRG3 on Mushroom Data with All 30 Random Groups

To evaluate this algorithm, we performed the test with 30 groups of 3 attributes were my generated from the total 22 mushroom attributes, as shown in Table 10. The algorithm generated all possible rules for these groups on the mushroom data. Then, with all these rules the rule combination process and selection process were ran and the final rules were selected, as shown in both Table 11 and 12. Below are the randomly generated groups and the result of this test.

Table 10 30 Randomly generated groups using 22 attributes
Table 11 Sequential rule generation with 30 randomly generated attribute groupings of 3
Table 12 Characteristics of discovered rules R1-R7 for poisonous class in Experiment 4

Rules R1–R7 (Poisonous)

R1: [(x5 = 3) \(\vee\) (x5 = 4) \(\vee\) (x5 = 5) \(\vee\) (x5 = 6) \(\vee\) (x5 = 8) \(\vee\) (x5 = 9)] ⇒ x ∈ C1

R2: [(x12 = 3) & (x20 ≠ 6) & (x7 ≠ 2)] ⇒ x ∈ C1

R3: [(x9 = 3) \(\vee\) (x9 = 6)] ⇒ x ∈ C1

R4: [(x19 ≠ 6) & (x3 = 10)] ⇒ x ∈ C1

R5: [(x3 = 2) & (x11 = 1)] ⇒ x ∈ C1

R6: [(x12 ≠ 1) & (x20 = 5) & (x7 = 1)] ⇒ x ∈ C1

R7: [(x16 = 1) & (x21 = 2) & (x11 ≠ 7)] ⇒ x ∈ C1

At all levels tests there is no unclassified cases of the poisonous class, and misclassified cases by all rules, with 100% coverage of the poisonous class and 100% accuracy.

The analysis of the tables above shows that with 30 random groups the SRG algorithm was able to achieve 100% coverage and 100% precision requiring only 7 rules in the level 3 test. In addition, this result is also positive because it was able to pick up a better rule with small coverage (52 cases vs. 42 cases in the alternative rule).

1.5 Experiment 5 with SRG3 on Mushroom Data with 13 Most Frequent Attributes from 30 Groups

Here we tested the sequential rule generation algorithm SRG3 with the 13 most frequent attributes used in the 30 random triples test as seen above, where 7 rules reached 100% precision and 100% coverage. The 13 most frequent attributes were broken up into three different groups to ensure that the test would finish in reasonable time (Tables 13 and 14).

Rules R1–R7 (Poisonous):

R1: [(x5 = 3) \(\vee\) (x5 = 4) \(\vee\) (x5 = 5) \(\vee\) (x5 = 6) \(\vee\) (x5 = 8) \(\vee\) (x5 = 9)] ⇒ x ∈ C1

R2: [(x9 = 3) \(\vee\) (x9 = 6)] ⇒ x ∈ C1

R3: [(x9 = 5) & (x11 = 1)] ⇒ x ∈ C1

R4: [(x16 = 1) & (x19 ≠ 2) & (x20 = 5) & (x21 = 5)] ⇒ x ∈ C1

R5: [(x11 = 2) & (x12 ≠ 4)] ⇒ x ∈ C1

R6: [(x16 = 1) & (x19 ≠ 2) & (x20 = 8) & (x21 = 2)] ⇒ x ∈ C1

R7: [(x3 = 10) & (x5 = 7)] ⇒ x ∈ C1

Table 13 13 used attributes in 30 random triple test results
Table 14 Characteristics of discovered rules R1-R7 for poisonous class in Experiment 5

Here the SRG3 algorithm did not reduce the number of selected rules below 7.

1.6 Experiment 6 with SRG3 Algorithm on Mushroom Data and Tenfold Cross Validation with Generated Rules

Here we tested the abilities of SRG3 algorithm to generate beneficial rules in the tenfold cross validation with the sequential triples attribute groups. These groups are G1: {x1, x2, x3}; G2: {x4, x5, x6}, …, G7: {x19, x20, x21, x22}. The tenfold cross validation test was run with these groups to ensure that performance on the training data can be confirmed in the validation data. Table 15 shows the result of this test with 95% precision threshold for rule generation. In all 10 tests all rules provided 100% precision, 100% coverage of the target class.

Table 15 Tenfold cross validation results for sequential triple attribute groups

The Table 15 shows that the tenfold cross validation achieved 100% accuracy in every test/fold. It generated and selected the four rules that were previously generated using all the data and the given attribute groups. This confirms the efficiency of the SRG algorithm to train and generate rules for newly added cases.

1.7 Experiment 7 with SRG3 Algorithm on Mushroom Data and Tenfold Cross Validation with 30 Random Randomly Generated Triples

Here we tested the abilities of SRG3 algorithm to generate beneficial rules using the tenfold cross validation algorithm with 30 random triples of attribute groups as defined below. We ran this test four times to validate the accuracy of results. The result validation is necessary due to the variability of the randomly generated triples.

1.7.1 Run 1

See Tables 16 and 17.

Table 16 30 Randomly generated triples of attributes for Run 1
Table 17 Tenfold cross validation results for sequential triple attribute groups (95% precision threshold) for run 1

In all 10 tests all rules provided 100% precision, 100% coverage of the target class and 100% accuracy.

Rules generated using 10-Fold Cross Validation:

CR1= R1: [(x5 = 3) \(\vee\) (x5 = 4) \(\vee\) (x5 = 5) \(\vee\) (x5 = 6) \(\vee\) (x5 = 8) \(\vee\) (x5 = 9)] ⇒ x ∈ C1.

R2: [(x4 ≠ 2) & (x20 = 5)] ⇒ x ∈ C1

R3: [(x12 = 3) & (x16 = 1) & (x19 ≠ 6)] ⇒ x ∈ C1

R4: [(x4 ≠ 2) & (x7 = 2) & (x10 = 1)] ⇒ x ∈ C1

R5: [(x19 ≠ 6) & (x3 = 10)] ⇒ x ∈ C1

Complexity 16/3520 = 0.004545.

1.7.2 Run 2

See Tables 18 and 19.

Table 18 30 randomly generated triples for run 2
Table 19 Tenfold cross validation results for sequential triple attribute groups (95% precision threshold) for run 2

At all 10 tests precision of all rules is 100% and target class coverage by rules is 100%.

Rules generated using 10-Fold Cross Validation:

CR1= R1: [(x5 = 3) ∨ (x5 = 4) ∨ (x5 = 5) ∨ (x5 = 6) ∨ (x5 = 8) ∨ (x5 = 9)] ⇒ x ∈ C1.

R2: [(x4 ≠ 2) & (x20 = 5)] ⇒ x ∈ C1

R3: [(x12 = 3) & (x18 ≠ 3)] ⇒ x ∈ C1

R4: [(x8 ≠ 1) & (x14 ≠ 8)] ⇒ x ∈ C1

R5: [(x22 = 2) & (x4 ≠ 2)] ⇒ x ∈ C1

Complexity = 14/3533 = 0.00396.

1.7.3 Run 3

See Tables 20 and 21.

Table 20 30 randomly generated triples for run 3
Table 21 Tenfold cross validation results for sequential triple attribute groups (95% precision threshold) for run 3

In all 10 tests all rules provided 100% precision, 100% coverage of the target class.

Rules generated using 10-Fold Cross Validation:

CR1= R1: [(x5 = 3) ∨ (x5 = 4) ∨ (x5 = 5) ∨ (x5 = 6) ∨ (x5 = 8) ∨ (x5 = 9)] ⇒ x ∈ C1.

R2: [(x5 ≠ 2) & (x20 = 5)] ⇒ x ∈ C1

R3: [(x13 = 2) & (x16 = 1) & (x19 ≠ 6)] ⇒ x ∈ C1

R4: [(x15 = 8) & (x22 = 2)] ⇒ x ∈ C1

Complexity = 13/3525 = 0.003688.

1.7.4 Run 4

See Tables 22 and 23.

Table 22 30 randomly generated triples for run 4
Table 23 Tenfold cross validation results for sequential triple attribute groups (95% precision threshold) for run 4

In all 10 tests all rules provided 100% precision and 100% coverage of the target class.

Rules generated using 10-Fold Cross Validation:

CR1= R1: [(x5 = 3) ∨ (x5 = 4) ∨ (x5 = 5) ∨ (x5 = 6) ∨ (x5 = 8) ∨ (x5 = 9)] ⇒ x ∈ C1.

R2: [(x1 ≠ 6) & (x16 = 1) & (x20 = 5)] ⇒ x ∈ C1

R3: [(x13 = 2) & (x16 = 1) & (x19 ≠ 6)] ⇒ x ∈ C1

R4: [(x15 = 8) & (x22 = 2)] ⇒ x ∈ C1

Complexity = 14/3531 = 0.00396 (Table 24).

Table 24 Summary of runs

In all runs all rules provided 100% precision, 100% coverage of the target class and 100% accuracy. While the tests are accurate and precise, the generated rules have a moderate amount of variation in complexity and attributes used. The variation is due to the random group generation and the tenfold data partition.

1.7.5 Rule Generation

Below we present rules generated at Step 3 first for the poisonous class and then complementary rules for the eatable class on Mushroom data with most frequent attributes generated from all 22 mushroom attributes. In this experiment the following attribute groups are used: Group A1: x9, x5, x7, ×11; Group A2: ×13, ×14, ×15, x6; Group A3: x1, x2, x4, ×21, ×22.

According to the design of SRG algorithm it runs at the several levels of thresholds that can be set up by a user. We experimented with 3 level of precision: 75%. 85% and 95% with fixed level of coverage of 0.5%. This means that rules that have lower precision coverage are filtered out and not selected. For coverage with 3916 cases in the poisonous class it means that rules that cover less than 20 cases are filtered out considered as overfitting rules.

The thresholds 75%, 85%, and 95% limit the low margin of the rule quality, but not their upper level. Therefore, we computed the actual precision and coverage reached for the poisonous class at each level. Table 25 shows that rules at all levels missed only 4 cases from the poisonously class, giving 99.89% coverage and none of the poisonous cases was misclassified, giving 100% precision. All misclassified are eatable cases that ranged from 800 cases for 75% threshold to 192 cases for 95% threshold.

Table 25 Results of sequential rule generation for poisonous class in Experiment 7: Rules R1–R13

We conducted the further analysis for 13 rules selected at level 3 with 95% threshold for rules. See Table 26. This table shows that all 192 misclassified cases belong to the rule R1 that has 98.47% coverage and 95.2% precision. All other rules have 100% precision and coverage that smaller than for dominant rule R1.

Table 26 Characteristics of discovered rules R1–R13 for poisonous class in Experiment 7

Rules R1–R13 (Poisonous)

R1: [(x5 ≠ 7) & (x7 ≠ 2) & (x9 ≠ 7) & (x11 ≠ 2)] ⇒ x ∈ C1

R2: [(x5 = 3) \(\vee\) (x5 = 4) \(\vee\) (x5 = 5) \(\vee\) (x5 = 6) \(\vee\) (x5 = 8) \(\vee\) (x5 = 9)] ⇒ x ∈ C1

R3: [(x9 = 3) \(\vee\) (x9 = 6)] ⇒ x ∈ C1

R4: [(x6 ≠ 1) & (x13 ≠ 4) & (x14 ≠ 8) & (x15 ≠ 1)] ⇒ x ∈ C1

R5: [(x1 ≠ 6) & (x2 ≠ 3) & (x4 ≠ 1) & (x21 ≠ 6) & (x22 = 7)] ⇒ x ∈ C1

R6: [(x9 = 5) & (x11 = 1)] ⇒ x ∈ C1

R7: [(x15 = 3) \(\vee\) (x15 = 3) \(\vee\) (x15 = 9)] ⇒ x ∈ C1

R8: [(x1 = 5) & (x4 = 2) & (x21 = 5) & (x22 ≠ 2)] ⇒ x ∈ C1

R9: [(x21 = 5) & (x22 = 1)] ⇒ x ∈ C1

R10: [(x6 = 3) & (x13 = 2) & (x14 ≠ 1) & (x15 ≠ 8)] ⇒ x ∈ C1

R11: [(x2 = 3) & (x21 = 2) & (x22 ≠ 6)] ⇒ x ∈ C1

R12: [(x1 = 1) & (x21 = 5) & (x22 ≠ 2)] ⇒ x ∈ C1

R13: [(x1 ≠ 6) & (x2 ≠ 1) & (x4 ≠ 2) & (x21 = 5) & (x22 = 3)] ⇒ x ∈ C1

The rules R14 and R15 generated for the eatable class C2 are presented below:

R14: [(x5 = 1) & (x9 ≠ 1)] = > x C2

R15: [(x5 = 2) & (x9 ≠ 1)] = > x C2

Table 27 shows the analysis of both R14 and R15 for class eatable class C2. Rules R14 and R15 together cover and correctly predict all 192 cases misclassed by Rule R1. While both R14 and R15 cover 336 cases each, these cases are different. In fact, rules R14 and R15 have 0% overlap and combined cover 672 cases of eatable class C2.

Table 27 Characteristics of discovered rules R14, R15 for eatable class for Experiment 7

1.7.6 Combining Rules for Two Classes

Now we have rules R1–R13 for the target class and rules R14–R15 for the non-target class and can accomplish step 3 of combining them. The only rule for the target class that misclassified some cases is R1. So, we need to improve only this rule. It is done by creating a new rule RN that combines rule R1 with R14 and R15 as follows.

$${\text{R}}_{\mathrm{N}} ({\mathbf{x}})\, = \,{\mathrm{R}}_{\mathrm{1}} \left( {\mathbf{x}} \right){\mathrm{ }}\& \neg ({\mathrm{R}}_{{\mathrm{14}}} \left( {\mathbf{x}} \right) \vee {\mathrm{R}}_{{\mathrm{15}}} \left( {\mathbf{x}} \right))$$

resulted in

$${\text{R}}_{\mathrm{N}} {\mathrm{: }}[(x_{\mathrm{5}} \, \ne \,{\mathrm{7}}){\mathrm{ }}\& {\mathrm{ }}\left( {x_{\mathrm{7}} \, \ne \,{\mathrm{2}}} \right){\mathrm{ }}\& {\mathrm{ }}\left( {x_{\mathrm{9}} \, \ne \,{\mathrm{7}}} \right){\mathrm{ }}\& {\mathrm{ }}\left( {x_{{\mathrm{11}}} \, \ne \,{\mathrm{2}}} \right)]{\mathrm{ }}\&$$
$$\neg (\left[ {\left( {x_{\text{5}} = {\mathrm{1}}} \right){\mathrm{ }}\& {\mathrm{ }}\left( {x_{\mathrm{9}} \ne {\mathrm{1}}} \right)} \right] \vee \left[ {\left( {x_{\mathrm{5}} = {\mathrm{2}}} \right){\mathrm{ }}\& {\mathrm{ }}\left( {x_{\mathrm{9}} \ne {\mathrm{1}}} \right)} \right]) = > {\mathbf{x}} \in {\mathrm{C}}_{\mathrm{1}}$$

after putting actual rules R1, R14 and R15 to the formula.

Rule RN is false, RN(x) = 0, for all 192 cases x misclassified by rule R1 as poisonous, for which R1(x) = 1, because for those x rule R14 or R15 is true. If each rule R14 and R15 would independently cover all 192 cases misclassed by R1 then RN can be defined simpler in two ways by using any of these rules:

$${\text{R}}_{\mathrm{N}} ({\mathbf{x}})\, = \,{\mathrm{R}}_{\mathrm{1}} \left( {\mathbf{x}} \right){\mathrm{ }}\& \neg {\mathrm{R}}_{{\mathrm{14}}} \left( {\mathbf{x}} \right),{\mathrm{ R}}_{\mathrm{N}} ({\mathbf{x}})\, = \,{\mathrm{R}}_{\mathrm{1}} \left( {\mathbf{x}} \right){\mathrm{ }}\& \neg {\mathrm{R}}_{{\mathrm{15}}} \left( {\mathbf{x}} \right)$$

1.8 Experiment 8 with Algorithm SRG4 Based on Expert Selected Groups

The results of this test are shown in Tables 28 and 29. The groups used are: Group 1 (Cap): {x1, x2, x3}; Group 2 (Odor): {x5}; Group 3 (Gill): {x6, x7, x8, x9}; Group 4 (Stalk): {x10, x11, x13, x15}; Group 5 (Veil): {x17, x18, x19}; Group 6 (Spore): {x20}; Group 7 (Distribution): {x21, x22}. The analysis of Tables 28 and 29 shows that using the groups generated by the biologist the SRG algorithm was able to get a good result of 99.81% coverage of the target class with 100% precision using just 3 rules. Although this is considered to be a good result, it did not cover the whole of class C1. One way this can be achieved is by altering the coverage threshold to a value less than 0.5%. Making this adjustment would allow more small coverage rules to be generated an in turn would allow the combination phase to combine these smaller coverage rules with larger coverage rules to produce high coverage and general rules that cover more class C1 cases.

Table 28 Sequential rule generation test using expert groups from Biologist (95% precision threshold) for Experiment 8
Table 29 Characteristics of discovered rules R1–R3 for poisonous class (95% precision threshold) for Experiment 8

R1: [(x5 = 3) ∨ (x5 = 4) ∨ (x5 = 5) ∨ (x5 = 6) ∨ (x5 = 8) ∨ (x5 = 9)] ⇒ x ∈ C1

R2: [(x20 = 5)] ⇒ x ∈ C1

R3: [(x11 ≠ 1) & (x13 ≠ 4) & (x15 ≠ 8)] ⇒ x ∈ C1

1.9 Experiment 9 with SRG5 and 7 Successful Attributes

Here we tested SRG algorithm with the 7 used attributes in our previous 100% precision test. This test was run with these attributes in hopes that only using the 7 used attributes will allow the rule generation process to generate and select less rules while keeping the 100% precision and total coverage of the target class.

The 7 attributes were broken up into two different groups to be able to finish the test in reasonable time, where group 1: x5, x9, x15; and Group 2: x19, x20, x21, x22. The results are shown in Tables 30 and 31.

Table 30 Sequential rule generation test using successful attributes for Experiment 9
Table 31 Characteristics of discovered rules R1-R7 for poisonous class for Experiment 9

Rules R1–R7 (Poisonous):

R1: [(x5 = 3) ∨ (x5 = 4) ∨ (x5 = 5) ∨ (x5 = 6) ∨ (x5 = 8) ∨ (x5 = 9)] ⇒ x ∈ C1

R2: [(x9 = 3) ∨ (x9 = 6)] ⇒ x ∈ C1

R3: [(x19 = 2) & (x20 = 8) & (x21 ≠ !2) & (x22 ≠ !2)] ⇒ x ∈ C1

R4: [(x15 = 3) ∨ (x15 = 2) ∨ (x15 = 9)] ⇒ x ∈ C1

R5: [(x19 ≠ 2) & (x20 ≠ 6) & (x21 = 5) & (x22 = 1)] ⇒ x ∈ C1

R6: [(x19 = 6) & (x20 = 5) & (x21 ≠ 1) & (x22 ≠ 6)] ⇒ x ∈ C1

R7: [(x20 = 8) & (x21 = 2) & (x22 ≠ 6)] ⇒ x ∈ C1

At all levels all rules provided 100% precision, 100% coverage of the target class.

These tables allow to conclude that the sequential rule generation algorithm using the attribute groups created by using the 7 attributes was unable to reduce the number of rules selected to cover all of class C1. This result is likely due to the SRG2 process used. Other existing or new versions SRG algorithm can be more successful in future.

1.10 Experiment 10 with SPG 5 and Comparison of Rule Complexity

In this experiment we used SRG5 algorithm based on SRG1 algorithm, with attributes that have been successful for mushroom data in [1, 30]. We compared the complexity of rules generated in this process with rules from [1, 30], which we denote as CR rules.

The formulas for computing complexity of rules and sets of rules are given in Sect. 4.2. Several sets of rules were generated in [1, 30]. We use only the final rules from [1, 30] listed below in our notation.

CR Rules [1, 30]

CR1: [(x5 = 3) ∨ (x5 = 4) ∨ (x5 = 5) ∨ (x5 = 6) ∨ (x5 = 8) ∨ (x5 = 9)] ⇒ x ∈ C1,

Complexity 6/3796 = 0.0016.

CR2: [(x20 = 5)] ⇒ x ∈ C1, Complexity 1/72 = 0.014.

CR3: [(x8 = 2) & (x12 = 3)] v [(x8 = 2) & (x12 = 2)] ∨ [(x8 = 2) & (x21 = 2)] ⇒ x ∈ C1.

Complexity 6/912 = 0.0066.

Complexity of the set of rules: (6 + 1 + 6)/(3796 + 72 + 912) = 13/ 4780 = 0.0027.

Attribute Groups that we used in SRG algorithm are G1 = {x5}; G2 = {x20}; G3 = {x8, x12, x21}, which directly correspond to CR rules above.

Our Rules

R1: [(x5 = 3) ∨ (x5 = 4) ∨ (x5 = 5) ∨ (x5 = 6) ∨ (x5 = 8) ∨ (x5 = 9)] ⇒ x ∈ C1

Complexity 6/3796 = 0.0016.

R2: [(x20 = 5)] ⇒ x ∈ C1, Complexity 1/72 = 0.014

R3: [(x12 = 3) & (x21 = 5)] ⇒ x ∈ C1, Complexity 2/1544 = 0.0013

R4: [(x8 ≠ 1) & (x21 = 2)] ⇒ x ∈ C1, Complexity 2/16 = 0.125

Complexity of a set of rules = (6 + 1 + 2 + 2)/(3796 + 72 + 1544 + 16) = 11/5428 = 0.002.

The algorithm SRG1 generated simpler rules (11 clauses vs. 13 clauses) with the same precision as in in [1, 30] using the attribute groups derived from CR rules. This result shows that the algorithm SRG1 can generate rules that are less complex than CR rules and suggests that better rules are possible with more testing and preprocessing of attribute groups.

1.11 Toolkit

The toolkit includes the Data Type Editor integrated with visualization system VisCanvas 2.0 [29, 32] for multidimensional data visualization based on the adjustable parallel coordinates. The data type editor supports saving data in the explainable measurement coding format for pattern discovery and data visualization. Figure 16 illustrates setting up and applying a coding scheme for data that converts letter grades for 4 classes X1-X4 as follows: A to 4, B to 3, C to 2, and so on.

Fig. 16
Three tables illustrate setting up and applying a coding scheme for data that converts letter grades for 4 classes, X 1 to X 4, as follows, A to 4, B to 3, C to 2, and so on.

Example of applying coding schema for class grades

The toolkit supports Nominal, Ordinal, Interval, and Ratio data measurement types. The attributes of the absolute measurement data type are encoded as the ratio data type because it differs from it only by presence of the fixed measurement unit.

The data type editor provides helpful descriptions and examples in case the user is not familiar with these data types. The user interface allows a user: to assign data type for each attribute, and to group values of each attribute. The scheme loader allows a user to assign data measurement type: nominal, ordinal, interval, and ratio.

A typical example of mixed data are the mushroom data [27], which contain 8124 instances and 22 attributes. These attributes include nominal data such as habitat (grass, leaves, meadows, paths, urban, waste, woods), ordinal data, such as gill size (broad, narrow) and gill spacing (crowded, close, distant), absolute data such as the number of rings (0,1,2), which the scheme loader treats as ratio data.

The colors of different parts of the mushroom such as cap color represent an interesting data type. It can be treated as: (1) nominal (red, blue, green and so on), (2) three numeric attributes like R,G,B, (3) some scalar function from R,G,B like R + G + B, and (4) a single numeric ratio data type that uses wavelength. The first one does not capture the similarity between colors, the second one is expanding the number of attributes, the third one corrupts similarity relations between some colors, and the last one is the most physically meaningful.

Since, each color covers a wavelength interval, grouping wavelength values according to colors is a natural way to encode the colors. These groups are ordered and can be encoded by integers starting from 0. In general, grouping attribute values decreases the space size and run time of the algorithms.

Manual coding is time consuming and tedious work for the tasks with many attributes and multiple values of each attribute. The toolkit allows to speed up this process. The editor has the “All Ordinal” and “All Nominal” options that allow to assign initially Nominal or Ordinal type and to assign integer code values from 1 to n to all attributes with abilities to edit this assignment later.

Grouping. Figure 17 illustrates grouping and binary coding keys for a nominal attribute. Figure 18 illustrates setting up groups for the numeric interval and ratio attributes by creating intervals where user sets on the starting value for the group and the length of its interval. Original values of the attributes may not correctly represent its data type for the task. The user can select keeping the existing values or generate new ones.

Hierarchy of attribute groups. When we have hundreds of attributes, a hierarchy of attributes allows to deal with them efficiently. The system supports a user in constructing a hierarchy and picking up a level at which attributes will be visualized.

Fig. 17
Two tables. Nominal Grouping. Key, s, y, f, and g. Group, 1, 1, 2, and 2. Scheme loader for attribute 1. Key, x, b, s, and f. Value, 01 b, 10 b, 01 b, and 01 b.

Grouping keys and assigning binary codes for a nominal attribute

Fig. 18
Two tables. Table 1 lists the group number, starting point, and the interval to first. Table 2 lists the keys and values.

Setting up groups for the numeric interval and ratio attributes

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kovalerchuk, B., McCoy, E. (2024). Explainable Machine Learning for Categorical and Mixed Data with Lossless Visualization. In: Kovalerchuk, B., Nazemi, K., Andonie, R., Datia, N., Bannissi, E. (eds) Artificial Intelligence and Visualization: Advancing Visual Knowledge Discovery. Studies in Computational Intelligence, vol 1126. Springer, Cham. https://doi.org/10.1007/978-3-031-46549-9_3

Download citation

Publish with us

Policies and ethics