Weighting and Pruning of Decision Rules by Attributes and Attribute Rankings

Stańczyk, Urszula

doi:10.1007/978-3-319-47217-1_12

Urszula Stańczyk¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 659))

Included in the following conference series:

International Symposium on Computer and Information Sciences

10k Accesses
4 Citations

Abstract

Pruning is a popular post-processing mechanism used in search for optimal solutions when there is insufficient domain knowledge to either limit learning data or govern induction in order to infer only the most interesting or important decision rules. Filtering of generated rules can be driven by various parameters, for example explicit rule characteristics. The paper presents research on pruning rule sets by two approaches involving attribute rankings, the first relaying on selection of rules referring to the highest ranking attributes, which is compared to weighting of rules by calculated quality measures dependent on weights coming from attribute rankings that results in rule ranking.

You have full access to this open access chapter, Download conference paper PDF

Attribute Ranking Driven Filtering of Decision Rules

Algorithms for Attribute Selection and Knowledge Discovery

On Combining Discretisation Parameters and Attribute Ranking for Selection of Decision Rules

Keywords

1 Introduction

Rule classifiers express patterns discovered in data in learning processes through conditions on attributes included in the premises and pointing to specific classes [5]. A variety of available approaches to induction enable construction of classifiers with minimal numbers of constituent rules, with all rules that can be inferred from the training samples, or with subsets of interesting elements [3].

To limit the number of considered rules [9] either pre-processing can be employed, with reducing rather data than rules, by selection of features or instances, or in-processing relaying on induction of only those rules that satisfy given requirements, or post-processing, which implements pruning mechanisms and rejection of some unsatisfactory rules. The paper focuses on this latter approach.

One of the most straightforward ways to prune rules and rule sets involves exploiting direct parameters of rules, such as their support, length [11], strength [1]. Also specific condition attributes can be taken into account and indicate rules to be selected by appearing in their premises [12]. Such process can lead to improved performance or structure and in the presented research it is compared to weighting of rules by calculated quality measures, also based on attributes [13], both procedures actively using rankings of considered characteristic features [7].

The paper is organised as follows. Section 2 briefly describes some elements of background, that is feature weighting and ranking, and aims of pruning of rules and rule sets. Section 3 explains the proposed research framework, details experimental setup, and gives test results. Section 4 concludes the paper.

2 Background

The research described in this paper incorporates characteristic feature weights and rankings into the problem of pruning of decision rules and rule sets.

2.1 Feature Ranking

Roles of specific features exploited in any classification task can vary in significance and relevance in a high degree. The importance of individual attributes can be discovered by some approach leading to their ranking, that is assigning values of a score function which causes putting them in a specific order [7].

Rankings of characteristic features can be obtained through application of statistical measures, machine learning approaches, or systematic procedures [12]. The former assign calculated weights to all variables, while the latter can return only the positions in a ranking, reflecting discovered order of relevance.

Information Gain coefficient (InfoGain, IG) is defined by employing the concept of entropy from information theory for attributes and classes:

$$\begin{aligned} InfoGain(Cl,a_f)=H(Cl)-H(Cl|a_f), \end{aligned}$$

(1)

where H(Cl) denotes the entropy for the decision attribute Cl and $H(Cl|a_f)$ condition entropy, that is class entropy while observing values of attribute a.

An attribute relevance measure can be based on rule length [11], with special attention given to the shortest rules that often possess good generalisation properties:

$$\begin{aligned} MREVM(a)=Nr(a,MinL):Nr(a,MinL+1), \end{aligned}$$

(2)

where Nr(a, L) denotes the number of rules with length L in which attribute a appears, and MinL is the length of the shortest rule containing a. The attribute ranking constructed in this way is wrapped around the specific inducer, not its performance, since other parameters of rules are disregarded, but structure.

2.2 Pruning of Decision Rules

To limit the number of rules three approaches can be considered [8]:

pre-processing — the input data is reduced before the learning stage starts by rejecting some examples or cutting down on characteristic features. With less data to infer from, it follows that fewer rules are induced.
at the algorithm construction stage — by implementation of specific procedures only some rules meeting requirements are found instead of all possible.
post-processing — the set of inferred rules is analysed and some of its elements discarded while others selected.

When lower numbers of rules are found the learning stage can be shorter, yet solutions are not necessarily the best. If higher numbers of rules are generated, more thorough and in-depth analysis is enabled, yet even for rule sets with small cardinalities some measures of quality or interestingness can be employed [6].

Rule quality can be weighted by conditional attributes [13]:

$$\begin{aligned} QM(r_i)=\prod \limits _{j=1}^{K_{r_i}}w(a_j), \end{aligned}$$

(3)

where $K_{r_i}$ denotes the number of conditions included in rule $r_i$ and $w(a_j)$ weight of $a_j$ attribute taken from a ranking. It is assumed that $w(a_j)\in (0,1]$.

3 Experimental Setup and Obtained Results

The research works presented were executed within the general framework:

Initial preparation of learning and testing data sets
Obtaining rankings of attributes
Induction of decision algorithms
Pruning of decision rules in two approaches:
- Selecting rules referring to specific attributes in the ranking
- Calculating measures for all rules while exploiting weights assigned to positions in the attribute rankings, which led to weighting of rules and their rankings, and from these rankings rules in turn were selected
Comparison and analysis of obtained test results

Steps of these procedures are described in the following subsections.

3.1 Input Datasets

As a domain of application for the research stylometric analysis of texts was selected. Stylometry enables authorship attribution while basing on employed linguistic characteristic features. Typically they refer to lexical and syntactic markers, giving frequencies of occurrence for selected function words and punctuation marks that reflect individual habits of sentence and paragraph formation.

Learning and testing samples corresponded to parts of longer works by two pairs of writers, female and male, giving binary classification with balanced data.

As attribute values specified usage frequencies of textual descriptors, they were small fractions, which means that for data mining there was needed either some technique that can deal efficiently with continuous numbers, or some discretization strategy was required [2]. Since regardless of a selected method discretization always causes some loss of information, it was not attempted.

3.2 Rankings of Attributes

In the research presented two attribute rankings were tested. The first one relied on statistical properties detected in input datasets and was completely independent on the classifier used later for prediction, and the other was wrapped around characteristics of induced rules, observing how often each variable occurs in shortest rules, which usually are of higher quality as they are better at generalisation and description of detected patterns than those with many conditions. Orderings of variables for both rankings and both datasets are given in Table 1.

Table 1. Rankings of condition attributes

Full size table

InfoGain returns a specific score for each feature while MREVM gives a ratio. To unify numbers considered as attribute weights they were assigned in an arbitrary manner, listed in column denoted w(a), and equal 1/i, where i is a position in the ranking. Thus the distances between weights decrease while going down the ranking. It is assumed that each variable has nonzero weight.

3.3 DRSA Rule Classifiers

The rules were induced with the help of 4eMka Software (developed at the Poznań University of Technology, Poland), which implements Dominance-Based Rough Set Approach (DRSA). By substituting the original indiscernibility relation [4] of classical rough sets with dominance DRSA observes ordinal properties in datasets and enables both nominal and ordinal classification [10].

As the reference points classification systems with all rules on examples were taken. For female writers the algorithm consisted of 62383 rules, which with constraints on minimal rule support to be equal at least 66 resulted in 17 decision rules giving the maximal classification accuracy of 86.67 %. For male writers the algorithm contained 46191 rules, limited to 80 by support equal at least 41, and it gave the correct recognition of 76.67 % of testing samples. In all cases ambiguous decisions were treated as incorrect, without any further processing.

3.4 Pruning of Rule Sets by Attributes

Selection of decision rules while following attribute rankings was executed as follows: at i-th step only the rules with conditions on the i highest ranking features were taken into account. The rules could refer to all or some proper subsets of variables considered, and these with at least one condition on any of lower ranking attributes were discarded. Thus at the first step only rules with single conditions on the highest ranking variable were filtered, while at the last 25-th step all features and all rules were included. For example at 5-th step for female writer dataset for InfoGain ranking only rules referring to any combination of attributes: not, colon, semicolon, comma, hyphen, were selected. The detailed results for both datasets and both rankings are listed in Table 2.

Table 2. Characteristics of decision algorithms with pruning of rules referring to specific conditional attributes: N indicates the number of considered attributes, (a) number of recalled rules, (b) maximal classification accuracy [%], (c) minimal support required of rules, (d) number of rules satisfying condition on support

Full size table

It can be observed that with each variable added to the studied set the numbers of recalled rules rose significantly, but the classification accuracy equal to or even higher than the reference points was detected quite soon in processing, for InfoGain for female dataset after selection of just four highest ranking attributes, for male writers and MREVM for just three most important features.

3.5 Pruning of Rule Sets Through Rule Rankings

Calculation of QM measure for rules can be understood as translating feature rankings into rule rankings. Depending on cardinalities of subsets of rules selected at each step, the total number of executed steps can significantly vary. The minimum is obviously one, while the maximum can even equal the total number of rules in the analysed set, if with each step only a single rule is added.

On the other hand, once the core sets of rules, corresponding to the decision algorithms limited by constraints on minimal support of rules and giving the best results for the complete algorithms, are retrieved, there is little point in continuing, thus the results presented in Table 3 stop when only fractions of the whole rule sets are recalled, for female writers just few hundreds, and for male writers close to ten thousand (still less than a quarter of the original algorithm).

Table 3. Characteristics of decision algorithms with pruning of rules while weighting them by measures based on rankings of conditional attributes: N indicates the weighting step, (a) number of recalled rules, (b) maximal classification accuracy [%], (c) minimal support required of rules, (d) number of rules satisfying condition on support

Full size table

3.6 Summary of the Best Results

Out of the two tested and compared approaches to rule filtering, selection governed by attributes included when following their rankings enabled to reject more rules from the reference algorithms, even over 35 % and 48 %, respectively for female and male datasets, with prediction at the reference level. For male writers recognition could be increased (at maximum by over 4 %) either with keeping or lowering constraints on minimal support required of rules.

When rules were wighted, ranked, and then selected the quality of prediction was enhanced at maximum by over 3 % for both datasets, and for female and male writers datasets respectively over 29 % and 18 % of rules could be pruned.

For female dataset for both approaches to rule pruning better results were obtained while exploiting InfoGain attribute ranking, and for male dataset the same can be stated for MREVM ranking.

4 Conclusions

The paper presents research on selection of decision rules while following rankings of considered conditional attributes and exploiting weights assigned to them, which constitute alternatives to the popular approaches to rule filtering. Two ways to prune rules were compared, the first relying on selection of the rules with conditions only on the highest ranking attributes, while those referring to lower ranking features were rejected. Within the second methodology, the weights of attributes from their rankings formed a base from which for all rules the defined quality measures were calculated, and their values led to rule rankings. Next, the highest ranking rules were filtered out. For both described approaches two attribute rankings were tested, and the test results show several possibilities of constructing optimised rule classifiers, either with increased recognition, decreased lengths of decision algorithms, or both.

References

Amin, T., Chikalov, I., Moshkov, M., Zielosko, B.: Relationships between length and coverage of decision rules. Fundamenta Informaticae 129, 1–13 (2014)
MathSciNet MATH Google Scholar
Baron, G.: On approaches to discretization of datasets used for evaluation of decision systems. In: Czarnowski, I., Caballero, A., Howlett, R., Jain, L. (eds.) Intelligent Decision Technologies 2016. Smart Innovation, Systems and Technologies, vol. 56, pp. 149–159. Springer, Switzerland (2016)
Chapter Google Scholar
Bayardo Jr., R., Agrawal, R.: Mining the most interesting rules. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 145–154 (1999)
Google Scholar
Cyran, K.A., Stanczyk, U.: Indiscernibility relation for continuous attributes: application in image recognition. In: Kryszkiewicz, M., Peters, J.F., Rybiński, H., Skowron, A. (eds.) RSEISP 2007. LNCS (LNAI), vol. 4585, pp. 726–735. Springer, Heidelberg (2007)
Chapter Google Scholar
Fürnkranz, J., Gamberger, D., Lavrač, N.: Foundations of Rule Learning. Springer, Heidelberg (2012)
Book MATH Google Scholar
Gruca, A., Sikora, M.: Rule based functional description of genes – estimation of the multicriteria rule interestingness measure by the UTA method. Biocybernetics Biomed. Eng. 33, 222–234 (2013)
Article Google Scholar
Mansoori, E.: Using statistical measures for feature ranking. Int. J. Pattern Recog. Artitf. Intell. 27(1), 1350003–1350014 (2013)
Article MathSciNet Google Scholar
Sikora, M.: Induction and pruning of classification rules for prediction of microseismic hazards in coal mines. Expert Syst. Appl. 38(2), 6748–6758 (2013)
Google Scholar
Sikora, M., Wróbel, Ł.: Data-driven adaptive selection of rules quality measures for improving the rules induction algorithm. In: Kuznetsov, S.O., Ślęzak, D., Hepting, D.H., Mirkin, B.G. (eds.) RSFDGrC 2011. LNCS (LNAI), vol. 6743, pp. 278–285. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21881-1_44
Chapter Google Scholar
Słowiński, R., Greco, S., Matarazzo, B.: Dominance-based rough set approach to reasoning about ordinal data. In: Kryszkiewicz, M., Peters, J.F., Rybinski, H., Skowron, A. (eds.) RSEISP 2007. LNCS (LNAI), vol. 4585, pp. 5–11. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73451-2_2
Chapter Google Scholar
Stańczyk, U.: Decision rule length as a basis for evaluation of attribute relevance. J. Intell. Fuzzy Syst. 24(3), 429–445 (2013)
Google Scholar
Stańczyk, U.: Selection of decision rules based on attribute ranking. J. Intell. Fuzzy Syst. 29(2), 899–915 (2015)
Article MathSciNet Google Scholar
Stańczyk, U.: Measuring quality of decision rules through ranking of conditional attributes. In: Czarnowski, I., Caballero, A., Howlett, R., Jain, L. (eds.) Intelligent Decision Technologies 2016. Smart Innovation, Systems and Technologies, vol. 56, pp. 269–279. Springer, Switzerland (2016)
Chapter Google Scholar

Download references

Acknowledgments

The research presented was performed at the Silesian University of Technology, Gliwice, Poland, within the project BK/RAu2/2016.

Author information

Authors and Affiliations

Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland
Urszula Stańczyk

Authors

Urszula Stańczyk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Urszula Stańczyk .

Editor information

Editors and Affiliations

Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Gliwice, Poland
Tadeusz Czachórski
Department of Electrical and Electronic Engineering, Imperial College, London, United Kingdom
Erol Gelenbe
Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Gliwice, Poland
Krzysztof Grochla
University of Houston, Houston, Texas, USA
Ricardo Lent

Rights and permissions

Open Access This chapter is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, duplication, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, a link is provided to the Creative Commons license and any changes made are indicated.

The images or other third party material in this chapter are included in the work’s Creative Commons license, unless indicated otherwise in the credit line; if such material is not included in the work’s Creative Commons license and the respective action is not permitted by statutory regulation, users will need to obtain permission from the license holder to duplicate, adapt or reproduce the material.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stańczyk, U. (2016). Weighting and Pruning of Decision Rules by Attributes and Attribute Rankings. In: Czachórski, T., Gelenbe, E., Grochla, K., Lent, R. (eds) Computer and Information Sciences. ISCIS 2016. Communications in Computer and Information Science, vol 659. Springer, Cham. https://doi.org/10.1007/978-3-319-47217-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-47217-1_12
Published: 24 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47216-4
Online ISBN: 978-3-319-47217-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Weighting and Pruning of Decision Rules by Attributes and Attribute Rankings

Abstract

Similar content being viewed by others

Attribute Ranking Driven Filtering of Decision Rules

Algorithms for Attribute Selection and Knowledge Discovery

On Combining Discretisation Parameters and Attribute Ranking for Selection of Decision Rules

Keywords

1 Introduction

2 Background

2.1 Feature Ranking

2.2 Pruning of Decision Rules

3 Experimental Setup and Obtained Results

3.1 Input Datasets

3.2 Rankings of Attributes

3.3 DRSA Rule Classifiers

3.4 Pruning of Rule Sets by Attributes

3.5 Pruning of Rule Sets Through Rule Rankings

3.6 Summary of the Best Results

4 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Weighting and Pruning of Decision Rules by Attributes and Attribute Rankings

Abstract

Similar content being viewed by others

Attribute Ranking Driven Filtering of Decision Rules

Algorithms for Attribute Selection and Knowledge Discovery

On Combining Discretisation Parameters and Attribute Ranking for Selection of Decision Rules

Keywords

1 Introduction

2 Background

2.1 Feature Ranking

2.2 Pruning of Decision Rules

3 Experimental Setup and Obtained Results

3.1 Input Datasets

3.2 Rankings of Attributes

3.3 DRSA Rule Classifiers

3.4 Pruning of Rule Sets by Attributes

3.5 Pruning of Rule Sets Through Rule Rankings

3.6 Summary of the Best Results

4 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation