Abstract
Rule-based knowledge bases are constantly increasing in volume, thus the knowledge stored as a set of rules is getting progressively more complex and when rules are not organized into any structure, the system is inefficient. In the author’s opinion, modification of both the knowledge base structure and inference algorithms lead to improve the efficiency of the inference process. Rules partition enables reducing significantly the percentage of the knowledge base analysed during the inference process. The form of the group’s representative plays an important role in the efficiency of the inference process. The good performance of this approach is shown through an extensive experimental study carried out on a collection of real knoswledge bases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Rules have been extensively used in knowledge representation and reasoning. It is very space efficient: only a relatively small number of facts needs to be stored in the KB and the rest can be derived by the inference rules.
- 2.
- 3.
- 4.
The user has the possibility of creating KBs using a special creator or by importing a KB from a given data source. The format of the KBs enables working with a rule set generated automatically based on the rough set theory as well as with rules given apriori by the domain expert. The KB can have one of the following file formats: XML, RSES, TXT. It is possible to define attributes of any type: nominal, discrete or continuous. There are no limits for the number of rules, attributes, facts or the length of the rule.
- 5.
It allows for partitioning the rules using the algorithm with time complexity not higher than O(nk), where \(n=|R|\) and \(k=|PR|\). Simple strategies create final partition PR by a single search of rules set R according to the value of \(mc(r_i,R_j)\) function described above. For complex strategies, time complexity is rather higher than any simple partition strategy.
- 6.
Let us assume that threshold value \( 0 \le T \le 1\) exists.
- 7.
Thus, conjunction of pairs \((a_i,v_{1i}) (a_2,v_{2i})\ldots (a_s,v_{si})\)) may be both a conditional part of one rule and a representative of some group of rules.
- 8.
When more than one rule matches the working memory (it is called conflict set) and only one has to be selected, it is possible to use the following strategies: random, textual order, recency, specificity and refractoriness [1]. In the research, the author uses the textual order, the recency and the modified specificity strategies. The textual order fires the first matching rule, while the recency fires the rule which uses the data added most recently to the working memory and the specificity (complexity) fires the rule with the most conditions attached, which means that rules with a greater number of conditions or fewer variables are more specific and should be applied earlier because they use more data and can be used for special cases or exceptions to general rules.
- 9.
\(sim(F,R_j) = \frac{|F \cap Profile(R_j)|}{|F \cup Profile(R_j)|}\). The value of \(sim(F,Profile(R_j))\) is equal to 0 if there is no such fact \(f_i\) (or the hypothesis to prove) which is included in the representative of any group \(R_j\). It is equal to 1 if all facts (and/or hypothesis) are included in \(Profile(R_j)\) of group \(R_j\).
- 10.
The inputs are: PR - groups of rules with the representatives and F - the set of facts. The output is F the set of facts, including possible new facts obtained through the inference. The algorithm uses temporary variable R, which is the set of rules that is the result of the previous selection.
- 11.
In this paper, the following strategies are used: FR (first rule) — which fires the first rule in the conflict set (so it is relevant to the textual order strategy), LR (last rule) — which selects the rules recently added to the conflict set (so it works similarly to the recently strategy), SR (shortest rule) — which selects the rule with the smallest number of conditions, LOR (longest rule) — which chooses rules with the greatest number of conditions (so it is relevant to the specificity strategy).
- 12.
The mAHC approach for rules partitioning with using four different thresholds of similarity: \(k=0, k=0.25, k=0.5, k=1.0\).
- 13.
In Fig. 2 they are noticed as \(AHC\_general\), \(mAHC\_general\), \(AHC\_specialized\), \(mAHC\_specialized\), \(AHC\_weighted\) or \(mAHC\_weighted\).
References
Akerkar, R., Sajja, P.: Knowledge-Based Systems. Jones & Bartlett Learning, Sudbury (2010)
Forgy, C.L.: Rete: A fast algorithm for the many pattern many object pattern match problem. Artif. Intell. 19, 17–37 (1981)
Hanson, E., Hasan, M.S.: Gator: An optimized discrimination network for active database rule condition testing. Tech. rep. (1993)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, New Jersey (1988)
Latkowski, R., Mikołajczyk, M.: Data decomposition and decision rule joining for classification of data with missing values. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 254–263. Springer, Heidelberg (2004)
Markov, Z., Larose, D.T.: Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage. Wiley, Hoboken (2007)
Michalski, R.S., Larson, J.B.: Selection of Most Representative Training Examples and Incremental Generation of vl Hypotheses. The Underlying Methodology and the Description of the Programs esel and aq11. University of Illinois, Department of Computer Science, Urbana (1978)
Miranker, D.P.: Treat: A better match algorithm for ai production systems. Department of Computer Sciences, University of Texas at Austin, Technical report (1987)
Nalepa, G.J., Ligęza, A., Kaczor, K.: Overview of knowledge formalization with XTT2 rules. In: Bassiliades, N., Governatori, G., Paschke, A. (eds.) RuleML 2011 - Europe. LNCS, vol. 6826, pp. 329–336. Springer, Heidelberg (2011)
Nowak-Brzezińska, A., Simiński, R.: Knowledge mining approach for optimization of inference processes in rule knowledge bases. In: Herrero, P., Panetto, H., Meersman, R., Dillon, T. (eds.) OTM-WS 2012. LNCS, vol. 7567, pp. 534–537. Springer, Heidelberg (2012)
Pedrycz, W., Cholewa, W.: Expert Systems. Silesian University of Technology, Section of Scientific Publications, Poland (1987). [in polish]
Rissanen, J.: Paper: Modeling by shortest data description. Automatica 14(5), 465–471, September 1978. http://dx.doi.org/10.1016/0005-1098(78)90005-5
Sarker, B.R.: The resemblance coefficients in group technology: A survey and comparative study of relational metrics. Comput. Ind. Eng. 30(1), 103–116 (1996). Elsevier Science Ltd., Printed in Great Britain
Acknowledgement
This work is a part of the project “Exploration of rule knowledge bases” founded by the Polish National Science Centre (NCN: 2011/03/D/ST6/03027).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Nowak-Brzezińska, A. (2016). Mining Rule-Based Knowledge Bases. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-34099-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34098-2
Online ISBN: 978-3-319-34099-9
eBook Packages: Computer ScienceComputer Science (R0)