Mining Rule-Based Knowledge Bases

Nowak-Brzezińska, Agnieszka

doi:10.1007/978-3-319-34099-9_6

Agnieszka Nowak-Brzezińska¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 613))

Included in the following conference series:

1166 Accesses
1 Citations

Abstract

Rule-based knowledge bases are constantly increasing in volume, thus the knowledge stored as a set of rules is getting progressively more complex and when rules are not organized into any structure, the system is inefficient. In the author’s opinion, modification of both the knowledge base structure and inference algorithms lead to improve the efficiency of the inference process. Rules partition enables reducing significantly the percentage of the knowledge base analysed during the inference process. The form of the group’s representative plays an important role in the efficiency of the inference process. The good performance of this approach is shown through an extensive experimental study carried out on a collection of real knoswledge bases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Rules have been extensively used in knowledge representation and reasoning. It is very space efficient: only a relatively small number of facts needs to be stored in the KB and the rest can be derived by the inference rules.
2.
The idea is new but it is based on the author’s previous research, where the idea of clustering rules as well as creating the so-called decision units was introduced [10, 11].
3.
http://kbexplorer.ii.us.edu.pl/.
4.
The user has the possibility of creating KBs using a special creator or by importing a KB from a given data source. The format of the KBs enables working with a rule set generated automatically based on the rough set theory as well as with rules given apriori by the domain expert. The KB can have one of the following file formats: XML, RSES, TXT. It is possible to define attributes of any type: nominal, discrete or continuous. There are no limits for the number of rules, attributes, facts or the length of the rule.
5.
It allows for partitioning the rules using the algorithm with time complexity not higher than O(nk), where \(n=|R|\) and \(k=|PR|\). Simple strategies create final partition PR by a single search of rules set R according to the value of \(mc(r_i,R_j)\) function described above. For complex strategies, time complexity is rather higher than any simple partition strategy.
6.
Let us assume that threshold value \( 0 \le T \le 1\) exists.
7.
Thus, conjunction of pairs \((a_i,v_{1i}) (a_2,v_{2i})\ldots (a_s,v_{si})\)) may be both a conditional part of one rule and a representative of some group of rules.
8.
When more than one rule matches the working memory (it is called conflict set) and only one has to be selected, it is possible to use the following strategies: random, textual order, recency, specificity and refractoriness [1]. In the research, the author uses the textual order, the recency and the modified specificity strategies. The textual order fires the first matching rule, while the recency fires the rule which uses the data added most recently to the working memory and the specificity (complexity) fires the rule with the most conditions attached, which means that rules with a greater number of conditions or fewer variables are more specific and should be applied earlier because they use more data and can be used for special cases or exceptions to general rules.
9.
\(sim(F,R_j) = \frac{|F \cap Profile(R_j)|}{|F \cup Profile(R_j)|}\). The value of \(sim(F,Profile(R_j))\) is equal to 0 if there is no such fact \(f_i\) (or the hypothesis to prove) which is included in the representative of any group \(R_j\). It is equal to 1 if all facts (and/or hypothesis) are included in \(Profile(R_j)\) of group \(R_j\).
10.
The inputs are: PR - groups of rules with the representatives and F - the set of facts. The output is F the set of facts, including possible new facts obtained through the inference. The algorithm uses temporary variable R, which is the set of rules that is the result of the previous selection.
11.
In this paper, the following strategies are used: FR (first rule) — which fires the first rule in the conflict set (so it is relevant to the textual order strategy), LR (last rule) — which selects the rules recently added to the conflict set (so it works similarly to the recently strategy), SR (shortest rule) — which selects the rule with the smallest number of conditions, LOR (longest rule) — which chooses rules with the greatest number of conditions (so it is relevant to the specificity strategy).
12.
The mAHC approach for rules partitioning with using four different thresholds of similarity: \(k=0, k=0.25, k=0.5, k=1.0\).
13.
In Fig. 2 they are noticed as \(AHC\_general\), \(mAHC\_general\), \(AHC\_specialized\), \(mAHC\_specialized\), \(AHC\_weighted\) or \(mAHC\_weighted\).

References

Akerkar, R., Sajja, P.: Knowledge-Based Systems. Jones & Bartlett Learning, Sudbury (2010)
Google Scholar
Forgy, C.L.: Rete: A fast algorithm for the many pattern many object pattern match problem. Artif. Intell. 19, 17–37 (1981)
Article Google Scholar
Hanson, E., Hasan, M.S.: Gator: An optimized discrimination network for active database rule condition testing. Tech. rep. (1993)
Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, New Jersey (1988)
MATH Google Scholar
Latkowski, R., Mikołajczyk, M.: Data decomposition and decision rule joining for classification of data with missing values. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 254–263. Springer, Heidelberg (2004)
Chapter Google Scholar
Markov, Z., Larose, D.T.: Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage. Wiley, Hoboken (2007)
Book MATH Google Scholar
Michalski, R.S., Larson, J.B.: Selection of Most Representative Training Examples and Incremental Generation of vl Hypotheses. The Underlying Methodology and the Description of the Programs esel and aq11. University of Illinois, Department of Computer Science, Urbana (1978)
Google Scholar
Miranker, D.P.: Treat: A better match algorithm for ai production systems. Department of Computer Sciences, University of Texas at Austin, Technical report (1987)
Google Scholar
Nalepa, G.J., Ligęza, A., Kaczor, K.: Overview of knowledge formalization with XTT2 rules. In: Bassiliades, N., Governatori, G., Paschke, A. (eds.) RuleML 2011 - Europe. LNCS, vol. 6826, pp. 329–336. Springer, Heidelberg (2011)
Chapter Google Scholar
Nowak-Brzezińska, A., Simiński, R.: Knowledge mining approach for optimization of inference processes in rule knowledge bases. In: Herrero, P., Panetto, H., Meersman, R., Dillon, T. (eds.) OTM-WS 2012. LNCS, vol. 7567, pp. 534–537. Springer, Heidelberg (2012)
Chapter Google Scholar
Pedrycz, W., Cholewa, W.: Expert Systems. Silesian University of Technology, Section of Scientific Publications, Poland (1987). [in polish]
Google Scholar
Rissanen, J.: Paper: Modeling by shortest data description. Automatica 14(5), 465–471, September 1978. http://dx.doi.org/10.1016/0005-1098(78)90005-5
Google Scholar
Sarker, B.R.: The resemblance coefficients in group technology: A survey and comparative study of relational metrics. Comput. Ind. Eng. 30(1), 103–116 (1996). Elsevier Science Ltd., Printed in Great Britain
Article Google Scholar

Download references

Acknowledgement

This work is a part of the project “Exploration of rule knowledge bases” founded by the Polish National Science Centre (NCN: 2011/03/D/ST6/03027).

Author information

Authors and Affiliations

Department of Computer Science, Institute of Computer Science, Silesian University, Sosnowiec, Poland
Agnieszka Nowak-Brzezińska

Authors

Agnieszka Nowak-Brzezińska
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Agnieszka Nowak-Brzezińska .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Stanisław Kozielski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Dariusz Mrozek
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Paweł Kasprowski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Bożena Małysiak-Mrozek
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Daniel Kostrzewa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nowak-Brzezińska, A. (2016). Mining Rule-Based Knowledge Bases. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-34099-9_6
Published: 28 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34098-2
Online ISBN: 978-3-319-34099-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics