Skip to main content

Abstract

Rule-based knowledge bases are constantly increasing in volume, thus the knowledge stored as a set of rules is getting progressively more complex and when rules are not organized into any structure, the system is inefficient. In the author’s opinion, modification of both the knowledge base structure and inference algorithms lead to improve the efficiency of the inference process. Rules partition enables reducing significantly the percentage of the knowledge base analysed during the inference process. The form of the group’s representative plays an important role in the efficiency of the inference process. The good performance of this approach is shown through an extensive experimental study carried out on a collection of real knoswledge bases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Rules have been extensively used in knowledge representation and reasoning. It is very space efficient: only a relatively small number of facts needs to be stored in the KB and the rest can be derived by the inference rules.

  2. 2.

    The idea is new but it is based on the author’s previous research, where the idea of clustering rules as well as creating the so-called decision units was introduced [10, 11].

  3. 3.

    http://kbexplorer.ii.us.edu.pl/.

  4. 4.

    The user has the possibility of creating KBs using a special creator or by importing a KB from a given data source. The format of the KBs enables working with a rule set generated automatically based on the rough set theory as well as with rules given apriori by the domain expert. The KB can have one of the following file formats: XML, RSES, TXT. It is possible to define attributes of any type: nominal, discrete or continuous. There are no limits for the number of rules, attributes, facts or the length of the rule.

  5. 5.

    It allows for partitioning the rules using the algorithm with time complexity not higher than O(nk), where \(n=|R|\) and \(k=|PR|\). Simple strategies create final partition PR by a single search of rules set R according to the value of \(mc(r_i,R_j)\) function described above. For complex strategies, time complexity is rather higher than any simple partition strategy.

  6. 6.

    Let us assume that threshold value \( 0 \le T \le 1\) exists.

  7. 7.

    Thus, conjunction of pairs \((a_i,v_{1i}) (a_2,v_{2i})\ldots (a_s,v_{si})\)) may be both a conditional part of one rule and a representative of some group of rules.

  8. 8.

    When more than one rule matches the working memory (it is called conflict set) and only one has to be selected, it is possible to use the following strategies: random, textual order, recency, specificity and refractoriness [1]. In the research, the author uses the textual order, the recency and the modified specificity strategies. The textual order fires the first matching rule, while the recency fires the rule which uses the data added most recently to the working memory and the specificity (complexity) fires the rule with the most conditions attached, which means that rules with a greater number of conditions or fewer variables are more specific and should be applied earlier because they use more data and can be used for special cases or exceptions to general rules.

  9. 9.

    \(sim(F,R_j) = \frac{|F \cap Profile(R_j)|}{|F \cup Profile(R_j)|}\). The value of \(sim(F,Profile(R_j))\) is equal to 0 if there is no such fact \(f_i\) (or the hypothesis to prove) which is included in the representative of any group \(R_j\). It is equal to 1 if all facts (and/or hypothesis) are included in \(Profile(R_j)\) of group \(R_j\).

  10. 10.

    The inputs are: PR - groups of rules with the representatives and F - the set of facts. The output is F the set of facts, including possible new facts obtained through the inference. The algorithm uses temporary variable R, which is the set of rules that is the result of the previous selection.

  11. 11.

    In this paper, the following strategies are used: FR (first rule) — which fires the first rule in the conflict set (so it is relevant to the textual order strategy), LR (last rule) — which selects the rules recently added to the conflict set (so it works similarly to the recently strategy), SR (shortest rule) — which selects the rule with the smallest number of conditions, LOR (longest rule) — which chooses rules with the greatest number of conditions (so it is relevant to the specificity strategy).

  12. 12.

    The mAHC approach for rules partitioning with using four different thresholds of similarity: \(k=0, k=0.25, k=0.5, k=1.0\).

  13. 13.

    In Fig. 2 they are noticed as \(AHC\_general\), \(mAHC\_general\), \(AHC\_specialized\), \(mAHC\_specialized\), \(AHC\_weighted\) or \(mAHC\_weighted\).

References

  1. Akerkar, R., Sajja, P.: Knowledge-Based Systems. Jones & Bartlett Learning, Sudbury (2010)

    Google Scholar 

  2. Forgy, C.L.: Rete: A fast algorithm for the many pattern many object pattern match problem. Artif. Intell. 19, 17–37 (1981)

    Article  Google Scholar 

  3. Hanson, E., Hasan, M.S.: Gator: An optimized discrimination network for active database rule condition testing. Tech. rep. (1993)

    Google Scholar 

  4. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, New Jersey (1988)

    MATH  Google Scholar 

  5. Latkowski, R., Mikołajczyk, M.: Data decomposition and decision rule joining for classification of data with missing values. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 254–263. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  6. Markov, Z., Larose, D.T.: Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage. Wiley, Hoboken (2007)

    Book  MATH  Google Scholar 

  7. Michalski, R.S., Larson, J.B.: Selection of Most Representative Training Examples and Incremental Generation of vl Hypotheses. The Underlying Methodology and the Description of the Programs esel and aq11. University of Illinois, Department of Computer Science, Urbana (1978)

    Google Scholar 

  8. Miranker, D.P.: Treat: A better match algorithm for ai production systems. Department of Computer Sciences, University of Texas at Austin, Technical report (1987)

    Google Scholar 

  9. Nalepa, G.J., Ligęza, A., Kaczor, K.: Overview of knowledge formalization with XTT2 rules. In: Bassiliades, N., Governatori, G., Paschke, A. (eds.) RuleML 2011 - Europe. LNCS, vol. 6826, pp. 329–336. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Nowak-Brzezińska, A., Simiński, R.: Knowledge mining approach for optimization of inference processes in rule knowledge bases. In: Herrero, P., Panetto, H., Meersman, R., Dillon, T. (eds.) OTM-WS 2012. LNCS, vol. 7567, pp. 534–537. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  11. Pedrycz, W., Cholewa, W.: Expert Systems. Silesian University of Technology, Section of Scientific Publications, Poland (1987). [in polish]

    Google Scholar 

  12. Rissanen, J.: Paper: Modeling by shortest data description. Automatica 14(5), 465–471, September 1978. http://dx.doi.org/10.1016/0005-1098(78)90005-5

    Google Scholar 

  13. Sarker, B.R.: The resemblance coefficients in group technology: A survey and comparative study of relational metrics. Comput. Ind. Eng. 30(1), 103–116 (1996). Elsevier Science Ltd., Printed in Great Britain

    Article  Google Scholar 

Download references

Acknowledgement

This work is a part of the project “Exploration of rule knowledge bases” founded by the Polish National Science Centre (NCN: 2011/03/D/ST6/03027).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Agnieszka Nowak-Brzezińska .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Nowak-Brzezińska, A. (2016). Mining Rule-Based Knowledge Bases. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-34099-9_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-34098-2

  • Online ISBN: 978-3-319-34099-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics