Characteristic Sets for Inferring the Unions of the Tree Pattern Languages by the Most Fitting Hypotheses

  • Yen Kaow Ng
  • Takeshi Shinohara
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4201)


A tree pattern p is a first-order term in formal logic, and the language of p is the set of all the tree patterns obtainable by replacing each variable in p with a tree pattern containing no variables. We consider the inductive inference of the unions of these languages from positive examples using strategies that guarantee some forms of minimality during the learning process. By a result in our earlier work, the existence of a characteristic set for each language in a class \({\mathcal L }\) (within \({\mathcal L }\)) implies that \({\mathcal L }\) can be identified in the limit by a learner that simply conjectures a hypothesis containing the examples, that is minimal in the number of elements of up to an appropriate size. Furthermore, if there is a size ℓ such that each candidate hypothesis has a characteristic set (within the languages in \({\mathcal L }\) that intersects non-emptily with the examples) that consists only of elements of up to size ℓ, then the hypotheses containing the least number of elements of up to size ℓ are at the same time minimal with respect to inclusion. In this paper we show how to determine such a size ℓ for the unions of the tree pattern languages, and hence allowing us to learn the class using hypotheses that fulfill the two mentioned notions of minimality.


Tree Pattern Inductive Inference Variable Node Positive Data Pattern Language 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Angluin, D.: Inductive inference of formal languages from positive data. Information and Control 45, 117–135 (1980)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Angluin, D.: Inference of reversible languages. J. of the ACM 29, 741–765 (1982)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Arimura, H., Ishizaka, H., Shinohara, T.: Learning unions of tree patterns using queries. Theoretical Computer Science 185(1), 47–62 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Arimura, H., Shinohara, T., Otsuki, S.: A polynomial time algorithm for finding finite unions of tree pattern languages. In: Brewka, G., Jantke, K.P., Schmitt, P.H. (eds.) NIL 1991. LNCS (LNAI), vol. 659, pp. 118–131. Springer, Heidelberg (1993)CrossRefGoogle Scholar
  5. 5.
    Arimura, H., Shinohara, T., Otsuki, S.: Polynomial time inference of unions of two tree pattern languages. IEICE Transactions on Information and Systems E75-D(7), 426–434 (1992)Google Scholar
  6. 6.
    Arimura, H., Shinohara, T., Otsuki, S.: Finding minimal generalizations for unions of pattern languages and its application to inductive inference from positive data. In: Enjalbert, P., Mayr, E.W., Wagner, K.W. (eds.) STACS 1994. LNCS, vol. 775, pp. 649–660. Springer, Heidelberg (1994)Google Scholar
  7. 7.
    Arimura, H., Shinohara, T., Otsuki, S., Ishizaka, H.: A generalization of the least general generalization. Machine Intelligence 13 13, 59–85 (1994)Google Scholar
  8. 8.
    Chan, C., Garofalakis, M., Rastogi, R.: RE-tree: an efficient index structure for regular expressions. The VLDB Journal 12(2), 102–119 (2003)CrossRefGoogle Scholar
  9. 9.
    Gold, E.M.: Language identification in the limit. Information and Control 10, 447–474 (1967)zbMATHCrossRefGoogle Scholar
  10. 10.
    W3C XML Core Working Group. Extensible Markup Language (XML) 1.0, 3rd edn. W3C Recommendation (2004)Google Scholar
  11. 11.
    Kobayashi, S., Yokomori, T.: Identifiability of subspaces and homomorphic images of zero-reversible languages. In: Proc. of Algorithmic Learning Theory, 8th Int. Conf., ALT 1997. LNCS (LNAI), vol. 1316, pp. 48–61. Springer, Heidelberg (1997)Google Scholar
  12. 12.
    Ng, Y.K., Ono, H., Shinohara, T.: Measuring over-generalization in the minimal multiple generalizations of biosequences. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 176–188. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    Ng, Y.K., Shinohara, T.: Inferring unions of the pattern languages by the most fitting covers. In: Jain, S., Simon, H.U., Tomita, E. (eds.) ALT 2005. LNCS (LNAI), vol. 3734, pp. 269–282. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  14. 14.
    Ng, Y.K., Shinohara, T.: Finding consensus patterns in very scarce biosequence samples from their minimal multiple generalizations. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 540–545. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  15. 15.
    Plotkin, G.D.: A note on inductive generalization. Machine Intelligence 5, 153–163 (1970)MathSciNetGoogle Scholar
  16. 16.
    Sato, M.: Inductive inference of formal languages. Bulletin of Informatics and Cybernetics 27(1), 85–106 (1995)zbMATHMathSciNetGoogle Scholar
  17. 17.
    Shinohara, T.: Polynomial time inference of extended regular pattern languages. In: Goto, E., Nakajima, R., Yonezawa, A., Nakata, I., Furukawa, K. (eds.) RIMS 1982. LNCS, vol. 147, pp. 115–127. Springer, Heidelberg (1983)Google Scholar
  18. 18.
    Wright, K.: Identification of unions of languages drawn from an identifiable class. In: Proc. of the Second Annual Workshop on Computational Learning Theory, pp. 328–333. Morgan Kaufmann, San Francisco (1989)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yen Kaow Ng
    • 1
  • Takeshi Shinohara
    • 2
  1. 1.Kyushu Institute of Technology, Graduate School of Computer Science and SystemsIizukaJapan
  2. 2.Department of Artificial IntelligenceKyushu Institute of TechnologyIizukaJapan

Personalised recommendations