Learning n-Ary Node Selecting Tree Transducers from Completely Annotated Examples

  • A. Lemay
  • J. Niehren
  • R. Gilleron
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4201)


We present the first algorithm for learning n-ary node selection queries in trees from completely annotated examples by methods of grammatical inference. We propose to represent n-ary queries by deterministic n-ary node selecting tree transducers (n-NSTTs). These are tree automata that capture the class of monadic second-order definable n-ary queries. We show that n-NSTTs defined polynomially bounded n-ary queries can be learned from polynomial time and data. An application in Web information extraction yields encouraging results.


Polynomial Time Tree Automaton Tree Language Tree Transducer Deterministic Automaton 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Carme, J., Gilleron, R., Lemay, A., Niehren, J.: Interactive learning of node selecting tree transducer. Machine Learning (2006)Google Scholar
  2. 2.
    Carme, J., Lemay, A., Niehren, J.: Learning node selecting tree transducer from completely annotated examples. In: Paliouras, G., Sakakibara, Y. (eds.) ICGI 2004. LNCS (LNAI), vol. 3264, pp. 91–102. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    Carme, J., Niehren, J., Tommasi, M.: Querying unranked trees with stepwise tree automata. In: van Oostrom, V. (ed.) RTA 2004. LNCS, vol. 3091, pp. 105–118. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  4. 4.
    Chidlovskii, B.: Wrapping web information providers by transducer induction. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 61–73. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  5. 5.
    Corbí, A., Oncina, J., García, P.: Learning regular languages from a complete sample by error correcting techniques. IEEE, 4/1–4/7 (1993)Google Scholar
  6. 6.
    de la Higuera, C.: Characteristic sets for polynomial grammatical inference. Machine Learning 27, 125–137 (1997)zbMATHCrossRefGoogle Scholar
  7. 7.
    Gold, E.M.: Complexity of automaton identification from given data. Inf. Cont. 37, 302–320 (1978)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Gottlob, G., Koch, C.: Monadic queries over tree-structured data. In: 17th Annual IEEE Symposium on Logic in Computer Science, pp. 189–202 (2002)Google Scholar
  9. 9.
    Hosoya, H., Pierce, B.: Regular expression pattern matching for XML. Journal of Functional Programming 6(13), 961–1004 (2003)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Kushmerick, N.: Wrapper induction: Efficiency and expressiveness. Artificial Intelligence 118(1-2), 15–68 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Martens, W., Niehren, J.: On the minimization of XML schemas and tree automata for unranked trees. Journal of Computer and System Science (2006)Google Scholar
  12. 12.
    Miklau, G., Suciu, D.: Containment and equivalence for a fragment of xpath. Journal of the ACM 51(1), 2–45 (2004)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Muslea, I., Minton, S., Knoblock, C.: Active learning with strong and weak views: a case study on wrapper induction. In: IJCAI 2003, pp. 415–420 (2003)Google Scholar
  14. 14.
    Neven, F., Van Den Bussche, J.: Expressiveness of structured document query languages based on attribute grammars. Journal of the ACM 49(1), 56–100 (2002)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Niehren, J., Planque, L., Talbot, J.M., Tison, S.: N-ary queries by tree automata. In: Bierman, G., Koch, C. (eds.) DBPL 2005. LNCS, vol. 3774, pp. 217–231. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  16. 16.
    Oncina, J., Garcia, P.: Inferring regular languages in polynomial update time. Pattern Recognition and Image Analysis, 49–61 (1992)Google Scholar
  17. 17.
    Oncina, J., García, P.: Inference of recognizable tree sets. Tech. report, Universidad de Alicante, DSIC-II/47/93 (1993)Google Scholar
  18. 18.
    Raeymaekers, S., Bruynooghe, M., Van den Bussche, J.: Learning (k,l)-contextual tree languages for information extraction. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 305–316. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  19. 19.
    Thatcher, J.W., Wright, J.B.: Generalized finite automata with an application to a decision problem of second-order logic. Math. System Theory 2, 57–82 (1968)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • A. Lemay
    • 1
  • J. Niehren
    • 2
  • R. Gilleron
    • 1
  1. 1.Mostrare project of INRIA Futurs, LIFLUniversity of Lille 3LilleFrance
  2. 2.Mostrare project of INRIA Futurs, LIFLINRIA FutursLilleFrance

Personalised recommendations