Skip to main content
Log in

A semiparametric generative model for efficient structured-output supervised learning

  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Abstract

We present a semiparametric generative model for supervised learning with structured outputs. The main algorithmic idea is to replace the parameters of an underlying generative model (such as a stochastic grammars) with input-dependent predictions obtained by (kernel) logistic regression. This method avoids the computational burden associated with the comparison between target and predicted structure during the training phase, but requires as an additional input a vector of sufficient statistics for each training example. The resulting training algorithm is asymptotically more efficient than structured output SVM as the size of the output structure grows. At the same time, by computing parameters of a joint distribution as a function of the full input structure, typical expressiveness limitations of related conditional models (such as maximum entropy Markov models) can be potentially avoided. Empirical results on artificial and real data (in the domains of natural language parsing and RNA secondary structure prediction) show that the method works well in practice and scales up with the size of the output structures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bosco. C., Lombardo, V., Vassallo, D., Lesmo, L.: Building a treebank for Italian: a data-driven annotation schema. In: Proceedings of the Second International Conference on Language Resources and Evaluation LREC, pp. 99–106, Athens, 31 May–2 June 2000

  2. Collins, M.: Parameter estimation for statistical parsing models: theory and practice of distribution-free methods. In: New Developments in Parsing Technology, pp. 19–55. Kluwer Academic, Norwell (previusly IWPT 2001) (2004)

    Google Scholar 

  3. Cortes, C., Mohri, M., Weston, J.: A general regression technique for learning transductions. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 153–160, Bonn, 7–11 August 2005

  4. Johnson, M.: PCFG models of linguistic tree representations. Comput. Linguist. 24(4), 613–632 (1998)

    Google Scholar 

  5. Knudsen, B., Hein, J.: RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15(6), 446–454 (1999)

    Article  Google Scholar 

  6. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML ’01: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  7. Leslie, C., Eskin, E., Noble, W.: The spectrum kernel: a string kernel for svm protein classification. In: Proc. of the Pacific Symposium on Biocomputing, pp. 564–575, Lihue, 3–7 January 2002

  8. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT, Cambridge (1999)

    MATH  Google Scholar 

  9. McAllester, D.: Generalization bounds and consistency for structured labeling. In: Bakir, G., Hofmann, T., Schölkopf, B., Smola, A., Taskar, B., Vishwanathan, S.V.N. (eds.) Predicting Structured Data. MIT, Cambridge (2007)

    Google Scholar 

  10. McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy markov models for information extraction and segmentation. In: Langley, P. (ed.) ICML, pp. 591–598. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  11. Menchetti, S., Costa, F., Frasconi, P.: Weighted decomposition kernels. In: Proceedings of the Twenty-second International Conference on Machine Learning (ICML’05), pp. 585–592. ACM, New York (2005)

    Google Scholar 

  12. Sakakibara, Y., Brown, M., Hughey, R., Mian, I.S., Sjölander, K., Underwood, R.C., Haussler, D.: Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res. 22, 5112–5120 (1994)

    Article  Google Scholar 

  13. Taskar, B., Guestrin, C., Koller, D.: Max-margin markov networks. In: Advances in Neural Information Processing Systems (NIPS 2003), Vancouver, 13–18 December 2004

  14. Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)

    MathSciNet  Google Scholar 

  15. Weston, J., Chapelle, O., Elisseeff, A., Scholkopf, B., Vapnik, V.: Kernel dependency estimation. Adv. Neural Inf. Process. Syst. 15, 873–880 (2003)

    Google Scholar 

  16. Zhu, J., Hastie, T.: Kernel logistic regression and the import vector machine. In: Advances in Neural Information Processing Systems (NIPS 2001), pp. 1081–1088, Vancouver, 3–8 December 2001

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Lippi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Costa, F., Passerini, A., Lippi, M. et al. A semiparametric generative model for efficient structured-output supervised learning. Ann Math Artif Intell 54, 207–222 (2008). https://doi.org/10.1007/s10472-009-9137-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10472-009-9137-6

Keywords

Mathematics Subject Classification (2000)

Navigation