Abstract
This work is inspired by the so-called reranking tasks in natural language processing. In this paper, we first study the ranking, reranking, and ordinal regression algorithms proposed recently in the context of ranks and margins. Then we propose a general framework for ranking and reranking, and introduce a series of variants of the perceptron algorithm for ranking and reranking in the new framework. Compared to the approach of using pairwise objects as training samples, the new algorithms reduces the data complexity and training time. We apply the new perceptron algorithms to the parse reranking and machine translation reranking tasks, and study the performance of reranking by employing various definitions of the margins.
Article PDF
Similar content being viewed by others
References
Charniak, E. (2000). A maximum-entropy-inspired parser. In J. Wiebe (Ed.), Proceedings of the 1st meeting of the north American chapter of the association for computational linguistics (pp. 132–139). Washington, USA: Seattle.
Collins, M., (1999). Head-Driven statistical models for natural language parsing. Ph.D. thesis, University of Pennsylvania.
Collins, M. (2000). Discriminative reranking for natural language parsing. In P. Langley (Ed.), Proceedings of the 17th International Conference on Machine Learning (pp. 175–182). Standord, CA, USA: Morgan Kaufmann.
Collins, M. (2004). Parameter estimation for statistical parsing models: Theory and practice of distribution-free methods. In H. Bunt, J. Carroll, & G. Satta (Eds.), New developments in parsing technology. Kluwer Academic Publishers.
Collins, M., & Duffy N. (2002). New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In E. Charniak, & D. Lin (Eds.), Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL)(pp. 263–270). Philadelphia, PA, USA.
Crammer, K. & Singer, Y. (2003). A family of additive online algorithms for category ranking. Journal of Machine Learning Research, 3:Feb, 1025–1058.
Crammer, K. & Singer, Y. (2001). PRanking with ranking. In Z. G. Thomas, G. Dietterich, & Suzanna Becker (Eds.), Proceedings of the 15th Annual Conference Neural Information Processing Systems (pp. 641–647). Vancouver, British Columbia, Canada: The MIT Press.
Crammer, K. & Singer, Y. (2002). On the learnability and design of output codes for multiclass problems. Machine Learning, 47:(2/3), 201–233.
Cristianini, N., & Shawe-Taylor J. (2000). An introduction to support vector machines and other kernel-based learning mathods. Cambridge University Press.
Har-Peled, S., Roth, D., & Zimak, D. (2002). Constraint classification: A newapproach to multiclass classification. In N. Cesa-Bianchi, M. Numao, & R. Reischuk (Eds.), Proceedings of the 13th Conference on Algorithmic Learning Theory (pp. 365–379). Lubeck, Germany: Springer-Verlag.
Harrington, E. F. (2003). Online ranking/collaborative filtering using the perceptronalgorithm. In T. Fawcett & N. Mishra (Eds.), Proceedings of the 20th International Conference on Machine Learning (pp. 250–257). Washington, DC, USA: AAAI Press.
Herbrich, R., Graepel, T., & Obermayer, K. (2000). Large Margin Rank Boundaries for Ordinal Regression. In Advances in Large Margin Classifiers. (pp. 115–132). The MIT Press.
Krauth, W. & Mezard, M. (1987). Learning algorithms with optimal stability in neural networks. Journal of Physics A, 20, 745–752.
Li, Y., Zaragoza, H., Herbrich, R., Shawe-Taylor, J., & Kandola, J. (2002). The perceptron algorithm with uneven margins. In A. G. H. Claude Sammut (Ed.), Proceedings of the 19th International Conference on Machine Learning (pp. 379–386). Sydney, Australia: Morgan Kaufmann.
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1994). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19:2), 313–330.
Novikoff, A. B. J. (1962). On convergence proofs on perceptrons. The Symposium on the Mathematical Theory of Automata, 12, 615–622.
Och, F. J. (2003). Minimum error rate training for statistical machine translation. In E. W. Hinrichs, & D. Roth (Eds.), Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 160–167). Sapporo, Japan.
Och, F. J., Gildea, D., Khudanpur, S., Sarkar, A., Yamada, K., Fraser, A., Kumar, S., Shen, L., Smith, D., Eng, K., Jain, V., Jin, Z., & Radev, D. (2004). A smorgasbord of features for statistical machine translation. In S. Dumais, D. Marcu, & S. Roukos (Eds.), Proceedings of the 2004 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (pp. 161–168). Boston, MA, USA.
Papineni, K., Roukos, S., & Ward, T. (2001). Bleu: a method for automatic evaluation of machine translation. IBM Research Report, RC22176.
Ratnaparkhi, A. (1997). A Linear Observed Time Statistical Parser based on Maximum Entropy Models. In C. Cardie, & R. Weischedel (Eds.), Proceedings of the 2nd Conference of Empirical Methods in Natural Language Processing (pp. 1–10). Providence, Rhode Island, USA.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–408.
Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1997). Boosting the margin: A new explanation for the effectiveness of voting methods. In D. H. Fisher (Ed.), Proceedings of the 14th International Conference on Machine Learning (pp. 322–330). Nashville, Tennessee, USA: Morgan Kaufmann.
Shen, L. & Joshi, A. K. (2003). An SVM based voting algorithm with application to parse reranking. In W. Daelemans & M. Osborne (Eds.), Proceedings of the 7th Conference on Computational Natural Language Learning (pp. 9–16). Edmonton, Canada: Morgan Kaufmann.
Shen, L., & Joshi, A. K. (2004). Flexible margin selection for reranking with full pairwise samples. In K. Su, & J. Tsujii (Eds.), Proceedings of the 1st International Joint Conference of Natural Language Processing (pp. 467–474). Sanya, Hainan Island, China.
Shen, L., Sarkar, A., & Och, F. J. (2004). Discriminative Reranking for Machine Translation. In S. Dumais, D. Marcu, & S. Roukos (Eds.), Proceedings of the 2004 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (pp. 177–184). Boston, MA, USA.
Vapnik, V. N. (1998). Statistical Learning Theory. John Wiley.
Widrow, B., & Hoff, M. E. (1960). Adaptive switching circuits. IRE WESCON Convention Record, part 4.
Zhang, T. (2000). Large margin winnow methods for text categorization. In KDD-2000 Workshop on Text Mining(pp. 81–87). Boston, MA, USA.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors:
Dan Roth and Pascale Fung
Rights and permissions
About this article
Cite this article
Shen, L., Joshi, A.K. Ranking and Reranking with Perceptron. Mach Learn 60, 73–96 (2005). https://doi.org/10.1007/s10994-005-0918-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-005-0918-9