Abstract
This paper discusses an on-going project aiming at improving the quality and the efficiency of a rule-based parser by the addition of a statistical component. The proposed technique relies on bigrams of pairs (word+category) selected from the homographs contained in our lexical database and computed over a large section of the Hansard corpus, previously tagged. The bigram table is used by the parser to rank and prune the set of alternatives. To evaluate the gains obtained by the hybrid system, we conducted two manual evaluations. One over a small subset of the Hansard corpus, the other one with a corpus of about 50 articles taken from the magazine The Economist. In both cases, we compare analyses obtained by the parser with and without the statistical component, focusing only on one important source of mistakes, the confusion between nominal and verbal readings for ambiguous words such as announce, sets, costs, labour, etc.
Thanks to Meghdad Farahmand and Yves Scherrer for useful comments and contributions to this paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adolphs, P., Oepen, S., Callmeier, U., Crysmann, B., Flickinger, D., Kiefer, B.: Some Fine Points of Hybrid Natural Language Parsing. In: Proceedings of LREC 2008, Marrakesh, Morocco (2008)
Blache, P., Rauzy, S.: Probabiliser les Grammaires de Propriétés. In: Proceedings of the TALN-Mixeur Workshop, TALN 2013, Sables d’Olonne, pp. 108–111 (2013)
Klavans, J., Resnik, P. (eds.): The Balancing Act: Combining Symbolic and Statistical Approaches to Language. MIT Press (1996)
Klein, D., Manning, C.D.: Accurate Unlexicalized Parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423–430 (2003)
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S., Marsi, E.: MaltParser: A Language-independent System for Data-driven Dependency Parsing. Natural Language Engineering 13(2), 95–135 (2007)
Petrov, S., Dipanjan, D., McDonald, R.: A Universal Part-of-Speech Tagset. In: Proceedings of the LREC 2012, Istanbul, Turkey (2012)
Santorini, B.: Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision, 2nd printing) (1990), http://www.cis.upenn.edu/treebank
Schneider, G.: Hybrid Long-Distance Functional Dependency Parsing. Ph.D. dissertation, Institute of Computational Linguistics, University of Zurich (2008)
Sennrich, R., Schneider, G., Volk, M., Warin, M.: A New Hybrid Dependency Parser for German. In: Proceedings of GSCL-Conference (2009)
Wehrli, E.: Fips, a ‘Deep’ Linguistic Multilingual Parser. In: Proceedings of the ACL 2007 Workshop on Deep Linguistic Processing, Prague, Czech Republic, pp. 120–127 (2007)
Wehrli, E., Nerima, L.: L’Analyseur Syntaxique Fips. In: IWPT 2009, Workshop on French Parsers, Paris (2009), http://alpage.inria.fr/iwpt09/atala/fips.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wehrli, E., Nerima, L. (2014). When Rules Meet Bigrams. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54906-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-54906-9_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54905-2
Online ISBN: 978-3-642-54906-9
eBook Packages: Computer ScienceComputer Science (R0)