Abstract
This article presents a new annotation scheme for syntactic complexity in text which has the advantage over other existing syntactic annotation schemes that it is easy to apply, is reliable and it is able to encode a wide range of phenomena. It is based on the notion that the syntactic complexity of sentences is explicitly indicated by signs such as conjunctions, complementisers and punctuation marks. The article describes the annotation scheme developed to annotate these signs and evaluates three corpora containing texts from three genres that were annotated using it. Inter-annotator agreement calculated on the three corpora shows that there is at least “substantial agreement” and motivates directions for future work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agarwal, R., Boggess, L.: A simple but useful approach to conjunct identification. In: Proceedings of the 30th Annual Meeting for Computational Linguistics, Newark, Delaware, pp. 15–21. Association for Computational Linguistics (1992)
Rindflesch, T.C., Rajan, J.V., Hunter, L.: Extracting molecular binding relationships from biomedical text. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, Seattle, Washington, pp. 188–195. Association of Computational Linguistics (2000)
Evans, R.: Comparing methods for the syntactic simplification of sentences in information extraction. Literary and Linguistic Computing 26 (4), 371–388 (2011)
Gerber, L., Hovy, E.: Improving translation quality by manipulating sentence length. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 448–460. Springer, Heidelberg (1998)
Tomita, M.: Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems. Kluwer Academic Publishers, Norwell (1985)
McDonald, R.T., Nivre, J.: Analyzing and integrating dependency parsers. Computational Linguistics 37, 197–230 (2011)
Quirk, R., Greenbaum, S., Leech, G., Svartvik, J.: A comprehensive grammar of the English language. Longman (1985)
Orăsan, C., Evans, R., Dornescu, I.: Towards multilingual Europe 2020: A Romanian perspective, pp. 287–312. Romanian Academy Publishing House (2013)
Nunberg, G., Briscoe, T., Huddleston, R.: Punctuation, pp. 1724–1764. Cambridge University Press (2002)
Brants, S., Dipper, S., Hansen, S., Lezius, W., Smith, G.: The TIGER treebank. In: Proceedings of the Workshop on Treebanks and Linguistic Theories, Sozopol (2002)
Simov, K., Popova, G., Osenova, P.: HPSG-based syntactic treebank of Bulgarian (BulTreeBank), pp. 135–142. Lincom-Europa, Munich (2002)
Hajič, J., Zemánek, P.: Prague arabic dependency treebank: Development in data and tools. In: Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools, pp. 110–117 (2004)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19, 313–330 (1993)
Charniak, E., Johnson, M.: Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, pp. 173–180 (2005)
Collins, M., Koo, T.: Discriminative reranking for natural language parsing. Computational Linguistics 31, 25–69 (2005)
Maier, W., Kübler, S., Hinrichs, E., Kriwanek, J.: Annotating coordination in the penn treebank. In: Proceedings of the Sixth Linguistic Annotation Workshop, Jeju, Republic of Korea, pp. 166–174. Association for Computational Linguistics (2012)
Ratnaparkhi, A., Roukos, S., Ward, R.T.: A maximum entropy model for parsing. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), Yokohama, Japan, pp. 803–806 (1994)
Rus, V., Moldovan, D., Bolohan, O.: FLAIRS Conference. AAAI Press (2002)
Kim, M.Y., Lee, J.H.: S-clause segmentation for efficient syntactic analysis using decision trees. In: Proceedings of the Australasian Language Technology Workshop, Melbourne, Australia (2003)
Nakov, P., Hearst, M.: Using the web as an implicit training set: Application to structural ambiguity resolution. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), Vancouver, Association for Computational Linguistics, pp. 835–842 (2005)
Hogan, D.: Coordinate noun phrase disambiguation in a generative parsing model. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 680–687. Association for Computational Linguistics (2007)
Kawahara, D., Kurohashi, S.: Coordination disambiguation without any similarities. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, England, pp. 425–432 (2008)
Kübler, S., Hinrichs, E., Maier, W., Klett, E.: Parsing coordinations. In: Proceedings of the 12th Conference of the European Chapter of the ACL, Athens, Greece, pp. 406–414. Association for Computational Linguistics (2009)
Chomsky, N.: Knowledge of language: its nature, origin, and use. Greenwood Publishing Group, Santa Barbara (1986)
Viera, A.J., Garrett, J.M.: Understanding interobserver agreement: The kappa statistic. Family Medicine 37, 360–363 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Evans, R., Orăsan, C. (2013). Annotating Signs of Syntactic Complexity to Support Sentence Simplification. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-40585-3_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)