Annotating Signs of Syntactic Complexity to Support Sentence Simplification

Evans, Richard; Orăsan, Constantin

doi:10.1007/978-3-642-40585-3_13

Richard Evans²⁰ &
Constantin Orăsan²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

2439 Accesses

Abstract

This article presents a new annotation scheme for syntactic complexity in text which has the advantage over other existing syntactic annotation schemes that it is easy to apply, is reliable and it is able to encode a wide range of phenomena. It is based on the notion that the syntactic complexity of sentences is explicitly indicated by signs such as conjunctions, complementisers and punctuation marks. The article describes the annotation scheme developed to annotate these signs and evaluates three corpora containing texts from three genres that were annotated using it. Inter-annotator agreement calculated on the three corpora shows that there is at least “substantial agreement” and motivates directions for future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agarwal, R., Boggess, L.: A simple but useful approach to conjunct identification. In: Proceedings of the 30th Annual Meeting for Computational Linguistics, Newark, Delaware, pp. 15–21. Association for Computational Linguistics (1992)
Google Scholar
Rindflesch, T.C., Rajan, J.V., Hunter, L.: Extracting molecular binding relationships from biomedical text. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, Seattle, Washington, pp. 188–195. Association of Computational Linguistics (2000)
Google Scholar
Evans, R.: Comparing methods for the syntactic simplification of sentences in information extraction. Literary and Linguistic Computing 26 (4), 371–388 (2011)
Article Google Scholar
Gerber, L., Hovy, E.: Improving translation quality by manipulating sentence length. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 448–460. Springer, Heidelberg (1998)
Chapter Google Scholar
Tomita, M.: Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems. Kluwer Academic Publishers, Norwell (1985)
Google Scholar
McDonald, R.T., Nivre, J.: Analyzing and integrating dependency parsers. Computational Linguistics 37, 197–230 (2011)
Article Google Scholar
Quirk, R., Greenbaum, S., Leech, G., Svartvik, J.: A comprehensive grammar of the English language. Longman (1985)
Google Scholar
Orăsan, C., Evans, R., Dornescu, I.: Towards multilingual Europe 2020: A Romanian perspective, pp. 287–312. Romanian Academy Publishing House (2013)
Google Scholar
Nunberg, G., Briscoe, T., Huddleston, R.: Punctuation, pp. 1724–1764. Cambridge University Press (2002)
Google Scholar
Brants, S., Dipper, S., Hansen, S., Lezius, W., Smith, G.: The TIGER treebank. In: Proceedings of the Workshop on Treebanks and Linguistic Theories, Sozopol (2002)
Google Scholar
Simov, K., Popova, G., Osenova, P.: HPSG-based syntactic treebank of Bulgarian (BulTreeBank), pp. 135–142. Lincom-Europa, Munich (2002)
Google Scholar
Hajič, J., Zemánek, P.: Prague arabic dependency treebank: Development in data and tools. In: Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools, pp. 110–117 (2004)
Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19, 313–330 (1993)
Google Scholar
Charniak, E., Johnson, M.: Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, pp. 173–180 (2005)
Google Scholar
Collins, M., Koo, T.: Discriminative reranking for natural language parsing. Computational Linguistics 31, 25–69 (2005)
Article MathSciNet MATH Google Scholar
Maier, W., Kübler, S., Hinrichs, E., Kriwanek, J.: Annotating coordination in the penn treebank. In: Proceedings of the Sixth Linguistic Annotation Workshop, Jeju, Republic of Korea, pp. 166–174. Association for Computational Linguistics (2012)
Google Scholar
Ratnaparkhi, A., Roukos, S., Ward, R.T.: A maximum entropy model for parsing. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), Yokohama, Japan, pp. 803–806 (1994)
Google Scholar
Rus, V., Moldovan, D., Bolohan, O.: FLAIRS Conference. AAAI Press (2002)
Google Scholar
Kim, M.Y., Lee, J.H.: S-clause segmentation for efficient syntactic analysis using decision trees. In: Proceedings of the Australasian Language Technology Workshop, Melbourne, Australia (2003)
Google Scholar
Nakov, P., Hearst, M.: Using the web as an implicit training set: Application to structural ambiguity resolution. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), Vancouver, Association for Computational Linguistics, pp. 835–842 (2005)
Google Scholar
Hogan, D.: Coordinate noun phrase disambiguation in a generative parsing model. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 680–687. Association for Computational Linguistics (2007)
Google Scholar
Kawahara, D., Kurohashi, S.: Coordination disambiguation without any similarities. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, England, pp. 425–432 (2008)
Google Scholar
Kübler, S., Hinrichs, E., Maier, W., Klett, E.: Parsing coordinations. In: Proceedings of the 12th Conference of the European Chapter of the ACL, Athens, Greece, pp. 406–414. Association for Computational Linguistics (2009)
Google Scholar
Chomsky, N.: Knowledge of language: its nature, origin, and use. Greenwood Publishing Group, Santa Barbara (1986)
Google Scholar
Viera, A.J., Garrett, J.M.: Understanding interobserver agreement: The kappa statistic. Family Medicine 37, 360–363 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Research Group in Computational Linguistics, University of Wolverhampton, United Kingdom
Richard Evans & Constantin Orăsan

Authors

Richard Evans
View author publications
You can also search for this author in PubMed Google Scholar
Constantin Orăsan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of West Bohemia, 306 14, Pilsen, Czech Republic
Ivan Habernal & Václav Matoušek &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Evans, R., Orăsan, C. (2013). Annotating Signs of Syntactic Complexity to Support Sentence Simplification. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-40585-3_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics