Syntactic Annotation

Dash, Niladri Sekhar

doi:10.1007/978-981-16-2960-0_10

Niladri Sekhar Dash²

419 Accesses

Abstract

We discuss in this chapter some of the basic challenges that are involved in analyzing sentences and designing a scheme for syntactic annotation. Here we define the basic concept of syntactic annotation with comments on its nature, method, and function in a language. Next, we focus on some goals and purposes behind developing a syntactic annotation tool for a language. There are some guidelines and instructions for developing a syntactic annotation tool for some advanced languages. We do not try to address these issues and strategies again in this chapter. Rather, we focus on the theoretical and practical importance of syntactic annotation in the process of extracting syntactic information from a sentence. During syntactic annotation, we supply a sentence of a natural language to a machine as an input and instruct the machine to identify phrases and mark their grammatical-cum-syntactic roles in the sentence. It implies that a machine has to learn how phrases are formed and organized so that it understands how a sentence is to be analyzed and interpreted from the perspective of syntactic function and semantic information of words and phrases. It also needs to learn how syntactic-cum-semantic roles of various syntactic units are functionally controlled based on their lexical associations and morphological functions in retrieving information embedded within a sentence. We address all these issues in this chapter and present some ideas and processes that are normally used in syntactic annotation. In course of formulating the basic ideas, we refer to the rules of context-free grammars and show how the outputs generated from syntactically annotated corpus can be used in a better description of a grammar of a language, teaching grammatical forms of a language with better information and analysis, understanding how human brain applies syntactic rules to form sentences, how syntactic rules can be designed to train a computer, and how applications relating to language can be developed with proper syntactic information of a language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aarts, B., Wallis, S. A., & Nelson, G. (2000). Syntactic annotation in reverse: Exploring ICE-GB with fuzzy tree fragments and ICECUP. In: J. M. Kirk (Ed.) Corpora galore: Analyses and techniques in describing english (pp. 335-343). Rodopi.
Google Scholar
Aldebazal, I., Aranzabe, M. J., Arriola, J. M., & Dias de Ilarraza, A. (2009). Syntactic annotation in the reference corpus for processing of basque: Theoretical and practical issues. Corpus Linguistics and Linguistic Theory, 5(2), 241–269.
Google Scholar
Antony, P. J., Nandini, J. W., & Soman, K. P. (2012). Computational morphology and natural language parsing for Indian languages: A literature survey. International Journal of Computer Science, Engineering and Technology, 3(4), 136–146.
Google Scholar
Antony, P. J., Nandini, J. W., & Soman, K. P. (2010). Penn Treebank-based syntactic parsers for South Dravidian languages using a machine learning approach. International Journal on Computer Application, 7(8), 14–21.
Article Google Scholar
Atwell, E., Demetriou, G., Hughes, J., Schiffrin, A., Souter, C., & Wilcock, S. (2000). A comparative evaluation of modern English corpus grammatical annotation schemes. International Computer Archive of Modern English Journal, 24(1), 7–23.
Google Scholar
Barnbrook, G. (1998). Language and computers. Edinburgh University Press.
Google Scholar
Begum, R., Husain, S., Dhwaj, A., Sharma, D., Bai, L., & Sangal, R. (2008). A dependency annotation scheme for Indian languages. Proceedings of the international joint conference on natural language processing (IJCNLP-2008). International Institute of Information Technology, January 2008, pp. 1–7.
Google Scholar
Bharati, A., Chaitanya, V. and Sangal, R. (1995). Natural language processing: A paninian perspective. Prentice-Hall of India.
Google Scholar
Bharati, A., Gupta, M., Yadav, V., Gali, K., & Sharma, D. M. (2009). Simple parser for Indian languages in a dependency framework. Proceedings of the 3rd linguistic annotation workshop (LAWIII) (pp. 162-165). SIGANN, 47th ACL—4th IJCNLP, Singapore (IJCNLP-2009).
Google Scholar
Bhat, I.A., Bhat, R.A., Shrivastava, M., & Sharma, D. M. (2017). Joining hands: Exploiting monolingual treebanks for parsing of code-mixing data. Proceedings of the 15th conference of the European chapter of the association for computational linguistics (Vol. 2, pp. 324–330). April 3–7, 2017.
Google Scholar
Bhat, R. A., Bhat, I. A., & Sharma, D. M. (2017). Improving transition-based dependency parsing of Hindi and Urdu by modeling syntactically relevant phenomena. ACM transactions on Asian and Low-Resource Language Information Processing (TALLIP), Article 17, 6(3), 1–35.
Google Scholar
Borsley, R. (1991). Syntactic theory: A unified approach. Edward Arnold.
Google Scholar
Brants, T. (2000). TnT—A statistical part-of-speech tagger. Proceedings of the sixth applied natural language processing conference (ANLP-2000) (pp. 37–42).
Google Scholar
Brekke, M. (1991). Automatic syntactic annotation meets the wall. In S. Johansson & A.-B. Stenström (Eds.), English computer corpora: Selected papers and research guides (pp. 83–103). Mouton de Gruyter.
Google Scholar
Bresnan, J. (2001). Lexical-functional syntax. Blackwell.
Google Scholar
Bresnan, J., Asudeh, A., Toivonen, I., & Wechsler, S. (2015). Lexical-functional syntax. 2nd ed. Wiley Blackwell.
Google Scholar
Briscoe, E., & Carroll, J. (1993). Generalized probabilistic LR syntactic annotation of natural language (corpora) with unification-based grammars. Computational Linguistics., 19, 25–60.
Google Scholar
Bunt, H., & Tomita, M. (Eds.). (1996). Recent advances in syntactic annotation technology. Kluwer Academic Publishers.
Google Scholar
Bunt, H., Carroll, J., & Satta, G. (2004a). Developments in syntactic annotation technology: from theory to application. In H. Bunt, J. Carroll, & G. Satta (Eds.), New Developments in syntactic annotation Technology (pp. 1–18). Kluwer Academic Publishers.
Google Scholar
Bunt, H., Carroll, J., & Satta, G. (Eds.). (2004b). New developments in syntactic annotation technology. Kluwer Academic Publishers.
Google Scholar
Chen, X., Alexopoulou, T., & Tsimpli, I. (2020). Automatic extraction of subordinate clauses and its application in second language acquisition research. Behavior Research Methods. https://doi.org/10.3758/s13428-020-01456-7
Article Google Scholar
Chomsky, N. (1956). Three models for the description of language. Information Theory, IEEE Transactions., 2(3), 113–124.
Article Google Scholar
Dalrymple, M. (2001). Lexical-functional grammar. No. 42 in Syntax and semantics series. Academic Press.
Google Scholar
Dash, N. S., & Ramamoorthy, L. (2019). Utility and application of language corpora. Springer Nature.
Google Scholar
Dornescu, I., Evans, R., & Orasan, C. (2014). Relative clause extraction for syntactic simplification. Proceedings of the workshop on automatic text simplification: Methods and applications in multilingual society (ATS-MA 2014) (pp. 1–10). Association for Computational Linguistics and Dublin City University.
Google Scholar
Falk, Y. N. (2001). Lexical-functional grammar: An introduction to parallel constraint-based syntax. CSLI.
Google Scholar
Garside, R., Leech, G., & Sampson, G. (Eds.). (1987). The computational analysis of English: a Corpus-based approach. Longman.
Google Scholar
Greene, B., & Rubin, G. (1971). Automatic grammatical tagging of English. Technical Report. Department of Linguistics. Brown University.
Google Scholar
Hajicova, E. (1998). Prague dependency treebank: From analytic to tectogrammatical annotation. Proceedings of the first workshop on text, speech, and dialogue (pp. 45–50).
Google Scholar
Han, A. L. F., Wong, D. F., Chao, L. S., Lu, Y., He, L., & Tian, L. (2014). A universal phrase tagset for multilingual treebanks. Proceedings of the CCL and NLP-NABD 2014, LNAI 8801, pp. 247–258.
Google Scholar
Han, C., Han, N., & Ko, S.(2002). Development and evaluation of a Korean treebank and its application to NLP. Proceedings of the 3rd international conference on language resources and evaluation (pp. 1635–1642).
Google Scholar
Haug, D. (2015). Treebanks in historical linguistic research. In C. Viti (Ed.), Perspectives on historical syntax (pp. 188–202). John Benjamins.
Google Scholar
Hewlett, D., & Cohen, P. (2011). Word segmentation as general chunking. Proceedings of 15th conference on computational natural language learning (pp. 39–47). 23–24 June 2011.
Google Scholar
Hinrichs, E. W., Bartels, J., Kawata, Y., Kordoni, V., & Telljohann, H. (2000). The tubingen treebanks for spoken German, English and Japanese. In W. Wahlster (Ed.), Verbmobil: Foundations of speech-to-speech translation (pp. 552–576). Springer.
Google Scholar
Hopcroft, J. E., & Ullman, J. D. (1979). Introduction to automata theory: Languages, and computation. Addison-Wesley.
Google Scholar
Hsieh, Y.-M., Yang, D.-C., & Chen, K.-J. (2007). Improve parsing performance by self-learning. Computational Linguistics, and Chinese Language Processing, 12(2), 195–216.
Google Scholar
Huang, C. R., Simon, P., Hsieh, S. K., & Prevot, L. (2007). Rethinking Chinese word segmentation: Tokenization, character classification, or wordbreak identification. Proceedings of the ACL 2007 demo and poster sessions (pp. 69–72). June 2007, Association for Computational Linguistics.
Google Scholar
Jelínek, T. (2016). Partial accuracy rates and agreements of parsers: Two experiments with ensemble syntactic annotation of Czech. ITAT 2016: Proceedings CEUR Workshop Proceedings, 1649, 42–47.
Google Scholar
Johansson, S., & Stenström, A.-B. (Eds.). (1991). English computer corpora: Selected papers and research guides. Mouton de Gruyter.
Google Scholar
Johansson, S., Leech, G., & Goodluck, H. (1978). Manual of information to accompany the Lancaster-Oslo/Bergen Corpus of British English. University of Oslo, Norway.
Google Scholar
Joshi, A. (1985). How much context-sensitivity is necessary for characterizing structural descriptions? In D. Dowty, L. Karttunen, & A. Zwicky (Eds.), Natural language processing: Theoretical, computational, and psychological perspectives (pp. 206–250). Cambridge University Press.
Chapter Google Scholar
Joshi, A. K., Rao, K. S., & Yamada, H. M. (1972a). String Adjunct Grammars: I Local and distributed adjunction. Information and Control, 21(2), 93–116.
Article Google Scholar
Joshi, A. K., Rao, K. S., & Yamada, H. M. (1972b). String adjunct grammars: II. Equational Representation, Null Symbols, and Linguistic Relevance. Information and Control, 21(3), 235–260.
Google Scholar
Jurafsky, D., & Martin, J. H. (2000). Speech and language processing. Pearson Education Inc.
Google Scholar
Kallmeyer, L. (2010). Syntactic annotation beyond context-free grammars. Springer.
Book Google Scholar
Kanayama, H., Torisawa, K., Mitsuishi, Y. & Tsujii, J. (2000). A hybrid Japanese parser with hand-crafted grammar and statistics. Proceedings of 18th international conference on computational linguistics (COLING 2000) (Vo. 2, pp. 411–417). 31 July–4 August 2000, Universität des Saarlandes.
Google Scholar
Karlsson, F. (1994). Robust syntactic annotation of unconstrained text. In N. Oostdijk & P. deHaan (Eds.), Corpus-based research into language: In honour of Jan Aarts (pp. 121–142). Rodopi.
Google Scholar
Karlsson, F., Voutilainen, A., Heikkilä, J., & Anttila, A. (Eds.). (1995). Constraint grammar: A language-independent system for syntactic annotation unrestricted text. Mouton de Gruyter.
Google Scholar
Koster, C. A. (1991). Affix grammars for natural languages. In: Attribute grammars, applications, and systems, International summer school saga. Springer.
Google Scholar
Kroeger, P. R. (2004). Analyzing syntax: A lexical-functional approach. Cambridge University Press.
Book Google Scholar
Kübler, S., McDonald, R., & Nivre, J. (2008). Dependency parsing. Synthesis Lectures on Human Language Technologies., 2(1), 1–127.
Article Google Scholar
Leech, G., & Eyes, E. (1993). Syntactic annotation: Linguistic aspects of grammatical tagging and skeleton parsing. In E. Black, R. Garside, & G. Leech (Eds.), Statistically-driven computer grammars of English (pp. 36–61). Rodopi.
Google Scholar
Leech, G. (1993). Corpus annotation schemes. Literary and Linguistic Computing., 8(4), 275–281.
Article Google Scholar
Liu, H. (2009). Dependency grammar: From theory to practice. Science Press.
Google Scholar
Maamouri, M., & Bies, A. (2004). Developing an Arabic treebank: Methods, guidelines, procedures, and tools. Proceedings of the workshop on computational approaches to arabic script-based languages (pp. 2–9).
Google Scholar
Makwana, M. T., & Vegda, D. C. (2015). Survey: Natural language parsing for Indian languages. ArXiv. arXiv:1501.07005. pp. 1–9.
Google Scholar
Mambrini, F. (2016). The ancient Greek dependency treebank: Linguistic annotation in a teaching environment. In G. Bodard & M. Romanello (Eds.), Digital classics outside the echo-chamber: Teaching, knowledge exchange & public engagement (pp. 83–99). Ubiquity Press.
Google Scholar
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. (2014). The Stanford coreNLP natural language processing toolkit. In: Association for computational linguistics (ACL) system demonstrations. pp. 55–60.
Google Scholar
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics., 19(2), 313–330.
Google Scholar
McEnery, T., & Wilson, A. (1996). Corpus linguistics. Edinburgh University Press.
Google Scholar
Melʹc̆uk, I. A. (1987). Dependency syntax: Theory and practice. State University Press of New York.
Google Scholar
Montemagni, S., Barsotti, F., Battista, M., Calzolari, N., Corazzari, O., Lenci, A., Zampolli, A., Fanciulli, F., Massetani, M., Raffaelli, R., Basili, R., Pazienza, M.T., Saracino, D., Zanzotto, F., Nana, N., Pianesi, F., & Delmonte, R. (2003). Building the Italian syntactic-semantic treebank. In: A. Abeille’ (Ed.) reebanks: Building and using parsed corpora (pp. 189–210). Kluwer.
Google Scholar
Moreno, A., Lopez, S., Sanchez, F., & Grishman, R. (2003). Developing a Spanish treebank. In: A. Abeille’ (Ed.) Treebanks: Building and using parsed corpora (pp. 149–163). Kluwer.
Google Scholar
Müller, F. H. (2004). Stylebook for the Tübingen partially parsed corpus of written German (TüPP-D/Z). University of Tübingen, 15 Jan 2004.
Google Scholar
Nelson, G., Wallis, S., & Aarts, B. (2002). Exploring natural language: Working with the British component of the international corpus of English. John Benjamins.
Book Google Scholar
Nivre, J. (2008). Treebanks. In: A. Lüdeling & M. Kytö (Ed.) Corpus linguistics: An international handbook (pp. 225–241). Mouton de Gruyter. Chapter 13.
Google Scholar
Osborne, T. (2019). A dependency grammar of English: An introduction and beyond. John Benjamins.
Book Google Scholar
Palmer, M., Bhatt, R., Narasimhan, B., Rambow, O., Sharma, D., & Xia, F. (2009). Hindi syntax: Annotating dependency, lexical predicate-argument structure, and phrase structure. Proceedings of the 7th international conference on natural language processing, (ICON-2009) (pp. 14–17).
Google Scholar
Perlmutter, D. M. (1980). Relational grammar. In: E. A. Moravcsik & J. R. Wirth (Eds.) Syntax and semantics: Current approaches to syntax (Vol. 13, pp. 195–229). Academic Press.
Google Scholar
Perlmutter, D. M. (Ed.). (1983). Studies in relational grammar 1. Chicago University Press.
Google Scholar
Pullum, G. K., & Gazdar, G. (1982). Natural languages and context-free languages. Linguistics and Philosophy., 4(4), 471–504.
Article Google Scholar
Sag, I., & Wasow, T. (1999). Syntactic theory: A formal introduction. CSLI Publications.
Google Scholar
Sampson, G. (2003). Reflections of a dendrographer. In A. Wilson, P. Rayson, & T. McEnery (Eds.), Corpus linguistics by the Lune: A Festschrift for Geoffrey Leech (pp. 157–184). Peter Lang.
Google Scholar
Sharma, A., Gupta, S., Motlani, R., Bansal, P., Shrivastava, M., Mamidi, R., & Sharma, D. M. (2016). Shallow parsing pipeline—Hindi-English code-mixed social media text. Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 1340–1345).
Google Scholar
Shieber, S. (1985). Evidence against the context-freeness of natural language. Linguistics and Philosophy., 8(3), 333–343.
Article Google Scholar
Sipser, M. (1997). Introduction to the theory of computation. PWS Publishing.
Google Scholar
Souter, C., & Atwell, E. (Eds.). (1993). Corpus-based computational linguistics. Rodopi.
Google Scholar
Staub, A., Dillon, B., & Clifton, C., Jr. (2017). The Matrix Verb as a source of comprehension difficulty in object relative sentences. Journal of Cognitive Science, 41(6), 1353–1376.
Article Google Scholar
Tateisi, Y., Yakushiji, A., Ohta, T., & Tsujii, J. (2005). Syntax annotation for the Genia corpus. Proceedings of the IJCNLP, Companion, 2005, 222–227.
Google Scholar
Taylor, A., Marcus, M., & Santorini, B. (2003). The Penn Treebank: An overview. In A. Abeillé (Ed.), Treebanks: Building and using parsed corpora (pp. 5–22). Springer.
Chapter Google Scholar
Telljohann, H., Hinrichs, E. W., Kübler, S., Zinsmeister, H., & Beck, K. (2012). Stylebook for the Tübingen treebank of written German (TüBa-D/Z). University of Tübingen, January 2012.
Google Scholar
Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. Proceedings of HLT-NAACL, 2003, 252–259.
Google Scholar
van Valin, R. (2001). An introduction to syntax. Cambridge University Press.
Book Google Scholar
Vempaty, C., Naidu, V., Husain, S., Kiran, R., Bai, L., Sharma, D., & Sangal, R. (2010). Issues in analyzing Telugu sentences towards building a Telugu treebank. Computational Linguistics and Intelligent Text Processing, pp. 50–59.
Google Scholar
Verma, S. K., & Krishnaswamy, N. (1989). Modern linguistics: An introduction. Oxford University Press.
Google Scholar
Wallis, S. (2008). Searching treebanks and other structured corpora. In: A. Lüdeling & M. Kytö (Eds.) Corpus linguistics: An international handbook (pp. 738–758). Mouton de Gruyter. Chapter 34.
Google Scholar
Watt, D. A., & Thomas, M. (1991). Programming language syntax and semantics. Prentice-Hall.
Google Scholar
Webster, J. J., & Kit, C. (1992). Tokenization as the initial phase in NLP. Proceedings of COLING-92, Nantes, Aug 23–28, 1992. pp. 1106–1110.
Google Scholar
Xue, N., Xia, F., Chiou, F. D., & Palmer, M. (2004). The Penn Chinese treebank: Phrase structure annotation of a large corpus. Natural Language Engineering, 11(1), 207–238.
Google Scholar

Web Links

Download references

Author information

Authors and Affiliations

Linguistic Research Unit, Indian Statistical Institute, Kolkata, West Bengal, India
Dr. Niladri Sekhar Dash

Authors

Dr. Niladri Sekhar Dash
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dash, N.S. (2021). Syntactic Annotation. In: Language Corpora Annotation and Processing. Springer, Singapore. https://doi.org/10.1007/978-981-16-2960-0_10

Download citation

DOI: https://doi.org/10.1007/978-981-16-2960-0_10
Published: 08 July 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-2959-4
Online ISBN: 978-981-16-2960-0
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics