Skip to main content

Syntactic Annotation

  • Chapter
  • First Online:
Language Corpora Annotation and Processing
  • 419 Accesses

Abstract

We discuss in this chapter some of the basic challenges that are involved in analyzing sentences and designing a scheme for syntactic annotation. Here we define the basic concept of syntactic annotation with comments on its nature, method, and function in a language. Next, we focus on some goals and purposes behind developing a syntactic annotation tool for a language. There are some guidelines and instructions for developing a syntactic annotation tool for some advanced languages. We do not try to address these issues and strategies again in this chapter. Rather, we focus on the theoretical and practical importance of syntactic annotation in the process of extracting syntactic information from a sentence. During syntactic annotation, we supply a sentence of a natural language to a machine as an input and instruct the machine to identify phrases and mark their grammatical-cum-syntactic roles in the sentence. It implies that a machine has to learn how phrases are formed and organized so that it understands how a sentence is to be analyzed and interpreted from the perspective of syntactic function and semantic information of words and phrases. It also needs to learn how syntactic-cum-semantic roles of various syntactic units are functionally controlled based on their lexical associations and morphological functions in retrieving information embedded within a sentence. We address all these issues in this chapter and present some ideas and processes that are normally used in syntactic annotation. In course of formulating the basic ideas, we refer to the rules of context-free grammars and show how the outputs generated from syntactically annotated corpus can be used in a better description of a grammar of a language, teaching grammatical forms of a language with better information and analysis, understanding how human brain applies syntactic rules to form sentences, how syntactic rules can be designed to train a computer, and how applications relating to language can be developed with proper syntactic information of a language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Aarts, B., Wallis, S. A., & Nelson, G. (2000). Syntactic annotation in reverse: Exploring ICE-GB with fuzzy tree fragments and ICECUP. In: J. M. Kirk (Ed.) Corpora galore: Analyses and techniques in describing english (pp. 335-343). Rodopi.

    Google Scholar 

  • Aldebazal, I., Aranzabe, M. J., Arriola, J. M., & Dias de Ilarraza, A. (2009). Syntactic annotation in the reference corpus for processing of basque: Theoretical and practical issues. Corpus Linguistics and Linguistic Theory, 5(2), 241–269.

    Google Scholar 

  • Antony, P. J., Nandini, J. W., & Soman, K. P. (2012). Computational morphology and natural language parsing for Indian languages: A literature survey. International Journal of Computer Science, Engineering and Technology, 3(4), 136–146.

    Google Scholar 

  • Antony, P. J., Nandini, J. W., & Soman, K. P. (2010). Penn Treebank-based syntactic parsers for South Dravidian languages using a machine learning approach. International Journal on Computer Application, 7(8), 14–21.

    Article  Google Scholar 

  • Atwell, E., Demetriou, G., Hughes, J., Schiffrin, A., Souter, C., & Wilcock, S. (2000). A comparative evaluation of modern English corpus grammatical annotation schemes. International Computer Archive of Modern English Journal, 24(1), 7–23.

    Google Scholar 

  • Barnbrook, G. (1998). Language and computers. Edinburgh University Press.

    Google Scholar 

  • Begum, R., Husain, S., Dhwaj, A., Sharma, D., Bai, L., & Sangal, R. (2008). A dependency annotation scheme for Indian languages. Proceedings of the international joint conference on natural language processing (IJCNLP-2008). International Institute of Information Technology, January 2008, pp. 1–7.

    Google Scholar 

  • Bharati, A., Chaitanya, V. and Sangal, R. (1995). Natural language processing: A paninian perspective. Prentice-Hall of India.

    Google Scholar 

  • Bharati, A., Gupta, M., Yadav, V., Gali, K., & Sharma, D. M. (2009). Simple parser for Indian languages in a dependency framework. Proceedings of the 3rd linguistic annotation workshop (LAWIII) (pp. 162-165). SIGANN, 47th ACL—4th IJCNLP, Singapore (IJCNLP-2009).

    Google Scholar 

  • Bhat, I.A., Bhat, R.A., Shrivastava, M., & Sharma, D. M. (2017). Joining hands: Exploiting monolingual treebanks for parsing of code-mixing data. Proceedings of the 15th conference of the European chapter of the association for computational linguistics (Vol. 2, pp. 324–330). April 3–7, 2017.

    Google Scholar 

  • Bhat, R. A., Bhat, I. A., & Sharma, D. M. (2017). Improving transition-based dependency parsing of Hindi and Urdu by modeling syntactically relevant phenomena. ACM transactions on Asian and Low-Resource Language Information Processing (TALLIP), Article 17, 6(3), 1–35.

    Google Scholar 

  • Borsley, R. (1991). Syntactic theory: A unified approach. Edward Arnold.

    Google Scholar 

  • Brants, T. (2000). TnT—A statistical part-of-speech tagger. Proceedings of the sixth applied natural language processing conference (ANLP-2000) (pp. 37–42).

    Google Scholar 

  • Brekke, M. (1991). Automatic syntactic annotation meets the wall. In S. Johansson & A.-B. Stenström (Eds.), English computer corpora: Selected papers and research guides (pp. 83–103). Mouton de Gruyter.

    Google Scholar 

  • Bresnan, J. (2001). Lexical-functional syntax. Blackwell.

    Google Scholar 

  • Bresnan, J., Asudeh, A., Toivonen, I., & Wechsler, S. (2015). Lexical-functional syntax. 2nd ed. Wiley Blackwell.

    Google Scholar 

  • Briscoe, E., & Carroll, J. (1993). Generalized probabilistic LR syntactic annotation of natural language (corpora) with unification-based grammars. Computational Linguistics., 19, 25–60.

    Google Scholar 

  • Bunt, H., & Tomita, M. (Eds.). (1996). Recent advances in syntactic annotation technology. Kluwer Academic Publishers.

    Google Scholar 

  • Bunt, H., Carroll, J., & Satta, G. (2004a). Developments in syntactic annotation technology: from theory to application. In H. Bunt, J. Carroll, & G. Satta (Eds.), New Developments in syntactic annotation Technology (pp. 1–18). Kluwer Academic Publishers.

    Google Scholar 

  • Bunt, H., Carroll, J., & Satta, G. (Eds.). (2004b). New developments in syntactic annotation technology. Kluwer Academic Publishers.

    Google Scholar 

  • Chen, X., Alexopoulou, T., & Tsimpli, I. (2020). Automatic extraction of subordinate clauses and its application in second language acquisition research. Behavior Research Methods. https://doi.org/10.3758/s13428-020-01456-7

    Article  Google Scholar 

  • Chomsky, N. (1956). Three models for the description of language. Information Theory, IEEE Transactions., 2(3), 113–124.

    Article  Google Scholar 

  • Dalrymple, M. (2001). Lexical-functional grammar. No. 42 in Syntax and semantics series. Academic Press.

    Google Scholar 

  • Dash, N. S., & Ramamoorthy, L. (2019). Utility and application of language corpora. Springer Nature.

    Google Scholar 

  • Dornescu, I., Evans, R., & Orasan, C. (2014). Relative clause extraction for syntactic simplification. Proceedings of the workshop on automatic text simplification: Methods and applications in multilingual society (ATS-MA 2014) (pp. 1–10). Association for Computational Linguistics and Dublin City University.

    Google Scholar 

  • Falk, Y. N. (2001). Lexical-functional grammar: An introduction to parallel constraint-based syntax. CSLI.

    Google Scholar 

  • Garside, R., Leech, G., & Sampson, G. (Eds.). (1987). The computational analysis of English: a Corpus-based approach. Longman.

    Google Scholar 

  • Greene, B., & Rubin, G. (1971). Automatic grammatical tagging of English. Technical Report. Department of Linguistics. Brown University.

    Google Scholar 

  • Hajicova, E. (1998). Prague dependency treebank: From analytic to tectogrammatical annotation. Proceedings of the first workshop on text, speech, and dialogue (pp. 45–50).

    Google Scholar 

  • Han, A. L. F., Wong, D. F., Chao, L. S., Lu, Y., He, L., & Tian, L. (2014). A universal phrase tagset for multilingual treebanks. Proceedings of the CCL and NLP-NABD 2014, LNAI 8801, pp. 247–258.

    Google Scholar 

  • Han, C., Han, N., & Ko, S.(2002). Development and evaluation of a Korean treebank and its application to NLP. Proceedings of the 3rd international conference on language resources and evaluation (pp. 1635–1642).

    Google Scholar 

  • Haug, D. (2015). Treebanks in historical linguistic research. In C. Viti (Ed.), Perspectives on historical syntax (pp. 188–202). John Benjamins.

    Google Scholar 

  • Hewlett, D., & Cohen, P. (2011). Word segmentation as general chunking. Proceedings of 15th conference on computational natural language learning (pp. 39–47). 23–24 June 2011.

    Google Scholar 

  • Hinrichs, E. W., Bartels, J., Kawata, Y., Kordoni, V., & Telljohann, H. (2000). The tubingen treebanks for spoken German, English and Japanese. In W. Wahlster (Ed.), Verbmobil: Foundations of speech-to-speech translation (pp. 552–576). Springer.

    Google Scholar 

  • Hopcroft, J. E., & Ullman, J. D. (1979). Introduction to automata theory: Languages, and computation. Addison-Wesley.

    Google Scholar 

  • Hsieh, Y.-M., Yang, D.-C., & Chen, K.-J. (2007). Improve parsing performance by self-learning. Computational Linguistics, and Chinese Language Processing, 12(2), 195–216.

    Google Scholar 

  • Huang, C. R., Simon, P., Hsieh, S. K., & Prevot, L. (2007). Rethinking Chinese word segmentation: Tokenization, character classification, or wordbreak identification. Proceedings of the ACL 2007 demo and poster sessions (pp. 69–72). June 2007, Association for Computational Linguistics.

    Google Scholar 

  • Jelínek, T. (2016). Partial accuracy rates and agreements of parsers: Two experiments with ensemble syntactic annotation of Czech. ITAT 2016: Proceedings CEUR Workshop Proceedings, 1649, 42–47.

    Google Scholar 

  • Johansson, S., & Stenström, A.-B. (Eds.). (1991). English computer corpora: Selected papers and research guides. Mouton de Gruyter.

    Google Scholar 

  • Johansson, S., Leech, G., & Goodluck, H. (1978). Manual of information to accompany the Lancaster-Oslo/Bergen Corpus of British English. University of Oslo, Norway.

    Google Scholar 

  • Joshi, A. (1985). How much context-sensitivity is necessary for characterizing structural descriptions? In D. Dowty, L. Karttunen, & A. Zwicky (Eds.), Natural language processing: Theoretical, computational, and psychological perspectives (pp. 206–250). Cambridge University Press.

    Chapter  Google Scholar 

  • Joshi, A. K., Rao, K. S., & Yamada, H. M. (1972a). String Adjunct Grammars: I Local and distributed adjunction. Information and Control, 21(2), 93–116.

    Article  Google Scholar 

  • Joshi, A. K., Rao, K. S., & Yamada, H. M. (1972b). String adjunct grammars: II. Equational Representation, Null Symbols, and Linguistic Relevance. Information and Control, 21(3), 235–260.

    Google Scholar 

  • Jurafsky, D., & Martin, J. H. (2000). Speech and language processing. Pearson Education Inc.

    Google Scholar 

  • Kallmeyer, L. (2010). Syntactic annotation beyond context-free grammars. Springer.

    Book  Google Scholar 

  • Kanayama, H., Torisawa, K., Mitsuishi, Y. & Tsujii, J. (2000). A hybrid Japanese parser with hand-crafted grammar and statistics. Proceedings of 18th international conference on computational linguistics (COLING 2000) (Vo. 2, pp. 411–417). 31 July–4 August 2000, Universität des Saarlandes.

    Google Scholar 

  • Karlsson, F. (1994). Robust syntactic annotation of unconstrained text. In N. Oostdijk & P. deHaan (Eds.), Corpus-based research into language: In honour of Jan Aarts (pp. 121–142). Rodopi.

    Google Scholar 

  • Karlsson, F., Voutilainen, A., Heikkilä, J., & Anttila, A. (Eds.). (1995). Constraint grammar: A language-independent system for syntactic annotation unrestricted text. Mouton de Gruyter.

    Google Scholar 

  • Koster, C. A. (1991). Affix grammars for natural languages. In: Attribute grammars, applications, and systems, International summer school saga. Springer.

    Google Scholar 

  • Kroeger, P. R. (2004). Analyzing syntax: A lexical-functional approach. Cambridge University Press.

    Book  Google Scholar 

  • Kübler, S., McDonald, R., & Nivre, J. (2008). Dependency parsing. Synthesis Lectures on Human Language Technologies., 2(1), 1–127.

    Article  Google Scholar 

  • Leech, G., & Eyes, E. (1993). Syntactic annotation: Linguistic aspects of grammatical tagging and skeleton parsing. In E. Black, R. Garside, & G. Leech (Eds.), Statistically-driven computer grammars of English (pp. 36–61). Rodopi.

    Google Scholar 

  • Leech, G. (1993). Corpus annotation schemes. Literary and Linguistic Computing., 8(4), 275–281.

    Article  Google Scholar 

  • Liu, H. (2009). Dependency grammar: From theory to practice. Science Press.

    Google Scholar 

  • Maamouri, M., & Bies, A. (2004). Developing an Arabic treebank: Methods, guidelines, procedures, and tools. Proceedings of the workshop on computational approaches to arabic script-based languages (pp. 2–9).

    Google Scholar 

  • Makwana, M. T., & Vegda, D. C. (2015). Survey: Natural language parsing for Indian languages. ArXiv. arXiv:1501.07005. pp. 1–9.

    Google Scholar 

  • Mambrini, F. (2016). The ancient Greek dependency treebank: Linguistic annotation in a teaching environment. In G. Bodard & M. Romanello (Eds.), Digital classics outside the echo-chamber: Teaching, knowledge exchange & public engagement (pp. 83–99). Ubiquity Press.

    Google Scholar 

  • Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. (2014). The Stanford coreNLP natural language processing toolkit. In: Association for computational linguistics (ACL) system demonstrations. pp. 55–60.

    Google Scholar 

  • Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics., 19(2), 313–330.

    Google Scholar 

  • McEnery, T., & Wilson, A. (1996). Corpus linguistics. Edinburgh University Press.

    Google Scholar 

  • Melʹc̆uk, I. A. (1987). Dependency syntax: Theory and practice. State University Press of New York.

    Google Scholar 

  • Montemagni, S., Barsotti, F., Battista, M., Calzolari, N., Corazzari, O., Lenci, A., Zampolli, A., Fanciulli, F., Massetani, M., Raffaelli, R., Basili, R., Pazienza, M.T., Saracino, D., Zanzotto, F., Nana, N., Pianesi, F., & Delmonte, R. (2003). Building the Italian syntactic-semantic treebank. In: A. Abeille’ (Ed.) reebanks: Building and using parsed corpora (pp. 189–210). Kluwer.

    Google Scholar 

  • Moreno, A., Lopez, S., Sanchez, F., & Grishman, R. (2003). Developing a Spanish treebank. In: A. Abeille’ (Ed.) Treebanks: Building and using parsed corpora (pp. 149–163). Kluwer.

    Google Scholar 

  • Müller, F. H. (2004). Stylebook for the Tübingen partially parsed corpus of written German (TüPP-D/Z). University of Tübingen, 15 Jan 2004.

    Google Scholar 

  • Nelson, G., Wallis, S., & Aarts, B. (2002). Exploring natural language: Working with the British component of the international corpus of English. John Benjamins.

    Book  Google Scholar 

  • Nivre, J. (2008). Treebanks. In: A. Lüdeling & M. Kytö (Ed.) Corpus linguistics: An international handbook (pp. 225–241). Mouton de Gruyter. Chapter 13.

    Google Scholar 

  • Osborne, T. (2019). A dependency grammar of English: An introduction and beyond. John Benjamins.

    Book  Google Scholar 

  • Palmer, M., Bhatt, R., Narasimhan, B., Rambow, O., Sharma, D., & Xia, F. (2009). Hindi syntax: Annotating dependency, lexical predicate-argument structure, and phrase structure. Proceedings of the 7th international conference on natural language processing, (ICON-2009) (pp. 14–17).

    Google Scholar 

  • Perlmutter, D. M. (1980). Relational grammar. In: E. A. Moravcsik & J. R. Wirth (Eds.) Syntax and semantics: Current approaches to syntax (Vol. 13, pp. 195–229). Academic Press.

    Google Scholar 

  • Perlmutter, D. M. (Ed.). (1983). Studies in relational grammar 1. Chicago University Press.

    Google Scholar 

  • Pullum, G. K., & Gazdar, G. (1982). Natural languages and context-free languages. Linguistics and Philosophy., 4(4), 471–504.

    Article  Google Scholar 

  • Sag, I., & Wasow, T. (1999). Syntactic theory: A formal introduction. CSLI Publications.

    Google Scholar 

  • Sampson, G. (2003). Reflections of a dendrographer. In A. Wilson, P. Rayson, & T. McEnery (Eds.), Corpus linguistics by the Lune: A Festschrift for Geoffrey Leech (pp. 157–184). Peter Lang.

    Google Scholar 

  • Sharma, A., Gupta, S., Motlani, R., Bansal, P., Shrivastava, M., Mamidi, R., & Sharma, D. M. (2016). Shallow parsing pipeline—Hindi-English code-mixed social media text. Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 1340–1345).

    Google Scholar 

  • Shieber, S. (1985). Evidence against the context-freeness of natural language. Linguistics and Philosophy., 8(3), 333–343.

    Article  Google Scholar 

  • Sipser, M. (1997). Introduction to the theory of computation. PWS Publishing.

    Google Scholar 

  • Souter, C., & Atwell, E. (Eds.). (1993). Corpus-based computational linguistics. Rodopi.

    Google Scholar 

  • Staub, A., Dillon, B., & Clifton, C., Jr. (2017). The Matrix Verb as a source of comprehension difficulty in object relative sentences. Journal of Cognitive Science, 41(6), 1353–1376.

    Article  Google Scholar 

  • Tateisi, Y., Yakushiji, A., Ohta, T., & Tsujii, J. (2005). Syntax annotation for the Genia corpus. Proceedings of the IJCNLP, Companion, 2005, 222–227.

    Google Scholar 

  • Taylor, A., Marcus, M., & Santorini, B. (2003). The Penn Treebank: An overview. In A. Abeillé (Ed.), Treebanks: Building and using parsed corpora (pp. 5–22). Springer.

    Chapter  Google Scholar 

  • Telljohann, H., Hinrichs, E. W., Kübler, S., Zinsmeister, H., & Beck, K. (2012). Stylebook for the Tübingen treebank of written German (TüBa-D/Z). University of Tübingen, January 2012.

    Google Scholar 

  • Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. Proceedings of HLT-NAACL, 2003, 252–259.

    Google Scholar 

  • van Valin, R. (2001). An introduction to syntax. Cambridge University Press.

    Book  Google Scholar 

  • Vempaty, C., Naidu, V., Husain, S., Kiran, R., Bai, L., Sharma, D., & Sangal, R. (2010). Issues in analyzing Telugu sentences towards building a Telugu treebank. Computational Linguistics and Intelligent Text Processing, pp. 50–59.

    Google Scholar 

  • Verma, S. K., & Krishnaswamy, N. (1989). Modern linguistics: An introduction. Oxford University Press.

    Google Scholar 

  • Wallis, S. (2008). Searching treebanks and other structured corpora. In: A. Lüdeling & M. Kytö (Eds.) Corpus linguistics: An international handbook (pp. 738–758). Mouton de Gruyter. Chapter 34.

    Google Scholar 

  • Watt, D. A., & Thomas, M. (1991). Programming language syntax and semantics. Prentice-Hall.

    Google Scholar 

  • Webster, J. J., & Kit, C. (1992). Tokenization as the initial phase in NLP. Proceedings of COLING-92, Nantes, Aug 23–28, 1992. pp. 1106–1110.

    Google Scholar 

  • Xue, N., Xia, F., Chiou, F. D., & Palmer, M. (2004). The Penn Chinese treebank: Phrase structure annotation of a large corpus. Natural Language Engineering, 11(1), 207–238.

    Google Scholar 

Web Links

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Dash, N.S. (2021). Syntactic Annotation. In: Language Corpora Annotation and Processing. Springer, Singapore. https://doi.org/10.1007/978-981-16-2960-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-2960-0_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-2959-4

  • Online ISBN: 978-981-16-2960-0

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics