Skip to main content
Log in

Using SGML as a Basis for Data-Intensive Natural Language Processing

  • Published:
Computers and the Humanities Aims and scope Submit manuscript

Abstract

This paper describes the LT NSL system (McKelvie et al., 1996), an architecture for writing corpus processing tools. This system is then compared with two other systems which address similar issues, the GATE system (Cunningham et al., 1995) and the IMS Corpus Workbench (Christ, 1994). In particular we address the advantages and disadvantages of an SGML approach compared with a non-sgml database approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Abiteboul, S., D. Quass, J. McHugh, J. Widom and J.L. Wiener, "The Lorel Query Language for Semistructured Data". Journal on Digital Libraries, 1(1) (1997).

  • Anderson, A. H., M. Bader, E. G. Bard, E. H. Boyle, G. M. Doherty, S. C. Garrod, S. D. Isard, J. C. Kowtko, J. M. McAllister, J. Miller, C. F. Sotillo, H. S. Thompson and Weinert, R. "The HCRC Map Task Corpus". Language and Speech, 34(4) (1991), 351–366.

    Google Scholar 

  • Ballim, A. and H. Thompson. "MULTEXT Task 1.2 Milestone B Report". Technical Report, available from Laboratoire Parole et Langage, Universite de Provence, Aix-en-Provence, France, 1995.

    Google Scholar 

  • Bray, T. and S. DeRose, eds. "Extensible Markup Language (XML) Version 1.0". WD-xml-link-970406, World Wide Web Consortium. See also http://www.w3.org/pub/WWW/TR/, 1997.

  • Bray, T. and C. M. Sperberg-McQueen (eds). "Extensible Markup Language (XML) version 1.0". World Wide Web Consortium Working Draft WD-xml-961114. Available at http://www.w3. org/pub/WWW/TR/WD-xml-961114.html, 1996.

  • Brew, C. and D. McKelvie. "Word-pair extraction for lexicography". In Proceedings of NeMLaP'96. Ed. T. Ankara, 1996, pp. 45–55.

  • Burnage, G. and D. Dunlop. "Encoding the British National Corpus". In 13th International Conference on English Language research on computerised corpora. Ed. Nijmegen. Available at http://www.sil.org/sgml/bnc-encoding2.html. See also http://info.ox.ac.uk/bnc/, 1992.

  • Carletta, J., H. Fraser-Krauss and S. Garrod. "An Empirical Study of Innovation in Manufacturing Teams: a Preliminary Report". In Proceedings of the International Workshop on Communication Modelling (LAP-96). Ed. J. L. G. Dietz, Springer-Verlag, Electronic Workshops in Computing Series, 1996.

  • Christ, O. "A modular and flexible architecture for an integrated corpus query system". In Proceedings of COMPLEX '94: 3rd Conference on Computational Lexicography and Text Research (Budapest, July 7–10, 1994), Budapest, Hungary. CMP-LG archive id 9408005, 1994.

  • Christophides, V., S. Abiteboul, S. Cluet and M. Scholl. "From Structured Documents to Novel Query Facilities", SIGMOD 94, 1994.

  • Clark, J. "SP: An SGML System Conforming to International Standard ISO 8879-Standard Generalized Markup Language". Available from http://www.jclark.com/sp/index.htm, 1996.

  • Cunningham, H., K. Humphreys, R. J. Gaizauskas, and Y. Wilks. "Software Infrastructure for Natural Language Processing". In 5th Conference on Applied Natural Language Processing, Washington, April 1997.

  • Cunningham, H., Y. Wilks and R. J. Gaizauskas. "New Methods, Current Trends and Software Infrastructure for NLP". In Proceedings of the Second Conference on New Methods in Language Processing. Ankara, Turkey, March 1996, pp. 283–298.

  • Cunningham, H., R. Gaizauskas and Y. Wilks. "A General Architecture for Text Engineering (GATE)-A New Approach to Language Engineering R&D". Technical Report, Dept of Computer Science, University of Sheffield. Available from http://www.dcs.shef.ac.uk/research/groups /nlp/gate/, 1995.

  • Goldfarb, C. F. "The SGML Handbook". Clarendon Press, 1990.

  • Grishman, R. "TIPSTER Phase II Architecture Design Document Version 1.52". Technical Report, Dept. of Computer Science, New York University. Available at http://www.cs.nyu.edu/tipster, 1995.

  • Ide, N. et al. "MULTEXT Task 1.5 Milestone B Report". Technical Report, available from Laboratoire Parole et Langage, Universite de Provence, Aix-en-Provence, France, 1995.

    Google Scholar 

  • Le Maitre, J., E. Murisasco and M. Rolbert. "SgmlQL, a language for querying SGML documents". In Proceedings of the 4th European Conference on Information Systems (ECIS'96). Lisbon, 1996, pp. 75–89. Information available from http://www.lpl.univ-aix.fr/projects/ multext/MtSgmlQL/

  • McKelvie, D., H. Thompson and S. Finch. "The Normalised SGML Library LT NSL version 1.4.6". Technical Report, Language Technology Group, University of Edinburgh. Available at http://www.ltg.ed.ac.uk/software/nsl, 1996.

  • McKelvie, D., C. Brew and H.S. Thompson. "Using SGML as a Basis for Data-Intensive NLP". In Proc. ANLP'97. Washington, April 1997.

  • Mikheev, A. and S. Finch. "Towards a Workbench for Acquisition of Domain Knowledge from Natural Language". In Proceedings of the Seventh Conference of the European Chapter of the Association for Computational Linguistics (EACL'95).Dublin, Ireland, 1995.

  • Mikheev, A. and S. Finch. "A Workbench for Finding Structure in Texts". In Proc. ANLP'97. Washington, April 1997.

  • Mikheev, A. and D. McKelvie. "Indexing SGML files using LT NSL". Technical Report, Language Technology Group, University of Edinburgh, 1997.

  • Pito, R. "Tgrep Manual Page". Available from http://www.ldc.upenn.edu/ldc/online/treebank/man/ cat1/tgrep.1, 1994.

  • van Rossum, G. "Python Tutorial". Available from http://www.python.org/, 1995.

  • Sperberg-McQueen, C. M. and L. Burnard, eds. "Guidelines for Electronic Text Encoding and Interchange". Text Encoding Initiative, Oxford, 1994.

  • Tobin, R. and D. McKelvie. "The Python Interface to the Normalised SGML Library (PythonNSL)". Technical Report, Language Technology Group, University of Edinburgh, 1996.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

McKelvie, D., Brew, C. & Thompson, H. Using SGML as a Basis for Data-Intensive Natural Language Processing. Computers and the Humanities 31, 367–388 (1997). https://doi.org/10.1023/A:1001053128638

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1001053128638

Navigation