Computers and the Humanities

, Volume 36, Issue 2, pp 223–254 | Cite as

GATE, a General Architecture for Text Engineering

  • Hamish Cunningham
Article

Abstract

This paper presents the design, implementation and evaluation of GATE, a General Architecture for Text Engineering.GATE lies at the intersection of human language computation and software engineering, and constitutes aninfrastructural system supporting research and development of languageprocessing software.

GATE infrastructure language engineering software architecture 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Appelt, D. “An Introduction to Information Extraction”. Artificial Intelligence Communications, 12(3) (1999), pp. 161–172.Google Scholar
  2. Bird, S. and M. Liberman. “A Formal Framework for Linguistic Annotation”. Technical Report MS-CIS-99-01, Department of Computer And Information Science, University of Pennsylvania. http://xxx.lanl.gov/-abs.cs.CL/9903003, 1999.Google Scholar
  3. Booch, G. Object-Oriented Analysis and Design, 2nd Edn. Benjamin/Cummings, 1994.Google Scholar
  4. Booch, G., J. Rumbaugh and I. Jacobson. The Unified Modelling Language User Guide. Addison-Wesley, Reading, MA, 1999.Google Scholar
  5. Brughman, H., A. Russel, P. Wittenburg and R. Piepenbrock. “Corpus-based Research Using and Internet”. In First International Conference on Language Resources and Evaluation (LREC) Workshop on Distributing and Accessing Linguistic Resources. Granada, Spain, 1998.Google Scholar
  6. Brugman, H., H. Russel and P.Wittenburg. “An Infrastructure for Collaboratively Building and Using Multimedia Corpora in the Humaniora”. In Proceedings of the ED-MEDIA/ED-TELECOM Conference. Freiburg, 1998.Google Scholar
  7. Burnett, M.,M. Baker, C. Bohus, P. Carlson, S. Yang and P. van Zee. “Scaling Up Visual Languages”. IEEE Computer, 28(3) (1987), pp. 45–54.Google Scholar
  8. Clements, P. and L. Northrop. “Software Architecture: An Executive Overview”. Technical Report CMU/SEI-96-TR-003, Software Engineering Institute, Carnegie Mellon University, 1996.Google Scholar
  9. Cockburn, A. “Structuring Use Cases with Goals”. Journal of Object-Oriented Programming, Sept– Oct and Nov–Dec, 1997.Google Scholar
  10. Cowie, J. and W. Lehnert. “Information Extraction”. Communications of the ACM, 39(1) (1996), pp. 80–91.Google Scholar
  11. Cunningham, H. “A Definition and Short History of Language Engineering”. Journal of Natural Language Engineering, 5(1) (1999a), pp. 1–16.Google Scholar
  12. Cunningham, H. “Information Extraction: A User Guide (revised version)”. Research Memorandum CS-99-07, Department of Computer Science, University of Sheffield, 1999b.Google Scholar
  13. Cunningham, H. “JAPE: A Java Annotation Patterns Engine”. Research Memorandum CS-99-06, Department of Computer Science, University of Sheffield, 1999c.Google Scholar
  14. Cunningham, H. “Software Architecture for Language Engineering”. Ph.D. thesis, University of Sheffield. http://gate.ac.uk/sale/thesis/, 2000.Google Scholar
  15. Cunningham, H., K. Bontcheva, V. Tablan and Y. Wilks. “Software Infrastructure for Language Resources: A Taxonomy of Previous Work and a Requirements Analysis”. In Proceedings of the 2nd International Conference On Language Resources and Evaluation (LREC-2). Athens. http://gate.ac.uk/, 2000a.Google Scholar
  16. Cunningham, H., M. Freeman and W. Black. “Software Reuse, Object-Oriented Frameworks and Natural Language Processing”. In New Methods in Language Processing (NeMLaP-1), September 1994. lManchester, (Re-published in book form 1997 by UCL Press), 1994.Google Scholar
  17. Cunningham, H., R. Gaizauskas, K. Humphreys and Y. Wilks. “Experience with a Language Engineering Architecture: Three Years of GATE”. In Proceedings of the AISB'99 Workshop on Reference Architectures and Data Standards for NLP. Edinburgh, The Society for the Study of Artificial Intelligence and Simulation of Behaviour, 1999.Google Scholar
  18. Cunningham, H., R. Gaizauskas and Y.Wilks. “A General Architecture for Text Engineering (GATE) – a New Approach to Language Engineering R&D”. Technical Report CS-95-21, Department of Computer Science, University of Sheffield. http://xxx.lanl.gov/abs/cs.CL/9601009, 1995.Google Scholar
  19. Cunningham, H., K. Humphreys, R. Gaizauskas and M. Stower. “CREOLE Developer's Manual”. Technical report, Department of Computer Science, University of Sheffield. http://www.dcs.shef.ac.uk/nlp/gate, 1996a.Google Scholar
  20. Cunningham, H., K. Humphreys, R. Gaizauskas and Y. Wilks. “TIPSTER-Compatible Projects at Sheffield”. In Advance in Text Processing, TIPSTER Program Phase II. Morgan Kaufmann, California, 1996b.Google Scholar
  21. Cunningham, H., K. Humphreys, R. Gaizauskas and Y. Wilks. “GATE – a TIPSTER-based General Architecture for Text Engineering”. In Proceedings of the TIPSTER Text Program (Phase III) 6 Month Workshop. Morgan Kaufmann, California, 1997b.Google Scholar
  22. Cunningham, H., K. Humphreys, R. Gaizauskas and Y. Wilks. “Software Infrastructure for Natural Language Processing”. In Proceedings of the Fifth Conference on Applied Natural Language Processing (ANLP-97). http://xxx.lanl.gov/abs/cs.CL.9702005, 1997a.Google Scholar
  23. Cunningham, H., D. Maynard, K. Bontcheva, V. Tablan and Y. Wilks. “Experience of Using GATE for NLP R&D”. In Proceedings of the Workshop on Using Toolsets and Architectures to Build NLP Systems at COLING-2000. Luxembourg. http://gate.ac.uk/, 2000b.Google Scholar
  24. Cunningham, H., W. Peters, C. McCauley, K. Bontcheva and Y. Wilks. “A Level Playing Field for Language Resource Evaluation”. In Workshop on Distributing and Accessing Lexical Resources at Conference on Language Resources Evaluation. Granada, Spain, 1998a.Google Scholar
  25. Cunningham, H.,M. Stevenson and Y.Wilks. “Implementing a Sense Tagger within a General Architecture for Language Engineering”. In Proceedings of the Third Conference on New Methods in Language Engineering (NeMLaP-3). Sydney, Australia, 1998b, pp. 59–72.Google Scholar
  26. Cunningham, H., Y. Wilks and R. Gaizauskas. “GATE – a General. Architecture for Text Engineering”. In Proceedings of the 16th Conference on Computational Linguistics (COLING-96). Gopenhagen, 1996c.Google Scholar
  27. Cunningham, H., Y. Wilks and R. Gaizauskas. “New Methods, Current Trends and Software Infrastructure for NLP”. In Proceedings of the Conference on New Methods in Natural Language Processing (NeMLaP-2). Bilkent University, Turkey. http://xxx.lanl.gov/abs/cs.CL/9607025, 1996d.Google Scholar
  28. Cunningham, H., Y. Wilks and R. Gaizauskas. “Software Infrastructure for Language Engineering”. In Proceedings of the AISB Workshop on Language Engineering for Document Analysis and Recognition. Brighton, U.K., 1996e.Google Scholar
  29. Day, D., J. Aberdeen, L. Hirschman, R. Kozierok, P. Robinson and M. Vilain. “Mixed-Initiative Development of Language Processing Systems”. In Proceedings of the 5th Conference on Applied NLP Syatems (ANLP-97), 1997.Google Scholar
  30. Day, D., P. Robinson, M. Vilain and A. Yeh. “MITEE: Description of the Alembic System Used for MUC-7”. In Proceedings of the Seventh Message Understanding Conference (MUC-7). http://www.itl.nist.giv/iaui/894.02/-related_project/muc/index.html, 1998.Google Scholar
  31. Dybkjær, L., N. Bernsen, H. Dybkjær, D. McKelvie and A. Mengel. “The MATE Markup Framework. MATE Deliverable Dl.2”. Technical Report D1.2, MATE Project, http://mate.nis.sdu.dk/, 1998.Google Scholar
  32. Eriksson, M. “Final Report of Svensk”. Technical report, SICS, http://www.sics.se/humle/ projects/svensk/, 1997.Google Scholar
  33. Erikison, M. and B. Gambäck. “SVENSK: A Toolbox of Swedish Language Processing Resources”. In Proceedings of the 2nd Conference on Recent Advances in Natural Language Processing (RANLP-2). Tzigov Chark, Bulgaria, 1997.Google Scholar
  34. Fowler, M. and K. Scott. UML Distilled. Addison-Welsey, Reading, MA, 1997.Google Scholar
  35. Fowler, M. and K. Scott. UML Distilled, Second Edition. Addison-Welsey, Reading, MA, 2000.Google Scholar
  36. Fröhlich, M. and M. Werner. “Demonstration of the Graph Visualization System daVinci”. In Proceedings of DIMACS Workshop on Graph Drawing’ 94, LNCS 894. Springer-Verlag, 1995.Google Scholar
  37. Gaizauskas, R., H. Cunningham, Y. Wilks, P. Rodgers and K. Humphreys. “GATE – an Environment to Support Reaearch and Development in Natural Language Engineering”. In Proceedings of the 8th IEEE International Conference on Tool with Artificial Intelligence (ICTAI-96). Toulouse, France, 1996a.Google Scholar
  38. Gaizauskas, R., P. Rodgers, H. Cunningham and K. Humphreys. “GATE User Guide”. http:// www.dcs.shef.ac.uk/nlp/gate, 1996b.Google Scholar
  39. Gaizauskas, R., T. Wakao, K. Humpbreys, H. Cunningham and Y. Wilks. “Description of the LaSIE system as used for MUC-6”. In Proceedings of the Sixth Message Understanding Conference (MUC-6). Morgan Kaufmann, California, 1995.Google Scholar
  40. Gambäck, B. and F. Olason. “Experiences of Language Engineering Algorithm Reuse”. In Second International Conference on Language Resources and Evaluation (LREC). Athens, Greece, 2000, pp. 155–160.Google Scholar
  41. Goldfarb, C. and P. Prescod. The XML Handbook. Prentice Hall, New York, 1998.Google Scholar
  42. Goldfarb, C.F. The SGML Handbook. Oxford University Press, 1990.Google Scholar
  43. Gotoh, Y., S. Renals, R. Gaizauskas, G. Williams and H. Cunningham. “Named Entity Tagged Language Models for LVCSR”. Technical Report CS-98-05, Department of Computer Science, University of Sheffield, 1998.Google Scholar
  44. Grishman, R. “TIPSTER Architecture Design Document Version 2.3”. Technical report, DARPA. http://www.itl.nist.gov/div894/894.02/-related_projects/tipster/, 1997.Google Scholar
  45. Grishman, R. and B. Sundheim. “Message Understanding Conference – 6: A Brief History”. In Proceedings of the 16 International Conference on Computational Linguistics. Copenhagen, 1996.Google Scholar
  46. Harrison, P. “Evluating Syntax Performance of Parsers/Grammars of English”. In Proceedings of the Workshop on Evaluating Natural Language Processing Systems, ACL, 1991.Google Scholar
  47. Hayes-Roth, F. “Architecture-Based Acquisition and Development of Software: Guidelines and Recommendations from the ARPA Domain-Specific Software Architecture (DSSA) Program”. Technical report, Techknowledge Federal Systems. http://www.oswego.com/dssa/, visited 29th March 1999, 1994.Google Scholar
  48. Jelinek, F. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA, 1997.Google Scholar
  49. Keijola, M. “BRIEFS-Gaining Information of Value in Dynamical Business Environments”. http://www.tuta.hut.fi/briefs, 1999.Google Scholar
  50. Kokkinakis, D. “AVENTINUS, GATE and Swedish Lingware”. In Proceedings of the 11th NODALIDA Conference. Copenhagen, 1998, pp. 22–33.Google Scholar
  51. Kokkinakis, D. and S. Johansson-Kokkinakis. “A Cascaded Finite-State Parser for Syntactic Analysis of Swedish”. Technical report, Department of Swedish, University of Göteborg, Göteborg, 1999.Google Scholar
  52. LREC-1. “Conference on Language Resources Evaluation (LREC-1)”. Granada, Spain, 1998.Google Scholar
  53. Manning, C. and H. Schütze. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press. Supporting materials available at http://www.sultry.arts.usyd.edu.au/fsnlp/, 1999.Google Scholar
  54. Maynard, D., H. Cunningham, K. Bontcheva, R. Catizone, G. Demetriou, R. Gaizauskas, O. Hamza, M. Hepple, P. Herring, B. Mitchell, M. Oakes, W. Peters, A. Setzer, M. Stevenson, V. Tablan, C. Ursu and Y. Wilks. “A Survey of Uses of GATE”. Technical Report CS-00-06, Department of Computer Science, University of Sheffield, 2000.Google Scholar
  55. McEnery, A., P. Baker, R. Gaizauskas and H. Cunningham. “EMILLE: Building a Corpus of South Asian Languages”. Vivek, A Quarterly in Artificial Intelligence, 13(3) (2000), pp. 23–32.Google Scholar
  56. McKelvie, D., C. Brew and H. Thompson. “Using SGML as a Basis for Data-Intensive NLP”. In Proceedings of the fifth Conference on Applied Natural Language Processing (ANLP-97). Washington, DC, 1997.Google Scholar
  57. McKelvie, D., C. Brew and H. Thompson. “Using SGML as a Basis for Data-Intensive Natural Language Processing”. Computers and the Humanities, 31(5) (1998), pp. 367–388.Google Scholar
  58. Nelson, T. “Embedded Markup Considered Harmful”. In XML: Principles, Tools and Techniques. Ed. D. Connolly, O'Reilly, Cambridge, MA, 1997, pp. 129–134.Google Scholar
  59. Olsson, F. “Tagging and Morphological Processing in the SVENSK System”. Master's thesis, University of Uppsala. http://http://stp.ling.uu.se/fredriko/exjobb.ps, 1997.Google Scholar
  60. Olsson, F., B. Gambäck and M. Eriksson. “Reusing Swedish Language Processing Resources in SVENSK”. In Workshop on Minimising the Efforts for LR Acquistion. Granada, Spain, 1998.Google Scholar
  61. Ousterhout, J. Tcl and the Tk Toolkit. Addison-Wesley, Reading, MA, 1994.Google Scholar
  62. Peter, W., H. Cunningham, C. McCauley, K. Bontcheva and Y. Wilks. “Uniform Language Resource Access and Distribution”. In Workshop on Distributing and Accessing Lexical Resources at Conference on Language Resources Evaluation. Granada, Spain, 1998.Google Scholar
  63. Roche, E. and Y. Schabes. finite-State Language Processing. MIT Press, Cambridge, MA, 1997.Google Scholar
  64. Rodgers, P., R. Gaizauskas, K. Humphreys and H. Cunningham. “Visual Execution and Data Visualisation in Natural Language Processing”. In IEEE Visual Language. Capri, Italy, 1997.Google Scholar
  65. Spyropoulos, C. “Final Report of the Greek Information Extraction (GIE) Project”. Technical report, NKSR Demokritus, Athens, 1999.Google Scholar
  66. Stevenson, M., H. Cunningham and Y. Wilks. “Sense Tagging and Language Engineering”. In Proceedings of the 13th European Conference on Artificial Intellingence (ECAI-98). Brighton, U.K., 1998, pp. 185–189.Google Scholar
  67. The Unicode Consortium. The Unicode Standard, Version 2.0. Addison-Wesley, Reading, MA, 1996.Google Scholar
  68. Tracz, W. “Domain-Specific Software Architecture (DSSA) Frequently Asked Questions (FAQ)”. http://www.oswego.com/dssa/faq/faq.html, 1995.Google Scholar
  69. Yourdon, E. Modern Structured Analysis. Prentice Hall, New York, 1989.Google Scholar
  70. Yourdon, E. The Rise and Resurrection of the American Programmer. Prentice Hall, New York, 1996.Google Scholar
  71. Zajac, R. “An Open Distributed Architecture for Reuse and Integration of Heterogenous NLP Components”. In Proceedings of the 5th Conference on Applied Natural Language Processing (ANLP-97), 1997.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Hamish Cunningham
    • 1
  1. 1.Department of Computer Science andInstitute for Language, Speech and Hearing, University of SheffieldUK

Personalised recommendations