Towards a New Generation of Language Resources in the Semantic Web Vision

Part of the Text, Speech and Language Technology book series (TLTB, volume 36)

In this contribution I touch on issues related to: language resources (LR) and semantics, dynamic resources automatically acquired, and how to go for a new generation of LRs compliant with the Semantic Web (SW) vision, pointing at the potentialities and the need for cross-fertilisation between the two communities of Human Language Technology (HLT) and SW/ontologies. Many of these issues are related to Yorick’s work on preferences, lexicons, semantic annotation, and recently to his ideas on the relation between HLT and SW

Large scale LRs are unanimously recognised as the necessary infrastructure underlying language technology (LT) (Varile and Zampolli (eds.) 1997). Discussing a few major European initiatives for building harmonised LRs, I highlight how computational lexicons and textual corpora should be considered as complementary views on the lexical space, in the perspective of modelling a new type of resource which is both a lexicon and a corpus together. A “complete” computational lexicon should incorporate and represent our “knowledge of the world”. I claim that it is theoretically impossible to achieve completeness within any “static” lexicon. Moreover, choices on the syntagmatic axis are pervasive in language. A sound language infrastructure must encompass both “static” lexicons, as the traditional ones, and “dynamic” systems able to enrich the lexicon with information acquired on-line from large corpora, thus capturing the “actually realised” potentialities, the large range of variation, and the flexibility inherent in the language as it is used. These are the challenges for semantic tagging, which is at the core of the SW vision of giving meaning, in a manner understandable by machines, to the content of Web documents

Broadening our perspective into the future, the need for more and more “knowledge intensive” large-size LRs for effective content processing requires a change in the paradigm, and the design of a new generation of LRs, based on open content interoperability standards. The SW notion may be helpful in determining the shape of the LRs of the future, consistent with the vision of an open distributed space of sharable knowledge available on the Web for processing

The approach to realise the necessary world-wide linguistic infrastructure requires coverage not only of a range of technical aspects, but also – and maybe most critically – of a number of organisational aspects. An essential aspect for ensuring an integrated basis is to enhance the interchange and cooperation among many communities that act now separately, such as LR and LT developers, Terminology, Semantic Web and Ontology experts, content providers, linguists and so on. This is one of the challenges for the next years, for a usable and useful “language” scenario in the global network


Natural Language Processing Computational Linguistics Language Resource Lexical Information Lexical Resource 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Amsler, R.A. 1981. A taxonomy for English nouns and verbs. In Proceedings of the 19th Annual Meeting of the Association for Computational Linguistics. Stanford University, Stanford, California, USA, pp. 133–138.Google Scholar
  2. Barnbrook, G. 2002. Defining Language: A Local Grammar of Definition Sentences. Studies in Corpus Linguistics 11. John Benjamins.Google Scholar
  3. Bertagna, F., Lenci, A., Monachini, M. and Calzolari, N. 2004. “Content Interoperability of Lexical Resources : Open Issues and MILE Perspectives”. In Proceedings of LREC2004, pp. 131–134.Google Scholar
  4. Binnenpoorte, D., De Vriend, F., Sturm, J., Daelemans, W., Strik, H. and Cucchiarini, C. 2002. “A Field Survey for Establishing Priorities in the Development of HLT Resources for Dutch”. In LREC 2002 Proceedings. Las Palmas, pp. 1862–1866.Google Scholar
  5. Byrd, R.J., Calzolari, N., Chodorow, M.S., Klavans, J.L., Neff, M.S. and Rizk, O.A. 1987. “Tools and Methods for Computational Lexicology”. In Computational Linguistics. ACL Journal, 13(3–4):219–240.Google Scholar
  6. Boguraev, B. and Briscoe, T. (eds.) 1989. Computational Lexicography for Natural Language Processing. Longman.Google Scholar
  7. Boguraev, B., Briscoe, E.J., Calzolari, N., Cater, A., Meijs, W. and Zampolli, A. 1988. Acquisition of Lexical Knowledge for Natural Language Processing Systems (ACQUILEX). Proposal for ESPRIT Basic Research Actions No. 3030. Cambridge, UK, p. 34.Google Scholar
  8. Briscoe, T., McCarthy, D., Carroll, J., Allegrini, P., Calzolari, N., Federici, S., Montemagni, S., Pirrelli, V., Abney, S., Beil, F., Carroll, G., Light, M., Prescher, D., Riezler, S. and Rooth, M. 1999. Acquisition System for Syntactic and Semantic Type and Selection. SPARKLE Deliverable 7.2. Pisa. P. 72.Google Scholar
  9. Calzolari, N. 1977. “An Empirical Approach to Circularity in Dictionary Definitions”. In Cahiers de Lexicologie, XXXI(2):118–128.Google Scholar
  10. Calzolari, N. 1982. “Towards the Organization of Lexical Definitions on a Database Structure”. In Jàn Horecky (ed.), COLING ’82. (North-Holland Linguistic Series, 47). North-Holland, Amsterdam, pp. 61–64.Google Scholar
  11. Calzolari, N. 1991. “Lexical Databases and Textual Corpora: Perspectives of Integration for a Lexical Knowledge Base”.In Zernik, U. (ed.), Lexical Acquisition: Exploiting on-line Resources to Build a Lexicon. Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 191–208.Google Scholar
  12. Calzolari, N. 1998. “An Overview of Written Language Resources in Europe: A few Reflection, Facts, and a Vision”. In Rubio, A., Gallardo, N., Castro, R., Tejada, A. (eds.), Proceedings of the First International Conference on Language Resources and Evaluation (LREC). Granada, Vol. I, pp. 217–224.Google Scholar
  13. Calzolari, N. 2002. “Computational Lexicons: Towards a New paradigm of an Open Lexical Infrastructure?”. In Willée, G., Schröder, B., Schmitz, H.C. (eds.), Computerlinguistik. Was geht, was kommt?. Computational Linguistics. Achievements and Perspectives. Gardez! Verlag, Sankt Augustin, pp. 41–47.Google Scholar
  14. Calzolari, N. 2003. “Corpus-based Lexicon Building: An Overview Across Projects, Problems, Approaches”. In Zampolli, A., Calzolari, N., Cignoni, L. (eds.), Computational Linguistics in Pisa – Linguistica Computazionale a Pisa. Special Issue of Linguistica Computazionale. IEPI, Pisa, pp. 79–116.Google Scholar
  15. Calzolari, N. 2004. “Computational Lexicons and Corpora: Complementary Components in Human Language Technology”. In van Sterkenburg, P. (ed.), Linguistics Today – Facing a Greater Challenge. John Benjamins, Amsterdam, pp. 89–107.Google Scholar
  16. Calzolari, N. 2005. “Language Resources: priorities and challenges”. In Symposium on Natural Processing and Image Recognition. National Institute of Information and Communication (NICT), Kyoto University, pp. 9–12.Google Scholar
  17. Calzolari, N. 2006. “Technical and Strategic Issues on language resources for a Research Infrastructure”. In Furui, S. (ed.), Proceedings of the International Symposium on Large-scale Knowledge Resources (LKR2006). Tokyo Institute of Technology, pp. 53–58.Google Scholar
  18. Calzolari, N., Bertagna, F., Lenci, A., Monachini, M. (eds.) 2003. Standards and Best Practice for Multilingual Computational Lexicons. MILE (the Multilingual ISLE Lexical Entry). ISLE CLWG Deliverable D2.2&D2.3. Pisa, p. 194. Scholar
  19. Calzolari, N., Lenci, A., Bertagna, F. and Zampolli, A. 2002. Broadening the Scope of the EAGLES/ISLE Lexical Standardization Initiative. In Calzolari, N., Choi, K., Lenci, A., Tokunaga, T. (eds.), Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization. Taipei, Taiwan, pp. 9–16.Google Scholar
  20. Calzolari, N. and Briscoe, T. 1995. “ACQUILEX I and II. Acquisition of Lexical Knowledge from Machine-Readable Dictionaries and Text Corpora”. In Cahiers de Lexicologie, 67(2):95–114.Google Scholar
  21. Calzolari, N., Choukri, K., Gavrilidou, M., Maegaard, B., Baroni, P., Fersøe, H., Lenci, A., Mapelli, V., Monachini, M. and Piperidis, S. 2004. “ENABLER Thematic Network of National Projects: Technical, Strategic and Political Issues of LRs”. In LREC 2004 Proceedings. Lisbon, pp. 937–940.Google Scholar
  22. Calzolari, N., Federici, S., Montemagni, S. and Peters, C. 1995. “Extracting, Representing and Using Syntactic-Semantic Information from the Cobuild Student’s Dictionary”. In Sinclair, J., Hoelter, M., Peters, C. (eds.), The Languages of Definition: The Formalization of Dictionary Definitions for Natural Language processing. Studies in Machine Translation and Natural Language Processing, European Communities, Brussels Luxembourg, pp. 59–148.Google Scholar
  23. Calzolari, N., Hagman, J., Marinai, E., Montemagni, S., Spanu, A. and Zampolli, A. 1993. “Encoding Lexicographic Definitions as Typed Feature Structures”. In Beckmann, F., Heyer, G. (eds.), Theorie und Praxis des Lexikons. Foundations of Communication and Cognition. Walter de Gruyter, Berlin, pp. 274–315.Google Scholar
  24. Calzolari, N. and Moretti, L. 1976. “A Method for a Normalization and a Possible Algorithmic Treatment of Definitions in the Italian Dictionary”. In Proceedings of the 6th International conference on Computational Linguistics (COLING’76). Ottawa, No. 32, p. 13.Google Scholar
  25. Calzolari, N. and Zampolli, A. 1999. “Harmonised Large-scale Syntactic/Semantic Lexicons: A European Multilingual Infrastructure”. In MT Summit Proceedings. Singapore.Google Scholar
  26. Calzolari, N. and Zampolli, A. 2003. “The EAGLES/ISLE Initiative for Setting Standards: The Computational Lexicon Working Group for Multilingual Lexicons”. In Cole, C., Craig, H. (eds.), Computing Arts: Digital Resources for Research in the Humanities. University of Sydney, pp. 45–73.Google Scholar
  27. Federici, S., Montemagni, S., Pirrelli, V. and Calzolari, N. 1998. “Analogy-based Extraction of Lexical Knowledge from Corpora: the SPARKLE Experience”. In Rubio, A., Gallardo, N., Castro, R., Tejada, A. (eds.), Proceedings of the First International Conference on Language Resources and Evaluation. Granada, Vol. I, pp. 75–82.Google Scholar
  28. Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. MIT Press.Google Scholar
  29. Hanks, P. 2004. “The Syntagmatics of Metaphor”. In International Journal of Lexicography, 17(3).Google Scholar
  30. Hanks, P. and Pustejovsky, J. 2005. “A Pattern Dictionary for Natural Language Processing”. In Revue française de linguistique appliquée, 10(2).Google Scholar
  31. Krauwer, S. 1998. “ELSNET and ELRA: A Common Past and a Common Future”. In ELRA Newsletter, 3(2).Google Scholar
  32. Lenat, D. and Guha, R.V. 1990. Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley.Google Scholar
  33. Lenci, A., Bel, N., Busa, F., Calzolari, N., Gola, E., Monachini, M., Ogonowsky, A., Peters, I., Peters, W., Ruimy, N., Villegas, M. and Zampolli, A. 2000. “SIMPLE: A General Framework for the Development of Multilingual Lexicons”. International Journal of Lexicography, 13(4):249–263.CrossRefGoogle Scholar
  34. Lenci, A., Montemagni, S., Pirrelli, V. and Soria, C. 1999. “FAME: A Functional Annotation Meta-scheme for Multi-modal and Multi-lingual Parsing Evaluation”. In Proceedings of the ACL-IALL Workshop “Computer-mediated Language Assessment and Evaluation in Natural Language Processing”. Maryland, pp. 45–52.Google Scholar
  35. Mapelli, V., Choukri, K. 2003. “Report on a (Minimal) Set of LRs to Be Made Available for as Many Languages as Possible, and Map of the Actual Gaps”. ENABLER Deliverable D5.1, Paris, p. 22.Google Scholar
  36. Nakamura, J. and Nagao, M. 1988. “Extraction of Semantic Information from an Ordinary English Dictionary and its Evaluation”. In COLING 1988. pp. 459–464.Google Scholar
  37. Ostler, N., Zampolli, A. (eds.) 1994. Literary and Linguistic Computing. Special issue. OUP.Google Scholar
  38. Ruimy, N., Corazzari, O., Gola, E., Spanu, A., Calzolari, N. and Zampolli, A. 1998. “The European LE-PAROLE Project: the Italian Syntactic Lexicon”. In Proceedings of LREC1998, pp. 241–248.Google Scholar
  39. Ruimy N., Monachini, M., Gola, E., Calzolari, N., Ulivieri, M., Del Fiorentino, M.C., Ulivieri, M. and Rossi, S. 2003. “A Computational Semantic Lexicon of Italian: SIMPLE”. In Zampolli, A., Calzolari, N., Cignoni, L. (eds.), Computational Linguistics in Pisa. Linguistica Computazionale. Vol. XVI–XVII. IEPI, Pisa, pp. 821–864.Google Scholar
  40. Varile, G.B. and Zampolli, A. (eds.) 1992. “Synopsis of American, European and Japanese Projects”. In Linguistica Computazionale, VIII, Giardini Editore, Pisa.Google Scholar
  41. Varile, G.B. and Zampolli, A. (eds.) 1997. Survey of the State of the Art in Human Language Technology. Sponsored by the Commission of the European Union and the National Science Foundation of the USA, Giardini Editori, Pisa and Cambridge University Press.Google Scholar
  42. Vossen, P. 1998. “Introduction to EuroWordNet”. Computers and the Humanities, 32:73–89.CrossRefGoogle Scholar
  43. Walker, D., Zampolli, A. and Calzolari, N. (eds.) 1995. Automating the Lexicon: Research and Practice in a Multilingual Environment. Clarendon Press, OUP, Oxford, p. 413.Google Scholar
  44. Wilks, Y. 1975a. “An Intelligent Analyser and Understander of English”. In Communications of the ACM, 18(5):264–274.CrossRefGoogle Scholar
  45. Wilks, Y. 1975b. “A Preferential, Pattern-Seeking, Semantics for Natural Language Inference”. In Artificial Intelligence, 6:53–74.CrossRefGoogle Scholar
  46. Wilks, Y. 1979. “Making Preferences More Active”. In Artificial Intelligence, 11: 197–223.CrossRefGoogle Scholar
  47. Wilks, Y. 1995. “ECRAN: Extraction of Content: our Research at Near-market”. EU LRE Project.Google Scholar
  48. Wilks, Y. 1997. “Senses and texts”. In Computers and the Humanities, 31(2):77–90.CrossRefGoogle Scholar
  49. Wilks, Y. 2005. “The Semantic Web as the apotheosis of annotation, but what are its semantics?”. AAAI Proceedings.Google Scholar
  50. Wilks, Y., Fass, D., Guo, C.-M., MacDonald, J.E., Plate, T. and Slator, B. 1989. “A Tractable Machine Dictionary as a Resource for Computational Semantics”. In Boguraev, B., Briscoe, T. (eds.), Computational lexicography for natural language planning. London: Longman, and as CRL Memoranda in Computer and Cognitive Science, MCCS-87-105.Google Scholar
  51. Wilks, Y. and Nirenburg, S. 1993. “Towards Automated Knowledge Acquisition”. In Proceedings of the Conference on Very large Knowledge Bases. Electronic Dictionary Research Institute, Tokyo.Google Scholar
  52. Zampolli, A. 1997. “The PAROLE Project in the General Context of the European Actions for Language Resources”. In Marcinkeviciene, R., Volz, N. (eds.), TELRI, Second European Seminar: Language Applications for a Multilingual Europe. Kaunas, Lithuania, pp. 185–210.Google Scholar
  53. Zampolli, A. 1998. “Introduction of the General Chairman”. In Rubio, A., Gallardo, N., Castro, R., Tejada, A. (eds.), Proceedings of the First International Conference on Language Resources and Evaluation (LREC). Granada, Vol. I, pp. xv–xxv.Google Scholar
  54. Zampolli, A. 2003. “Standards for Language Data Processing: An Historical Overview”. In Fiormonte, D. (ed.), Informatica Umanistica dalla Ricerca all’Insegnamento. Atti del Convegno Computer, Literature and Philology. Bulzoni, pp. 65–84.Google Scholar
  55. Zampolli, A., Calzolari, N. and Cignoni, L. (eds.) 2003. Computational Linguistics in Pisa – Linguistica Computazionale a Pisa. Linguistica Computazionale, Special Issue, Vol. XVI–XVII, Vol. XVIII–XIX. IEPI, Pisa-Roma.Google Scholar
  56. Zampolli, A., Calzolari, N. and Palmer, M. (eds.) 1994. Current Issues in Computational Linguistics: in Honour of Don Walker. Linguistica Computazionale, Vol. IX–X, Giardini Editori, Pisa and Kluwer Academic Publisher, Norwell, MA, p. 595.Google Scholar
  57. Zampolli, A. et al. 2000. ENABLER Technical Annex, Pisa.Google Scholar

Copyright information

© Springer 2007

Authors and Affiliations

  1. 1.Istituto di Linguistica Computazionale del CNRPisaItaly

Personalised recommendations