Dataset Alignment and Lexicalization to Support Multilingual Analysis of Legal Documents

  • Armando StellatoEmail author
  • Manuel Fiorelli
  • Andrea Turbati
  • Tiziano Lorenzetti
  • Peter Schmitz
  • Enrico FrancesconiEmail author
  • Najeh Hajlaoui
  • Brahim Batouche
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10791)


The result of the EU is a complex, multilingual, multicultural and yet united environment, requiring solid integration policies and actions targeted at simplifying cross-language and cross-cultural knowledge access. The legal domain is a typical case in which both the linguistic and the conceptual aspects mutually interweave into a knowledge barrier that is hard to break. In the context of the ISA2 funded project “Public Multilingual Knowledge Infrastructure” (PMKI) we are addressing Semantic Interoperability at both the conceptual and lexical level, by developing a set of coordinated instruments for advanced lexicalization of RDF resources (be them ontologies, thesauri and datasets in general) and for alignment of their content. In this paper, we describe the objectives of the project and the concrete actions, specifically in the legal domain, that will create a platform for multilingual cross-jurisdiction accessibility to legal content in the EU.


Legal Domain Eurovoc Lexical Markup Framework Multilingual Tools Lexical Enrichment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Francesconi, E., Peruginelli, G.: Opening the legal literature portal to multi-lingual access. In: Proceedings of the Dublin Core Conference, pp. 37–44 (2004)Google Scholar
  2. 2.
    Antonini, A., Boella, G., Hulstijn, J., Humphreys, L.: Requirements of legal knowledge management systems to aid normative reasoning in specialist domains. In: Nakano, Y., Satoh, K., Bekki, D. (eds.) JSAI-isAI 2013. LNCS (LNAI), vol. 8417, pp. 167–182. Springer, Cham (2014). Scholar
  3. 3.
    Velardi, P., Navigli, R., Cucchiarelli, A., Neri, F.: Evaluation of ontolearn, a methodology for automatic population of domain ontologies. In: Ontology Learning from Text: Methods, Applications and Evaluation. IOS Press, Amsterdam (2005)Google Scholar
  4. 4.
    Pennacchiotti, M., Pantel, P.: Automatically harvesting and ontologizing semantic relations. In: Buitelaar, P., Cimiano, P. (eds.) Ontology learning and population: bridging the gap between text and knowledge. Frontiers in Artificial Intelligence. IOS Press, Amsterdam (2008)Google Scholar
  5. 5.
    Cole, R.A., Mariani, J., Uszkoreit, H., Zaenen, A., Zue, V. (eds.): Survey of the State of the Art in Human Language Technology. Cambridge University Press, Cambridge (1997)Google Scholar
  6. 6.
    Calzolari, N., McNaught, J., Zampolli, A.: EAGLES Final Report: EAGLES Editors Introduction. Pisa, Italy (1996)Google Scholar
  7. 7.
    Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An On-line Lexical Database (1993)Google Scholar
  8. 8.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. WordNet Pointers. MIT Press, Cambridge, MA (1998)zbMATHGoogle Scholar
  9. 9.
    Vossen, P.: EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, Dordrecht (1998)CrossRefGoogle Scholar
  10. 10.
    Roventini, A., et al.: ItalWordNet: a large semantic database for the automatic treatment of the Italian language. In: First International WordNet Conference, Mysore, India, January 2002Google Scholar
  11. 11.
    Stamou, S., et al.: BALKANET: a multilingual semantic network for the Balkan languages. In: First International Wordnet Conference, Mysore, India, pp. 12–14 (2002)Google Scholar
  12. 12.
    Francopoulo, G., et al.: Lexical markup framework (LMF). In: LREC2006, Genoa, Italy (2006)Google Scholar
  13. 13.
    Pazienza, M.T., Stellato, A., Turbati, A.: Linguistic Watermark 3.0: an RDF framework and a software library for bridging language and ontologies in the semantic web. In: 5th Workshop on Semantic Web Applications and Perspectives (SWAP2008), Rome, Italy, 15–17 December 2008, CEUR Workshop Proceedings, FAO-UN, Rome, Italy, vol. 426, p. 11 (2008)Google Scholar
  14. 14.
    Oltramari, A., Stellato, A.: Enriching ontologies with linguistic content: an evaluation framework. In: The Role of Ontolex Resources in Building the Infrastructure of Web 3.0: Vision and Practice (OntoLex 2008), 31 May, Marrakech, Morocco, pp. 1–8 (2008)Google Scholar
  15. 15.
    Cimiano, P., Haase, P., Herold, M., Mantel, M., Buitelaar, P.: LexOnto: a model for ontology lexicons for ontology-based NLP. In: Proceedings of the OntoLex07 Workshop (held in conjunction with ISWC 2007) (2007)Google Scholar
  16. 16.
    Buitelaar, P., et al.: LingInfo: design and applications of a model for the integration of linguistic information in ontologies. In: OntoLex 2006, Genoa, Italy, pp. 28–34 (2006)Google Scholar
  17. 17.
    Montiel-Ponsoda, E., Aguado de Cea, G., Gómez-Pérez, A., Peters, W.: Enriching ontologies with multilingual information. Nat. Lang. Eng. 17, 283–309 (2011)CrossRefGoogle Scholar
  18. 18.
    Cimiano, P., Buitelaar, P., McCrae, J., Sintek, M.: LexInfo: a declarative model for the lexicon-ontology interface. Web Semant. Sci. Serv. Agents World Wide Web 9(1), 29–51 (2011)CrossRefGoogle Scholar
  19. 19.
    McCrae, J., et al.: Interchanging lexical resources on the Semantic Web. Lang. Resour. Eval. 46(4), 701–719 (2012)CrossRefGoogle Scholar
  20. 20.
    Cimiano, P., McCrae, J.P., Buitelaar, P.: Lexicon Model for Ontologies: Community Report, 10 May 2016. Community Report, W3C (2016).
  21. 21.
    Borin, L., Dannélls, D., Forsberg, M., McCrae, J.P.: Representing Swedish lexical resources in RDF with lemon. In: Proceedings of the ISWC 2014 Posters & Demonstrations Track a Track Within the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, pp. 329–332 (2014)Google Scholar
  22. 22.
    Ehrmann, M., Cecconi, F., Vannella, D., McCrae, J.P., Cimiano, P., Navigli, R.: Representing multilingual data as linked data: the case of BabelNet 2.0. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland, 26–31 May 2014, pp. 401–408 (2014)Google Scholar
  23. 23.
    Eckle-Kohler, J., McCrae, J.P., Chiarcos, C.: lemonUby—a large, interlinked syntactically-rich lexical resources for ontologies. Semant. Web J. (2015 accepted)Google Scholar
  24. 24.
    Sérasset, G.: Dbnary: wiktionary as a LMF based multilingual RDF network. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), Istanbul, Turkey, 23–25 May 2012, pp. 2466–2472 (2012)Google Scholar
  25. 25.
    Buitelaar, P.: Ontology-based Semantic Lexicons: Mapping between Terms and Object Descriptions. In: Huang, C.-R., Calzolari, N., Gangemi, A., Lenci, A., Oltramari, A., Prevot, L. (eds.) Ontology and the Lexicon: A Natural Language Processing Perspective. Cambridge University Press, Cambridge (2010)Google Scholar
  26. 26.
    Cimiano, P., McCrae, J., Buitelaar, P., Montiel-Ponsoda, E.: On the role of senses in the ontology-Lexicon. In: Oltramari, A., Vossen, P., Qin, L., Hovy, E. (eds.) New Trends of Research in Ontologies and Lexical Resources, pp. 43–62. Springer, Berlin (2013). Scholar
  27. 27.
    Evans, V.: Lexical concepts, cognitive models and meaning-construction. Cognit. Linguist. 17(4), 491–534 (2006)CrossRefGoogle Scholar
  28. 28.
    Chiarcos, C., McCrae, J., Cimiano, P., Fellbaum, C.: Towards open data for linguistics: linguistic linked data. In: Oltramari, A., Vossen, P., Qin, L., Hovy, E. (eds.) New Trends of Research in Ontologies and Lexical Resources, pp. 7–25. Springer, Berlin (2013). Scholar
  29. 29.
    World Wide Web Consortium (W3C): SKOS Simple knowledge organization system reference. In: World Wide Web Consortium (W3C) (2009). Accessed 18 Aug 2009
  30. 30.
    World Wide Web Consortium (W3C): SKOS simple knowledge organization system eXtension for labels (SKOS-XL). In: World Wide Web Consortium (W3C). Accessed 18 Aug 2009
  31. 31.
    Enea, R., Pazienza, M.T., Turbati, A.: GENOMA: GENeric Ontology Matching Architecture. In: Gavanelli, M., Lamma, E., Riguzzi, F. (eds.) AI*IA 2015. LNCS (LNAI), vol. 9336, pp. 303–315. Springer, Cham (2015). Scholar
  32. 32.
    Fiorelli, M., Pazienza, M.T., Stellato, A.: A meta-data driven platform for semi-automatic configuration of ontology mediators. In Chair, N.C., et al. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), May 2014. European Language Resources Association (ELRA), Reykjavik, Iceland, pp. 4178–4183 (2014)Google Scholar
  33. 33.
    Stellato, A., Rajbhandari, S., Turbati, A., Fiorelli, M., Caracciolo, C., Lorenzetti, T., Keizer, J., Pazienza, M.T.: VocBench: a web application for collaborative development of multilingual thesauri. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 38–53. Springer, Cham (2015). Scholar
  34. 34.
    Pazienza, M.T., Scarpato, N., Stellato, A., Turbati, A.: Semantic Turkey: a browser-integrated environment for knowledge acquisition and management. Semant. Web J. 3(3), 279–292 (2012)Google Scholar
  35. 35.
    Stellato, A., et al.: Towards VocBench 3: pushing collaborative development of thesauri and ontologies further beyond. In: 17th European Networked Knowledge Organization Systems (NKOS) Workshop, 21st September 2017, Thessaloniki, Greece (2017)Google Scholar
  36. 36.
    Fiorelli, M., Lorenzetti, T., Pazienza, M.T., Stellato, A.: Assessing VocBench custom forms in supporting editing of lemon datasets. In: Gracia, J., Bond, F., McCrae, John P., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds.) LDK 2017. LNCS (LNAI), vol. 10318, pp. 237–252. Springer, Cham (2017). Scholar
  37. 37.
    Pazienza, M.T., Stellato, A.: An environment for semi-automatic annotation of ontological knowledge with linguistic content. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, pp. 442–456. Springer, Heidelberg (2006). Scholar
  38. 38.
    Pazienza, M.T., Sguera, S., Stellato, A.: Let’s talk about our “being”: a linguistic-based ontology framework for coordinating agents. Appl. Ontol. Spec. Issue Form. Ontol. Commun. Agents 2(3–4), 305–332 (2007)Google Scholar
  39. 39.
    Fiorelli, M., Stellato, A., McCrae, J.P., Cimiano, P., Pazienza, M.T.: LIME: the metadata module for OntoLex. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 321–336. Springer, Cham (2015). Scholar
  40. 40.
    Fiorelli, M., Pazienza, M.T., Stellato, A.: An API for OntoLex LIME datasets. In: OntoLex-2017 1st Workshop on the OntoLex Model (co-located with LDK-2017), Galway (2017)Google Scholar
  41. 41.
    Shvaiko, P., Euzenat, J.: Ontology matching: state of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25(1), 158–176 (2013)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Armando Stellato
    • 1
    Email author
  • Manuel Fiorelli
    • 1
  • Andrea Turbati
    • 1
  • Tiziano Lorenzetti
    • 1
  • Peter Schmitz
    • 2
  • Enrico Francesconi
    • 2
    • 3
    Email author
  • Najeh Hajlaoui
    • 2
  • Brahim Batouche
    • 2
  1. 1.ART Group, Department of Enterprise EngineeringUniversity of Rome Tor VergataRomeItaly
  2. 2.Publications Office of the European UnionLuxembourg CityLuxembourg
  3. 3.Institute of Legal Information Theory and Techniques (ITTIG)Consiglio Nazionale delle Ricerche (CNR)FlorenceItaly

Personalised recommendations