Skip to main content

Legislative Document Content Extraction Based on Semantic Web Technologies

A Use Case About Processing the History of the Law

  • Conference paper
  • First Online:
The Semantic Web (ESWC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11503))

Included in the following conference series:

Abstract

This paper describes the system architecture for generating the History of the Law developed for the Chilean National Library of Congress (BCN). The production system uses Semantic Web technologies, Akoma-Ntoso, and tools that automate the marking of plain text to XML, enriching and linking documents. These documents semantically annotated allow to develop specialized political and legislative services, and to extract knowledge for a Legal Knowledge Base for public use. We show the strategies used for the implementation of the automatic markup tools, as well as describe the knowledge graph generated from semantic documents. Finally, we show the contrast between the time of document processing using semantic technologies versus manual tasks, and the lessons learnt in this process, installing a base for the replication of a technological model that allows the generation of useful services for diverse contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://docs.oasis-open.org/legaldocml/ns/akn/3.0.

  2. 2.

    https://code.google.com/archive/p/weso-desh/.

  3. 3.

    http://purl.org/vocab/frbr/core.

  4. 4.

    http://www.geonames.org/ontology.

  5. 5.

    http://www.w3.org/2003/01/geo/wgs84_pos.

  6. 6.

    http://datos.bcn.cl/sparql.

  7. 7.

    https://www.leychile.cl.

  8. 8.

    https://datos.bcn.cl/es/ontologias.

  9. 9.

    A test tool of the Automatic XML Marker can be found in http://bcn.cl/28n7h.

  10. 10.

    https://nlp.stanford.edu/software/CRF-NER.shtml.

  11. 11.

    https://spacy.io/usage/linguistic-features#section-named-entities.

  12. 12.

    https://opennlp.apache.org.

  13. 13.

    http://lime.cirsfid.unibo.it.

  14. 14.

    https://xcential.com/legispro-xml-tech/.

  15. 15.

    https://at4am.eu.

  16. 16.

    https://github.com/bungeni-org.

  17. 17.

    http://www.ittig.cnr.it/lab/xmlegeseditor.

  18. 18.

    https://ec.europa.eu/isa2/solutions/leos.

  19. 19.

    https://www.bcn.cl/historiadelaley.

  20. 20.

    https://www.bcn.cl/laborparlamentaria.

  21. 21.

    https://www.bcn.cl/presupuesto.

References

  1. Abolhassani, M., Fuhr, N., Gövert, N.: Information extraction and automatic markup for XML documents. In: Blanken, H., Grabs, T., Schek, H.-J., Schenkel, R., Weikum, G. (eds.) Intelligent Search on XML Data. LNCS, vol. 2818, pp. 159–174. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45194-5_11

    Chapter  Google Scholar 

  2. Akhtar, S., Reilly, R.G., Dunnion, J.: Automating XML markup using machine learning techniques. J. Systemics Cybern. Inform. 2(5), 12–16 (2004)

    Google Scholar 

  3. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval: The Concepts and Technology Behind Search, vol. 82. Pearson Education Ltd., Harlow (2011)

    Google Scholar 

  4. Bolioli, A., Dini, L., Mercatali, P., Romano, F.: For the automated mark-up of Italian legislative texts in XML. In: Legal Knowledge and Information Systems (Jurix 2002), pp. 21–30. IOS Press (2002)

    Google Scholar 

  5. Burget, R.: Automatic document structure detection for data integration. In: Abramowicz, W. (ed.) BIS 2007. LNCS, vol. 4439, pp. 391–397. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72035-5_30

    Chapter  Google Scholar 

  6. Bizer, C., Hartig, O.: How to Publish Linked Data on the Web - Half-day Tutorial at the 7th International Semantic Web Conference (2008)

    Google Scholar 

  7. Cifuentes-Silva, F.: Service-Oriented Architecture for automatic markup of documents. An use case for legal documents. In: IFLA 2014, Lyon, p. 10 (2014)

    Google Scholar 

  8. Cifuentes-Silva, F., Sifaqui, C., Labra-Gayo, J.E.: Towards an architecture and adoption process for linked data technologies in open government contexts. In: Proceedings of the 7th International Conference on Semantic Systems - I-Semantics 2011, pp. 79–86 (2011)

    Google Scholar 

  9. Gacitua B.R., Aravena-Diaz, V., Cares, C., Cifuentes-Silva, F.: Conceptual distinctions for traceability of history of law. In: Rocha, A. (ed.) 11th Iberian Conference on Information Systems and Technologies (CISTI). IEEE (2016)

    Google Scholar 

  10. Hoffart, J., Seufert, S., Nguyen, D.B., Theobald, M., Weikum, G.: KORE: keyphrase overlap relatedness for entity disambiguation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (2012)

    Google Scholar 

  11. Malyshev, S., Krötzsch, M., González, L., Gonsior, J., Bielefeldt, A.: Getting the most out of Wikidata: semantic technology usage in Wikipedia’s knowledge graph. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11137, pp. 376–394. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00668-6_23

    Chapter  Google Scholar 

  12. Martinez-Rodriguez, J.L., Hogan, A., Lopez-Arevalo, I.: Information extraction meets the semantic web: a survey. Semant. Web J. (2018)

    Google Scholar 

  13. Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, I-Semantics 2011, pp. 1–8. ACM, New York (2011)

    Google Scholar 

  14. Palmirani, M., Vitali, F.: Legislative XML: principles and technical tools. Technical report, Inter-American Development Bank (2012)

    Google Scholar 

  15. Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16:1–16:45 (2009)

    Article  Google Scholar 

  16. Usbeck, R., et al.: AGDISTIS - graph-based disambiguation of named entities using Linked Data. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 457–471. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_29

    Chapter  Google Scholar 

  17. Verborgh, R., Vander Sande, M., Colpaert, P., Coppens, S., Mannens, E., Van de Walle, R.: Web-scale querying through Linked Data fragments. In: Proceedings of the 7th Workshop on Linked Data on the Web. CEUR Workshop Proceedings, vol. 1184 (2014)

    Google Scholar 

Download references

Acknowledgements

We wish to thank David Vilches, Eridan Otto, and Christian Sifaqui by their contribution to the development of the HL project, that was funded by the Library of Congress of Chile. The described research activities were partially funded by the Spanish Ministry of Economy and Competitiveness (Society challenges: TIN2017-88877-R).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Francisco Cifuentes-Silva or Jose Emilio Labra Gayo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cifuentes-Silva, F., Labra Gayo, J.E. (2019). Legislative Document Content Extraction Based on Semantic Web Technologies. In: Hitzler, P., et al. The Semantic Web. ESWC 2019. Lecture Notes in Computer Science(), vol 11503. Springer, Cham. https://doi.org/10.1007/978-3-030-21348-0_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-21348-0_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-21347-3

  • Online ISBN: 978-3-030-21348-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics