Skip to main content

Data Extraction Using NLP Techniques and Its Transformation to Linked Data

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 8856)

Abstract

We present a system that extracts a knowledge base from raw unstructured texts that is designed as a set of entities and their relations and represented in an ontological framework. The extraction pipeline processes input texts by linguistically-aware tools and extracts entities and relations from their syntactic representation. Consequently, the extracted data is represented according to the Linked Data principles. The system is designed both domain and language independent and provides users with data for more intelligent search than full-text search. We present our first case study on processing Czech legal texts.

Keywords

  • Natural Language Processing
  • Resource Description Framework
  • Dependency Tree
  • Legal Text
  • Relation Extraction

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-13647-9_13
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   69.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-13647-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   89.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gantz, J., Reinsel, D.: The digital universe decade - are you ready? (2010), http://goo.gl/ZaO0PR

  2. Lassila, O., Swick, R.R.: Resource description framework (RDF) model and syntax specification. Technical report (1999), http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

  3. Nečaský, M., Knap, T., Klímek, J., Holubová, I., Vidová-Hladká, B.: Linked open data for legislative domain - ontology and experimental data. In: Abramowicz, W. (ed.) BIS Workshops 2013. LNBIP, vol. 160, pp. 172–183. Springer, Heidelberg (2013)

    CrossRef  Google Scholar 

  4. Berners-Lee, T., Hendler, J., Lassila, O., et al.: The semantic web. Scientific American 284, 28–37 (2001)

    CrossRef  Google Scholar 

  5. Biemann, C.: Ontology learning from text: A survey of methods. In: LDV forum, vol. 20, pp. 75–93 (2005)

    Google Scholar 

  6. Agichtein, E., Gravano, L.: Snowball: Extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries, DL 2000, pp. 85–94. ACM, New York (2000)

    Google Scholar 

  7. Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in Knowitall (preliminary results). In: Proceedings of the 13th International Conference on World Wide Web, WWW 2004, pp. 100–110. ACM, New York (2004)

    Google Scholar 

  8. Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr, E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI (2010)

    Google Scholar 

  9. Banko, M., Etzioni, O.: Strategies for lifelong knowledge extraction from the web. In: Proceedings of the 4th International Conference on Knowledge Capture, K-CAP 2007, pp. 95–102. ACM, New York (2007)

    Google Scholar 

  10. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545. Association for Computational Linguistics (2011)

    Google Scholar 

  11. Suchanek, F.M., Sozio, M., Weikum, G.: Sofie: a self-organizing framework for information extraction. In: Proceedings of the 18th International Conference on World Wide Web, pp. 631–640. ACM (2009)

    Google Scholar 

  12. Abacha, A.B., Zweigenbaum, P.: Automatic extraction of semantic relations between medical entities: a rule based approach. J. Biomedical Semantics 2, S4 (2011)

    Google Scholar 

  13. Exner, P., Nugues, P.: Entity extraction: From unstructured text to dbpedia rdf triples. In: The Web of Linked Entities Workshop, WoLE 2012 (2012)

    Google Scholar 

  14. Baisa, V., Kovář, V.: Information extraction for czech based on syntactic analysis. In: Vetulani, Z. (ed.) Proceedings of 5th Language and Technology Conference on Human Language Technologies as a Challenge for Computer Science and Linguistics, Pozna, Funcacja Universytetu im. A. Mickiewicza, pp. 466–470 (2011)

    Google Scholar 

  15. Biagioli, C., Francesconi, E., Passerini, A., Montemagni, S., Soria, C.: Automatic semantics extraction in law documents. In: Proceedings of the 10th International Conference on Artificial Intelligence and Law, pp. 133–140. ACM (2005)

    Google Scholar 

  16. Chiarcos, C., Hellmann, S., Nordhoff, S.: Introduction and overview. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics, pp. 1–12. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  17. Francesconi, E., Montemagni, S., Peters, W., Tiscornia, D. (eds.): Semantic Processing of Legal Texts. LNCS, vol. 6036. Springer, Heidelberg (2010)

    Google Scholar 

  18. McCarty, L.T.: Deep semantic interpretations of legal texts. In: Proceedings of the 11th International Conference on Artificial Intelligence and Law, ICAIL 2007, pp. 217–224. ACM, New York (2007)

    Google Scholar 

  19. Dell’Orletta, F., Marchi, S., Montemagni, S., Plank, B., Venturi, G.: The splet–2012 shared task on dependency parsing of legal texts. In: Proceedings of the 4th Workshop on Semantic Processing of Legal Texts 2012, Istanbul, Turkey (2012)

    Google Scholar 

  20. Pala, K., Rychlý, P., Šmerk, P.: Automatic identification of legal terms in czech law texts. In: Semantic Processing of Legal Texts, pp. 83–94. Springer, Berlin (2010)

    CrossRef  Google Scholar 

  21. Pala, K., Mráková, E.: Legal terms and word sketches: a case study. In: Sojka, P., Horák, A. (eds.) Proceedings of Fourth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2010, Brno, Tribun s.r.o, pp. 31–39 (2010)

    Google Scholar 

  22. Hajič, J., Panevová, J., Hajičová, E., Sgall, P., Pajas, P., Štěpánek, J., Havelka, J., Mikulová, M., Žabokrtský, Z., Ševčíková-Razímová, M.: Prague dependency treebank 2.0 (2006)

    Google Scholar 

  23. Bejček, E., Hajičová, E., Hajič, J., Jínová, P., Kettnerová, V., Kolářová, V., Mikulová, M., Mírovský, J., Nedoluzhko, A., Panevová, J., Poláková, L., Ševčíková, M., Štěpánek, J., Zikánová, Š.: Prague dependency treebank 3.0. (2013), http://ufal.mff.cuni.cz/pdt3.0

  24. Popel, M., Žabokrtský, Z.: TectoMT: Modular NLP framework. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) IceTAL 2010. LNCS, vol. 6233, pp. 293–304. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  25. Pajas, P., Štěpánek, J.: System for querying syntactically annotated corpora. In: Lee, G., Im Walde, S.S. (eds.) Proceedings of the ACL-IJCNLP 2009 Software Demonstrations, pp. 33–36. Association for Computational Linguistics, Suntec (2009)

    CrossRef  Google Scholar 

  26. Tiersma, P.: The Creation, Structure, and Interpretation of the Legal Text (2010), http://www.languageandlaw.org/LEGALTEXT.HTM

  27. Kríž, V.: Detecting semantic relations in texts and their integration with external data resources. In: WDS 2013 Proceedings of Contributed Papers, Praha, Czechia, pp. 18–23. Matematicko-fyzikální fakulta Univerzity Karlovy, Matfyzpress (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Kríž, V., Hladká, B., Nečaský, M., Knap, T. (2014). Data Extraction Using NLP Techniques and Its Transformation to Linked Data. In: Gelbukh, A., Espinoza, F.C., Galicia-Haro, S.N. (eds) Human-Inspired Computing and Its Applications. MICAI 2014. Lecture Notes in Computer Science(), vol 8856. Springer, Cham. https://doi.org/10.1007/978-3-319-13647-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13647-9_13

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13646-2

  • Online ISBN: 978-3-319-13647-9

  • eBook Packages: Computer ScienceComputer Science (R0)