Skip to main content

A Logic-Based Tool for Semantic Information Extraction

  • Conference paper
Logics in Artificial Intelligence (JELIA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4160))

Included in the following conference series:

Abstract

Recognizing and extracting meaningful information from unstructured Web documents, taking into account their semantics, is an important problem in information and knowledge management. This paper describes H\(\imath\)LεX, a system implementing a novel logic-based approach to information extraction from unstructured documents. The approach adopted in the H\(\imath\)LεX system is founded on a new two-dimensional representation of documents, and heavily exploits DLP  +  – an extension of disjunctive logic programming for ontology representation and reasoning, which has been recently implemented on top of the DLV system. Unlike previous systems, which are mainly syntactic, H\(\imath\)LεX combines both semantic and syntactic knowledge for a powerful information extraction. Ontologies, representing the semantics of the domain of the information to be extracted, are encoded in DLP  + , while the extraction patterns are encoded by regular expressions in an ad hoc two-dimensional grammar. These regular expressions are (internally) translated into DLP  +  rules, whose execution yields the actual extraction of information from the input document. H\(\imath\)LεX allows the semantic information extraction from both HTML pages and flat text documents. The usefulness of Hilex has been already confirmed also in practice, as the system has been successfully employed in two advanced applications in the e-health and e-finance domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baumgartner, R., Flesca, S., Gottlob, G.: Declarative information extraction, web crawling, and recursive wrapping with Lixto. In: Eiter, T., Faber, W., Truszczyński, M. (eds.) LPNMR 2001. LNCS, vol. 2173, pp. 21–41. Springer, Heidelberg (2001)

    Google Scholar 

  2. Eikvil, L.: Information extraction from world wide web - a survey. Technical Report 945, Norweigan Computing Center (1999)

    Google Scholar 

  3. Giammarresi, D., Restivo, A.: Two-dimensional languages. In: Handbook of Formal Languages, Beyond Words, vol. 3, pp. 215–267. Springer, Berlin (1997)

    Google Scholar 

  4. Kuhlins, S., Tredwell, R.: Toolkits for generating wrappers – a survey of software toolkits for automated data extraction from web sites. In: Aksit, M., Mezini, M., Unland, R. (eds.) NODe 2002. LNCS, vol. 2591, pp. 184–198. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  5. Laender, A., Ribeiro-Neto, B., Silva, A., Teixeira, J.: A brief survey of web data extraction tools. In: SIGMOD Record, vol. 31 (June 2002)

    Google Scholar 

  6. Leone, N., Pfeifer, G., Faber, W., Eiter, T., Gottlob, G., Perri, S., Scarcello, F.: The DLV System for Knowledge Representation and Reasoning. In: ACM TOCL 2006 (2006) (forthcoming)

    Google Scholar 

  7. Ricca, F., Leone, N.: Disjunctive Logic Programming with types and objects: The DLV +  System. Journal of Applied Logic (forthcoming, 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ruffolo, M., Manna, M., Gallucci, L., Leone, N., Saccà, D. (2006). A Logic-Based Tool for Semantic Information Extraction. In: Fisher, M., van der Hoek, W., Konev, B., Lisitsa, A. (eds) Logics in Artificial Intelligence. JELIA 2006. Lecture Notes in Computer Science(), vol 4160. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11853886_48

Download citation

  • DOI: https://doi.org/10.1007/11853886_48

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-39625-3

  • Online ISBN: 978-3-540-39627-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics