Skip to main content

Improving Open Information Extraction for Semantic Web Tasks

  • Chapter
Transactions on Computational Collective Intelligence XXI

Part of the book series: Lecture Notes in Computer Science ((TCCI,volume 9630))

  • 497 Accesses


Open Information Extraction (OIE) aims to automatically identify all the possible assertions within a sentence. Results of this task are usually a set of triples (subject, predicate, object). In this paper, we first present what OIE is and how it can be improved when we work in a given domain of knowledge. Using a corpus made up of sentences in building engineering construction, we obtain an improvement of more than 18 %. Next, we show how OIE can be used at a base of a high-level semantic web task. Here we have applied OIE on formalisation of natural language definitions. We test this formalisation task on a corpus of sentences defining concepts found in the pizza ontology. At this stage, 70.27 % of our 37 sentences-corpus are fully rewritten in OWL DL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. 1.

  2. 2.

    An exhaustive list of labels for phrases is available in the Penn Treebank [6].

  3. 3.

  4. 4.

    With a large ontology, such comparison must take advantage of an index for the sake of scalability.

  5. 5.

  6. 6.

    \(r_i\) is the subsumption or the set of elements of a more complex restriction (URI of the restriction property, OWL keywords for the type of the restriction, etc.) as explained in the introduction of Sect. 4.3.

  7. 7.

    Only for better understanding. The choice of or would not have changed anything.

  8. 8.

  9. 9.

    For example,, etc.

  10. 10.

    Concepts’ tokens are usually surrounded by adjectives, adverbs, prepositions, etc.


  1. OWL Web Ontology Language Guide, February 2004.

  2. American with Disabilities Act (ADA): 2010 ADA Standards for Accessible Design, September 2010.

  3. Bast, H., Haussmann, E.: Open information extraction via contextual sentence decomposition. In: 2013 IEEE Seventh International Conference on Semantic Computing (ICSC), pp. 154–159. IEEE Computer Society (2013)

    Google Scholar 

  4. Bast, H., Haussmann, E.: More informative open information extraction via simple inference. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C.X., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 585–590. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  5. Berg, J.: Aristotle’s theory of definition. In: ATTI del Convegno Internazionale di Storia della Logica, pp. 19–30 (1982)

    Google Scholar 

  6. Bies, A., Ferguson, M., Katz, K., MacIntyre, R., Tredinnick, V., Kim, G., Marcinkiewicz, M.A., Schasberger, B.: Bracketing guidelines for treebank II Style Penn Treebank project. University of Pennsylvania 97 (1995)

    Google Scholar 

  7. Bühmann, L., Fleischhacker, D., Lehmann, J., Melo, A., Völker, J.: Inductive lexical learning of class expressions. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds.) EKAW 2014. LNCS, vol. 8876, pp. 42–53. Springer, Heidelberg (2014)

    Google Scholar 

  8. Building Safety Unit Tasmania Fire Service: Fire Safety in Buildings, obligaitions of owners and occupiers, August 2002.

  9. California Energy Commission: 2008 Building Energy Efficiency Standards, for residential and nonresidential buildings (2008).

  10. Del Corro, L., Gemulla, R.: Clausie: clause-based open information extraction. In: Proceedings of the 22nd International Conference on World Wide Web, WWW 2013, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp. 355–366 (2013)

    Google Scholar 

  11. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 1535–1545 (2011)

    Google Scholar 

  12. Hadjieleftheriou, M., Srivastava, D.: Weighted set-based string similarity. IEEE Data Eng. Bull. 33(1), 25–36 (2010)

    Google Scholar 

  13. Horridge, M., Jupp, S., Moulton, G., Rector, A., Stevens, R., Wroe, C.: A Practical Guide To Building OWL Ontologies Using Protégé 4 and CO-ODE Tools Edition1.2. The University of Manchester, Manchester (2009)

    Google Scholar 

  14. Kacfah Emani, C.H., Ferreira Da Silva, C., B., Ghodous, P.: Improving open information extraction using domain knowledge. In: Surfacing the Deep and the Social Web (SDSW), Co-Located with The 13th ISWC, October 2014

    Google Scholar 

  15. Kacfah Emani, C.H., Ferreira Da Silva, C., Fis, B., Ghodous, P., Khosrowshahi, F.: Structural sentence decomposition via open information extraction. In: 18th International Conference Information Visualisation (IV2014), July 2014

    Google Scholar 

  16. Lehmann, J., Auer, S., Bühmann, L., Tramp, S.: Class expression learning for ontology engineering. Web Semant. Sci. Serv. Agents World Wide Web 9(1), 71–81 (2011)

    Article  Google Scholar 

  17. Mausam, S.,M., Bart, R., Soderland, S., Etzioni, O.: Open language learning for information extraction. In: EMNLP-CoNLL, pp. 523–534. Association for Computational Linguistics (2012)

    Google Scholar 

  18. Nguyen, V.T., Sallaberry, C., Gaio, M.: Mesure de la similarité entre termes et labels de concepts ontologiques. In: Conférence en Recherche D’information et Applications, pp. 415–430 (2013)

    Google Scholar 

  19. Sayah, K.: Automated Norm Extraction from Legal Texts. Master’s thesis, Utrecht University, August 2004

    Google Scholar 

  20. Tsatsaronis, G., Petrova, A., Kissa, M., Ma, Y., Distel, F., Baader, F., Schroeder, M.: Learning formal definitions for biomedical concepts. In: OWLED (2013)

    Google Scholar 

  21. Unger, C., Bühmann, L., Lehmann, J., Ngonga Ngomo, A.C., Gerber, D., Cimiano, P.: Template-based question answering over RDF data. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012, pp. 639–648. ACM, New York (2012)

    Google Scholar 

  22. Unger, C., Cimiano, P.: Pythia: compositional meaning construction for ontology-based question answering on the semantic web. In: Muñoz, R., Montoyo, A., Métais, E. (eds.) NLDB 2011. LNCS, vol. 6716, pp. 153–160. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  23. Völker, J., Hitzler, P., Cimiano, P.: Acquisition of OWL DL axioms from lexical resources. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 670–685. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  24. Völker, J., Rudolph, S.: Lexico-logical acquisition of OWL DL axioms. In: Medina, R., Obiedkov, S. (eds.) ICFCA 2008. LNCS (LNAI), vol. 4933, pp. 62–77. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  25. Wächter, T., Schroeder, M.: Semi-automated ontology generation within obo-edit. Bioinformatics 26(12), i88–i96 (2010)

    Article  Google Scholar 

  26. Winkler, W.E.: The state of record linkage and current research problems. Technical report, Statistical Research Division, U.S. Census Bureau (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Cheikh Kacfah Emani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Emani, C.K., Da Silva, C.F., Fiès, B., Ghodous, P. (2016). Improving Open Information Extraction for Semantic Web Tasks. In: Nguyen, N.T., Kowalczyk, R., Rupino da Cunha, P. (eds) Transactions on Computational Collective Intelligence XXI. Lecture Notes in Computer Science(), vol 9630. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-49520-9

  • Online ISBN: 978-3-662-49521-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics