Skip to main content

A Multi-strategy Approach for Lexicalizing Linked Open Data

  • Conference paper
Book cover Computational Linguistics and Intelligent Text Processing (CICLing 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9042))

Abstract

This paper aims at exploiting Linked Data for generating natural text, often referred to as lexicalization. We propose a framework that can generate patterns which can be used to lexicalize Linked Data triples. Linked Data is structured knowledge organized in the form of triples consisting of a subject, a predicate and an object. We use DBpedia as the Linked Data source which is not only free but is currently the fastest growing data source organized as Linked Data. The proposed framework utilizes the Open Information Extraction (OpenIE) to extract relations from natural text and these relations are then aligned with triples to identify lexicalization patterns. We also exploit lexical semantic resources which encode knowledge on lexical, semantic and syntactic information about entities. Our framework uses VerbNet and WordNet as semantic resources. The extracted patterns are ranked and categorized based on the DBpedia ontology class hierarchy. The pattern collection is then sorted based on the score assigned and stored in an index embedded database for use in the framework as well as for future lexical resource. The framework was evaluated for syntactic accuracy and validity by measuring the Mean Reciprocal Rank (MRR) of the first correct pattern. The results indicated that framework can achieve 70.36% accuracy and a MRR value of 0.72 for five DBpedia ontology classes generating 101 accurate lexicalization patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mendes, P.N., Jakob, M., Bizer, C.: DBpedia for NLP: A Multilingual Cross-domain Knowledge Base. In: International Conference on Language Resources and Evaluation, Istanbul, LREC (2012)

    Google Scholar 

  2. Kipper, K., Dang, H.T., Palmer, M.: Class-Based Construction of a Verb Lexicon. In: Seventeenth National Conference on Artificial Intelligence, pp. 691–696. AAAI Press, Austin (2000)

    Google Scholar 

  3. Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  4. Perera, R., Nand, P.: Real text-cs - corpus based domain independent content selection model. In: 2014 IEEE 26th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 599–606 (November 2014)

    Google Scholar 

  5. Perera, R., Nand, P.: The role of linked data in content selection. In: The 13th Pacific Rim International Conference on Artificial Intelligence, pp. 573–586 (December 2014)

    Google Scholar 

  6. Manning, C., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP Natural Language Processing Toolkit. In: The 52nd Annual Meeting of the Association for Computational Linguistics, ACL (2014)

    Google Scholar 

  7. Brin, S.: Extracting Patterns and Relations from the World Wide Web. In: International Workshop on the World Wide Web and Databases, pp. 172–183. Springer-Verlag, Valencia (1998)

    Google Scholar 

  8. Zelenko, D., Aone, C., Richardella, A.: Kernel methods for relation extraction. In: Empirical Methods in Natural Language Processing, EMNLP 2002, vol. 10, pp. 71–78. Association for Computational Linguistics, Morristown (2002)

    Google Scholar 

  9. Kambhatla, N.: Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In: ACL 2004 on Interactive Poster and Demonstration Sessions. ACL, Morristown (2004)

    Google Scholar 

  10. Etzioni, O., Banko, M., Soderland, S., Weld, D.S.: Open information extraction from the web. Communications of the ACM 51(12), 68 (2008)

    Article  Google Scholar 

  11. Mausam, S.M., Bart, R., Soderland, S., Etzioni, O.: Open language learning for information extraction. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 523–534. ACL, Jeju Island (2012)

    Google Scholar 

  12. Duma, D., Klein, E.: Generating Natural Language from Linked Data: Unsupervised template extraction. In: 10th International Conference on Computational Semantics, IWCS 2013. ACL, Potsdam (2013)

    Google Scholar 

  13. Walter, S., Unger, C., Cimiano, P.: A Corpus-Based Approach for the Induction of Ontology Lexica. In: 18th International Conference on Applications of Natural Language to Information Systems, pp. 102–113. Springer-Verlag, Salford (2013)

    Google Scholar 

  14. Gerber, D., Ngomo, A.C.N.: Bootstrapping the Linked Data Web. In: The 10th International Semantic Web Conference. Springer-Verlag, Bonn (2011)

    Google Scholar 

  15. Ell, B., Harth, A.: A language-independent method for the extraction of RDF verbalization templates. In: 8th International Natural Language Generation Conference. ACL, Philadelphia (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rivindu Perera .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Perera, R., Nand, P. (2015). A Multi-strategy Approach for Lexicalizing Linked Open Data. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18117-2_26

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18116-5

  • Online ISBN: 978-3-319-18117-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics