The Role of Linked Data in Content Selection

  • Rivindu Perera
  • Parma Nand
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8862)


This paper explores the appropriateness of utilizing Linked Data as a knowledge source for content selection. Content Selection is a crucial subtask in Natural Language Generation which has the function of determining the relevancy of contents from a knowledge source based on a communicative goal. The recent online era has enabled us to accumulate extensive amounts of generic online knowledge some of which has been made available as structured knowledge sources for computational natural language processing purposes. This paper proposes a model for content selection by utilizing a generic structured knowledge source, DBpedia, which is a replica of the unstructured counterpart, Wikipedia. The proposed model uses log likelihood to rank the contents from DBpedia Linked Data for relevance to a communicative goal. We performed experiments using DBpedia as the Linked Data resource using two keyword datasets as communicative goals. To optimize parameters we used keywords extracted from QALD-2 training dataset and QALD-2 testing dataset is used for the testing. The results was evaluated against the verbatim based selection strategy. The results showed that our model can perform 18.03% better than verbatim selection.


Linked Data Content selection Log likelihood distance Text mining 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Reiter, E., Dale, R.: Building natural language generation systems. Cambridge University Press (January 2000)Google Scholar
  2. 2.
    Jentzsch, A., Cyganiak, R., Bizer, C.: State of the LOD Cloud. Technical report, Hasso-Plattner-Institute, Potsdam-Babelsberg (2011)Google Scholar
  3. 3.
    Rayson, P., Berridge, D., Francis, B.: Extending the Cochran rule for the comparison of word frequencies between corpora. In: 7th International Conference on Statistical Analysis of Textual Data (2004)Google Scholar
  4. 4.
    He, T., Zhang, X., Xinghuo, Y.: An Approach to Automatically Constructing Domain Ontology. In: Pacific Asia Computational Linguistics, Wuhan, pp. 150–157 (2006)Google Scholar
  5. 5.
    Gelbukh, A., Sidorov, G., Lavin-Villa, E., Chanona-Hernandez, L.: Automatic Term Extraction Using Log-Likelihood Based Comparison with General Reference Corpus. In: Hopfe, C.J., Rezgui, Y., Métais, E., Preece, A., Li, H. (eds.) NLDB 2010. LNCS, vol. 6177, pp. 248–255. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Pedersen, P.: WordNet: Similarity - Measuring the Relatedness of Concepts. In: Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics, Boston, pp. 38–41 (2004)Google Scholar
  7. 7.
    Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  8. 8.
    Penas, A., Hovy, E.: Semantic enrichment of text with background knowledge. In: NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, Los Angeles, pp. 15–23. Association for Computational Linguistics (June 2010)Google Scholar
  9. 9.
    Voorhees, E., Tice, D.: Building a Question Answering Test Collection. In: ACM Special Interest Group on Information Retrieval Conference, Athens, Greece. ACM Press (2000)Google Scholar
  10. 10.
    Unger, C.: Question Answering Over Linked Data. Technical report, Bielefeld University, Heraklion, Greece (2012)Google Scholar
  11. 11.
    Smith, N., Heilman, M., Hwa, R., Cohen, S., Gimpel, K.: Question-Answer Dataset. Technical report, Carnegie Mellon University, Pennsylvania, USA (2013)Google Scholar
  12. 12.
    Bouayad-Agha, N., Casamayor, G., Wanner, L., Mellish, C.: Content selection from semantic web data. In: Seventh International Natural Language Generation Conference, Utica, IL, USA, pp. 146–149. Association for Computational Linguistics (May 2012)Google Scholar
  13. 13.
    Bouayad-Agha, N., Casamayor, G., Wanner, L., Mellish, C.: Overview of the First Content Selection Challenge from Open Semantic Web Data. In: Proceedings of the 14th European Workshop on Natural Language Generation, Sofia, Bulgaria, pp. 98–102. Association for Computational Linguistics (August 2013)Google Scholar
  14. 14.
    Kutlak, R., Mellish, C., van Deemter, K.: Content Selection Challenge - University of Aberdeen Entry. In: Fourteenth European Workshop on Natural Language Generation, Sofia, Bulgaria, pp. 208–209. Association for Computational Linguistics (August 2013)Google Scholar
  15. 15.
    Venigalla, H., Eugenio, B.D.: UIC-CSC: The Content Selection Challenge Entry from the University of Illinois at Chicago. In: Proceedings of the 14th European Workshop on Natural Language Generation, Sofia, Bulgaria, pp. 210–211. Association for Computational Linguistics (August 2013)Google Scholar
  16. 16.
    Duboue, P.A., McKeown, K.R.: Statistical acquisition of content selection rules for natural language generation. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Morristown, NJ, USA, vol. 10, pp. 121–128. Association for Computational Linguistics (July 2003)Google Scholar
  17. 17.
    Bouayad-Agha, N., Casamayor, G., Wanner, L.: Content selection from an ontology-based knowledge base for the generation of football summaries. In: Thirtheenth European Workshop on Natural Language Generation, Nancy, France, pp. 72–81. Association for Computational Linguistics (September 2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Rivindu Perera
    • 1
  • Parma Nand
    • 1
  1. 1.School of Computer and Mathematical SciencesAuckland University of TechnologyAucklandNew Zealand

Personalised recommendations