LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs

  • Priyansh Trivedi
  • Gaurav Maheshwari
  • Mohnish Dubey
  • Jens Lehmann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10588)


Being able to access knowledge bases in an intuitive way has been an active area of research over the past years. In particular, several question answering (QA) approaches which allow to query RDF datasets in natural language have been developed as they allow end users to access knowledge without needing to learn the schema of a knowledge base and learn a formal query language. To foster this research area, several training datasets have been created, e.g. in the QALD (Question Answering over Linked Data) initiative. However, existing datasets are insufficient in terms of size, variety or complexity to apply and evaluate a range of machine learning based QA approaches for learning complex SPARQL queries. With the provision of the Large-Scale Complex Question Answering Dataset (LC-QuAD), we close this gap by providing a dataset with 5000 questions and their corresponding SPARQL queries over the DBpedia dataset. In this article, we describe the dataset creation process and how we ensure a high variety of questions, which should enable to assess the robustness and accuracy of the next generation of QA systems for knowledge graphs.



This work was partly supported by the grant from the European Union’s Horizon 2020 research Europe flag and innovation programme for the projects Big Data Europe (GA no. 644564), HOBBIT (GA no. 688227) and WDAqua (GA no. 642795).


  1. 1.
    Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD Conference on Management of Data, pp. 1247–1250 (2008)Google Scholar
  2. 2.
    Bordes, A., Usunier, N., Chopra, S., Weston, J.: Large-scale simple question answering with memory networks. CoRR, abs/1506.02075 (2015)Google Scholar
  3. 3.
    Cai, Q., Yates, A.: Large-scale semantic parsing via schema matching and lexicon extension. In: ACL, pp. 423–433 (2013)Google Scholar
  4. 4.
    Dubey, M., Dasgupta, S., Sharma, A., Höffner, K., Lehmann, J.: AskNow: a framework for natural language query formalization in SPARQL. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 300–316. Springer, Cham (2016). doi: 10.1007/978-3-319-34129-3_19 CrossRefGoogle Scholar
  5. 5.
    Ell, B., Vrandečić, D., Simperl, E.: SPARTIQULATION: verbalizing SPARQL queries. In: Simperl, E., Norton, B., Mladenic, D., Della Valle, E., Fundulaki, I., Passant, A., Troncy, R. (eds.) ESWC 2012. LNCS, vol. 7540, pp. 117–131. Springer, Heidelberg (2015). doi: 10.1007/978-3-662-46641-4_9 Google Scholar
  6. 6.
    Höffner, K., Walter, S., Marx, E., Usbeck, R., Lehmann, J., Ngonga Ngomo, A.-C.: Survey on challenges of question answering in the semantic web. Seman. Web 1–26 (2016)Google Scholar
  7. 7.
    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., Van Kleef, P., Auer, S., Bizer, C.: DBpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Seman. Web 6(2), 167–195 (2015)Google Scholar
  8. 8.
    Lukovnikov, D., Fischer, A., Lehmann, J., Auer, S.: Neural network-based question answering over knowledge graphs on word and character level. In: Proceedings of the 26th International World Wide Web Conference, pp. 1211–1220 (2017)Google Scholar
  9. 9.
    Ngonga Ngomo, A.-C., Bühmann, L., Unger, C., Lehmann, J., Gerber, D.: Sorry, i don’t speak SPARQL: translating SPARQL queries into natural language. In: Proceedings of the 22nd International World Wide Web Conference, pp. 977–988 (2013)Google Scholar
  10. 10.
    Serban, I.V., García-Durán, A., Gülçehre, Ç., Ahn, S., Chandar, S., Courville, A., Bengio, Y.: Generating factoid questions with recurrent neural networks: the 30m factoid question-answer corpus. In: 54th Annual Meeting of the Association for Computational Linguistics, p. 588 (2016)Google Scholar
  11. 11.
    Unger, C., Ngomo, A.-C.N., Cabrio, E.: 6th open challenge on question answering over linked data (QALD-6). In: Sack, H., Dietze, S., Tordai, A., Lange, C. (eds.) SemWebEval 2016. CCIS, vol. 641, pp. 171–177. Springer, Cham (2016). doi: 10.1007/978-3-319-46565-4_13 CrossRefGoogle Scholar
  12. 12.
    Vrandečić, D.: Wikidata: a new platform for collaborative data collection. In: Proceedings of the 21st International World Wide Web Conference, pp. 1063–1064 (2012)Google Scholar
  13. 13.
    Yih, W.-T., Chang, M.-W., He, X., Gao, J.: Semantic parsing via staged query graph generation: question answering with knowledge base. In: Proceedings of the 53rd Annual Meeting of the ACL and the 7th International Joint Conference on NLP (2015)Google Scholar
  14. 14.
    Zhang, Y., Liu, K., He, S., Ji, G., Liu, Z., Wu, H., Zhao, J.: Question answering over knowledge base with neural attention combining global knowledge information. arXiv preprint arXiv:1606.00979 (2016)

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Priyansh Trivedi
    • 1
  • Gaurav Maheshwari
    • 1
  • Mohnish Dubey
    • 1
  • Jens Lehmann
    • 1
    • 2
  1. 1.University of BonnBonnGermany
  2. 2.Fraunhofer IAISBonnGermany

Personalised recommendations