TR Discover: A Natural Language Interface for Querying and Analyzing Interlinked Datasets

  • Dezhao SongEmail author
  • Frank Schilder
  • Charese Smiley
  • Chris Brew
  • Tom Zielund
  • Hiroko Bretz
  • Robert Martin
  • Chris Dale
  • John Duprey
  • Tim Miller
  • Johanna Harrison
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9367)


Currently, the dominant technology for providing non-technical users with access to Linked Data is keyword-based search. This is problematic because keywords are often inadequate as a means for expressing user intent. In addition, while a structured query language can provide convenient access to the information needed by advanced analytics, unstructured keyword-based search cannot meet this extremely common need. This makes it harder than necessary for non-technical users to generate analytics. We address these difficulties by developing a natural language-based system that allows non-technical users to create well-formed questions. Our system, called TR Discover, maps from a fragment of English into an intermediate First Order Logic representation, which is in turn mapped into SPARQL or SQL. The mapping from natural language to logic makes crucial use of a feature-based grammar with full formal semantics. The fragment of English covered by the natural language grammar is domain specific and tuned to the kinds of questions that the system can handle. Because users will not necessarily know what the coverage of the system is, TR Discover offers a novel auto-suggest mechanism that can help users to construct well-formed and useful natural language questions. TR Discover was developed for future use with Thomson Reuters Cortellis, which is an existing product built on top of a linked data system targeting the pharmaceutical domain. Currently, users access it via a keyword-based query interface. We report results and performance measures for TR Discover on Cortellis, and in addition, to demonstrate the portability of the system, on the QALD-4 dataset, which is associated with a public shared task. We show that the system is usable and portable, and report on the relative performance of queries using SQL and SPARQL back ends.


Natural language interface Question answering Feature-based grammar Auto-suggestion Analytics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Athenikos, S.J., Han, H.: Biomedical question answering: A survey. Comput. Methods Prog. Biomed. 99(1), 1–24 (2010)CrossRefGoogle Scholar
  2. 2.
    Cimiano, P., Haase, P., Heizmann, J., Mantel, M., Studer, R.: Towards portable natural language interfaces to knowledge bases - the case of the ORAKEL system. Data Knowl. Eng. 65(2), 325–354 (2008)CrossRefGoogle Scholar
  3. 3.
    Cornea, R.C., Weininger, N.B.: Providing autocomplete suggestions (February 4, 2014). US Patent 8,645,825Google Scholar
  4. 4.
    d’Aquin, M., Motta, E.: Watson, more than a semantic web search engine. Semantic Web Journal 2(1), 55–63 (2011)Google Scholar
  5. 5.
    Demartini, G., Trushkowsky, B., Kraska, T., Franklin, M.J.: CrowdQ: crowdsourced query understanding. In: CIDR 2013, Sixth Biennial Conference on Innovative Data Systems Research (2013)Google Scholar
  6. 6.
    Ding, L., Finin, T.W., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V., Sachs, J.: Swoogle: a search and metadata engine for the semantic web. In: Proceedings of the 2004 ACM International Conference on Information and Knowledge Management, pp. 652–659 (2004)Google Scholar
  7. 7.
    Hahn, R., Bizer, C., Sahnwaldt, C., Herta, C., Robinson, S., Bürgle, M., Düwiger, H., Scheel, U.: Faceted wikipedia search. In: Abramowicz, W., Tolksdorf, R. (eds.) BIS 2010. LNBIP, vol. 47, pp. 1–11. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  8. 8.
    Hamon, T., Grabar, N., Mougin, F., Thiessard, F.: Description of the POMELO system for the task 2 of QALD-2014. In: Working Notes for CLEF 2014 Conference, pp. 1212–1223 (2014)Google Scholar
  9. 9.
    Lehmann, J., Furche, T., Grasso, G., Ngomo, A.N., Schallhart, C., Sellers, A.J., Unger, C., Bühmann, L., Gerber, D., Höffner, K., Liu, D., Auer, S.: DEQA: deep web extraction for question answering. In: 11th International Semantic Web Conference, pp. 131–147 (2012)Google Scholar
  10. 10.
    Lin, R.T.K., Chiu, J.L., Dai, H., Tsai, R.T., Day, M., Hsu, W.: A supervised learning approach to biological question answering. Integrated Computer-Aided Engineering 16(3), 271–281 (2009)Google Scholar
  11. 11.
    Lopez, V., Pasin, M., Motta, E.: AquaLog: an ontology-portable question answering system for the semantic web. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 546–562. Springer, Heidelberg (2005) CrossRefGoogle Scholar
  12. 12.
    Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 55–60 (2014)Google Scholar
  13. 13.
    Marginean, A.: GFMed: Question answering over biomedical linked data with grammatical framework. In: Working Notes for CLEF 2014 Conference, pp. 1224–1235 (2014)Google Scholar
  14. 14.
    Narayanan, V., Arora, I., Bhatia, A.: Fast and accurate sentiment classification using an enhanced naive bayes model (2013). CoRR abs/1305.6143Google Scholar
  15. 15.
    Shekarpour, S., Ngomo, A.N., Auer, S.: Question answering on interlinked data. In: 22nd International World Wide Web Conference, pp. 1145–1156 (2013)Google Scholar
  16. 16.
    Song, D., Schilder, F., Smiley, C., Brew, C.: Natural language question answering and analytics for diverse and interlinked datasets. In: The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 101–105 (2015)Google Scholar
  17. 17.
    Tummarello, G., Delbru, R., Oren, E.: weaving the open linked data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 552–565. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  18. 18.
    Unger, C., Bühmann, L., Lehmann, J., Ngomo, A.N., Gerber, D., Cimiano, P.: Template-based question answering over RDF data. In: Proceedings of the 21st World Wide Web Conference, pp. 639–648 (2012)Google Scholar
  19. 19.
    Unger, C., Forascu, C., Lopez, V., Ngomo, A.N., Cabrio, E., Cimiano, P., Walter, S.: Question answering over linked data (QALD-4). In: Working Notes for CLEF 2014 Conference, pp. 1172–1180 (2014)Google Scholar
  20. 20.
    Usbeck, R., Ngonga Ngomo, A.C., Bühmann, L., Unger, C.: HAWK - hybrid question answering over linked data. In: 12th Extended Semantic Web Conference (2015)Google Scholar
  21. 21.
    Yahya, M., Berberich, K., Elbassuoni, S., Weikum, G.: Robust question answering over the web of linked data. In: 22nd ACM International Conference on Information and Knowledge Management, pp. 1107–1116 (2013)Google Scholar
  22. 22.
    Yu, H., Cao, Y.G.: Using the weighted keyword model to improve information retrieval for answering biomedical questions. Summit on translational bioinformatics, p. 143 (2009)Google Scholar
  23. 23.
    Zhang, X., Song, D., Priya, S., Daniels, Z., Reynolds, K., Heflin, J.: Exploring linked data with contextual tag clouds. Journal of Web Semantics 24, 33–39 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Dezhao Song
    • 1
    Email author
  • Frank Schilder
    • 1
  • Charese Smiley
    • 1
  • Chris Brew
    • 2
  • Tom Zielund
    • 1
  • Hiroko Bretz
    • 1
  • Robert Martin
    • 3
  • Chris Dale
    • 1
  • John Duprey
    • 3
  • Tim Miller
    • 4
  • Johanna Harrison
    • 5
  1. 1.Research and DevelopmentThomson ReutersEaganUSA
  2. 2.Research and DevelopmentThomson ReutersLondonUK
  3. 3.Research and DevelopmentThomson ReutersRochesterUSA
  4. 4.Intellectual Property and ScienceThomson ReutersLondonUK
  5. 5.Intellectual Property and ScienceThomson ReutersPhiladelphiaUSA

Personalised recommendations