deqa: Deep Web Extraction for Question Answering

  • Jens Lehmann
  • Tim Furche
  • Giovanni Grasso
  • Axel-Cyrille Ngonga Ngomo
  • Christian Schallhart
  • Andrew Sellers
  • Christina Unger
  • Lorenz Bühmann
  • Daniel Gerber
  • Konrad Höffner
  • David Liu
  • Sören Auer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7650)

Abstract

Despite decades of effort, intelligent object search remains elusive. Neither search engine nor semantic web technologies alone have managed to provide usable systems for simple questions such as “find me a flat with a garden and more than two bedrooms near a supermarket.”

We introduce deqa, a conceptual framework that achieves this elusive goal through combining state-of-the-art semantic technologies with effective data extraction. To that end, we apply deqa, to the UK real estate domain and show that it can answer a significant percentage of such questions correctly. deqa achieves this by mapping natural language questions to Sparql patterns. These patterns are then evaluated on an RDF database of current real estate offers. The offers are obtained using OXPath, a state-of-the-art data extraction system, on the major agencies in the Oxford area and linked through Limes to background knowledge such as the location of supermarkets.

References

  1. 1.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A Nucleus for a Web of Open Data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC/ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI, pp. 2670–2676 (2007)Google Scholar
  3. 3.
    Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (2001)Google Scholar
  4. 4.
    Bizer, C., Schultz, A.: The R2R framework: Publishing and discovering mappings on the web. In: COLD (2010)Google Scholar
  5. 5.
    Bleiholder, J., Naumann, F.: Data fusion. ACM Comput. Surv. 41(1), 1–41 (2008)CrossRefGoogle Scholar
  6. 6.
    Bühmann, L., Lehmann, J.: Universal OWL Axiom Enrichment for Large Knowledge Bases. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 57–71. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  7. 7.
    Chang, C.H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A survey of web information extraction systems. IEEE TKDE 18(10), 1411–1428 (2006)Google Scholar
  8. 8.
    Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165, 91–134 (2005)CrossRefGoogle Scholar
  9. 9.
    Furche, T., Gottlob, G., Grasso, G., Schallhart, C., Sellers, A.: OXPath: A language for scalable, memory-efficient data extraction from web applications. In: VLDB, pp. 1016–1027 (2011)Google Scholar
  10. 10.
    Gerber, D., Ngonga Ngomo, A.-C.: Bootstrapping the Linked Data Web. In: Proc. of WekEx at ISWC (2011)Google Scholar
  11. 11.
    Grant, C., George, C.P., Gumbs, J.D., Wilson, J.N., Dobbins, P.J.: Morpheus: a deep web question answering system. In: iiWAS, pp. 841–844 (2010)Google Scholar
  12. 12.
    Gulhane, P., Madaan, A., Mehta, R., Ramamirtham, J., Rastogi, R., Satpal, S., Sengamedu, S.H., Tengli, A., Tiwari, C.: Web-scale information extraction with vertex. In: ICDE, pp. 1209–1220 (2011)Google Scholar
  13. 13.
    Kayed, M., Chang, C.H.: FiVaTech: Page-level web data extraction from template pages. IEEE TKDE 22(2), 249–263 (2010)Google Scholar
  14. 14.
    Köpcke, H., Thor, A., Rahm, E.: Comparative evaluation of entity resolution approaches with fever. In: VLDB, pp. 1574–1577 (2009)Google Scholar
  15. 15.
    Kranzdorf, J., Sellers, A., Grasso, G., Schallhart, C., Furche, T.: Spotting the tracks on the OXPath. In: WWW (2012)Google Scholar
  16. 16.
    Lehmann, J., Auer, S., Bhmann, L., Tramp, S.: Class expression learning for ontology engineering. J. of Web Semantics 9, 71–81 (2011)CrossRefGoogle Scholar
  17. 17.
    Lin, J.: The Web as a resource for question answering: Perspectives and challenges. In: LREC 2002 (2002)Google Scholar
  18. 18.
    Lopez, V., Uren, V., Sabou, M., Motta, E.: Is question answering fit for the semantic web? A survey. Semantic Web J. 2, 125–155 (2011)CrossRefGoogle Scholar
  19. 19.
    Lopez, V., Fernández, M., Motta, E., Stieler, N.: PowerAqua: Supporting users in querying and exploring the Semantic Web content. Semantic Web Journal (2012), http://www.semantic-web-journal.net/
  20. 20.
    Mollá, D., Vicedo, J.L.: Question answering in restricted domains: An overview. Comput. Linguist. 33(1), 41–61 (2007)CrossRefGoogle Scholar
  21. 21.
    Ngonga Ngomo, A.-C., Lyko, K.: EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 149–163. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  22. 22.
    Ngonga Ngomo, A.-C.: A time-efficient hybrid approach to link discovery. In: OM@ISWC (2011)Google Scholar
  23. 23.
    Ngonga Ngomo, A.-C., Auer, S.: A time-efficient approach for large-scale link discovery on the web of data. In: IJCAI (2011)Google Scholar
  24. 24.
    Ngonga Ngomo, A.-C., Lehmann, J., Auer, S., Höffner, K.: Raven – active learning of link specifications. In: OM@ISWC (2011)Google Scholar
  25. 25.
    Nikolov, A., d’Aquin, M., Motta, E.: Unsupervised Learning of Link Discovery Configuration. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 119–133. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  26. 26.
    Song, D., Heflin, J.: Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 649–664. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  27. 27.
    Unger, C., Cimiano, P.: Pythia: Compositional meaning construction for ontology-based question answering on the Semantic Web. In: NLDB (2011)Google Scholar
  28. 28.
    Unger, C., Bühmann, L., Lehmann, J., Ngomo, A.C.N., Gerber, D., Cimiano, P.: Template-based question answering over RDF data. In: WWW, pp. 639–648 (2012)Google Scholar
  29. 29.
    Xiao, C., Wang, W., Lin, X., Yu, J.X.: Efficient similarity joins for near duplicate detection. In: WWW, pp. 131–140 (2008)Google Scholar
  30. 30.
    Zhai, Y., Liu, B.: Structured Data Extraction from the Web Based on Partial Tree Alignment. IEEE TKDE 18(12), 1614–1628 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jens Lehmann
    • 1
    • 2
  • Tim Furche
    • 1
  • Giovanni Grasso
    • 1
  • Axel-Cyrille Ngonga Ngomo
    • 2
  • Christian Schallhart
    • 1
  • Andrew Sellers
    • 1
  • Christina Unger
    • 3
  • Lorenz Bühmann
    • 2
  • Daniel Gerber
    • 2
  • Konrad Höffner
    • 2
  • David Liu
    • 1
  • Sören Auer
    • 2
  1. 1.Department of Computer ScienceOxford UniversityOxfordUK
  2. 2.Institute of Computer ScienceUniversity of LeipzigLeipzigGermany
  3. 3.CITECBielefeld UniversityBielefeldGermany

Personalised recommendations