Portable Extraction of Partially Structured Facts from the Web

  • Andrew Salway
  • Liadh Kelly
  • Inguna Skadiņa
  • Gareth J. F. Jones
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6233)


A novel fact extraction task is defined to fill a gap between current information retrieval and information extraction technologies. It is shown that it is possible to extract useful partially structured facts about different kinds of entities in a broad domain, i.e. all kinds of places depicted in tourist images. Importantly the approach does not rely on existing linguistic resources (gazetteers, taggers, parsers, etc.) and it ported easily and cheaply between two rather different languages (English and Latvian). Previous fact extraction from the web has focused on the extraction of structured data, e.g. (Building-LocatedIn-Town). In contrast we extract richer and more interesting facts, such as a fact explaining why a building was built. Enough structure is maintained to facilitate subsequent processing of the information. For example, the partial structure enables straightforward template-based text generation. We report positive results for the correctness and interest of English and Latvian facts and for their utility in enhancing image captions.


Augmented Reality Image Caption Broad Domain Fact Extraction Extraction Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Purves, R.S., Edwardes, A.J., Sanderson, M.: Describing the Where - improving image annotation and search through geography. In: First Intl. Workshop on Metadata Mining for Image Understanding (2008)Google Scholar
  2. 2.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)Google Scholar
  3. 3.
    Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: 16th ACM SIGIR, pp. 49–58 (1993)Google Scholar
  4. 4.
    Sarawagi, S.: Information Extraction. Foundations and Trends in Databases 1(3), 261–377 (2008)CrossRefGoogle Scholar
  5. 5.
    Lin, J.: An Exploration of the Principles Underlying Redundancy-based Factoid Question Answering. ACM Trans. Information Systems 25(2), 1–55 (2007)CrossRefGoogle Scholar
  6. 6.
    Dumais, S., et al.: Web Question Answering: Is More Always Better? In: 25th ACM SIGIR, pp. 291–298 (2002)Google Scholar
  7. 7.
    Goldstein, J., et al.: Multi-document Summarization by Sentence Extraction. In: NAACL-ANLP 2000 Workshop on Automatic Summarization, pp. 40–48 (2000)Google Scholar
  8. 8.
    Pasca, M., et al.: Organizing and Searching the World Wide Web of Facts - Step One: the One-Million Fact Extraction Challenge. In: 21st Nat. Conf. on AI (AAAI 2006), pp. 1400–1405 (2006)Google Scholar
  9. 9.
    Banko, M., Etzioni, O.: The Tradeoffs Between Open and Traditional Relation Extraction. In: ACL 2008, pp. 28–36 (2008)Google Scholar
  10. 10.
    Etzioni, O., et al.: Open Information Extraction from the Web. Comms. of the ACM 51(12), 68–74 (2008)CrossRefGoogle Scholar
  11. 11.
    TextRunner Search (March 30, 2010),
  12. 12.
    Yahoo! Search BOSS (March 30, 2010),
  13. 13.
    Powerset (March 30, 2010),
  14. 14.
    Google Squared (March 30, 2010),

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Andrew Salway
    • 1
  • Liadh Kelly
    • 1
  • Inguna Skadiņa
    • 2
  • Gareth J. F. Jones
    • 1
  1. 1.Centre for Digital Video Processing, School of ComputingDublin City UniversityDublin 9Ireland
  2. 2.TildeRīgaLatvia

Personalised recommendations