Knowledge on the Web: Robust and Scalable Harvesting of Entity-Relationship Facts
The proliferation of knowledge-sharing communities like Wikipedia and the advances in automatic information extraction from semistructured and textual Web data have enabled the construction of very large knowledge bases. These knowledge collections contain facts about many millions of entities and relationships between them, and can be conveniently represented in the RDF data model. Prominent examples are DBpedia, YAGO, Freebase, Trueknowledge, and others.
These structured knowledge collections can be viewed as “Semantic Wikipedia Databases”, and they can answer many advanced questions by SPARQL-like query languages and appropriate ranking models. In addition, the knowledge bases can boost the semantic capabilities and precision of entity-oriented Web search, and they are enablers for value-added knowledge services and applications in enterprises and online communities.
The talk discusses recent advances in the large-scale harvesting of entity-relationship facts from Web sources, and it points out the next frontiers in building comprehensive knowledge bases and enabling semantic search services. In particular, it discusses the benefits and problems in extending the prior work along the following dimensions: temporal knowledge to capture the time-context and evolution of facts, multilingual knowledge to interconnect the plurality of languages and cultures, and multimodal knowledge to include also photo and video footage of entities. All these dimensions pose grand challenges for robustness and scalability of knowledge harvesting.