Browsing DBpedia Entities with Summaries
- 1.3k Downloads
The term “Linked Data” describes online-retrievable formal descriptions of entities and their links to each other. Machines and humans alike can retrieve these descriptions and discover information about links to other entities. However, for human users it becomes difficult to browse descriptions of single entities because, in many cases, they are referenced in more than a thousand statements.
In this demo paper we present summarum, a system that ranks triples and enables entity summaries for improved navigation within Linked Data. In its current implementation, the system focuses on DBpedia with the summaries being based on the PageRank scores of the involved entities.
KeywordsEntity summarization DBpedia Linked data Statement ranking
The goal of the Linked Data movement is to enrich the Web with structured data. While the formal nature of these knowledge descriptions targets machines as immediate consumers, the final product is typically consumed by humans. Examples like Wikipedia Infoboxes show that, in many cases, next to textual descriptions users also want to browse structured data in order to get a quick overview about common or main facts of a data object. However, state-of-the-art interfaces like the one of DBpedia deliver all known facts about an entity in a single Web page. Often, the first thing users see when browsing a DBpedia entity are the values of dbpedia-owl:abstract in ten different languages. As a first attempt to overcome this issue, we introduce summarum, a system that ranks triples in accordance to popularity and enables entity summaries for improved navigation within Linked Data. In its current implementation, the system focuses on DBpedia with the summaries being based on the PageRank scores of the involved entities. We also adopted navigation elements from Semantic MediaWiki  in order to enable more flexible browsing.
The system is available at http://km.aifb.kit.edu/services/summa/.
2 Related Work
The field of browsing Linked Data entities has already been explored thoroughly. For the sake of conciseness, we focus on the most related and/or recent work in this field.
Recent efforts for producing user-friendly interfaces for Linked Data entities include the new DBpedia interface (currently available via DBpedia Live)1 and Magnus Manske’s Reasonator tool2 which is based on Wikidata.3 In the new DBpedia interface, all property-value pairs are ordered in the traditional DBpedia fashion, with values sorted alphabetically in accordance to their labels. In the Reasonator tool, the listings of statements do not seem to implement a particular order.
Similar tools are aemoo  and LODPeas . aemoo focuses on schema information: of which class is an entity and to which other classes does the currently browsed entity relate. Further interaction with the related classes enables to detect additional entities of the respective type which can be browsed. LODPeas enables to browse further entites that are related to the currently browsed entity. The system makes use of a “concurrence index” which enables to suggest entities that share common property-value pairs. Both systems are focused on presenting entities that are not necessarily directly attached to the currently browsed entity.
Semantic MediaWiki  offers search by property-value pairs4, e.g. by specifying [[Born In::Hawaii]]. We adopt this scheme in order to enable users to discover entities which share a specific attribute with the currently browsed one. Thus, browsing dbpedia:Barack_Obama, it is possible to discover who else was born in dbpedia:Hawaii.
The three major search engines, Google, Yahoo, and Bing also offer summaries of entities. Bing and Google also retrieve lists of entities that are focused on a property-value pairs, e.g. “movies directed by Quentin Tarantino”. However, this seems to work only in specific domains as querying for “people born in Hawaii” does not result in a list of entities.
3 DBpedia PageRank
For our popularity-based approach, we computed the PageRank  scores for each DBpedia entity. As a basis for this, we used DBpedia’s Wikipedia Pagelinks (en)5 dataset. This dataset contains triples of the form “Wikipedia page A links to Wikipedia page B”. We only use these untyped links, i.e. do not make use of typed links (e.g., dbpedia-owl:birthplace) for computation and thus, the computed scores reflect the PageRank of the associated Wikipedia pages. However, we call the dataset “DBpedia PageRank” as the link extraction is performed by the DBpedia framework and the resources are identified with DBpedia URIs.
For the computation of PageRank we used the original formula as described in  with a damping factor of \(0.85\). The number of iterations was set to 40 while the score changes from 20 iterations onwards were marginal and thus, suggest convergence. We publish the computed PageRank scores for the English language DBpedia versions 3.8 and 3.9 at http://people.aifb.kit.edu/ath/#DBpedia_PageRank. The dataset is available in tab-separated values and also in Turtle format. For the Turtle representation we used the vRank vocabulary6 .
entity* the URI of a DBpedia entity that the user wants to browse.
k* the maximum number of statements the user wants to retrieve about the entity.
predicate the URI of a DBpedia predicate. If this parameter is present, the system focuses on statements that involve the given entity in combination with the given predicate.
The system currently focuses on statements that involve two DBpedia entities7 and, as such, does not consider statements with literal values, classes, or external resources. For each entity we use its incoming and outgoing typed links. Thus, the result is a mix of statements where the summarized entity is either in the subject or object position. This also includes results of queries where the predicate parameter was given. For example, using dbpedia-owl:order in combination with dbpedia:Apodiformes will retrieve statements where the entity is in the subject or object position of dbpedia-owl:order.
The decision on whether to include a statement in the top-k summary or not depends on the rank position. The score of a statement is the sum of the PageRank scores of the subject and the object. It has to be noted that, with the focus on a specific entity, its own score is not needed for the ranking and appears superfluous as the entity’s score influences each ranked statement equally. In fact, we add the score for reasons of consistency as we publish each statement’s score in the Turtle output of the service. Using only the subject’s (resp. object’s) score for ranking the statement would produce the same ranking but two different versions of the statement’s score depending on whether the subject or the object is currently in focus.
In many cases, there are more than one statement with the same subject-object pair. Often, this is due to the distinction between DBpedia “property” and “ontology” predicates. For these cases, we apply a simple heuristic to decide which statement we present: First, we prefer statements with the entity in the subject role over those with the entity in the object role. Second, we prefer the DBpedia “ontology” over “property” predicates. In all other cases, we select the first statement with the respective subject-object pair.
The summarum system supports two types of output via content negotiation: HTML (text/html) and Turtle (text/turtle).
5 Conclusion and Future Work
Our work adds popularity-based entity summaries to known Linked Data browsing interfaces in order to enhance user experience. We show a live demonstration online and also provide machine-readable output for further reuse of the rankings.
Predicates. In our next major release we plan to focus on the predicate component of the triple.
Literal values. We plan to include literal values as descriptors of the entities. The selection of these values is planned to be based on predicate-statistics about the entity’s RDF-type.
i18n and time. One of our further contributions will be the exploitation and combination of browsing context for region, language, and timeline-focused summaries.
Data sources. We are investigating on how to extend the summarization engine with further data sources such as Freebase and Wikidata.
Visualization and media. The HTML output of the system is currently very basic. We plan to put significant effort into the design of a more appealing show case.
Evaluation. We plan to extend our previous efforts  in designing evaluation scenarios for entity summarization.
The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 611346.
- 1.Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th International Conference on World Wide Web 7, WWW7, pp. 107–117. Elsevier Science Publishers B. V., Amsterdam (1998)Google Scholar
- 2.Hogan, A., Munoz, E., Umbrich, J.: Lodpeas: like peas in a lod (cloud). In: Proceedings of the Billion Triple Challenge (2012)Google Scholar
- 3.Krötzsch, M., Vrandečić, D., Völkel, M.: Semantic MediaWiki. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 935–942. Springer, Heidelberg (2006)Google Scholar
- 4.Musetti, A., Nuzzolese, A.G., Draicchio, F., Presutti, V., Blomqvist, E., Gangemi, A., Ciancarini, P.: Aemoo: exploratory search based on knowledge patterns over the semantic web. In: Semantic Web Challenge (2012)Google Scholar
- 5.Roa-Valverde, A., Thalhammer, A., Toma, I., Sicilia, M.-A.: Towards a formal model for sharing and reusing ranking computations. In: Proceedings of the 6th International Workshop on Ranking in Databases In conjunction with VLDB 2012 (2012)Google Scholar
- 6.Thalhammer, A., Knuth, M., Sack, H.: Evaluating entity summarization using a game-based ground truth. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part II. LNCS, vol. 7650, pp. 350–361. Springer, Heidelberg (2012)Google Scholar