What Are the Important Properties of an Entity?

Assaf, Ahmad; Atemezing, Ghislain A.; Troncy, Raphaël; Cabrio, Elena

doi:10.1007/978-3-319-11955-7_16

Ahmad Assaf⁷,
Ghislain A. Atemezing⁷,
Raphaël Troncy⁷ &
…
Elena Cabrio^7,8

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8798))

Included in the following conference series:

European Semantic Web Conference

Abstract

Entities play a key role in knowledge bases in general and in the Web of Data in particular. Entities are generally described with a lot of properties, this is the case for DBpedia. It is, however, difficult to assess which ones are more “important” than others for particular tasks such as visualizing the key facts of an entity or filtering out the ones which will yield better instance matching. In this paper, we perform a reverse engineering of the Google Knowledge graph panel to find out what are the most “important” properties for an entity according to Google. We compare these results with a survey we conducted on 152 users. We finally show how we can represent and explicit this knowledge using the Fresnel vocabulary.

You have full access to this open access chapter, Download conference paper PDF

VoldemortKG: Mapping schema.org and Web Entities to Linked Open Data

Entity Extraction from Wikipedia List Pages

DBkWik: extracting and integrating knowledge from thousands of Wikis

Article 02 November 2019

Keywords

1 Introduction

In many knowledge bases, entities are described with numerous properties. However, not all properties have the same importance. Some properties are considered as keys for performing instance matching tasks while other properties are generally chosen for quickly providing a summary of the key facts attached to an entity. Our motivation is to provide a method enabling to select what properties should be used when depicting the summary of an entity, for example in a multimedia question answering system such as QakisMedia^{Footnote 1} or in a second screen application providing more information about a particular TV program^{Footnote 2}.

Our approach consists in: (i) reverse engineering the Google Knowledge Panel by extracting the properties that Google considers as sufficiently important to show (Sect. 2), and (ii) analyzing users’ preferences by conducting a user survey and comparing the results (Sect. 3). We finally show how we can explicitly represent this knowledge of preferred properties to attach to an entity using the Fresnel vocabulary before concluding (Sect. 4).

2 Reverse Engineering the Google KG Panel

Web scraping is a technique for extracting data from Web pages. We aim at capturing the properties depicted in the Google Knowledge Panel (GKP) that are injected in search result pages [1]. We have developed a Node.js application that queries all DBpedia concepts that have at least one instance which is owl:sameAs with a Freebase resource in order to increase the probability that the search engine result page (SERP) for this resource will contain a GKP. We assume in our experiment that the properties displayed for an entity are “entity type dependent" and that context (country, query, time, etc.) can affect the results. Moreover, we filter out generic concepts by excluding those who are direct subclasses of owl:Thing since they will trigger ambiguous queries. We obtained a list of \(352\) concepts^{Footnote 3}.

For each of these concepts, we retrieve \(n\) instances^{Footnote 4}. For each of these instances, we issue a search query to Google containing the instance label. Google does not serve the GKP for all user agents and we had to mimic a browser behavior by setting the \(User-Agent\) to a particular browser. We use CSS selectors to extract data from a GKP. An example of a query selector is \(.\_om\) (all elements with class name \(\_om\)) which returns the property DOM element(s) for the concept described in the GKP. From our experiments, we found out that we do not always get a GKP in a SERP. If this happens, we disambiguate the instance by issuing a new query with the concept type attached. However, if no GKP was found again, we capture that for manual inspection later on. Listing 1 gives the high level algorithm for extracting the GKP. The full implementation can be found at https://github.com/ahmadassaf/KBE.

3 Evaluation

We conducted a user survey in order to compare what users think should be the important properties to display for a particular entity and what the GKP shows.

User survey. We set up a survey^{Footnote 5} on February 25th, 2014 and for three weeks in order to collect the preferences of users in term of the properties they would like to be shown for a particular entity. We select one representative entity for nine classes: TennisPlayer, Museum, Politician, Company, Country, City, Film, SoccerClub and Book. 152 participants have provided answers, 72 % from academia, 20 % coming from the industry and 8 % having not declared their affiliation. 94 % of the respondents have heard about the Semantic Web while 35 % were not familiar with specific visualization tools. The detailed results^{Footnote 6} show the ranking of the top properties for each entity. We only keep the properties having received at least 10 % votes for comparing with the properties depicted in a KGP. Hence, users do not seem to be interested in the INSEE code identifying a French city while they expect to see the population or the points of interest of this city.

Comparison with the Knowledge Graphs. The results of the Google Knowledge Panel (GKP) extraction^{Footnote 7} clearly show a long tail distribution of the properties depicted by Google, with a top N properties (N being 4, 5 or 6 depending on the entity) counting for 98 % of the properties shown for this type. We compare those properties with the ones revealed by the user study. Table 1 shows the agreement between the users and the choices made by Google in the GKP for the 9 classes. The highest agreement concerns the type Museum (66.97 %) while the lowest one is for the TennisPlayer (20 %) concept. We think properties for museums or Books are more stable (no many variety) while for entities categories of Person/Agent, they change a lot according to the status, the function, etc. And so more subjective.

Table 1. Agreement on properties between the users and the Knowledge Graph Panel

Full size table

With this set of 9 concepts, we are covering \(301,189\) DBpedia entities that have an existence in Freebase, and for each of them, we can now empirically define the most important properties when there is an agreement between one of the biggest knowledge base (Google) and users preferences.

Modeling the preferred properties with Fresnel. Fresnel^{Footnote 8} is a presentation vocabulary for displaying RDF data. It specifies what information contained in an RDF graph should be presented with the core concept fresnel:Lens [2]. We use the Fresnel and PROV-O ontologies^{Footnote 9} to explicitly represent what properties should be depicted when displaying an entity.

4 Conclusion and Future Work

We have shown that it is possible to reveal what are the “important” properties of entities by reverse engineering the choices made by Google when creating knowledge graph panels and by comparing users preferences obtained from a user survey. Our motivation is to represent this choice explicitly, using the Fresnel vocabulary, so that any application could read this configuration file for deciding which properties of an entity is worth to visualize. This is fundamentally different from the work in [4] where the authors created a generalizable approach to open up closed knowledge bases like Google’s by means of crowd-sourcing the knowledge extraction task. We are aware that this knowledge is highly dynamic, the Google Knowledge Graph panel varies across geolocation and time. We have provided the code that enables to perform new calculation at run time and we aim to study the temporal evolution of what are important properties on a longer period. This knowledge which has been captured will be made available shortly in a SPARQL endpoint. We are also investigating the use of Mechanical Turk to perform a larger survey for the complete set of DBpedia classes.

Notes

1.
http://qakis.org/
2.
http://www.linkedtv.eu/demos/linkednews/
3.
SPARQL query: http://goo.gl/EYuGm1.
4.
In our experiment, \(n\) was equal to 100 random instances.
5.
The survey is at http://eSurv.org?u=entityviz.
6.
https://github.com/ahmadassaf/KBE/blob/master/results/agreement-gkp- users.xls
7.
https://github.com/ahmadassaf/KBE/blob/master/results/survey.json
8.
http://www.w3.org/2005/04/fresnel-info/
9.
http://www.w3.org/TR/prov-o/

References

Bergman, M.: Deconstructing the Google Knowledge Graph. http://www.mkbergman.com/1009/deconstructing-the-google-knowledge-graph
Pietriga, E., Bizer, C., Karger, D.R., Lee, R.: Fresnel: a browser-independent presentation vocabulary for RDF. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 158–171. Springer, Heidelberg (2006)
Google Scholar
Shneiderman, B.: The eyes have it: a task by data type taxonomy for information visualizations. In: IEEE Symposium on Visual Languages, pp. 336–343 (1996)
Google Scholar
Steiner, T., Mirea, S.: SEKI@home or crowdsourcing an open knowledge graph. In: 1st International Workshop on Knowledge Extraction and Consolidation from Social Media (KECSM’12), Boston, USA (2012)
Google Scholar

Download references

Acknowledgments

This work has been partially supported by the ANR Datalift (ANR-10-CORD-009) and UCN (ANR-11-LABX-0031-01) projects.

Author information

Authors and Affiliations

EURECOM, Sophia Antipolis, Nice, France
Ahmad Assaf, Ghislain A. Atemezing, Raphaël Troncy & Elena Cabrio
INRIA, Sophia Antipolis, Nice, France
Elena Cabrio

Authors

Ahmad Assaf
View author publications
You can also search for this author in PubMed Google Scholar
Ghislain A. Atemezing
View author publications
You can also search for this author in PubMed Google Scholar
Raphaël Troncy
View author publications
You can also search for this author in PubMed Google Scholar
Elena Cabrio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmad Assaf .

Editor information

Editors and Affiliations

ISTC-CNR, Rome, Italy
Valentina Presutti
Linköping University, Linköping, Sweden
Eva Blomqvist
EURECOM, Biot, France
Raphael Troncy
Hasso-Plattner-Institut, Potsdam, Brandenburg, Germany
Harald Sack
Ionian University, Corfu, Greece
Ioannis Papadakis
Elsevier B.V., Amsterdem, The Netherlands
Anna Tordai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Assaf, A., Atemezing, G.A., Troncy, R., Cabrio, E. (2014). What Are the Important Properties of an Entity?. In: Presutti, V., Blomqvist, E., Troncy, R., Sack, H., Papadakis, I., Tordai, A. (eds) The Semantic Web: ESWC 2014 Satellite Events. ESWC 2014. Lecture Notes in Computer Science(), vol 8798. Springer, Cham. https://doi.org/10.1007/978-3-319-11955-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-11955-7_16
Published: 16 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11954-0
Online ISBN: 978-3-319-11955-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

What Are the Important Properties of an Entity?

Abstract

Similar content being viewed by others

VoldemortKG: Mapping schema.org and Web Entities to Linked Open Data

Entity Extraction from Wikipedia List Pages

DBkWik: extracting and integrating knowledge from thousands of Wikis

Keywords

1 Introduction

2 Reverse Engineering the Google KG Panel

3 Evaluation

4 Conclusion and Future Work

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

What Are the Important Properties of an Entity?

Abstract

Similar content being viewed by others

VoldemortKG: Mapping schema.org and Web Entities to Linked Open Data

Entity Extraction from Wikipedia List Pages

DBkWik: extracting and integrating knowledge from thousands of Wikis

Keywords

1 Introduction

2 Reverse Engineering the Google KG Panel

3 Evaluation

4 Conclusion and Future Work

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation