Abstract
In knowledge bases such as Wikidata, it is possible to assert a large set of properties for entities, ranging from generic ones such as name and place of birth to highly profession-specific or background-specific ones such as doctoral advisor or medical condition. Determining a preference or ranking in this large set is a challenge in tasks such as prioritisation of edits or natural-language generation. Most previous approaches to ranking knowledge base properties are purely data-driven, that is, as we show, mistake frequency for interestingness. In this work, we have developed a human-annotated dataset of 350 preference judgments among pairs of knowledge base properties for fixed entities. From this set, we isolate a subset of pairs for which humans show a high level of agreement (87.5% on average). We show, however, that baseline and state-of-the-art techniques achieve only 61.3% precision in predicting human preferences for this subset. We then develop a technique based on a combination of general frequency, applicability to similar entities and semantic similarity that achieves 74% precision. The preference dataset is available at https://www.kaggle.com/srazniewski/wikidatapropertyranking.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
For instance, as of March 21st, 2016, there were 2202 properties, while as of February 7, 2017, there are 2719 according to https://tools.wmflabs.org/hay/propbrowse/.
- 4.
- 5.
- 6.
Not to be mixed with ensemble learning, a machine learning approach where consecutive instances of the same classifier are trained especially on records that previous instances predicted wrongly. Ensemble learning requires a sufficient amount of labeled training data, which is not available in our case.
References
Abedjan, Z., Naumann, F.: Improving RDF data through association rule mining. Datenbank-Spektrum 13(2), 111–120 (2013)
Ahmeti, A., Razniewski, S., Polleres, A.: Assessing the completeness of entities in knowledge bases. In: ESWC P&D (2017)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data (2007)
Bast, H., Buchhold, B., Haussmann, E.: Relevance scores for triples from type-like relations. In: SIGIR, pp. 243–252. New York (2015)
Blei, D.M., Ng, Y.M., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: ICML, pp. 129–136 (2007)
Carterette, B., Bennett, P.N., Chickering, D.M., Dumais, S.T.: Here or there. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 16–27. Springer, Heidelberg (2008). doi:10.1007/978-3-540-78646-7_5
de Condorcet, M.: Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Imprimerie Royale, Paris (1785)
Deerwester, S.: Improving information retrieval with latent semantic indexing (1988)
Dessi, A., Atzori, M.: A machine-learning approach to ranking RDF properties. FGCS 54, 366–377 (2016)
Fatma, N., Chinnakotla, M., Shrivastava, M.: The unusual suspects: deep learning based mining of interesting entity trivia from knowledge graphs. In: AAAI 2017 (2017)
Gassler, W., Zangerle, E., Specht, G.: Guided curation of semistructured data in collaboratively-built knowledge bases. FGCS 31, 111–119 (2014)
Heindorf, S., Potthast, M., Bast, H., Buchhold, Haussmann, E.: WSDM cup 2017: vandalism detection and triple scoring (2017)
Jones, N., Brun, A., Boyer, A.: Comparisons instead of ratings: towards more stable preferences. In: WI-IAT, pp. 451–456 (2011)
Kalloori, S., Ricci, F., Tkalcic, M.: Pairwise preferences based matrix factorization and nearest neighbor recommendation techniques (2016)
Langer, P., Schulze, P., George, S., Kohnen, M., Metzke, T., Abedjan, Z., Kasneci, G.: Assigning global relevance scores to dbpedia facts. In: ICDE Workshops, pp. 248–253 (2014)
Li, H.: A short introduction to learning to rank. In: IEICE Transactions (2011)
Mousavi, H., Gao, S., Zaniolo, C.: IBminer: a text mining tool for constructing and populating infobox databases and knowledge bases. In: VLDB (2013)
Pan, S.J., Yang, Q.: A survey on transfer learning. In: TKDE (2010)
Prakash, A., Chinnakotla, M.K., Patel, D., Garg, P.: Did you know?-mining interesting trivia for entities from wikipedia (2015)
Razniewski, S., Suchanek, F.M., Nutt, W.: But what do we actually know. In: AKBC, pp. 40–44 (2016)
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledge base. CACM 57(10), 78–85 (2014)
Zangerle, E., Gassler, W., Pichl, M., Steinhauser, S., Specht, G.: An empirical evaluation of property recommender systems for Wikidata and collaborative knowledge bases. In: Opensym (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Razniewski, S., Balaraman, V., Nutt, W. (2017). Doctoral Advisor or Medical Condition: Towards Entity-Specific Rankings of Knowledge Base Properties. In: Cong, G., Peng, WC., Zhang, W., Li, C., Sun, A. (eds) Advanced Data Mining and Applications. ADMA 2017. Lecture Notes in Computer Science(), vol 10604. Springer, Cham. https://doi.org/10.1007/978-3-319-69179-4_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-69179-4_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69178-7
Online ISBN: 978-3-319-69179-4
eBook Packages: Computer ScienceComputer Science (R0)