Advertisement

Doctoral Advisor or Medical Condition: Towards Entity-Specific Rankings of Knowledge Base Properties

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10604)

Abstract

In knowledge bases such as Wikidata, it is possible to assert a large set of properties for entities, ranging from generic ones such as name and place of birth to highly profession-specific or background-specific ones such as doctoral advisor or medical condition. Determining a preference or ranking in this large set is a challenge in tasks such as prioritisation of edits or natural-language generation. Most previous approaches to ranking knowledge base properties are purely data-driven, that is, as we show, mistake frequency for interestingness. In this work, we have developed a human-annotated dataset of 350 preference judgments among pairs of knowledge base properties for fixed entities. From this set, we isolate a subset of pairs for which humans show a high level of agreement (87.5% on average). We show, however, that baseline and state-of-the-art techniques achieve only 61.3% precision in predicting human preferences for this subset. We then develop a technique based on a combination of general frequency, applicability to similar entities and semantic similarity that achieves 74% precision. The preference dataset is available at https://www.kaggle.com/srazniewski/wikidatapropertyranking.

References

  1. 1.
    Abedjan, Z., Naumann, F.: Improving RDF data through association rule mining. Datenbank-Spektrum 13(2), 111–120 (2013)CrossRefGoogle Scholar
  2. 2.
    Ahmeti, A., Razniewski, S., Polleres, A.: Assessing the completeness of entities in knowledge bases. In: ESWC P&D (2017)Google Scholar
  3. 3.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data (2007)Google Scholar
  4. 4.
    Bast, H., Buchhold, B., Haussmann, E.: Relevance scores for triples from type-like relations. In: SIGIR, pp. 243–252. New York (2015)Google Scholar
  5. 5.
    Blei, D.M., Ng, Y.M., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATHGoogle Scholar
  6. 6.
    Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: ICML, pp. 129–136 (2007)Google Scholar
  7. 7.
    Carterette, B., Bennett, P.N., Chickering, D.M., Dumais, S.T.: Here or there. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 16–27. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-78646-7_5 CrossRefGoogle Scholar
  8. 8.
    de Condorcet, M.: Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Imprimerie Royale, Paris (1785)Google Scholar
  9. 9.
    Deerwester, S.: Improving information retrieval with latent semantic indexing (1988)Google Scholar
  10. 10.
    Dessi, A., Atzori, M.: A machine-learning approach to ranking RDF properties. FGCS 54, 366–377 (2016)CrossRefGoogle Scholar
  11. 11.
    Fatma, N., Chinnakotla, M., Shrivastava, M.: The unusual suspects: deep learning based mining of interesting entity trivia from knowledge graphs. In: AAAI 2017 (2017)Google Scholar
  12. 12.
    Gassler, W., Zangerle, E., Specht, G.: Guided curation of semistructured data in collaboratively-built knowledge bases. FGCS 31, 111–119 (2014)CrossRefGoogle Scholar
  13. 13.
    Heindorf, S., Potthast, M., Bast, H., Buchhold, Haussmann, E.: WSDM cup 2017: vandalism detection and triple scoring (2017)Google Scholar
  14. 14.
    Jones, N., Brun, A., Boyer, A.: Comparisons instead of ratings: towards more stable preferences. In: WI-IAT, pp. 451–456 (2011)Google Scholar
  15. 15.
    Kalloori, S., Ricci, F., Tkalcic, M.: Pairwise preferences based matrix factorization and nearest neighbor recommendation techniques (2016)Google Scholar
  16. 16.
    Langer, P., Schulze, P., George, S., Kohnen, M., Metzke, T., Abedjan, Z., Kasneci, G.: Assigning global relevance scores to dbpedia facts. In: ICDE Workshops, pp. 248–253 (2014)Google Scholar
  17. 17.
    Li, H.: A short introduction to learning to rank. In: IEICE Transactions (2011)Google Scholar
  18. 18.
    Mousavi, H., Gao, S., Zaniolo, C.: IBminer: a text mining tool for constructing and populating infobox databases and knowledge bases. In: VLDB (2013)Google Scholar
  19. 19.
    Pan, S.J., Yang, Q.: A survey on transfer learning. In: TKDE (2010)Google Scholar
  20. 20.
    Prakash, A., Chinnakotla, M.K., Patel, D., Garg, P.: Did you know?-mining interesting trivia for entities from wikipedia (2015)Google Scholar
  21. 21.
    Razniewski, S., Suchanek, F.M., Nutt, W.: But what do we actually know. In: AKBC, pp. 40–44 (2016)Google Scholar
  22. 22.
    Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)Google Scholar
  23. 23.
    Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledge base. CACM 57(10), 78–85 (2014)CrossRefGoogle Scholar
  24. 24.
    Zangerle, E., Gassler, W., Pichl, M., Steinhauser, S., Specht, G.: An empirical evaluation of property recommender systems for Wikidata and collaborative knowledge bases. In: Opensym (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Simon Razniewski
    • 1
    • 2
  • Vevake Balaraman
    • 3
  • Werner Nutt
    • 1
  1. 1.Free University of Bozen-BolzanoBolzanoItaly
  2. 2.Max-Planck-Institute for InformaticsSaarbrückenGermany
  3. 3.University of TrentoTrentoItaly

Personalised recommendations