Advertisement

Doctoral Advisor or Medical Condition: Towards Entity-Specific Rankings of Knowledge Base Properties

  • Simon Razniewski
  • Vevake Balaraman
  • Werner Nutt
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10604)

Abstract

In knowledge bases such as Wikidata, it is possible to assert a large set of properties for entities, ranging from generic ones such as name and place of birth to highly profession-specific or background-specific ones such as doctoral advisor or medical condition. Determining a preference or ranking in this large set is a challenge in tasks such as prioritisation of edits or natural-language generation. Most previous approaches to ranking knowledge base properties are purely data-driven, that is, as we show, mistake frequency for interestingness. In this work, we have developed a human-annotated dataset of 350 preference judgments among pairs of knowledge base properties for fixed entities. From this set, we isolate a subset of pairs for which humans show a high level of agreement (87.5% on average). We show, however, that baseline and state-of-the-art techniques achieve only 61.3% precision in predicting human preferences for this subset. We then develop a technique based on a combination of general frequency, applicability to similar entities and semantic similarity that achieves 74% precision. The preference dataset is available at https://www.kaggle.com/srazniewski/wikidatapropertyranking.

References

  1. 1.
    Abedjan, Z., Naumann, F.: Improving RDF data through association rule mining. Datenbank-Spektrum 13(2), 111–120 (2013)CrossRefGoogle Scholar
  2. 2.
    Ahmeti, A., Razniewski, S., Polleres, A.: Assessing the completeness of entities in knowledge bases. In: ESWC P&D (2017)Google Scholar
  3. 3.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data (2007)Google Scholar
  4. 4.
    Bast, H., Buchhold, B., Haussmann, E.: Relevance scores for triples from type-like relations. In: SIGIR, pp. 243–252. New York (2015)Google Scholar
  5. 5.
    Blei, D.M., Ng, Y.M., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  6. 6.
    Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: ICML, pp. 129–136 (2007)Google Scholar
  7. 7.
    Carterette, B., Bennett, P.N., Chickering, D.M., Dumais, S.T.: Here or there. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 16–27. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-78646-7_5 CrossRefGoogle Scholar
  8. 8.
    de Condorcet, M.: Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Imprimerie Royale, Paris (1785)Google Scholar
  9. 9.
    Deerwester, S.: Improving information retrieval with latent semantic indexing (1988)Google Scholar
  10. 10.
    Dessi, A., Atzori, M.: A machine-learning approach to ranking RDF properties. FGCS 54, 366–377 (2016)CrossRefGoogle Scholar
  11. 11.
    Fatma, N., Chinnakotla, M., Shrivastava, M.: The unusual suspects: deep learning based mining of interesting entity trivia from knowledge graphs. In: AAAI 2017 (2017)Google Scholar
  12. 12.
    Gassler, W., Zangerle, E., Specht, G.: Guided curation of semistructured data in collaboratively-built knowledge bases. FGCS 31, 111–119 (2014)CrossRefGoogle Scholar
  13. 13.
    Heindorf, S., Potthast, M., Bast, H., Buchhold, Haussmann, E.: WSDM cup 2017: vandalism detection and triple scoring (2017)Google Scholar
  14. 14.
    Jones, N., Brun, A., Boyer, A.: Comparisons instead of ratings: towards more stable preferences. In: WI-IAT, pp. 451–456 (2011)Google Scholar
  15. 15.
    Kalloori, S., Ricci, F., Tkalcic, M.: Pairwise preferences based matrix factorization and nearest neighbor recommendation techniques (2016)Google Scholar
  16. 16.
    Langer, P., Schulze, P., George, S., Kohnen, M., Metzke, T., Abedjan, Z., Kasneci, G.: Assigning global relevance scores to dbpedia facts. In: ICDE Workshops, pp. 248–253 (2014)Google Scholar
  17. 17.
    Li, H.: A short introduction to learning to rank. In: IEICE Transactions (2011)Google Scholar
  18. 18.
    Mousavi, H., Gao, S., Zaniolo, C.: IBminer: a text mining tool for constructing and populating infobox databases and knowledge bases. In: VLDB (2013)Google Scholar
  19. 19.
    Pan, S.J., Yang, Q.: A survey on transfer learning. In: TKDE (2010)Google Scholar
  20. 20.
    Prakash, A., Chinnakotla, M.K., Patel, D., Garg, P.: Did you know?-mining interesting trivia for entities from wikipedia (2015)Google Scholar
  21. 21.
    Razniewski, S., Suchanek, F.M., Nutt, W.: But what do we actually know. In: AKBC, pp. 40–44 (2016)Google Scholar
  22. 22.
    Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)Google Scholar
  23. 23.
    Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledge base. CACM 57(10), 78–85 (2014)CrossRefGoogle Scholar
  24. 24.
    Zangerle, E., Gassler, W., Pichl, M., Steinhauser, S., Specht, G.: An empirical evaluation of property recommender systems for Wikidata and collaborative knowledge bases. In: Opensym (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Simon Razniewski
    • 1
    • 2
  • Vevake Balaraman
    • 3
  • Werner Nutt
    • 1
  1. 1.Free University of Bozen-BolzanoBolzanoItaly
  2. 2.Max-Planck-Institute for InformaticsSaarbrückenGermany
  3. 3.University of TrentoTrentoItaly

Personalised recommendations