Skip to main content

Doctoral Advisor or Medical Condition: Towards Entity-Specific Rankings of Knowledge Base Properties

  • Conference paper
  • First Online:
Book cover Advanced Data Mining and Applications (ADMA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10604))

Included in the following conference series:

Abstract

In knowledge bases such as Wikidata, it is possible to assert a large set of properties for entities, ranging from generic ones such as name and place of birth to highly profession-specific or background-specific ones such as doctoral advisor or medical condition. Determining a preference or ranking in this large set is a challenge in tasks such as prioritisation of edits or natural-language generation. Most previous approaches to ranking knowledge base properties are purely data-driven, that is, as we show, mistake frequency for interestingness. In this work, we have developed a human-annotated dataset of 350 preference judgments among pairs of knowledge base properties for fixed entities. From this set, we isolate a subset of pairs for which humans show a high level of agreement (87.5% on average). We show, however, that baseline and state-of-the-art techniques achieve only 61.3% precision in predicting human preferences for this subset. We then develop a technique based on a combination of general frequency, applicability to similar entities and semantic similarity that achieves 74% precision. The preference dataset is available at https://www.kaggle.com/srazniewski/wikidatapropertyranking.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/Wikidata-lib/PropertySuggester.

  2. 2.

    https://meta.wikimedia.org/wiki/Research:Wikidata_gap_analysis#Conclusion.

  3. 3.

    For instance, as of March 21st, 2016, there were 2202 properties, while as of February 7, 2017, there are 2719 according to https://tools.wmflabs.org/hay/propbrowse/.

  4. 4.

    https://www.crowdflower.com.

  5. 5.

    https://www.wikidata.org/wiki/Property:P155.

  6. 6.

    Not to be mixed with ensemble learning, a machine learning approach where consecutive instances of the same classifier are trained especially on records that previous instances predicted wrongly. Ensemble learning requires a sufficient amount of labeled training data, which is not available in our case.

References

  1. Abedjan, Z., Naumann, F.: Improving RDF data through association rule mining. Datenbank-Spektrum 13(2), 111–120 (2013)

    Article  Google Scholar 

  2. Ahmeti, A., Razniewski, S., Polleres, A.: Assessing the completeness of entities in knowledge bases. In: ESWC P&D (2017)

    Google Scholar 

  3. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data (2007)

    Google Scholar 

  4. Bast, H., Buchhold, B., Haussmann, E.: Relevance scores for triples from type-like relations. In: SIGIR, pp. 243–252. New York (2015)

    Google Scholar 

  5. Blei, D.M., Ng, Y.M., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  6. Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: ICML, pp. 129–136 (2007)

    Google Scholar 

  7. Carterette, B., Bennett, P.N., Chickering, D.M., Dumais, S.T.: Here or there. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 16–27. Springer, Heidelberg (2008). doi:10.1007/978-3-540-78646-7_5

    Chapter  Google Scholar 

  8. de Condorcet, M.: Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Imprimerie Royale, Paris (1785)

    Google Scholar 

  9. Deerwester, S.: Improving information retrieval with latent semantic indexing (1988)

    Google Scholar 

  10. Dessi, A., Atzori, M.: A machine-learning approach to ranking RDF properties. FGCS 54, 366–377 (2016)

    Article  Google Scholar 

  11. Fatma, N., Chinnakotla, M., Shrivastava, M.: The unusual suspects: deep learning based mining of interesting entity trivia from knowledge graphs. In: AAAI 2017 (2017)

    Google Scholar 

  12. Gassler, W., Zangerle, E., Specht, G.: Guided curation of semistructured data in collaboratively-built knowledge bases. FGCS 31, 111–119 (2014)

    Article  Google Scholar 

  13. Heindorf, S., Potthast, M., Bast, H., Buchhold, Haussmann, E.: WSDM cup 2017: vandalism detection and triple scoring (2017)

    Google Scholar 

  14. Jones, N., Brun, A., Boyer, A.: Comparisons instead of ratings: towards more stable preferences. In: WI-IAT, pp. 451–456 (2011)

    Google Scholar 

  15. Kalloori, S., Ricci, F., Tkalcic, M.: Pairwise preferences based matrix factorization and nearest neighbor recommendation techniques (2016)

    Google Scholar 

  16. Langer, P., Schulze, P., George, S., Kohnen, M., Metzke, T., Abedjan, Z., Kasneci, G.: Assigning global relevance scores to dbpedia facts. In: ICDE Workshops, pp. 248–253 (2014)

    Google Scholar 

  17. Li, H.: A short introduction to learning to rank. In: IEICE Transactions (2011)

    Google Scholar 

  18. Mousavi, H., Gao, S., Zaniolo, C.: IBminer: a text mining tool for constructing and populating infobox databases and knowledge bases. In: VLDB (2013)

    Google Scholar 

  19. Pan, S.J., Yang, Q.: A survey on transfer learning. In: TKDE (2010)

    Google Scholar 

  20. Prakash, A., Chinnakotla, M.K., Patel, D., Garg, P.: Did you know?-mining interesting trivia for entities from wikipedia (2015)

    Google Scholar 

  21. Razniewski, S., Suchanek, F.M., Nutt, W.: But what do we actually know. In: AKBC, pp. 40–44 (2016)

    Google Scholar 

  22. Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)

    Google Scholar 

  23. Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledge base. CACM 57(10), 78–85 (2014)

    Article  Google Scholar 

  24. Zangerle, E., Gassler, W., Pichl, M., Steinhauser, S., Specht, G.: An empirical evaluation of property recommender systems for Wikidata and collaborative knowledge bases. In: Opensym (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simon Razniewski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Razniewski, S., Balaraman, V., Nutt, W. (2017). Doctoral Advisor or Medical Condition: Towards Entity-Specific Rankings of Knowledge Base Properties. In: Cong, G., Peng, WC., Zhang, W., Li, C., Sun, A. (eds) Advanced Data Mining and Applications. ADMA 2017. Lecture Notes in Computer Science(), vol 10604. Springer, Cham. https://doi.org/10.1007/978-3-319-69179-4_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69179-4_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69178-7

  • Online ISBN: 978-3-319-69179-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics