Skip to main content

Using Crowdsourcing for Fine-Grained Entity Type Completion in Knowledge Bases

Part of the Lecture Notes in Computer Science book series (LNISA,volume 10988)


Recent years have witnessed the proliferation of large-scale Knowledge Bases (KBs). However, many entities in KBs have incomplete type information, and some are totally untyped. Even worse, fine-grained types (e.g., BasketballPlayer) containing rich semantic meanings are more likely to be incomplete, as they are more difficult to be obtained. Existing machine-based algorithms use predicates (e.g., birthPlace) of entities to infer their missing types, and they have limitations that the predicates may be insufficient to infer fine-grained types. In this paper, we utilize crowdsourcing to solve the problem, and address the challenge of controlling crowdsourcing cost. To this end, we propose a hybrid machine-crowdsourcing approach for fine-grained entity type completion. It firstly determines the types of some “representative” entities via crowdsourcing and then infers the types for remaining entities based on the crowdsourcing results. To support this approach, we first propose an embedding-based influence for type inference which considers not only the distance between entity embeddings but also the distances between entity and type embeddings. Second, we propose a new difficulty model for entity selection which can better capture the uncertainty of the machine algorithm when identifying the entity types. We demonstrate the effectiveness of our approach through experiments on real crowdsourcing platforms. The results show that our method outperforms the state-of-the-art algorithms by improving the effectiveness of fine-grained type completion at affordable crowdsourcing cost.


  • Crowdsourcing
  • Entity type completion
  • Knowledge base

This work is partially supported by National Natural Science Foundation of China (No. 61602488, No. 61632016 and No. 61472427) and Academy of Finland (No. 310321).

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-96893-3_19
  • Chapter length: 16 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   69.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-96893-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   89.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.


  1. 1.

  2. 2.

  3. 3.

  4. 4.

  5. 5.

  6. 6.

  7. 7.


  1. Palmero Aprosio, A., Giuliano, C., Lavelli, A.: Automatic expansion of DBpedia exploiting wikipedia cross-language information. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 397–411. Springer, Heidelberg (2013).

    CrossRef  Google Scholar 

  2. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase:a collaboratively created graph database for structuring human knowledge. In: SIGMOD Conference, pp. 1247–1250 (2008)

    Google Scholar 

  3. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: International Conference on Neural Information Processing Systems, pp. 2787–2795 (2013)

    Google Scholar 

  4. Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the em algorithm. J. Roy. Stat. Soc. 28(1), 20–28 (1979)

    Google Scholar 

  5. Dong, Z., Lu, J., Ling, T.W.: PANDA: a platform for academic knowledge discovery and acquisition. In: 2016 International Conference on Big Data and Smart Computing (BigComp), pp. 10–17. IEEE (2016)

    Google Scholar 

  6. Dong, Z., Lu, J., Ling, T.W., Fan, J., Chen, Y.: Using hybrid algorithmic-crowdsourcing methods for academic knowledge acquisition. Cluster Comput. 20(4), 3629–3641 (2017).

    CrossRef  Google Scholar 

  7. Fan, J., Lu, M., Ooi, B.C., Tan, W.C., Zhang, M.: A hybrid machine-crowdsourcing system for matching web tables. In: IEEE International Conference on Data Engineering, pp. 976–987 (2014)

    Google Scholar 

  8. Gangemi, A., Nuzzolese, A.G., Presutti, V., Draicchio, F., Musetti, A., Ciancarini, P.: Automatic typing of DBpedia entities. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 65–81. Springer, Heidelberg (2012).

    CrossRef  Google Scholar 

  9. Huang, F., Li, J., Lu, J., Ling, T.W., Dong, Z.: PandaSearch: a fine-grained academic search engine for research documents. In: ICDE 2015 (2015)

    Google Scholar 

  10. Kejriwal, M., Szekely, P.: Supervised typing of big graphs using semantic embeddings, p. 3 (2017)

    Google Scholar 

  11. Kondreddi, S.K., Triantafillou, P., Weikum, G.: Combining information extraction and human computing for crowdsourced knowledge acquisition. In: ICDE, pp. 988–999 (2014)

    Google Scholar 

  12. Lehmann, J.: DBpedia: a large-scale, multilingual knowledge base extracted from Wikipedia. Seman. Web 6(2), 167–195 (2015)

    Google Scholar 

  13. Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2181–2187 (2015)

    Google Scholar 

  14. Lofi, C., Maarry, K.E.: Design patterns for hybrid algorithmic-crowdsourcing workflows. In: CBI, pp. 1–8 (2014)

    Google Scholar 

  15. Melo, A., Völker, J., Paulheim, H.: Type prediction in noisy RDF knowledge bases using hierarchical multilabel classification with graph and latent features. Int. J. Artif. Intell. Tools 26(2), 1760011 (2017)

    CrossRef  Google Scholar 

  16. Mozafari, B., Sarkar, P., Franklin, M.J., Jordan, M.I., Madden, S.: Scaling up crowd-sourcing to very large datasets: a case for active learning. Proc. VLDB Endow. (PVLDB) 8(2), 125–136 (2014)

    CrossRef  Google Scholar 

  17. Nickel, M., Rosasco, L., Poggio, T.: Holographic embeddings of knowledge graphs. In: Thirtieth AAAI Conference on Artificial Intelligence, pp. 1955–1961 (2016)

    Google Scholar 

  18. Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Seman. Web 8, 1–20 (2016). (Preprint) survey

    CrossRef  Google Scholar 

  19. Paulheim, H., Bizer, C.: Type inference on noisy RDF data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 510–525. Springer, Heidelberg (2013).

    CrossRef  Google Scholar 

  20. Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. Int. J. Seman. Web Inf. Syst. 10(2), 63–86 (2014)

    CrossRef  Google Scholar 

  21. Rebele, T., Suchanek, F., Hoffart, J., Biega, J., Kuzey, E., Weikum, G.: YAGO: a multilingual knowledge base from Wikipedia, wordnet, and geonames. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 177–185. Springer, Cham (2016).

    CrossRef  Google Scholar 

  22. Sleeman, J., Finin, T.: Type prediction for efficient coreference resolution in heterogeneous semantic graphs. In: IEEE Seventh International Conference on Semantic Computing, pp. 78–85 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ju Fan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Dong, Z., Fan, J., Lu, J., Du, X., Ling, T.W. (2018). Using Crowdsourcing for Fine-Grained Entity Type Completion in Knowledge Bases. In: Cai, Y., Ishikawa, Y., Xu, J. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 10988. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96892-6

  • Online ISBN: 978-3-319-96893-3

  • eBook Packages: Computer ScienceComputer Science (R0)