Skip to main content

A Scalable Approach to Incrementally Building Knowledge Graphs

  • Conference paper
  • First Online:
Research and Advanced Technology for Digital Libraries (TPDL 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9819))

Included in the following conference series:

Abstract

We work on converting the metadata of 13 American art museums and archives into Linked Data, to be able to integrate and query the resulting data. While there are many good sources of artist data, no single source covers all artists. We thus address the challenge of building a comprehensive knowledge graph of artists that we can then use to link the data from each of the individual museums. We present a framework to construct and incrementally extend a knowledge graph, describe and evaluate techniques for efficiently building knowledge graphs through the use of the MinHash/LSH algorithm for generating candidate matches, and conduct an evaluation that demonstrates our approach can efficiently and accurately build a knowledge graph about artists.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://americanartcollaborative.org/.

  2. 2.

    http://www.getty.edu/.

  3. 3.

    Please note that not all of the people in DBpedia and VIAF are artists.

  4. 4.

    The 2-gram of the first name ‘Roy’ consists of {_R, Ro, oy, y_}.

  5. 5.

    The Jaccard similarity between sets S and T is defined as \(\frac{\mid S \cap T \mid }{\mid S \cup T \mid }\).

  6. 6.

    http://www.w3.org/TR/prov-o/.

  7. 7.

    https://bitbucket.org/GlebGawriljuk/aifb-isi-knowledgegraphconstruction/raw/168b6ec21654e1de01d546567f7232b77daaf1a2/groundTruth_final_2015.tsv.

  8. 8.

    http://vocab.getty.edu/ulan/500018769.

  9. 9.

    http://edan.si.edu/saam/id/person-institution/121.

  10. 10.

    For example, the series of workshops on Automated Knowledge Base Construction (AKBC), http://www.akbc.ws/.

  11. 11.

    http://www.geonames.org/.

References

  1. Alexander, G., Ororbia, I., Wu, J., Giles, C.L: CiteSeerX: intelligent information extraction and knowledge creation from web-based data. In: Proceedings of the 4th Workshop on Automated Knowledge Base Construction at NIPS (2014)

    Google Scholar 

  2. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2008)

    Google Scholar 

  3. Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E.R.J., Mitchell, T.: Toward an architecture for never-ending language learning. In: Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI) (2010)

    Google Scholar 

  4. Doan, A., Halevy, A., Ives, Z.: Principles of Data Integration. Elsevier, Amsterdam (2012)

    Google Scholar 

  5. Dong, X.L., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014)

    Google Scholar 

  6. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in NLP and Computational Natural Language Learning (EMNLP) (2011)

    Google Scholar 

  7. Fan, J., Ferrucci, D., Gondek, D., Kalyanpur, A.: PRISMATIC: inducing knowledge from a large scale lexicalized relation resource. In: Proceedings of the 1st International Workshop on Formalisms and Methodology for Learning by Reading (2010)

    Google Scholar 

  8. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases (1999)

    Google Scholar 

  9. Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. J. 194, 28–61 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  10. Knoblock, C.A., Szekely, P., Ambite, J.L., Goel, A., Gupta, S., Lerman, K., Muslea, M., Taheriyan, M., Mallick, P.: Semi-automatically mapping structured sources into the semantic web. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 375–390. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  11. Mausam, Schmitz, M., Bart, R., Soderland, S., Etzioni, O.: Open language learning for information extraction. In: Proceedings of the Conference on Empirical Methods on NLP and Computational Natural Language Learning (EMNLP) (2012)

    Google Scholar 

  12. Pujara, J., Getoor, L.: Building dynamic knowledge graphs. In: Proceedings of the Knowledge Extraction Workshop at NAACL-HLT (2014)

    Google Scholar 

  13. Pujara, J., Miao, H., Getoor, L., Cohen, W.: Knowledge graph identification. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 542–557. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  14. Schultz, A., Matteini, A., Isele, R., Mendes, P., Bizer, C., Becker, C.: LDIF - a framework for large-scale linked data integration graphs. In: Proceedings of 21st International Conference on World Wide Web (2012)

    Google Scholar 

  15. Suchanek, F., Kasneci, G., Weikum, G.: YAGO - a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web (2007)

    Google Scholar 

  16. Szekely, P., et al.: Building and using a knowledge graph to combat human trafficking. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 205–221. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Harth .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Gawriljuk, G., Harth, A., Knoblock, C.A., Szekely, P. (2016). A Scalable Approach to Incrementally Building Knowledge Graphs. In: Fuhr, N., Kovács, L., Risse, T., Nejdl, W. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2016. Lecture Notes in Computer Science(), vol 9819. Springer, Cham. https://doi.org/10.1007/978-3-319-43997-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43997-6_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43996-9

  • Online ISBN: 978-3-319-43997-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics