A Scalable Approach to Incrementally Building Knowledge Graphs

Gawriljuk, Gleb; Harth, Andreas; Knoblock, Craig A.; Szekely, Pedro

doi:10.1007/978-3-319-43997-6_15

Gleb Gawriljuk¹⁷,
Andreas Harth¹⁷,
Craig A. Knoblock¹⁸ &
…
Pedro Szekely¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9819))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

1822 Accesses
10 Citations

Abstract

We work on converting the metadata of 13 American art museums and archives into Linked Data, to be able to integrate and query the resulting data. While there are many good sources of artist data, no single source covers all artists. We thus address the challenge of building a comprehensive knowledge graph of artists that we can then use to link the data from each of the individual museums. We present a framework to construct and incrementally extend a knowledge graph, describe and evaluate techniques for efficiently building knowledge graphs through the use of the MinHash/LSH algorithm for generating candidate matches, and conduct an evaluation that demonstrates our approach can efficiently and accurately build a knowledge graph about artists.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://americanartcollaborative.org/.
2.
http://www.getty.edu/.
3.
Please note that not all of the people in DBpedia and VIAF are artists.
4.
The 2-gram of the first name ‘Roy’ consists of {_R, Ro, oy, y_}.
5.
The Jaccard similarity between sets S and T is defined as \(\frac{\mid S \cap T \mid }{\mid S \cup T \mid }\).
6.
http://www.w3.org/TR/prov-o/.
7.
https://bitbucket.org/GlebGawriljuk/aifb-isi-knowledgegraphconstruction/raw/168b6ec21654e1de01d546567f7232b77daaf1a2/groundTruth_final_2015.tsv.
8.
http://vocab.getty.edu/ulan/500018769.
9.
http://edan.si.edu/saam/id/person-institution/121.
10.
For example, the series of workshops on Automated Knowledge Base Construction (AKBC), http://www.akbc.ws/.
11.
http://www.geonames.org/.

References

Alexander, G., Ororbia, I., Wu, J., Giles, C.L: CiteSeerX: intelligent information extraction and knowledge creation from web-based data. In: Proceedings of the 4th Workshop on Automated Knowledge Base Construction at NIPS (2014)
Google Scholar
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2008)
Google Scholar
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E.R.J., Mitchell, T.: Toward an architecture for never-ending language learning. In: Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI) (2010)
Google Scholar
Doan, A., Halevy, A., Ives, Z.: Principles of Data Integration. Elsevier, Amsterdam (2012)
Google Scholar
Dong, X.L., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014)
Google Scholar
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in NLP and Computational Natural Language Learning (EMNLP) (2011)
Google Scholar
Fan, J., Ferrucci, D., Gondek, D., Kalyanpur, A.: PRISMATIC: inducing knowledge from a large scale lexicalized relation resource. In: Proceedings of the 1st International Workshop on Formalisms and Methodology for Learning by Reading (2010)
Google Scholar
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases (1999)
Google Scholar
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. J. 194, 28–61 (2013)
Article MathSciNet MATH Google Scholar
Knoblock, C.A., Szekely, P., Ambite, J.L., Goel, A., Gupta, S., Lerman, K., Muslea, M., Taheriyan, M., Mallick, P.: Semi-automatically mapping structured sources into the semantic web. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 375–390. Springer, Heidelberg (2012)
Chapter Google Scholar
Mausam, Schmitz, M., Bart, R., Soderland, S., Etzioni, O.: Open language learning for information extraction. In: Proceedings of the Conference on Empirical Methods on NLP and Computational Natural Language Learning (EMNLP) (2012)
Google Scholar
Pujara, J., Getoor, L.: Building dynamic knowledge graphs. In: Proceedings of the Knowledge Extraction Workshop at NAACL-HLT (2014)
Google Scholar
Pujara, J., Miao, H., Getoor, L., Cohen, W.: Knowledge graph identification. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 542–557. Springer, Heidelberg (2013)
Chapter Google Scholar
Schultz, A., Matteini, A., Isele, R., Mendes, P., Bizer, C., Becker, C.: LDIF - a framework for large-scale linked data integration graphs. In: Proceedings of 21st International Conference on World Wide Web (2012)
Google Scholar
Suchanek, F., Kasneci, G., Weikum, G.: YAGO - a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web (2007)
Google Scholar
Szekely, P., et al.: Building and using a knowledge graph to combat human trafficking. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 205–221. Springer, Heidelberg (2015)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Applied Informatics and Formal Description Methods (AIFB), Karlsruhe Institute of Technology, 76128, Karlsruhe, Germany
Gleb Gawriljuk & Andreas Harth
Information Sciences Institute, University of Southern California, Marina Del Rey, CA, 90292, USA
Craig A. Knoblock & Pedro Szekely

Authors

Gleb Gawriljuk
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Harth
View author publications
You can also search for this author in PubMed Google Scholar
Craig A. Knoblock
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Szekely
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Harth .

Editor information

Editors and Affiliations

Universität Duisburg-Essen , Duisburg, Germany
Norbert Fuhr
Hungarian Academy of Science , Budapest, Hungary
László Kovács
Leibniz Universität Hannover , Hannover, Germany
Thomas Risse
Leibniz Universität Hannover , Hannover, Germany
Wolfgang Nejdl

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gawriljuk, G., Harth, A., Knoblock, C.A., Szekely, P. (2016). A Scalable Approach to Incrementally Building Knowledge Graphs. In: Fuhr, N., Kovács, L., Risse, T., Nejdl, W. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2016. Lecture Notes in Computer Science(), vol 9819. Springer, Cham. https://doi.org/10.1007/978-3-319-43997-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-43997-6_15
Published: 10 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43996-9
Online ISBN: 978-3-319-43997-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics