Skip to main content

Harvesting Knowledge from Cultural Heritage Artifacts in Museums of India

  • 1647 Accesses

Part of the Lecture Notes in Computer Science book series (LNAI,volume 10938)

Abstract

Recent efforts towards digitization of cultural heritage artifacts have resulted in a surge of information around these artifacts. However, the organization of these artifacts falls short with respect to accessing the facts across these entities. In this paper, we present a method to harvest the knowledge and form a knowledge graph from the digitized artifacts in the Museums of India repository via distant supervision to enable better accessibility of the facts and ability to extract new insights around the artifacts. Triples extracted from an open information extractor are first canonicalized to a standard taxonomy based on a metric-based scoring. Since a standard taxonomy is insufficient to capture all the relationships, we propose a sequential clustering based approach to add artifact specific relationships to the taxonomy (and to the knowledge graph). The graph is enriched by inferring missing facts based on a probabilistic soft logic approach seeded from a frequent item set framework. Human evaluation of the final knowledge graph showed an accuracy of \(75\%\) on par with knowledge bases like DBpedia.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-93037-4_25
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-93037-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.

References

  1. Agirre, E., Aletras, N., Clough, P.D., Fernando, S., Goodale, P., Hall, M.M., Soroa, A., Stevenson, M.: Paths: a system for accessing cultural heritage collections. In: Conference System Demonstrations, pp. 151–156. ACL (2013)

    Google Scholar 

  2. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52

    CrossRef  Google Scholar 

  3. Bach, S.H., Broecheler, M., Huang, B., Getoor, L.: Hinge-loss Markov random fields and probabilistic soft logic. arXiv preprint arXiv:1505.04406 (2015)

  4. Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI, vol. 7, pp. 2670–2676 (2007)

    Google Scholar 

  5. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250. ACM (2008)

    Google Scholar 

  6. British Museum: http://www.britishmuseum.org/

  7. Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI, vol. 5, p. 3 (2010)

    Google Scholar 

  8. Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)

    Google Scholar 

  9. Europeana Museums: https://www.europeana.eu/portal/en

  10. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545. Association for Computational Linguistics (2011)

    Google Scholar 

  11. Fernando, S., Stevenson, M.: Adapting wikification to cultural heritage. In: Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 101–106. Association for Computational Linguistics (2012)

    Google Scholar 

  12. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370. Association for Computational Linguistics (2005)

    Google Scholar 

  13. Galárraga, L.A., Teflioudi, C., Hose, K., Suchanek, F.: AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 413–422. ACM (2013)

    Google Scholar 

  14. Museums of India. http://museumsofindia.gov.in

  15. Pujara, J., Sameer Singh, B.D.: Knowledge graph construction from text. In: AAAI Tutorial (2017)

    Google Scholar 

  16. Kobren, A., Logan, T., Sampangi, S., McCallum, A.: Domain specific knowledge base construction via crowdsourcing. In: Neural Information Processing Systems Workshop on Automated Knowledge Base Construction, AKBC, Montreal, Canada (2014)

    Google Scholar 

  17. Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D.: Stanford’s multi-pass sieve coreference resolution system at the CONLL 2011 shared task. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, pp. 28–34. Association for Computational Linguistics (2011)

    Google Scholar 

  18. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  19. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  20. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    CrossRef  Google Scholar 

  21. Nakashole, N., Weikum, G., Suchanek, F.: PATTY: a taxonomy of relational patterns with semantic types. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1135–1145. Association for Computational Linguistics (2012)

    Google Scholar 

  22. Pujara, J., Miao, H., Getoor, L., Cohen, W.: Knowledge graph identification. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 542–557. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_34

    CrossRef  Google Scholar 

  23. Pujara, J., Miao, H., Getoor, L., Cohen, W.W.: Using semantics and statistics to turn data into knowledge. AI Mag. 36(1), 65–74 (2015)

    CrossRef  Google Scholar 

  24. Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706. ACM (2007)

    Google Scholar 

  25. Zaveri, A., Kontokostas, D., Sherif, M.A., Bühmann, L., Morsey, M., Auer, S., Lehmann, J.: User-driven quality evaluation of DBpedia. In: Proceedings of the 9th International Conference on Semantic Systems, pp. 97–104. ACM (2013)

    Google Scholar 

  26. Zhao, X., Xing, Z., Kabir, M.A., Sawada, N., Li, J., Lin, S.W.: HDSKG: harvesting domain specific knowledge graph from content of webpages. In: 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 56–67. IEEE (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Balaji Vasan Srinivasan .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 419 KB)

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Sancheti, A., Maheshwari, P., Chaturvedi, R., Monsy, A.V., Goyal, T., Srinivasan, B.V. (2018). Harvesting Knowledge from Cultural Heritage Artifacts in Museums of India. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10938. Springer, Cham. https://doi.org/10.1007/978-3-319-93037-4_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93037-4_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93036-7

  • Online ISBN: 978-3-319-93037-4

  • eBook Packages: Computer ScienceComputer Science (R0)