Advertisement

PageRank and Generic Entity Summarization for RDF Knowledge Bases

  • Dennis Diefenbach
  • Andreas Thalhammer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10843)

Abstract

Ranking and entity summarization are operations that are tightly connected and recurrent in many different domains. Possible application fields include information retrieval, question answering, named entity disambiguation, co-reference resolution, and natural language generation. Still, the use of these techniques is limited because there are few accessible resources. PageRank computations are resource-intensive and entity summarization is a complex research field in itself.

We present two generic and highly re-usable resources for RDF knowledge bases: a component for PageRank-based ranking and a component for entity summarization. The two components, namely PageRankRDF and summaServer, are provided in form of open source code along with example datasets and deployments. In addition, this work outlines the application of the components for PageRank-based RDF ranking and entity summarization in the question answering project WDAqua.

Keywords

RDF Ranking PageRank Entity summarization Question answering Linked data 

Notes

Acknowledgments

Parts of this work received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sksłodowska-Curie grant agreement No. 642795, project: Answering Questions using Web Data (WDAqua).

References

  1. 1.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-76298-0_52CrossRefGoogle Scholar
  2. 2.
    Beek, W., Rietveld, L., Bazoobandi, H.R., Wielemaker, J., Schlobach, S.: LOD laundromat: a uniform way of publishing other people’s dirty data. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 213–228. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11964-9_14CrossRefGoogle Scholar
  3. 3.
    Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250. SIGMOD 2008. ACM, New York (2008). http://dx.doi.org/10.1145/1376616.1376746
  4. 4.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the Seventh International Conference on World Wide Web 7, WWW7, pp. 107–117. Elsevier Science Publishers B.V., Amsterdam (1998). http://dx.doi.org/10.1016/S0169-7552(98)00110-XCrossRefGoogle Scholar
  5. 5.
    Diefenbach, D., Amjad, S., Both, A., Singh, K., Maret, P.: Trill: a reusable front-end for QA systems. In: Blomqvist, E., Hose, K., Paulheim, H., Ławrynowicz, A., Ciravegna, F., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10577, pp. 48–53. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-70407-4_10CrossRefGoogle Scholar
  6. 6.
    Diefenbach, D., Singh, K., Maret, P.: WDAqua-core0: a question answering component for the research community. In: Dragoni, M., Solanki, M., Blomqvist, E. (eds.) SemWebEval 2017. CCIS, vol. 769, pp. 84–89. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-69146-6_8CrossRefGoogle Scholar
  7. 7.
    Diefenbach, D., Both, A., Singh, K., Maret, P.: Towards a question answering system over the semantic web (2018). arXiv:1803.00832
  8. 8.
    Fernández, J.D., Martínez-Prieto, M.A., Gutiérrez, C., Polleres, A., Arias, M.: Binary RDF representation for publication and exchange (HDT). Web Semant.: Sci. Serv. Agents World Wide Web 19, 22–41 (2013). http://dx.doi.org/10.1016/j.websem.2013.01.002CrossRefGoogle Scholar
  9. 9.
    Lange, C., Shekarpour, S., Auer, S.: The WDAqua ITN: answering questions using web data. In: EU Project Networking Session at ESWC (2015)Google Scholar
  10. 10.
    Ngomo, A.C.N., Hoffmann, M., Usbeck, R., Jha, K.: Holistic and Scalable ranking of RDF data. In: 2017 IEEE International Conference on Big Data (2017). to appearGoogle Scholar
  11. 11.
    Pouriyeh, S., Allahyari, M., Kochut, K., Cheng, G., Arabnia, H.R.: ES-LDA: entity summarization using knowledge-based topic modeling. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 316–325. Asian Federation of Natural Language Processing (2017). http://aclweb.org/anthology/I17-1032
  12. 12.
    Roa-Valverde, A.J., Sicilia, M.A.: A survey of approaches for ranking on the web of data. Information Retrieval 17(4), 295–325 (2014). http://dx.doi.org/10.1007/s10791-014-9240-0CrossRefGoogle Scholar
  13. 13.
    Roa-Valverde, A.J., Thalhammer, A., Toma, I., Sicilia, M.A.: Towards a formal model for sharing and reusing ranking computations. In: Proceedings of the 6th International Workshop on Ranking in Databases (DBRank 2012) held in conjunction with the 38th Conference on Very Large Databases (VLDB 2012) (2012). http://www.aifb.kit.edu/web/Inproceedings3537
  14. 14.
    Thalhammer, A.: Linked data entity summarization. Ph.D. thesis, KIT, Fakultät für Wirtschaftswissenschaften, Karlsruhe (2016). http://dx.doi.org/10.5445/IR/1000065395
  15. 15.
    Thalhammer, A., Lasierra, N., Rettinger, A.: LinkSUM: using link analysis to summarize entity data. In: Bozzon, A., Cudre-Maroux, P., Pautasso, C. (eds.) ICWE 2016. LNCS, vol. 9671, pp. 244–261. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-38791-8_14CrossRefGoogle Scholar
  16. 16.
    Thalhammer, A., Rettinger, A.: Browsing DBpedia Entities with Summaries. In: Presutti, V., Blomqvist, E., Troncy, R., Sack, H., Papadakis, I., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8798, pp. 511–515. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11955-7_76CrossRefGoogle Scholar
  17. 17.
    Thalhammer, A., Rettinger, A.: ELES: combining entity linking and entity summarization. In: Bozzon, A., Cudre-Maroux, P., Pautasso, C. (eds.) ICWE 2016. LNCS, vol. 9671, pp. 547–550. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-38791-8_45CrossRefGoogle Scholar
  18. 18.
    Thalhammer, A., Rettinger, A.: PageRank on Wikipedia: towards general importance scores for entities. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 227–240. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-47602-5_41CrossRefGoogle Scholar
  19. 19.
    Thalhammer, A., Stadtmüller, S.: SUMMA: a common API for linked data entity summaries. In: Cimiano, P., Frasincar, F., Houben, G.-J., Schwabe, D. (eds.) ICWE 2015. LNCS, vol. 9114, pp. 430–446. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-19890-3_28CrossRefGoogle Scholar
  20. 20.
    Tristram, F., Walter, S., Cimiano, P., Unger, C.: Weasel: a machine learning based approach to entity linking combining different features. In: Proceedings of 3th International Workshop on NLP and DBpedia, co-located with the 14th International Semantic Web Conference (ISWC 2015), October 11–15, USA (2015). http://nbn-resolving.de/urn:nbn:de:0070-pub-27755352
  21. 21.
    Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014). http://dx.doi.org/10.1145/2629489CrossRefGoogle Scholar
  22. 22.
    Wilkinson, M.D., et al.: The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3 (2016). http://dx.doi.org/10.1038/sdata.2016.18

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Université de Lyon, CNRS UMR 5516 Laboratoire Hubert CurienLyonFrance
  2. 2.Roche Pharma Research and Early Development Informatics, Roche Innovation Center BaselBaselSwitzerland

Personalised recommendations