CALVADOS: A Tool for the Semantic Analysis and Digestion of Web Contents

  • GovindEmail author
  • Amit Kumar
  • Céline Alec
  • Marc Spaniol
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11762)


Web users these days are confronted with an abundance of information. While this is clearly beneficial in general, there is a risk of “information overload”. To this end, there is an increasing need of filtering, classifying and/or summarizing Web contents automatically. In order to help consumers in efficiently deriving the semantics from Web contents, we have developed the CALVADOS (Content AnaLytics ViA Digestion Of Semantics) system. To this end, CALVADOS raises contents to the entity-level and digests its inherent semantics. In this demo, we present how entity-level analytics can be employed to automatically classify the main topic of a Web content and reveal the semantic building blocks associated with the corresponding document.


Entity-level web analytics Semantic content digestion Web semantics Analytics interface 



This work was supported by the RIN RECHERCHE Normandie Digitale research project ASTURIAS contract no. 18E01661. We thank our colleagues for inspiring discussions.


  1. 1.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: ISWC/ASWC, pp. 722–735 (2007)Google Scholar
  2. 2.
    Dubey, M., Banerjee, D., Chaudhuri, D., Lehmann, J.: EARL: joint entity and relation linking for question answering over knowledge graphs. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 108–126. Springer, Cham (2018). Scholar
  3. 3.
    Govind, Alec, C., Spaniol, M.: Semantic fingerprinting: a novel method for entity-level content classification. In: Mikkonen, T., Klamma, R., Hernández, J. (eds.) ICWE 2018. Lecture Notes in Computer Science. Springer, Cham (2018). Scholar
  4. 4.
    Govind, Alec, C., Spaniol, M.: Fine-grained web content classification via entity-level analytics: the case of semantic fingerprinting. J. Web Eng. (JWE) 17(6&7), 449–482 (2019)Google Scholar
  5. 5.
    Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194, 28–61 (2013)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Hoffart, J., et al.: Robust disambiguation of named entities in text. In: Conference on EMNLP, Edinburgh, Scotland, UK, pp. 782–792 (2011)Google Scholar
  7. 7.
    Jindal, R., Malhotra, R., Jain, A.: Techniques for text classification: literature review and current trends. Webology 12, 1–28 (2015)Google Scholar
  8. 8.
    Johnson, R., Zhang, T.: Effective use of word order for text categorization with convolutional neural networks. CoRR abs/1412.1058 (2014)Google Scholar
  9. 9.
    Medeiros, J.F., Pereira Nunes, B., Siqueira, S.W.M., Portes Paes Leme, L.A.: TagTheWeb: using wikipedia categories to automatically categorize resources on the web. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 11155, pp. 153–157. Springer, Cham (2018). Scholar
  10. 10.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: I-Semantics 2011, pp. 1–8. ACM, New York (2011)Google Scholar
  11. 11.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: NAACL-HLT, pp. 1480–1489 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversité de Caen NormandieCaen CedexFrance

Personalised recommendations