Semantic Fingerprinting: A Novel Method for Entity-Level Content Classification

  • GovindEmail author
  • Céline Alec
  • Marc Spaniol
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10845)


With the constantly growing Web, there is a need for automatically analyzing, interpreting and organizing contents. A particular need is given by the management of Web contents with respect to classification systems, e.g. based on ontologies in the LOD (Linked Open Data) cloud. Research in deep learning recently has shown great progress in classifying data based on large volumes of training data. However, “targeted” and fine-grained information systems require classification methods based on a relatively small number of “representative” samples. For that purpose, we present an approach that allows a semantic exploitation of Web contents and - at the same time - computationally efficient processing based on “Semantic Fingerprinting”. To this end, we raise Web contents to the entity-level and exploit entity-related information that allows “distillation” and fine-grained classification of the Web content by its “semantic fingerprint”. In experimental results on Web contents classified in Wikipedia, we show the superiority of our approach against state-of-the-art methods.


Entity-level web analytics Semantically-enriched web content classification Web semantics 


  1. 1.
    Elberrichi, Z., Rahmoun, A., Bentaallah, M.A.: Using WordNet for text categorization. Int. Arab J. Inf. Technol. 5, 16–24 (2008)Google Scholar
  2. 2.
    Firth, J.: A Synopsis of Linguistic Theory, 1930–1955 (1957)Google Scholar
  3. 3.
    Fleischman, M., Hovy, E.: Fine grained classification of named entities. In: Proceedings of COLING 2002, pp. 1–7. ACL (2002)Google Scholar
  4. 4.
    Hoffart, J., Milchevski, D., Weikum, G.: STICS: searching with strings, things, and cats. In: Proceedings of SIGIR 2014, pp. 1247–1248. ACM (2014)Google Scholar
  5. 5.
    Hoffart, J., et al.: YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2013)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Hoffart, J., et al.: Robust disambiguation of named entities in text. In: Proceedings of EMNLP 2011, pp. 782–792. ACL (2011)Google Scholar
  7. 7.
    Hotho, A., Staab, S., Stumme, G.: Ontologies improve text document clustering. In: Proceedings of ICDM 2003, p. 541. IEEE Computer Society (2003)Google Scholar
  8. 8.
    Joachims, T.: Text categorization with Support Vector Machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). Scholar
  9. 9.
    Johnson, R., Zhang, T.: Effective Use of Word Order for Text Categorization with Convolutional Neural Networks. CoRR, abs/1412.1058 (2014)Google Scholar
  10. 10.
    Ling, X., Weld, D.S.: Fine-grained entity recognition. In: Proceedings of AAAI 2012, pp. 94–100. AAAI Press (2012)Google Scholar
  11. 11.
    Manning, C.D., et al.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefGoogle Scholar
  12. 12.
    Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  13. 13.
    Rahman, A., Ng, V.: Inducing fine-grained semantic classes via hierarchical and collective classification. In: Proceedings of COLING 2010, pp. 931–939. ACL (2010)Google Scholar
  14. 14.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)CrossRefGoogle Scholar
  15. 15.
    Song, Y., et al.: Short text conceptualization using a probabilistic knowledgebase. In: Proceedings of IJCAI 2011, pp. 2330–2336. AAAI Press (2011)Google Scholar
  16. 16.
    Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using Wikipedia. In: Proceedings of AAAI 2006, pp. 1419–1424. AAAI Press (2006)Google Scholar
  17. 17.
    Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge - unifying WordNet and Wikipedia. In: Proceedings of WWW 2007, pp. 697–706. ACM (2007)Google Scholar
  18. 18.
    Wang, C., et al.: Text classification with heterogeneous information network kernels. In: AAAI, pp. 2130–2136 (2016)Google Scholar
  19. 19.
    Yang, Z., et al.: Hierarchical attention networks for document classification. In: Proceedings of NAACL HLT 2016, pp. 1480–1489 (2016)Google Scholar
  20. 20.
    Yosef, M.A., et al. HYENA: Hierarchical type classification for entity names. In: Proceedings of COLING 2012, pp. 1361–1370. ACL (2012)Google Scholar
  21. 21.
    Yosef, M.A., et al.: HYENA-live: fine-grained online entity type classification from natural-language text. In: Proceedings of ACL 2013, pp. 133–138. ACL (2013)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversité de Caen NormandieCaen CedexFrance

Personalised recommendations