Any Suggestions? Active Schema Support for Structuring Web Information

  • Silviu Homoceanu
  • Felix Geilert
  • Christian Pek
  • Wolf-Tilo Balke
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8422)


Backed up by major Web players is the latest broad initiative for structuring Web information. Unfortunately, a representative analysis on a corpus of 733 million Web documents shows that, a year after its introduction, only 1.56% of documents featured any annotations. A probable reason is that providing annotations is quite tiresome, hindering wide-spread adoption. Here even state-of-the-art tools like Google’s Structured Data Markup Helper offer only limited support. In this paper we propose SASS, a system for automatically finding high quality schema suggestions for page content, to ease the annotation process. SASS intelligently blends supervised machine learning techniques with simple user feedback. Moreover, additional support features for binding attributes to values even further reduces the necessary effort. We show that SASS is superior to current tools for annotations.

Keywords semantic annotation metadata structuring unstructured data 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Berners-Lee, T.: Linked Data. Design issues for the World Wide Web Consortium (2006),
  2. 2.
    Bizer, C., et al.: Linked Data - The Story So Far. Int. J. Semant. Web Inf. Syst. (2009)Google Scholar
  3. 3.
    Cafarella, M.J., et al.: WebTables: Exploring the Power of Tables on the Web. PVLDB (2008)Google Scholar
  4. 4.
    Cafarella, M.J., Etzioni, O.: Navigating Extracted Data with Schema Discovery. Proc. of the 10th Int. Workshop on Web and Databases, WebDB (2007)Google Scholar
  5. 5.
    Finkel, J.R., et al.: Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In: Proc. of Annual Meeting of the Assoc. for Comp. Linguistics, ACL (2005)Google Scholar
  6. 6.
    Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 55, 1 (1997)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Homoceanu, S., Wille, P., Balke, W.-T.: ProSWIP: Property-based Data Access for Semantic Web Interactive Programming. In: Alani, H., et al. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 184–199. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  8. 8.
    Homoceanu, S., et al.: Review Driven Customer Segmentation for Improved E-Shopping Experience. In: Int. Conf. on Web Science, WebSci (2011)Google Scholar
  9. 9.
    Homoceanu, S., et al.: Will I Like It? Providing Product Overviews Based on Opinion Excerpts. IEEE (2011)Google Scholar
  10. 10.
    Homoceanu, S., Balke, W.-T.: A Chip Off the Old Block – Extracting Typical Attributes for Entities based on Family Resemblance (2013) (Under submission),
  11. 11.
    Jain, P., et al.: Contextual ontology alignment of LOD with an upper ontology: A case study with proton. The Semantic Web: Research and Applications (2011)Google Scholar
  12. 12.
    Jain, P., et al.: Ontology Alignment for Linked Open Data. Information. Retrieval. Boston (2010)Google Scholar
  13. 13.
    Khalili, A., Auer, S.: WYSIWYM – Integrated Visualization, Exploration and Authoring of Un-structured and Semantic Content. In: WISE (2013)Google Scholar
  14. 14.
    Norbaitiah, A., Lukose, D.: Enriching Webpages with Semantic Information. In: Proc. Dublin Core and Metadata Applications (2012)Google Scholar
  15. 15.
    Suchanek, F.M., Weikum, G.: YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia. In: WWW (2007)Google Scholar
  16. 16.
    Tversky, A.: Features of similarity. Psychol. Rev. 84, 4 (1977)CrossRefGoogle Scholar
  17. 17.
    Veres, C., Elseth, E.: Schema. org for the Semantic Web with MaDaME. In: Proc. of I-SEMANTICS (2013)Google Scholar
  18. 18.
    Whitelaw, C., Kehlenbeck, A., Petrovic, N., Ungar, L.: Web-scale named entity recognition. In: CIKM (2008)Google Scholar
  19. 19.
    Wittgenstein, L.: Philosophical investigations. The MacMillan Company, New York (1953)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Silviu Homoceanu
    • 1
  • Felix Geilert
    • 1
  • Christian Pek
    • 1
  • Wolf-Tilo Balke
    • 1
  1. 1.IFIS TU BraunschweigBraunschweigGermany

Personalised recommendations