Finding Quality Issues in SKOS Vocabularies

  • Christian Mader
  • Bernhard Haslhofer
  • Antoine Isaac
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7489)

Abstract

The Simple Knowledge Organization System (SKOS) is a standard model for controlled vocabularies on the Web. However, SKOS vocabularies often differ in terms of quality, which reduces their applicability across system boundaries. Here we investigate how we can support taxonomists in improving SKOS vocabularies by pointing out quality issues that go beyond the integrity constraints defined in the SKOS specification. We identified potential quantifiable quality issues and formalized them into computable quality checking functions that can find affected resources in a given SKOS vocabulary. We implemented these functions in the qSKOS quality assessment tool, analyzed 15 existing vocabularies, and found possible quality issues in all of them.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    ISO 25964-1: Information and documentation – Thesauri and interoperability with other vocabularies – Part 1: Thesauri for information retrieval. Norm, International Organization for Standardization (2011)Google Scholar
  2. 2.
    Aitchison, J., Gilchrist, A., Bawden, D.: Thesaurus construction and use: a practical manual. Aslib IMI (2000)Google Scholar
  3. 3.
    Allemang, D., Hendler, J.: Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL. Morgan Kaufmann (2011)Google Scholar
  4. 4.
    Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Computing Surveys 41(3), 16 (2009)CrossRefGoogle Scholar
  5. 5.
    de Coronado, S., Wright, L.W., Fragoso, G., Haber, M.W., Hahn-Dantona, E.A., Hartel, F.W., Quan, S.L., Safran, T., Thomas, N., Whiteman, L.: The NCI Thesaurus quality assurance life cycle. J. Biomed. Inform. 42(3), 530–539 (2009)CrossRefGoogle Scholar
  6. 6.
    Harpring, P.: Introduction to Controlled Vocabularies: Terminology for Art, Architecture, and Other Cultural Works. Getty Publications, Los Angeles (2010)Google Scholar
  7. 7.
    Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Morgan & Claypool (2011), http://linkeddatabook.com/
  8. 8.
    Hedden, H.: The accidental taxonomist. Information Today (2010)Google Scholar
  9. 9.
    Hogan, A., Harth, A., Passant, A., Decker, S., Polleres, A.: Weaving the pedantic web. In: Proc. WWW 2010 Workshop on Linked Data on the Web, LDOW (2010)Google Scholar
  10. 10.
    Hopcroft, J.E., Tarjan, R.E.: Algorithm 447: efficient algorithms for graph manipulation. Commun. ACM 16(6), 372–378 (1973)CrossRefGoogle Scholar
  11. 11.
    Isaac, A., Summers, E.: SKOS Simple Knowledge Organization System Primer. Working Group Note, W3C (2009), http://www.w3.org/TR/skos-primer/
  12. 12.
    Kless, D., Milton, S.: Towards quality measures for evaluating thesauri. Metadata and Semantic Research, 312–319 (2010)Google Scholar
  13. 13.
    Miles, A., Bechhofer, S.: SKOS Simple Knowledge Organization System Reference, W3C Recommendation (2009), http://www.w3.org/TR/skos-reference/
  14. 14.
    Nagy, H., Pellegrini, T., Mader, C.: Exploring structural differences in thesauri for SKOS-based applications. In: I-Semantics 2011, pp. 187–190. ACM (2011)Google Scholar
  15. 15.
    NISO: ANSI/NISO Z39.19 - Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies (2005)Google Scholar
  16. 16.
    Pipino, L., Lee, Y., Wang, R.: Data quality assessment. Commun. ACM 45(4), 211–218 (2002)CrossRefGoogle Scholar
  17. 17.
    Popitsch, N.P., Haslhofer, B.: DSNotify: handling broken links in the web of data. In: Proc. 19th Int. Conf. World Wide Web, WWW 2010, pp. 761–770 (2010)Google Scholar
  18. 18.
    Soergel, D.: Thesauri and ontologies in digital libraries: tutorial. In: Proc. 2nd Joint Conf. on Digital libraries, JCDL (2002)Google Scholar
  19. 19.
    Spero, S.: LCSH is to Thesaurus as Doorbell is to Mammal: Visualizing Structural Problems in the Library of Congress Subject Headings. In: Proc. Int. Conf. on Dublin Core and Metadata Applications, DC (2008)Google Scholar
  20. 20.
    Svenonius, E.: Definitional approaches in the design of classification and thesauri and their implications for retrieval and for automatic classification. In: Proc. Int. Study Conference on Classification Research, pp. 12–16 (1997)Google Scholar
  21. 21.
    Svenonius, E.: Design of controlled vocabularies. Encyclopedia of Library and Information Science 45, 822–838 (2003)Google Scholar
  22. 22.
    van Assem, M., Malaisé, V., Miles, A., Schreiber, G.: A Method to Convert Thesauri to SKOS. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, pp. 95–109. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  23. 23.
    Vrandecic, D.: Ontology Evaluation. Ph.D. thesis, KIT, Fakultät für Wirtschaftswissenschaften, Karlsruhe (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Christian Mader
    • 1
  • Bernhard Haslhofer
    • 2
  • Antoine Isaac
    • 3
  1. 1.Faculty of Computer ScienceUniversity of ViennaAustria
  2. 2.Department of Information ScienceCornell UniversityUSA
  3. 3.Europeana & Vrije Universiteit AmsterdamThe Netherlands

Personalised recommendations