Skip to main content
Log in

Assessing and Improving the Quality of SKOS Vocabularies

  • Original Article
  • Published:
Journal on Data Semantics

Abstract

Controlled vocabularies are increasingly made available on the Web of Data using the Simple Knowledge Organization System (SKOS) ontology. Assessment of vocabulary quality is important for determining the suitability of vocabularies for reuse in applications and for improving vocabulary development processes. We define 26 quality issues, i.e., computable functions that expose potential quality problems. In an analysis of a representative set of 24 SKOS vocabularies, we found all of them to contain structural errors and/or other quality problems. We propose a set of correction heuristics which we have used to automatically correct a significant proportion of the identified problems. Our reference implementations of these methods, the quality assessment tool qSKOS and the quality improvement tool Skosify, are available for reuse as open-source software.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://github.com/cmader/qSKOS/

  2. http://code.google.com/p/skosify/

  3. http://demo.semantic-web.at:8080/SkosServices/check

  4. http://www.ontotext.com/owlim

  5. http://www.w3.org/RDF/Validator/

  6. http://spinrdf.org

  7. http://topquadrant.com/products/TB_Composer.html

  8. http://pedantic-web.org

  9. In particular, neither OWL nor OWL 2 include any means to express the integrity condition S14: “A resource has no more than one value of skos:prefLabel per language tag.”

  10. https://github.com/cmader/qskos/wiki

  11. e.g., public-esw-thes@w3.org and public-lod@w3.org.

  12. http://demo.seco.tkk.fi/skosify/

  13. http://wifo5-03.informatik.uni-mannheim.de/lodcloud/state/#domains

  14. http://www.w3.org/2001/sw/wiki/SKOS/Datasets

  15. http://datahub.io/

  16. http://www.peroxisomekb.nl/

  17. The script, sparqldump.py, is included in the Skosify  distribution.

  18. http://jena.apache.org

  19. Missing namespace declarations were added manually for UMBEL. In NYTL, the invalid language tag fr_1793 was manually changed into fr-1793 in order to comply with BCP47 and the Turtle specification. In Reegle, an unparseable line in the original RDF dump was manually removed. For GEMET, the source file containing Arabic labels was excluded as it contained labels with improper Unicode encoding that caused the Jena toolkit to fail in parsing it.

  20. The Turtle files were condensed by removing extra whitespace, including all indentation, and using short 0–2 character namespace prefixes.

  21. Typographical note: words set in typewriter style that do not include a namespace prefix, such as Concept and prefLabel, refer to terms defined by SKOS [28].

  22. http://tools.ietf.org/html/bcp47

  23. http://www.iso.org/iso/language_codes

  24. http://www.w3.org/2009/08/skos-reference/skos.rdf

  25. http://sindice.com/ indexes the Web of Data, which is composed of pages with semantic markup in RDF, RDFa, Microformats or Microdata. Currently, it covers approximately 230 M documents with over 11 billion triples.

  26. http://datahub.io/ is a “community-run catalogue” of currently 5,045 datasets, many of them following the Linked Data guidelines.

  27. See http://www.w3.org/TR/skos-reference/#namespace

  28. http://www.w3.org/DesignIssues/LinkedData.html

  29. http://code.google.com/p/skosify/downloads/list

  30. SKOS-XL is an extension schema to SKOS that enhances the labeling capabilities by treating labels as resources and not as literals.

  31. TheSoz Thesaurus for the Social Sciences, http://datahub.io/dataset/gesis-thesoz

  32. In the most common case, there is only one concept scheme (often the one created in the previous step), and that will be selected as the default concept scheme; otherwise, the default concept scheme will be chosen arbitrarily and a warning message shown by Skosify.

References

  1. ISO 25964–1 (2011) Information and documentation—Thesauri and interoperability with other vocabularies—Part 1: Thesauri for information retrieval. Norm, International Organization for Standardization

  2. Abdul Manaf NA, Bechhofer S, Stevens R (2012) Common modelling slips in SKOS vocabularies. In: Klinov P, Horridge M (eds) Proceedings of OWL: experiences and directions workshop (OWLED 2012), CEUR Workshop Proceedings, vol 849. CEUR-WS.org. http://ceur-ws.org/Vol-849/paper_2.pdf

  3. Abdul Manaf NA, Bechhofer S, Stevens R (2012) The current state of SKOS vocabularies on the Web. In: Simperl E, Cimiano P, Polleres A, Corcho O, Presutti V (eds) Proceedings of the 9th extended semantic web conference (ESWC 2012), Lecture notes in computer science, vol 7295. Springer, Berlin, pp 270–284

  4. Aitchison J, Gilchrist A, Bawden D (2000) Thesaurus construction and use: a practical manual. Aslib IMI, London

  5. Allemang D, Hendler J (2011) Semantic web for the working ontologist: effective modeling in RDFS and OWL. Morgan Kaufmann, Los Altos

  6. van Assem M, Malaisé V, Miles A, Schreiber G (2006) A method to convert thesauri to SKOS. In: Sure Y, Domingue J (eds) Proceedings of the third European semantic web conference (ESWC’06). Lecture notes in computer science, vol 4011. Springer, Berlin, pp 95–109

  7. Batini C, Cappiello C, Francalanci C, Maurino A (2009) Methodologies for data quality assessment and improvement. ACM Comput Surv 41(3):16

    Article  Google Scholar 

  8. Berrueta D, Fernández S, Frade I (2008) Cooking HTTP content negotiation with Vapour. In: Bizer C, Auer S, Aastrand Grimnes G, Heath T (eds) Proceedings of the 4th workshop on scripting for the semantic web (SFSW 2008). CEUR Workshop Proceedings, vol 368. CEUR-WS.org. http://CEUR-WS.org/Vol-368/paper3.pdf

  9. Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia—a crystallization point for the web of data. Web Semant Sci Serv Agents World Wide Web 7(3):154–165. doi:10.1016/j.websem.2009.07.002

    Article  Google Scholar 

  10. Borst T, Fingerle B, Neubert J, Seiler A (2010) How do libraries find their way onto the semantic web? Liber Q 19(3/4)

  11. Byrne G, Goddard L. The strongest link: libraries and linked data. D-Lib Magazine 16(11/12) (2010). doi:10.1045/november2010-byrne

  12. de Coronado S, Wright LW, Fragoso G, Haber MW, Hahn-Dantona EA, Hartel FW, Quan SL, Safran T, Thomas N, Whiteman L (2009) The NCI thesaurus quality assurance life cycle. J Biomed Inform 42(3):530–539

    Article  Google Scholar 

  13. Ding L, Finin T (2006) Characterizing the semantic web on the web. Electr Eng 4273(August):5–9

    Google Scholar 

  14. Fürber C, Hepp M (2010) Using semantic web resources for data quality management. In: Proceedings of the 17th international conference on knowledge engineering and management by the masses (EKAW 2010). Lecture notes in computer science, vol 6317. Springer, Berlin, pp 211–225

  15. Harpring P (2010) Introduction to controlled vocabularies: terminology for art, architecture, and other cultural works. Getty Publications, Los Angeles

    Google Scholar 

  16. Heath T, Bizer C (2011) Linked data: evolving the web into a global data space. Morgan & Claypool, San Rafael. http://linkeddatabook.com

  17. Hedden H (2010) The accidental taxonomist. Inf Today

  18. Hogan A, Harth A, Passant A, Decker S, Polleres A (2010) Weaving the pedantic web. In: Bizer C, Heath T, Berners-Lee T, Hausenblas M (eds) Proceedings of WWW2010 workshop on linked data on the web (LDOW 2010). CEUR Workshop Proceedings, vol 628. EUR-WS.org. http://ceurws.org/Vol-628/ldow2010_paper04.pdf

  19. Hogan A, Umbrich J, Harth A, Cyganiak R, Polleres A, Decker S (2012) An empirical survey of linked data conformance. Web Semant Sci Serv Agents World Wide Web 14:14–44

    Article  Google Scholar 

  20. Hopcroft JE, Tarjan RE (1973) Algorithm 447: efficient algorithms for graph manipulation. Commun ACM 16(6):372–378

    Article  Google Scholar 

  21. Horridge M, Parsia B, Sattler U (2009) Explaining inconsistencies in OWL ontologies. In: Godo L, Pugliese A (eds) Proceedings of the 3rd international conference on scalable uncertainty management (SUM ’09). Lecture notes in computer science, vol 5785. Springer, Berlin, pp 124–137. doi:10.1007/978-3-642-04388-8_11

  22. Isaac A, Summers E (2009) SKOS Simple knowledge organization system primer. Working Group Note, W3C. http://www.w3.org/TR/skos-primer

  23. Kalyanpur A (2006) Debugging and repair of OWL ontologies. Ph.D. thesis, University of Maryland, College Park, MD, USA

  24. Kless D, Milton S (2010) Towards quality measures for evaluating thesauri. In: Sánchez-Alonso S, Athanasiadis I (eds) Proceedings of the 4th metadata and semantics research conference (MTSR 2010) Communications in computer and information science, vol 108. Springer, Berlin, pp 312–319. doi:10.1007/978-3-642-16552-8_28

  25. Mader C, Haslhofer B (2011) Quality criteria for controlled web vocabularies. In: Proceedings of the 10th European networked knowledge organisation systems workshop (NKOS 2011). Berlin, Germany. http://eprints.cs.univie.ac.at/2923

  26. Mader C, Haslhofer B, Isaac A (2012) Finding quality issues in SKOS vocabularies. In: Zaphiris P, Buchanan G, Rasmussen E, Loizides F (eds) Proceedings of the second international conference on theory and practice of digital libraries (TPDL 2012). Lecture notes in computer science, vol 7489. Springer, Berlin, pp 222–233. doi:10.1007/978-3-642-33290-6_25

  27. Malmsten M (2008) Making a library catalogue part of the semantic web. In: Greenberg J, Klas W (eds) Metadata for semantic and social applications. Proceedings of the international conference on Dublin core and metadata applications (DC-2008). Universitätsverlag Göttingen, Göttingen, Germany, pp 146–152

  28. Miles A, Bechhofer S (2009) SKOS simple knowledge organization system reference. Recommendation, W3C. http://www.w3.org/TR/skos-reference

  29. Miles A, Rogers N, Beckett D (2004) Migrating thesauri to the semantic web—guidelines and case studies for generating RDF encodings of existing thesauri. SWAD-Europe project deliverable 8.8, SWAD-Europe. http://www.w3.org/2001/sw/Europe/reports/thes/8.8/

  30. Mougin F, Bodenreider O (2005) Approaches to eliminating cycles in the UMLS Metathesaurus: naïve vs. formal. In: Proceedings of the AMIA annual symposium, vol 2005. American Medical Informatics Association, pp 550–554

  31. Nagy H, Pellegrini T, Mader C (2011) Exploring structural differences in thesauri for SKOS-based applications. In: Chidini C, Ngonga Ngomo Ac, Lindstaedt S, Pellegrini T (eds) Proceedings of the 7th international conference on semantic systems (I-Semantics ’11), New York, pp 187–190. doi:10.1145/2063518.2063546

  32. Neubert J (2009) Bringing the “Thesaurus for Economics” on to the web of linked data. In: Proceedings of the WWW2009 workshop on linked data on the web (LDOW2009). CEUR Workshop Proceedings, vol 538. CEUR-ws.org. http://ceur-ws.org/Vol-538/ldow2009_paper7.pdf

  33. NISO (2005) ANSI/NISO Z39.19—guidelines for the construction, format, and management of monolingual controlled vocabularies. Standard, National Information Standards Organization

  34. Ovchinnikova E, Wandmacher T, Kühnberger K (2007) Solving terminological inconsistency problems in ontology design. Int J Interoperabil Bus Inf Syst 2(1):65–80

    Google Scholar 

  35. Pipino L, Lee Y, Wang R (2002) Data quality assessment. Commun ACM 45(4):211–218

    Article  Google Scholar 

  36. Popitsch NP, Haslhofer B (2010) DSNotify: handling broken links in the web of data. In: Proceedings of the 19th international conference on World Wide Web (WWW 2010). ACM, New York, pp 761–770. doi:10.1145/1772690.1772768

  37. Poveda-Villalón M, Suárez-Figueroa M, Gómez-Pérez A (2012) Validating ontologies with OOPS! In: Teije A, Völker J, Handschuh S, Stuckenschmidt H, d’Aquin M, Nikolov A, Aussenac-Gilles N, Hernandez N (eds) Proceedings of the 18th international conference on knowledge engineering and knowledge management (EKAW 2012). Lecture notes in computer science, vol 7603. Springer, Berlin, pp 267–281. doi:10.1007/978-3-642-33876-2_24

  38. Schandl T, Blumauer A (2010) PoolParty: SKOS thesaurus management utilizing linked data. In: Aroyo L, Antoniou G, Hyvönen E, ten Teije A, Stuckenschmidt H, Cabral L, Tudorache T (eds) Proceedings of the 7th extended semantic web conference (ESWC2010). Lecture notes in computer science, vol 6088. Springer, Berlin, pp 421–425

  39. Soergel D (2002) Thesauri and ontologies in digital libraries: tutorial. In: Proceedings of the 2nd ACM/IEEE-CS joint conference on digital libraries (JCDL 2002). ACM, New York, p 415

  40. Summers E, Isaac A, Redding C, Krech D (2008) LCSH, SKOS and Linked Data. In: Greenberg J, Klas W (eds) Metadata for semantic and social applications. Proceedings of the International Conference on Dublin Core and Metadata Applications (DC-2008). Universitätsverlag Göttingen, Göttingen, pp 25–33

  41. Suominen O, Hyvönen E (2012) Improving the quality of SKOS vocabularies with Skosify. In: Aroyo L, Antoniou G, Hyvönen E, ten Teije A, Stuckenschmidt H, Cabral L, Tudorache T (eds) Proceedings of the 18th international conference on knowledge engineering and knowledge management, (EKAW 2012). Lecture notes in computer science, vol 7603. Springer, Berlin, pp 383–397. doi:10.1007/978-3-642-33876-2_34

  42. Svenonius E (1997) Definitional approaches in the design of classification and thesauri and their implications for retrieval and for automatic classification. In: Knowledge Organization for Information Retrieval: Proceedings of the 6th international study conference on classification research. International Federation for information and documentation, pp 12–16

  43. Tuominen J, Frosterus M, Viljanen K, Hyvönen E (2009) ONKI SKOS server for publishing and utilizing SKOS vocabularies and ontologies as services. In: Aroyo L, Traverso P, Ciravegna F, Cimiano P, Heath T, Hyvönen E, Mizoguchi R, Oren E, Sabou M, Simperl E (eds) Proceedings of the 6th European semantic web conference (ESWC 2009). Lecture notes in computer science, vol 5554. Springer, Berlin, pp 768–780

  44. Vrandecic D (2010) Ontology evaluation. Ph.D. thesis, KIT, Fakultät für Wirtschaftswissenschaften, Karlsruhe

Download references

Acknowledgments

We thank Eero Hyvönen, Jouni Tuominen, and Miika Alonen for giving insightful comments and support; Andreas Blumauer and Alexander Kreiser for technical assistance with the PoolParty checker; and Andrew Gibson and Tom Dent for providing RDF dumps of the Peroxisome Knowledge Base and the Integrated Public Sector Vocabulary. The work is supported by the FWF P21571 Meketre project, the National Semantic Web Ontology project in Finland FinnONTO (2003–2012), and the Linked Data Finland project (2012–2014).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Osma Suominen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Suominen, O., Mader, C. Assessing and Improving the Quality of SKOS Vocabularies. J Data Semant 3, 47–73 (2014). https://doi.org/10.1007/s13740-013-0026-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13740-013-0026-0

Keywords

Navigation