Journal on Data Semantics

, Volume 3, Issue 1, pp 47–73 | Cite as

Assessing and Improving the Quality of SKOS Vocabularies

Original Article


Controlled vocabularies are increasingly made available on the Web of Data using the Simple Knowledge Organization System (SKOS) ontology. Assessment of vocabulary quality is important for determining the suitability of vocabularies for reuse in applications and for improving vocabulary development processes. We define 26 quality issues, i.e., computable functions that expose potential quality problems. In an analysis of a representative set of 24 SKOS vocabularies, we found all of them to contain structural errors and/or other quality problems. We propose a set of correction heuristics which we have used to automatically correct a significant proportion of the identified problems. Our reference implementations of these methods, the quality assessment tool qSKOS and the quality improvement tool Skosify, are available for reuse as open-source software.


Controlled vocabularies Linked Data Semantic Web Quality assessment Data quality 



We thank Eero Hyvönen, Jouni Tuominen, and Miika Alonen for giving insightful comments and support; Andreas Blumauer and Alexander Kreiser for technical assistance with the PoolParty checker; and Andrew Gibson and Tom Dent for providing RDF dumps of the Peroxisome Knowledge Base and the Integrated Public Sector Vocabulary. The work is supported by the FWF P21571 Meketre project, the National Semantic Web Ontology project in Finland FinnONTO (2003–2012), and the Linked Data Finland project (2012–2014).


  1. 1.
    ISO 25964–1 (2011) Information and documentation—Thesauri and interoperability with other vocabularies—Part 1: Thesauri for information retrieval. Norm, International Organization for StandardizationGoogle Scholar
  2. 2.
    Abdul Manaf NA, Bechhofer S, Stevens R (2012) Common modelling slips in SKOS vocabularies. In: Klinov P, Horridge M (eds) Proceedings of OWL: experiences and directions workshop (OWLED 2012), CEUR Workshop Proceedings, vol 849.
  3. 3.
    Abdul Manaf NA, Bechhofer S, Stevens R (2012) The current state of SKOS vocabularies on the Web. In: Simperl E, Cimiano P, Polleres A, Corcho O, Presutti V (eds) Proceedings of the 9th extended semantic web conference (ESWC 2012), Lecture notes in computer science, vol 7295. Springer, Berlin, pp 270–284Google Scholar
  4. 4.
    Aitchison J, Gilchrist A, Bawden D (2000) Thesaurus construction and use: a practical manual. Aslib IMI, LondonGoogle Scholar
  5. 5.
    Allemang D, Hendler J (2011) Semantic web for the working ontologist: effective modeling in RDFS and OWL. Morgan Kaufmann, Los AltosGoogle Scholar
  6. 6.
    van Assem M, Malaisé V, Miles A, Schreiber G (2006) A method to convert thesauri to SKOS. In: Sure Y, Domingue J (eds) Proceedings of the third European semantic web conference (ESWC’06). Lecture notes in computer science, vol 4011. Springer, Berlin, pp 95–109Google Scholar
  7. 7.
    Batini C, Cappiello C, Francalanci C, Maurino A (2009) Methodologies for data quality assessment and improvement. ACM Comput Surv 41(3):16CrossRefGoogle Scholar
  8. 8.
    Berrueta D, Fernández S, Frade I (2008) Cooking HTTP content negotiation with Vapour. In: Bizer C, Auer S, Aastrand Grimnes G, Heath T (eds) Proceedings of the 4th workshop on scripting for the semantic web (SFSW 2008). CEUR Workshop Proceedings, vol 368.
  9. 9.
    Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia—a crystallization point for the web of data. Web Semant Sci Serv Agents World Wide Web 7(3):154–165. doi: 10.1016/j.websem.2009.07.002 CrossRefGoogle Scholar
  10. 10.
    Borst T, Fingerle B, Neubert J, Seiler A (2010) How do libraries find their way onto the semantic web? Liber Q 19(3/4)Google Scholar
  11. 11.
    Byrne G, Goddard L. The strongest link: libraries and linked data. D-Lib Magazine 16(11/12) (2010). doi: 10.1045/november2010-byrne
  12. 12.
    de Coronado S, Wright LW, Fragoso G, Haber MW, Hahn-Dantona EA, Hartel FW, Quan SL, Safran T, Thomas N, Whiteman L (2009) The NCI thesaurus quality assurance life cycle. J Biomed Inform 42(3):530–539CrossRefGoogle Scholar
  13. 13.
    Ding L, Finin T (2006) Characterizing the semantic web on the web. Electr Eng 4273(August):5–9Google Scholar
  14. 14.
    Fürber C, Hepp M (2010) Using semantic web resources for data quality management. In: Proceedings of the 17th international conference on knowledge engineering and management by the masses (EKAW 2010). Lecture notes in computer science, vol 6317. Springer, Berlin, pp 211–225Google Scholar
  15. 15.
    Harpring P (2010) Introduction to controlled vocabularies: terminology for art, architecture, and other cultural works. Getty Publications, Los AngelesGoogle Scholar
  16. 16.
    Heath T, Bizer C (2011) Linked data: evolving the web into a global data space. Morgan & Claypool, San Rafael.
  17. 17.
    Hedden H (2010) The accidental taxonomist. Inf TodayGoogle Scholar
  18. 18.
    Hogan A, Harth A, Passant A, Decker S, Polleres A (2010) Weaving the pedantic web. In: Bizer C, Heath T, Berners-Lee T, Hausenblas M (eds) Proceedings of WWW2010 workshop on linked data on the web (LDOW 2010). CEUR Workshop Proceedings, vol 628.
  19. 19.
    Hogan A, Umbrich J, Harth A, Cyganiak R, Polleres A, Decker S (2012) An empirical survey of linked data conformance. Web Semant Sci Serv Agents World Wide Web 14:14–44CrossRefGoogle Scholar
  20. 20.
    Hopcroft JE, Tarjan RE (1973) Algorithm 447: efficient algorithms for graph manipulation. Commun ACM 16(6):372–378CrossRefGoogle Scholar
  21. 21.
    Horridge M, Parsia B, Sattler U (2009) Explaining inconsistencies in OWL ontologies. In: Godo L, Pugliese A (eds) Proceedings of the 3rd international conference on scalable uncertainty management (SUM ’09). Lecture notes in computer science, vol 5785. Springer, Berlin, pp 124–137. doi: 10.1007/978-3-642-04388-8_11
  22. 22.
    Isaac A, Summers E (2009) SKOS Simple knowledge organization system primer. Working Group Note, W3C.
  23. 23.
    Kalyanpur A (2006) Debugging and repair of OWL ontologies. Ph.D. thesis, University of Maryland, College Park, MD, USAGoogle Scholar
  24. 24.
    Kless D, Milton S (2010) Towards quality measures for evaluating thesauri. In: Sánchez-Alonso S, Athanasiadis I (eds) Proceedings of the 4th metadata and semantics research conference (MTSR 2010) Communications in computer and information science, vol 108. Springer, Berlin, pp 312–319. doi: 10.1007/978-3-642-16552-8_28
  25. 25.
    Mader C, Haslhofer B (2011) Quality criteria for controlled web vocabularies. In: Proceedings of the 10th European networked knowledge organisation systems workshop (NKOS 2011). Berlin, Germany.
  26. 26.
    Mader C, Haslhofer B, Isaac A (2012) Finding quality issues in SKOS vocabularies. In: Zaphiris P, Buchanan G, Rasmussen E, Loizides F (eds) Proceedings of the second international conference on theory and practice of digital libraries (TPDL 2012). Lecture notes in computer science, vol 7489. Springer, Berlin, pp 222–233. doi: 10.1007/978-3-642-33290-6_25
  27. 27.
    Malmsten M (2008) Making a library catalogue part of the semantic web. In: Greenberg J, Klas W (eds) Metadata for semantic and social applications. Proceedings of the international conference on Dublin core and metadata applications (DC-2008). Universitätsverlag Göttingen, Göttingen, Germany, pp 146–152Google Scholar
  28. 28.
    Miles A, Bechhofer S (2009) SKOS simple knowledge organization system reference. Recommendation, W3C.
  29. 29.
    Miles A, Rogers N, Beckett D (2004) Migrating thesauri to the semantic web—guidelines and case studies for generating RDF encodings of existing thesauri. SWAD-Europe project deliverable 8.8, SWAD-Europe.
  30. 30.
    Mougin F, Bodenreider O (2005) Approaches to eliminating cycles in the UMLS Metathesaurus: naïve vs. formal. In: Proceedings of the AMIA annual symposium, vol 2005. American Medical Informatics Association, pp 550–554Google Scholar
  31. 31.
    Nagy H, Pellegrini T, Mader C (2011) Exploring structural differences in thesauri for SKOS-based applications. In: Chidini C, Ngonga Ngomo Ac, Lindstaedt S, Pellegrini T (eds) Proceedings of the 7th international conference on semantic systems (I-Semantics ’11), New York, pp 187–190. doi: 10.1145/2063518.2063546
  32. 32.
    Neubert J (2009) Bringing the “Thesaurus for Economics” on to the web of linked data. In: Proceedings of the WWW2009 workshop on linked data on the web (LDOW2009). CEUR Workshop Proceedings, vol 538.
  33. 33.
    NISO (2005) ANSI/NISO Z39.19—guidelines for the construction, format, and management of monolingual controlled vocabularies. Standard, National Information Standards OrganizationGoogle Scholar
  34. 34.
    Ovchinnikova E, Wandmacher T, Kühnberger K (2007) Solving terminological inconsistency problems in ontology design. Int J Interoperabil Bus Inf Syst 2(1):65–80Google Scholar
  35. 35.
    Pipino L, Lee Y, Wang R (2002) Data quality assessment. Commun ACM 45(4):211–218CrossRefGoogle Scholar
  36. 36.
    Popitsch NP, Haslhofer B (2010) DSNotify: handling broken links in the web of data. In: Proceedings of the 19th international conference on World Wide Web (WWW 2010). ACM, New York, pp 761–770. doi: 10.1145/1772690.1772768
  37. 37.
    Poveda-Villalón M, Suárez-Figueroa M, Gómez-Pérez A (2012) Validating ontologies with OOPS! In: Teije A, Völker J, Handschuh S, Stuckenschmidt H, d’Aquin M, Nikolov A, Aussenac-Gilles N, Hernandez N (eds) Proceedings of the 18th international conference on knowledge engineering and knowledge management (EKAW 2012). Lecture notes in computer science, vol 7603. Springer, Berlin, pp 267–281. doi: 10.1007/978-3-642-33876-2_24
  38. 38.
    Schandl T, Blumauer A (2010) PoolParty: SKOS thesaurus management utilizing linked data. In: Aroyo L, Antoniou G, Hyvönen E, ten Teije A, Stuckenschmidt H, Cabral L, Tudorache T (eds) Proceedings of the 7th extended semantic web conference (ESWC2010). Lecture notes in computer science, vol 6088. Springer, Berlin, pp 421–425Google Scholar
  39. 39.
    Soergel D (2002) Thesauri and ontologies in digital libraries: tutorial. In: Proceedings of the 2nd ACM/IEEE-CS joint conference on digital libraries (JCDL 2002). ACM, New York, p 415Google Scholar
  40. 40.
    Summers E, Isaac A, Redding C, Krech D (2008) LCSH, SKOS and Linked Data. In: Greenberg J, Klas W (eds) Metadata for semantic and social applications. Proceedings of the International Conference on Dublin Core and Metadata Applications (DC-2008). Universitätsverlag Göttingen, Göttingen, pp 25–33Google Scholar
  41. 41.
    Suominen O, Hyvönen E (2012) Improving the quality of SKOS vocabularies with Skosify. In: Aroyo L, Antoniou G, Hyvönen E, ten Teije A, Stuckenschmidt H, Cabral L, Tudorache T (eds) Proceedings of the 18th international conference on knowledge engineering and knowledge management, (EKAW 2012). Lecture notes in computer science, vol 7603. Springer, Berlin, pp 383–397. doi: 10.1007/978-3-642-33876-2_34
  42. 42.
    Svenonius E (1997) Definitional approaches in the design of classification and thesauri and their implications for retrieval and for automatic classification. In: Knowledge Organization for Information Retrieval: Proceedings of the 6th international study conference on classification research. International Federation for information and documentation, pp 12–16Google Scholar
  43. 43.
    Tuominen J, Frosterus M, Viljanen K, Hyvönen E (2009) ONKI SKOS server for publishing and utilizing SKOS vocabularies and ontologies as services. In: Aroyo L, Traverso P, Ciravegna F, Cimiano P, Heath T, Hyvönen E, Mizoguchi R, Oren E, Sabou M, Simperl E (eds) Proceedings of the 6th European semantic web conference (ESWC 2009). Lecture notes in computer science, vol 5554. Springer, Berlin, pp 768–780Google Scholar
  44. 44.
    Vrandecic D (2010) Ontology evaluation. Ph.D. thesis, KIT, Fakultät für Wirtschaftswissenschaften, KarlsruheGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Semantic Computing Research Group, Department of Media TechnologyAalto UniversityEspooFinland
  2. 2.Multimedia Information Systems Group, Faculty of Computer ScienceUniversity of ViennaViennaAustria

Personalised recommendations