Enhancing the Quality of Open Data

  • Kieron O’Hara
Part of the Synthese Library book series (SYLI, volume 358)


This paper looks at some of the quality issues relating to open data. This is problematic because of an open-data specific paradox: most metrics of quality are user-relative, but open data are aimed at no specific user and are simply available online under an open licence, so there is no user to be relevant to. Nevertheless, it is argued that opening data to scrutiny can improve quality by building feedback into the data production process, although much depends on the context of publication. The paper discusses various heuristics for addressing quality, and also looks at institutional approaches. Furthermore, if the open data can be published in linkable or bookmarkable form using Semantic Web technologies, that will provide further mechanisms to improve quality.


Open Data Primary User Data Provider Open Licence Open Government Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work is supported by SOCIAM: The Theory and Practice of Social Machines, funded by the UK Engineering and Physical Sciences Research Council (EPSRC) under grant number EP/J017728/1. Thanks are owing to audiences at a number of talks and conferences, including the Information Quality Symposium at the AISB/IACAP World Congress 2012, Birmingham, July 2012.


  1. Alani, H., Hall, W., O’Hara, K., Shadbolt, N., Szomszor, M., & Chandler, P. (2008). Building a pragmatic semantic web. IEEE Intelligent Systems, 23(3), 61–68.CrossRefGoogle Scholar
  2. Ayres, I. (2007). Super crunchers: How anything can be predicted. London: John Murray.Google Scholar
  3. Bennett D., & Harvey, A. (2009). Publishing open government data. World Wide Web Consortium.
  4. Berners-Lee, T. (2010). Linked data. World Wide Web Consortium.
  5. Gil, Y., et al. (2012). PROV model primer. World Wide Web Consortium.
  6. Hayek, F. A. (1945). The use of knowledge in society. American Economic Review, 35(4), 519–530.Google Scholar
  7. Hendler, J., Holm, J., Musialek, C., & Thomas, G. (2012). US government linked open data: IEEE Intelligent Systems, 27(3), 25–31.CrossRefGoogle Scholar
  8. Information Commissioner’s Office. (2010). Crime mapping and geo-spatial crime data: Privacy and transparency principles. Information Commissioner’s Office.
  9. Khan, B. K., Strong, D. M., & Yang, R. Y. (2002). Information quality benchmarks: Product and service performance. Communications of the ACM, 45(4), 184–192.CrossRefGoogle Scholar
  10. Lenat, D., & Guha, R. V. (1990). Building large knowledge based systems: Representation and inference in the CYC project. Reading: Addison-Wesley.Google Scholar
  11. Manyika, J., et al. (2011). Big data: The next frontier for innovation, competition and productivity. Washington, DC: McKinsey Global Institute.Google Scholar
  12. Murray-Rust, P. (2008). Open data in science. Serials Review, 34(1), 52–64.CrossRefGoogle Scholar
  13. O’Hara, K. (2011). Transparent government, not transparent citizens: A report on privacy and transparency for the Cabinet Office. Cabinet Office.
  14. O’Hara, K. (2012a). Data quality, government data and the open data infosphere. In AISB/IACAP world congress 2012: Information quality symposium. Birmingham: The Society for the Study of Artificial Intelligence and Simulation of Behaviour.
  15. O’Hara, K. (2012b). Transparency, open data and trust in government: Shaping the infosphere. In ACM conference on Web Science (WebSci2012) (pp. 223–232). Evanston: ACM, New York.Google Scholar
  16. Shadbolt, N., & O’Hara, K. (2013). Linked data in government. IEEE Internet Computing, 17(4), 72–77.CrossRefGoogle Scholar
  17. Shadbolt, N., Berners-Lee, T., & Hall, W. (2006). The semantic web revisited. IEEE Intelligent Systems, 21(3), 96–101.CrossRefGoogle Scholar
  18. Shadbolt, N., O’Hara, K., Berners-Lee, T., Gibbins, N., Glaser, H., Hall, W., & schraefel, m. c. (2012). Linked open government data: Lessons from IEEE Intelligent Systems, 27(3), 16–24.CrossRefGoogle Scholar
  19. Tullo, C. (2011). Online access to UK legislation: strategy and structure. In M. A. Biasiotti & S. Faro (Eds.), From information to knowledge (Frontiers in artificial intelligence and applications, Vol. 236, pp. 21–32). Amsterdam: Ios Press.Google Scholar
  20. Wind-Cowie, M., & Lekhi, R. (2012). The data dividend. London: Demos.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Electronics and Computer ScienceUniversity of SouthamptonSouthamptonUK

Personalised recommendations