Managing Information Quality in e-Science Using Semantic Web Technology

  • Alun Preece
  • Binling Jin
  • Edoardo Pignotti
  • Paolo Missier
  • Suzanne Embury
  • David Stead
  • Al Brown
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4011)

Abstract

We outline a framework for managing information quality (IQ) in e-Science, using ontologies, semantic annotation of resources, and data bindings. Scientists define the quality characteristics that are of importance in their particular domain by extending an OWL DL IQ ontology, which classifies and organises these domain-specific quality characteristics within an overall quality management framework. RDF is used to annotate data resources, with reference to IQ indicators defined in the ontology. Data bindings — again defined in RDF — are used to represent mappings between data elements (e.g. defined in XML Schemas) and the IQ ontology. As a practical illustration of our approach, we present a case study from the domain of proteomics.

References

  1. 1.
    Carr, S., Aebersold, R., Baldwin, M., Burlingame, A., Clauser, K., Nesvizhskii, A.: Editorial: The need for guidelines in publication of peptide and protein identification data. Molecular and Cellular Proteomics 3, 531–533 (2004)CrossRefGoogle Scholar
  2. 2.
    Elfeky, M.G., Elmagarmid, A.K., Verykios, V.S.: Tailor: a record linkage tool box. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), San Jose, CA, February 2002. IEEE Computer Society Press, Los Alamitos (2002)Google Scholar
  3. 3.
    English, L.: Improving Data Warehouse and Business Information Quality. Wiley, Chichester (1999)Google Scholar
  4. 4.
    Groth, P., Luck, M., Moreau, L.: Formalising a protocol for recording provenance in Grids. In: Proc. 3th UK e-Science All Hands Meeting, pp. 147–154 (2004)Google Scholar
  5. 5.
    Listgarten, J., Emili, A.: Statistical and computational methods for comparative proteomic profiling using liquid chromatography-tandem mass spectrometry. Molecular & Cellular Proteomics 4(4), 419–434 (2005)CrossRefGoogle Scholar
  6. 6.
    Missier, P., Embury, S., Greenwood, M., Preece, A., Jin, B.: An ontology-based approach to handling information quality in e-science. In: Proc. 4th e-Science All Hands Meeting (2005)Google Scholar
  7. 7.
    Nesvizhskii, A.I., Aebersold, R.: Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem ms. Drug Discovery Today 9(4), 173–181 (2004)CrossRefGoogle Scholar
  8. 8.
    Redman, T.C.: Data quality for the information age. Artech House (1996)Google Scholar
  9. 9.
    Reif, G., Gall, H., Jazayeri, M.: WEESA - web engineering for semantic web applications. In: Proceedings of the 14th International World Wide Web Conference (2005)Google Scholar
  10. 10.
    Scannapieco, M., Missier, P., Batini, C.: Data quality at a glance. Databanken-Spektrum 14, 6–14 (2005)Google Scholar
  11. 11.
    Sharman, N., Alpdemir, N., Ferris, J., Greenwood, M., Li, P., Wroe, C.: The myGrid information model. In: Proc. 3rd e-Science All Hands Meeting (2004)Google Scholar
  12. 12.
    Taylor, C.F., et al.: A systematic approach to modeling, capturing, and disseminating proteomics experimental data. Nature Biotechnology 21(3), 247–254 (2003)CrossRefGoogle Scholar
  13. 13.
    Wang, R., Strong, D.: Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems 12(4), 5–34 (1996)MATHGoogle Scholar
  14. 14.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar
  15. 15.
    Zhao, J., Wroe, C., Goble, C., Stevens, R., Quan, D., Greenwood, M.: Using semantic web technologies for representing e-science provenance. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 92–106. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Alun Preece
    • 1
  • Binling Jin
    • 1
  • Edoardo Pignotti
    • 1
  • Paolo Missier
    • 2
  • Suzanne Embury
    • 2
  • David Stead
    • 3
  • Al Brown
    • 3
  1. 1.Computing ScienceUniversity of AberdeenAberdeenUK
  2. 2.School of Computer ScienceUniversity of ManchesterManchesterUK
  3. 3.Molecular and Cell BiologyUniversity of AberdeenAberdeenUK

Personalised recommendations