Abstract
We outline a framework for managing information quality (IQ) in e-Science, using ontologies, semantic annotation of resources, and data bindings. Scientists define the quality characteristics that are of importance in their particular domain by extending an OWL DL IQ ontology, which classifies and organises these domain-specific quality characteristics within an overall quality management framework. RDF is used to annotate data resources, with reference to IQ indicators defined in the ontology. Data bindings — again defined in RDF — are used to represent mappings between data elements (e.g. defined in XML Schemas) and the IQ ontology. As a practical illustration of our approach, we present a case study from the domain of proteomics.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Carr, S., Aebersold, R., Baldwin, M., Burlingame, A., Clauser, K., Nesvizhskii, A.: Editorial: The need for guidelines in publication of peptide and protein identification data. Molecular and Cellular Proteomics 3, 531–533 (2004)
Elfeky, M.G., Elmagarmid, A.K., Verykios, V.S.: Tailor: a record linkage tool box. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), San Jose, CA, February 2002. IEEE Computer Society Press, Los Alamitos (2002)
English, L.: Improving Data Warehouse and Business Information Quality. Wiley, Chichester (1999)
Groth, P., Luck, M., Moreau, L.: Formalising a protocol for recording provenance in Grids. In: Proc. 3th UK e-Science All Hands Meeting, pp. 147–154 (2004)
Listgarten, J., Emili, A.: Statistical and computational methods for comparative proteomic profiling using liquid chromatography-tandem mass spectrometry. Molecular & Cellular Proteomics 4(4), 419–434 (2005)
Missier, P., Embury, S., Greenwood, M., Preece, A., Jin, B.: An ontology-based approach to handling information quality in e-science. In: Proc. 4th e-Science All Hands Meeting (2005)
Nesvizhskii, A.I., Aebersold, R.: Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem ms. Drug Discovery Today 9(4), 173–181 (2004)
Redman, T.C.: Data quality for the information age. Artech House (1996)
Reif, G., Gall, H., Jazayeri, M.: WEESA - web engineering for semantic web applications. In: Proceedings of the 14th International World Wide Web Conference (2005)
Scannapieco, M., Missier, P., Batini, C.: Data quality at a glance. Databanken-Spektrum 14, 6–14 (2005)
Sharman, N., Alpdemir, N., Ferris, J., Greenwood, M., Li, P., Wroe, C.: The myGrid information model. In: Proc. 3rd e-Science All Hands Meeting (2004)
Taylor, C.F., et al.: A systematic approach to modeling, capturing, and disseminating proteomics experimental data. Nature Biotechnology 21(3), 247–254 (2003)
Wang, R., Strong, D.: Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems 12(4), 5–34 (1996)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Zhao, J., Wroe, C., Goble, C., Stevens, R., Quan, D., Greenwood, M.: Using semantic web technologies for representing e-science provenance. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 92–106. Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Preece, A. et al. (2006). Managing Information Quality in e-Science Using Semantic Web Technology. In: Sure, Y., Domingue, J. (eds) The Semantic Web: Research and Applications. ESWC 2006. Lecture Notes in Computer Science, vol 4011. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11762256_35
Download citation
DOI: https://doi.org/10.1007/11762256_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34544-2
Online ISBN: 978-3-540-34545-9
eBook Packages: Computer ScienceComputer Science (R0)