Advertisement

A Hybrid Declarative/Procedural Metadata Mapping Language Based on Python

  • Greg Janée
  • James Frew
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3652)

Abstract

The Alexandria Digital Library (ADL) project has been working on automating the processes of building ADL collections and gathering the collection statistics on which ADL’s discovery system is based. As part of this effort, we have created a language and supporting programmatic framework for expressing mappings from XML metadata schemas to the required ADL metadata views. This language, based on the Python scripting language, is largely declarative in nature, corresponding to the fact that mappings can be largely—though not entirely—specified by crosswalk-type specifications. At the same time, the language allows mappings to be specified procedurally, which we argue is necessary to deal effectively with the realities of poor quality, highly variable, and incomplete metadata. An additional key feature of the language is the ability to derive new mappings from existing mappings, thereby making it easy to adapt generic mappings to the idiosyncrasies of particular metadata providers. We evaluate this language on three metadata standards (ADN, FGDC, and MARC) and three corresponding collections of metadata. We also note limitations, future research directions, and generalizations of this work.

Keywords

Access Point Digital Library Mapping Language Mapping Framework Metadata Standard 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    ANSI Z39.50-1995. Information Retrieval (Z39.50) Application Service Definition and Protocol Specification, http://www.loc.gov/z3950/agency/markup/markup.html
  2. 2.
    Arms, W.Y., Dushay, N., Fulker, D., Lagoze, C.: A Case Study in Meta-data Harvesting: the NSDL. Library Hi Tech 21(2), 228–237 (2003), http://dx.doi.org/10.1108/07378830310479866. CrossRefGoogle Scholar
  3. 3.
    James Clark (ed.). XSL Transformations (XSLT). Version 1.0., http://www.w3.org/TR/xslt
  4. 4.
    Doerr, M.: Semantic Problems of Thesaurus Mapping. Journal of Digital Information 1(8) (March 2001), http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Doerr/
  5. 5.
    Environmental Systems Research Institute (ESRI), Inc. ESRI Profile of the Content Stan-dard for Digital Geospatial Metadata (March 2003), http://www.esri.com/metadata/esriprof80.html
  6. 6.
    Federal Geographic Data Committee. FGDC-STD-001-1998. Content Standard for Digi-tal Geospatial Metadata (June 1998), http://www.fgdc.gov/metadata/contstan.html
  7. 7.
    Frew, J., Janée, G.: A Comparison of the Dublin Core Metadata Element Set and the Alexandria Digital Library Bucket Framework (2003), http://www.alexandria.ucsb.edu/~gjanee/archive/2003/dc-adl.pdf
  8. 8.
    Ghezzi, C., Jazayeri, M.: Programming Language Concepts, 2nd edn. John Wiley & Sons, New York (1987)Google Scholar
  9. 9.
    Godby, C.J., Young, J.A., Childress, E.: A Repository of Metadata Crosswalks. D-Lib Magazine 10(12) (December 2004)Google Scholar
  10. 10.
    Guillaume, D., Plante, R.: Declarative Metadata Processing with XML and Java. In: Astronomical Data Analysis Software and Systems X. ASP Conference Series, vol. 238 (2001), http://www.adass.org/adass/proceedings/adass00/O6-03/.
  11. 11.
    Halbert, M., Kaczmarek, J., Hagedorn, K.: Findings from the Mellon Meta-data Harvesting Initiative. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 58–69. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  12. 12.
    Hillmann, D., Dushay, N., Phipps, J.: Improving Metadata Quality: Augmen-tation and Recombination. In: DC-2004: International Conference on Dublin Core and Metadata Applications, Shanghai, China (October 2004), http://purl.org/metadataresearch/dcconf2004/papers/Paper_21.pdf.
  13. 13.
    Janée, G., Frew, J., Hill, L.L., Smith, T.R.: The ADL Bucket Frame-work. In: Third DELOS Workshop on Interoperability and Mediation in Heterogeneous Digital Libraries, Darmstadt, Germany (September 2001), http://www.ercim.org/publication/ws-proceedings/DelNoe03/13.pdf.
  14. 14.
    Janée, G., Frew, J.: The ADEPT Digital Library Architecture. In: Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), Portland, Ore-gon, July 2002, pp. 342–35 (2002), http://doi.acm.org/10.1145/544220.544306
  15. 15.
    Janée, G.: ADN Metadata Mapping (October 2003), http://www.alexandria.ucsb.edu/~gjanee/archive/2003/adn-mapping.html
  16. 16.
    Janée, G., Frew, J., Valentine, D.: Content Access Characterization in Digi-tal Libraries. In: Proceedings of the 2003 Joint Conference on Digital Libraries (JCDL), Houston, Texas, May 2003, pp. 261–262 (2003), http://doi.acm.org/10.1145/827140.827185.
  17. 17.
    Kepser, S.: A Simple Proof for the Turing-Completeness of XSLT and XQuery. Extreme Markup Languages (2004), http://www.mulberrytech.com/Extreme/Proceedings/html/2004/Kepser01/EML2004Kepser01.html
  18. 18.
    Lagoze, C., Van de Sompel, H. (eds.): The Open Archives Initiative Protocol for Metadata Harvesting. Version 2.0, June 14 (2002), http://www.openarchives.org/OAI/openarchivesprotocol.html.
  19. 19.
    Manghi, P., Simeoni, F., Lievens, D., Connor, R.: Hybrid Applications over XML: Integrating the Procedural and Declarative Approaches. In: Fourth ACM CIKM International Workshop on Web Information and Data Management (WIDM), McLean, Virginia (November 2002), http://doi.acm.org/10.1145/584931.584935
  20. 20.
    Mertz, D.: Create declarative mini-languages: Programming as assertion rather than instruction. In: Charming Python (2003), http://www.ibm.com/developerworks/library/l-cpdec.html.
  21. 21.
    Miles, A., Matthews, B.: Inter-Thesaurus Mapping (2005), Retrieved February 22 (2005), http://www.w3.org/2001/sw/Europe/reports/thes/8.4/
  22. 22.
    Paepcke, A., Brandriff, R., Janée, G., Larson, R., Ludäscher, B., Melnik, S., Raghavan, S.: Search Middleware and the Simple Digital Library Inter-operability Protocol. D-Lib Magazine 6(3) (March 2000)Google Scholar
  23. 23.
    Raymond, E.S.: The Art of Unix Programming. Addison-Wesley, Boston (2004)Google Scholar
  24. 24.
    Sathish, K., Maly, K., Zubair, M., Liu, X.: RVOT: A Tool For Making Collections OAI-PMH Compliant. In: Proceedings, 5th Russian Conference on Digital Libraries (RCDL), St. Petersburg, Russia (October 2003), http://RCDL2003.spbu.ru/proceedings/A5.pdf.
  25. 25.
    Sengupta, A., Dalkılıç, M.E.: DSQL - an SQL for structured documents. In: Pidduck, A.B., Mylopoulos, J., Woo, C.C., Ozsu, M.T. (eds.) CAiSE 2002. LNCS, vol. 2348, pp. 757–760. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  26. 26.
    Spinellis, D.: Notable Design Patterns for Domain-Specific Languages. Journal of Systems and Software 56(1), 91–99 (2001), http://www.dmst.aueb.gr/dds/pubs/jrnl/2000-JSS-DSLPatterns/html/dslpat.html CrossRefGoogle Scholar
  27. 27.
    Stvilia, B., Gasser, L., Twidale, M.B., Shreeves, S.L., Cole, T.W.: Metadata Quality for Federated Collections. In: Proceedings of the 9th International Conference on Information Quality (ICIQ), Boston, Massachusetts, November 2004, pp. 111–125 (2004)Google Scholar
  28. 28.
    Ullman, J.D., Widom, J.: A First Course in Database Systems, 2nd edn. Prentice-Hall, Upper Saddle River (2002)Google Scholar
  29. 29.
    Woodley, M.S., et al.: DCMI Glossary. September 15 (2003), http://dublincore.org/documents/usageguide/glossary.shtml

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Greg Janée
    • 1
  • James Frew
    • 2
  1. 1.Alexandria Digital Library Project, Institute for Computational Earth System ScienceUniversity of California, Santa BarbaraSanta Barbara
  2. 2.Donald Bren School of Environmental Science and ManagementUniversity of California, Santa BarbaraSanta Barbara

Personalised recommendations