Advertisement

An OAI-Based Filtering Service for CITIDEL from NDLTD

  • Baoping Zhang
  • Marcos André Gonçalves
  • Edward A. Fox
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2911)

Abstract

One goal of the Computing and Information Technology Interactive Digital Educational Library (CITIDEL) is to maximize the number of computing-related resources available to computer science scholars and practitioners through it. In this paper, we describe a set of experiments designed to help this goal by adding to CITIDEL a sub-collection of computing related electronic theses and dissertations (ETDs) automatically extracted from the Networked Digital Library of Theses and Dissertations (NDLTD) OAI Union Catalog. We analyze the metadata quality of the NDLTD OAI Union Catalog and describe three different experiments that combine different sources of evidence to improve the accuracy in filtering out the computing related entries.

Keywords

Support Vector Machine Digital Library Virginia Tech Subject Field Sequential Minimal Optimization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fox, E.A.: Computing and Information Technology Interactive Digital Educational Library (CITIDEL). Homepage (2002), http://www.citidel.org/
  2. 2.
    Fox, E.A.: Networked Digital Library of Theses and Dissertations. Nature Web Matters (1999), http://helix.nature.com/webmatters/library/library.html
  3. 3.
    Fox, E.A.: Networked Digital Library of Theses and Dissertations (NDLTD). Homepage (1999), http://www.ndltd.org
  4. 4.
    Suleman, H., Atkins A., Gonçalves, M.A., France, R.K., Fox, E.A., Virginia Tech; Chachra V., Crowder M., VTLS, Inc.; and Young, J., OCLC: Networked Digital Library of Theses and Dissertations: Bridging the Gaps for Global Access - Part 1: Mission and Progress. D-Lib Magazine 7(9) (2001) Google Scholar
  5. 5.
    Suleman, H., Luo, M.: Electronic Thesis/Dissertation OAI Union Catalog. Homepage (2002), http://rocky.dlib.vt.edu/~etdunion/cgi-bin/index.pl
  6. 6.
    Van de Sompel, H.: Open Archives Initiative. WWW site. Cornell University, Ithaca (2000), http://www.openarchives.org
  7. 7.
    DCMI: Dublin Core Metadata Element Set, Version 1.1: Reference Description, Available from http://www.dublincore.org/documents/dces/
  8. 8.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)Google Scholar
  9. 9.
    Paynter, G.: Attribute-Relation File Format (ARFF). WWW site, http://www.cs.waikato.ac.nz/~ml/weka/arff.html
  10. 10.
    Platt, J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, MIT Press, Cambridge (1998)Google Scholar
  11. 11.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  12. 12.
    Dumais, S.T., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of CIKM 1998, 7th ACM International Conference on Information and Knowledge Management, Bethesda, MD, pp. 148–155 (1998)Google Scholar
  13. 13.
    Joachims, T.: A statistical learning model of text classification for support vector machines. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, New Orleans, Louisiana, pp. 128–136 (2001)Google Scholar
  14. 14.
    Dumais, S., Chen, H.: Hierarchical classification of Web content. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, Athens, Greece, pp. 256–263 (2000)Google Scholar
  15. 15.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1), 1–47 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Baoping Zhang
    • 1
  • Marcos André Gonçalves
    • 1
  • Edward A. Fox
    • 1
  1. 1.Digital Library Research LaboratoryVirginia TechBlacksburgUSA

Personalised recommendations