Leveraging the Legacy of Conventional Libraries for Organizing Digital Libraries

  • Arash Joorabchi
  • Abdulhussain E. Mahdi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5714)


With the significant growth in the number of available electronic documents on the Internet, intranets, and digital libraries, the need for developing effective methods and systems to index and organize E-documents is felt more than ever. In this paper we introduce a new method for automatic text classification for categorizing E-documents by utilizing classification metadata of books, journals and other library holdings, that already exists in online catalogues of libraries. The method is based on identifying all references cited in a given document and, using the classification metadata of these references as catalogued in a physical library, devising an appropriate class for the document itself according to a standard library classification scheme with the help of a weighting mechanism. We have demonstrated the application of the proposed method and assessed its performance by developing a prototype classification system for classifying electronic syllabus documents archived in the Irish National Syllabus Repository according to the well-known Dewey Decimal Classification (DDC) scheme.


Digital library organization text classification collective classification library classification schemes bibliography 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Avancini, H., Rauber, A., Sebastiani, F.: Organizing Digital Libraries by Automated Text Categorization. In: International Conference on Digital Libraries, ICDL 2004, New Delhi, India (2004)Google Scholar
  2. 2.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1), 1–47 (2002)CrossRefGoogle Scholar
  3. 3.
    Golub, K.: Automated subject classification of textual Web pages, based on a controlled vocabulary: Challenges and recommendations. New Review of Hypermedia and Multimedia 12(1), 11–27 (2006)CrossRefGoogle Scholar
  4. 4.
    Yi, K.: Automated Text Classification Using Library Classification Schemes: Trends, Issues, and Challenges. In: International Cataloguing and Bibliographic Control (ICBC), vol. 36(4) (2007)Google Scholar
  5. 5.
    Dewey, M.: Dewey Decimal Classification (DDC) OCLC Online Computer Library Center (1876), (cited January 2008)
  6. 6.
    Putnam, H.: Library of Congress Classification (LCC) Library of Congress, Cataloging Policy and Support Office (1897), (cited January 2008)
  7. 7.
    Scorpion, OCLC Online Computer Library Center, Inc. (2002),
  8. 8.
    Larson, R.R.: Experiments in automatic Library of Congress Classification. Journal of the American Society for Information Science 43(2), 130–148 (1992)CrossRefGoogle Scholar
  9. 9.
    Jenkins, C., Jackson, M., Burden, P., Wallis, J.: Automatic classification of Web resources using Java and Dewey Decimal Classification. Computer Networks and ISDN Systems 30(1-7), 646–648 (1998)CrossRefGoogle Scholar
  10. 10.
    Dolin, R., Agrawal, D., Abbadi, E.E.: Scalable collection summarization and selection. In: Proceedings of the fourth ACM conference on Digital libraries, Berkeley, California, United States (1999)Google Scholar
  11. 11.
    Chung, Y.M., Noh, Y.-H.: Developing a specialized directory system by automatically classifying Web documents. Journal of Information Science 29(2), 117–126 (2003)CrossRefGoogle Scholar
  12. 12.
    Pong, J.Y.-H., Kwok, R.C.-W., Lau, R.Y.-K., Hao, J.-X., Wong, P.C.-C.: A comparative study of two automatic document classification methods in a library setting. Journal of Information Science 34(2), 213–230 (2008)CrossRefGoogle Scholar
  13. 13.
    Frank, E., Paynter, G.W.: Predicting Library of Congress classifications from Library of Congress subject headings. Journal of the American Society for Information Science and Technology 55(3), 214–227 (2004)CrossRefGoogle Scholar
  14. 14.
    Joorabchi, A., Mahdi, A.E.: A New Method for Bootstrapping an Automatic Text Classification System Utilizing Public Library Resources. In: Proceedings of the 19th Irish Conference on Artificial Intelligence and Cognitive Science, Cork, Ireland (August 2008)Google Scholar
  15. 15.
    Sen, P., Namata, G.M., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective Classification in Network Data. Technical Report CS-TR-4905, University of Maryland, College Park (2008),
  16. 16.
    Joorabchi, A., Mahdi, A.E.: Development of a national syllabus repository for higher education in ireland. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds.) ECDL 2008. LNCS, vol. 5173, pp. 197–208. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  17. 17. 2.0, sponsored by Sun Microsystems Inc., released under the open source LGPL licence (2007),
  18. 18.
    Xpdf 3.02, Glyph & Cog, LLC., Released under the open source GPL licence (2007),
  19. 19.
    Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, US (July 2002)Google Scholar
  20. 20.
    Z39.50, International Standard Maintenance Agency - Library of Congress Network Development and MARC Standards Office, 2.0 (1992),
  21. 21.
    MARC standards. Library of Congress Network Development and MARC Standards Office (1999),
  22. 22.
    ISCED. International Standard Classification of Education -1997 version (ISCED 1997) (UNESCO (1997), (cited July 2008)
  23. 23.
    WorldCat (Online Computer Library Center (OCLC) (2001)(2008), (cited January 2008)
  24. 24.
    Councill, I.G., Giles, C.L., Kan, M.-Y.: ParsCit: An open-source CRF reference string parsing package. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2008), Marrakesh, Morrocco (May 2008)Google Scholar
  25. 25.
    Traugott, K., Anders, A., Koraljka, G.: Browsing and searching behavior in the renardus web service a study based on log analysis. In: Proceedings of the Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, Tuscon, AZ, USA. ACM Press, New York (2004)Google Scholar
  26. 26.
    Giles, C.L., Kurt, D.B., Steve, L.: CiteSeer: an automatic citation indexing system. In: Proceedings of the third ACM conference on Digital libraries, Pittsburgh, USA (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Arash Joorabchi
    • 1
  • Abdulhussain E. Mahdi
    • 1
  1. 1.Department of Electronic and Computer EngineeringUniversity of LimerickIreland

Personalised recommendations