Development of a National Syllabus Repository for Higher Education in Ireland

  • Arash Joorabchi
  • Abdulhussain E. Mahdi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5173)


With the significant growth in electronic education materials such as syllabus documents and lecture notes available on the Internet and intranets, there is a need for developing structured central repositories of such materials to allow both educators and learners to easily share, search and access them. This paper reports on our on-going work to develop a national repository for course syllabi in Ireland. In specific, it describes a prototype syllabus repository system for higher education in Ireland that has been developed by utilising a number of information extraction and document classification techniques, including a new fully unsupervised document classification method that uses a web search engine for automatic collection of training set for the classification algorithm. Preliminary experimental results for evaluating the system’s performance are presented and discussed.


Information Extractor Page Number Subject Field Training Document Word Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Marcis, J.G., Carr, D.: A note on student views regarding the course syllabus. Atlantic Economic Journal 31(1), 115 (2003), CrossRefGoogle Scholar
  2. 2.
    Embley, D.W., Hurst, M., Lopresti, D., Nagy, G.: Table-processing paradigms: a research survey. International Journal on Document Analysis and Recognition 8(2-3), 66–86 (2006), CrossRefGoogle Scholar
  3. 3.
    Mccallum, A.: Information extraction: distilling structured data from unstructured text. Queue 3(9), 48–57 (2005), CrossRefGoogle Scholar
  4. 4.
    Yu, X., Tungare, M., Fan, W., Yuan, Y., Pérez-Quiñones, M., Fox, E.A., Cameron, W., Cassel, L.: Using Automatic Metadata Extraction to Build a Structured Syllabus Repository. In: Proceedings of the 10th International Conference on Asian Digital Libraries (ICADL 2007), Ha Noi, Vietnam (December 2007),
  5. 5.
    Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, US (July 2002),
  6. 6.
    Choi, F.: Advances in domain independent linear text segmentation. In: Proceedings of the first conference on North American chapter of the Association for Computational Linguistics (NAACL 2000), Seattle, USA (April 2000),
  7. 7.
    Thompson, C., Smarr, J., Nguyen, H., Manning, C.D.: Finding Educational Resources on the Web: Exploiting Automatic Extraction of Metadata. In: Proceedings of the ECML Workshop on Adaptive Text Extraction and Mining, Cavtat-Dubrovnik, Croatia (September 2003),
  8. 8.
    Matsunaga, Y., Yamada, S., Ito, E., Hirokawa, S.: A Web Syllabus Crawler and its Efficiency Evaluation. In: Proceedings of the International Symposium on Information Science and Electrical Engineering 2003 (ISEE 2003), Fukuoka, Japan (November 2003),
  9. 9.
    de Assis, G., Laender, A., Gonçalves, M., da Silva, A.: Exploiting Genre in Focused Crawling. In: String Processing and Information Retrieval, pp. 62–73. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  10. 10.
    Xiaoyan, Y., Manas, T., Weiguo, F., Manuel, P.-Q., Edward, A.F., William, C., GuoFang, T., Lillian, C.: Automatic syllabus classification. In: Proceedings of the ACM IEEE Joint Conference on Digital Libraries, Vancouver, BC, Canada (June 2007),
  11. 11.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1), 1–47 (2002)CrossRefGoogle Scholar
  12. 12. 2.0 (sponsored by Sun Microsystems Inc., released under the open source LGPL licence, 2007),
  13. 13.
    Xpdf 3.02 (Glyph & Cog, LLC., Released under the open source GPL licence, 2007)
  14. 14.
    Steward, S.: Pdftk 1.12 - the PDF Toolkit (sponsored by AccessPDF, Released under the open source GPL licence, 2004),
  15. 15.
    International Standard Classification of Education - 1997 version (ISCED 1997) (UNESCO, 2006) [cited 2007 December],
  16. 16.
    McCallum, A., Nigam, K.: A comparison of event models for Naive Bayes text classification. In: Proceedings of the AAAI 1998 Workshop on Learning for Text Categorization, Wisconsin, USA (1998),
  17. 17.
    Joachims, T.: A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, TN, USA. Morgan Kaufmann Publishers Inc., San Francisco (1997)Google Scholar
  18. 18.
    Seeger, M.: Learning with labeled and unlabeled data. Technical report, Institute for Adaptive and Neural Computation, University of Edinburgh (2000),
  19. 19.
    Yahoo! Search Web Services Software Development Kit (Yahoo! Inc (2007),
  20. 20.
    Appelt, D.E., Israel, D.: Introduction to Information Extraction Technology. In: Proceedings of the 16th international joint conference on artificial Intelligence (IJCAI 1999), Stockholm, Sweden (August 2, 1999),

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Arash Joorabchi
    • 1
  • Abdulhussain E. Mahdi
    • 1
  1. 1.Department of Electronic and Computer EngineeringUniversity of LimerickIreland

Personalised recommendations