Extracting Structured Subject Information from Digital Document Archives

  • Jyi-Shane Liu
  • Ching-Ying Lee
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4312)


Information extraction (IE) techniques are capable of decoding targeted subject information in documents, and reducing text data into a set of structured core information. The implication for digital libraries is that IE potentially serves as an enabling tool to extend the value of digital document archives. We present an approach, called sandwich extraction pattern, to address the closely coupled template relation tasks. The approach provides interactive capabilities for task specification, domain knowledge acquisition, and output evaluation. This allows users (e.g. librarians) to have direct control on the design of value-added content products and the performance of IE tools. We conducted empirical validation by implementing an IE system, called SEP, and field testing it in a practical document archive. Encouraged by successful test runs, NCCU library has formally initiated a project to develop a value-added content product of government personnel gazettes, including document images, electronic texts, and personnel changes database.


information extraction digital document archives value-added services 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Applet, D.E., Israel, D.J.: Introduction to Information Extraction Technology. A Tutorial. In: Proceedings of the 16th Int’l Joint Conference on Artificial Intelligence (1999)Google Scholar
  2. 2.
    Ciravegna, F.: Adaptive Information Extraction from Text by Rule Induction and Generalisation. In: Proceedings of the 17th IJCAI, pp. 1251–1256 (2001)Google Scholar
  3. 3.
    Grishman, R.: Information Extraction: Techniques and Challenges. In: Pazienza, M.T. (ed.) SCIE 1997. LNCS, vol. 1299, pp. 10–27. Springer, Heidelberg (1997)Google Scholar
  4. 4.
    Mohri, M.: Finite-State Transducers in Language and Speech Processing. Computational Linguistics 23(2), 269–311 (1997)MathSciNetGoogle Scholar
  5. 5.
    Saracevic, T., Kantor, P.B.: Studying the Value of Library and Information Services, Part I: Establishing a Theoretical Framework. Journal of the American Society for Information Science 48(6), 527–542 (1997)CrossRefGoogle Scholar
  6. 6.
    Soderland, S.: Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning 34(1-3), 233–272 (1999)zbMATHCrossRefGoogle Scholar
  7. 7.
    Wilks, Y., Catizone, R.: Can We Make Information Extraction More Adaptive? In: Pazienza, M.T. (ed.) SCIE 1999. LNCS (LNAI), vol. 1714, Springer, Heidelberg (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jyi-Shane Liu
    • 1
    • 2
  • Ching-Ying Lee
    • 3
  1. 1.Department of Computer ScienceNational Chengchi UniversityTaiwan, R.O.C.
  2. 2.University LibraryNational Chengchi UniversityTaiwan, R.O.C.
  3. 3.Department of EnglishNational Taiwan Normal UniversityTaiwan, R.O.C.

Personalised recommendations