A Frame-Based Approach for Reference Metadata Extraction

  • Yu-Lun Hsieh
  • Shih-Hung Liu
  • Ting-Hao Yang
  • Yu-Hsuan Chen
  • Yung-Chun Chang
  • Gladys Hsieh
  • Cheng-Wei Shih
  • Chun-Hung Lu
  • Wen-Lian Hsu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8916)

Abstract

In this paper, we propose a novel frame-based approach (FBA) and use reference metadata extraction as a case study to demonstrate its advantages. The main contributions of this research are three-fold. First, the new frame matching algorithm, based on sequence alignment, can compensate for the shortcomings of traditional rule-based approach, in which rule matching lacks flexibility and generality. Second, an approximate matching is adopted for capturing reasonable abbreviations or errors in the input reference string to further increase the coverage of the frames. Third, experiments conducted on extensive datasets show that the same knowledge framework performed equally well on various untrained domains. Comparing to a widely-used machine learning method, Conditional Random Fields (CRFs), the FBA can drastically reduce the average field error rate across all four independent test sets by 70% (2.24% vs. 7.54%).

Keywords

Reference Metadata Extraction Knowledge representation Frame-based approach 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agichtein, E., Ganti, V.: Mining reference tables for automatic text segmentation. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 20–29 (2004)Google Scholar
  2. 2.
    Chen, C.C., Yang, K.H., Chen, C.L., Ho, J.M.: BibPro: A citation parser based on sequence alignment. IEEE Transactions on Knowledge and Data Engineering 24(2), 236–250 (2012)CrossRefGoogle Scholar
  3. 3.
    Chowdhury, G.: Template mining for information extraction from digital documents. Library Trends 48, 182–208 (1999)Google Scholar
  4. 4.
    Cortez, E., da Silva, A.S., Goncalves, M.A., Mesquita, F., de Moura, E.S.: FLUX-CiM: Flexible unsupervised extraction of citation metadata. In: Proceedings of the Seventh ACM/IEEE-CS Joint Conf. Digital Libraries, pp. 215–224 (2007)Google Scholar
  5. 5.
    Day, M.Y., Tsai, T.H., Sung, C.L., Hsieh, C.C., Lee, C.W., Wu, S.H., Wu, K.P., Ong, C.S., Hsu, W.L.: Reference metadata extraction using a hierarchical knowledge representation framework. Decision Support Systems 43, 152–167 (2007)CrossRefGoogle Scholar
  6. 6.
    Ding, Y., Chowdhury, G., Foo, S.: Template mining for the extraction of citation from digital documents. In: Proceedings of the Second Asian Digital Library Conference, pp. 47–62 (1999)Google Scholar
  7. 7.
    Giles, C.L., Bollacker, K.D., Lawrence, S.: CiteSeer: An automatic citation indexing system. In: Proceedings of the Third ACM Conference on Digital Libraries, pp. 89–98 (1998)Google Scholar
  8. 8.
    Han, H.C., Giles, L., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.A.: Automatic document metadata extraction using support vector machines. In: Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital libraries, pp. 37–48 (2003)Google Scholar
  9. 9.
    Mitchell, T.M.: Machine Learning. McGraw-Hill, Inc. (1997)Google Scholar
  10. 10.
    Peng, F., McCallum, A.: Accurate information extraction from research papers using conditional random fields. In: Proceedings of the Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pp. 320–336 (2004)Google Scholar
  11. 11.
    Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden markov model structure for information extraction. In: Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction, pp. 37–42 (1999)Google Scholar
  12. 12.
    Wu, S.H., Tsai, T.H., Hsu, W.L.: Domain event extraction and representation with domain ontology. In: Proceedings of the IJCAI 2003 Workshop on Information Integration on the Web, Acapulco, Mexico (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Yu-Lun Hsieh
    • 1
  • Shih-Hung Liu
    • 1
  • Ting-Hao Yang
    • 1
  • Yu-Hsuan Chen
    • 1
  • Yung-Chun Chang
    • 1
  • Gladys Hsieh
    • 1
  • Cheng-Wei Shih
    • 1
  • Chun-Hung Lu
    • 2
  • Wen-Lian Hsu
    • 1
  1. 1.Institute of Information ScienceAcademia SinicaTaipeiTaiwan
  2. 2.Innovative Digitech-Enabled Applications & Services Institute, IIITaiwan

Personalised recommendations