Comparison of Numeral Strings Interpretation: Rule-Based and Feature-Based N-Gram Methods

  • Kyongho Min
  • William H. Wilson
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4304)


This paper describes a performance comparison for two approaches to numeral string interpretation: manually generated rule-based interpretation of numerals and strings including numerals [8] vs automatically generated feature-based interpretation. The system employs three interpretation processes: word trigram construction with a tokeniser, rule-based processing of number strings, and n-gram based classification. We extracted numeral strings from 378 online newspaper articles, finding that, on average, they comprised about 2.2% of the words in the articles. For feature-based interpretation, we tested on 11 datasets, with random selection of sample data to extract tabular feature-based constraints. The rule-based approach resulted in 86.8% precision and 77.1% recall ratio. The feature-based interpretation resulted in 83.1% precision and 74.5% recall ratio.


Entity Recognition Precision Ratio Disjunctive Constraint Disambiguation Method Recall Ratio 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Asahara, M., Matsumoto, Y.: Japanese Named Entity Extraction with Redundant Morphological Analysis. In: Proceedings of HLT-NAACL 2003, pp. 8–15 (2003)Google Scholar
  2. 2.
    Black, W., Rinaldi, F., Mowatt, D.: FACILE: Description of the NE system used for MUC-7. In: Proceedings of MUC-7 (1998)Google Scholar
  3. 3.
    Chieu, L., Ng, T.: Named Entity Recognition: A Maximum Entropy Approach Using Global Information. In: Proceedings of the 19th COLING, pp. 190–196 (2002)Google Scholar
  4. 4.
    CoNLL-2003 Language-Independent Named Entity Recognition (2003),
  5. 5.
    Dale, R.: A Framework for Complex Tokenisation and its Application to Newspaper Text. In: Proceedings of the second Australian Document Computing Symposium (1997)Google Scholar
  6. 6.
    Earley, J.: An Efficient Context-Free Parsing Algorithm. CACM 13(2), 94–102 (1970)MATHGoogle Scholar
  7. 7.
    Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Y.: Named Entity Recognition from Diverse Text Types. In: Proceedings of Recent Advances in NLP (2001)Google Scholar
  8. 8.
    Min, K., Wilson, W.H., Moon, Y.: Syntactic and Semantic Disambiguation of Numeral Strings Using an N-gram Method. In: Zhang, S., Jarvis, R. (eds.) AI 2005. LNCS (LNAI), vol. 3809, pp. 82–91. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Nelson, G., Wallis, S., Aarts, B.: Exploring Natural Language - working with the British Component of the International Corpus of English. John Benjamins, The Netherlands (2002)Google Scholar
  10. 10.
    Polanyi, L., van den Berg, M.: Logical Structure and Discourse Anaphora Resolution. In: Proceedings of ACL 1999 Workshop on The Relation of Discourse/Dialogue Structure and Reference, pp. 110–117 (1999)Google Scholar
  11. 11.
    Reiter, E., Sripada, S.: Learning the Meaning and Usage of Time Phrases from a parallel Text-Data Corpus. In: Proceedings of HLT-NAACL 2003 Workshop on Learning Word Meaning from Non-Linguistic Data, vol. 11, pp. 78–85 (2003)Google Scholar
  12. 12.
    Siegel, M., Bender, E.M.: Efficient Deep Processing of Japanese. In: Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization (2002)Google Scholar
  13. 13.
    Torii, M., Kamboj, S., Vijay-Shanker, K.: An investigation of Various Information Sources for Classifying Biological Names. In: Proceedings of ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 113–120 (2003)Google Scholar
  14. 14.
    Wang, H., Yu, S.: The Semantic Knowledge-base of Contemporary Chinese and its Application in WSD. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, pp. 112–118 (2003)Google Scholar
  15. 15.
    Zhou, G., Su, J.: Named Entity Recognition using an HMM-based Chunk Tagger. In: Proceedings of ACL 2002, pp. 473–480 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Kyongho Min
    • 1
  • William H. Wilson
    • 2
  1. 1.School of Computer and Information SciencesAuckland University of TechnologyNew Zealand
  2. 2.School of Computer Science and EngineeringUniversity of New South WalesSydneyAustralia

Personalised recommendations