Skip to main content

Effectiveness of Methods for Syntactic and Semantic Recognition of Numeral Strings: Tradeoffs Between Number of Features and Length of Word N-Grams

  • Conference paper
Book cover AI 2007: Advances in Artificial Intelligence (AI 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4830))

Included in the following conference series:

  • 2351 Accesses

Abstract

This paper describes and compares the use of methods based on N-grams (specifically trigrams and pentagrams), together with five features, to recognise the syntactic and semantic categories of numeral strings representing money, number, date, etc., in texts. The system employs three interpretation processes: word N-grams construction with a tokeniser; rule-based processing of numeral strings; and N-gram-based classification. We extracted numeral strings from 1,111 online newspaper articles. For numeral strings interpretation, we chose 112 (10%) of 1,111 articles to provide unseen test data (1,278 numeral strings), and used the remaining 999 articles to provide 11,525 numeral strings for use in extracting N-gram-based constraints to disambiguate meanings of the numeral strings. The word trigrams method resulted in 83.8% precision, 81.2% recall ratio, and 82.5% in F-measurement ratio. The word pentagrams method resulted in 86.6% precision, 82.9% recall ratio, and 84.7% in F-measurement ratio.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Searching with Numbers. In: Proceedings of WWW2002, pp. 190–196 (2002)

    Google Scholar 

  2. Asahara, M., Matsumoto, Y.: Japanese Named Entity Extraction with Redundant Morphological Analysis. In: Proceedings of HLT-NAACL 2003, pp. 8–15 (2003)

    Google Scholar 

  3. Black, W., Rinaldi, F., Mowatt, D.: Description of the NE system used for MUC-7. In: Proceedings of MUC-7 (1998)

    Google Scholar 

  4. Chieu, L., Ng, T.: Named Entity Recognition: A Maximum Entropy Approach Using Global Information. In: Proceedings of the 19th COLING, pp. 190–196 (2002)

    Google Scholar 

  5. CoNLL-2003 Language-Independent Named Entity Recognition. http://www.cnts.uia.ac.be/conll2003/ner/2 (2003)

  6. Dale, R.: A Framework for Complex Tokenisation and its Application to Newspaper Text. In: Proceedings of the second Australian Document Computing Symposium (1997)

    Google Scholar 

  7. Earley, J.: An Efficient Context-Free Parsing Algorithm. CACM 13(2), 94–102 (1970)

    MATH  Google Scholar 

  8. Jarvelin, A., Jarvelin, A., Jarvelin, K.: s-grams: Defining Generalised n-grams for Information Retrieval. Information Processing and Management. 43, 1005–1019 (2007)

    Article  Google Scholar 

  9. Kim, Y., Park, S., Kang, B., Choi, Y.: Incremental Knowledge Management of Web Community Groups on Web Portals. In: Karagiannis, D., Reimer, U. (eds.) PAKM 2004. LNCS (LNAI), vol. 3336, pp. 198–207. Springer, Heidelberg (2004)

    Google Scholar 

  10. Kim, Y., Park, S., Kang, B., Deards, E.: Adaptive Web Document Classification with MCRDR. In: ITCC 2004. Proceedings of International Conference on Information Technology, Las Vegas USA, pp. 198–207 (2004)

    Google Scholar 

  11. Mahidadia, A., Compton, P.: Knowledge Management in Data and Knowledge Intensive Environments. In: Karagiannis, D., Reimer, U. (eds.) PAKM 2004. LNCS (LNAI), vol. 3336, pp. 10–116. Springer, Heidelberg (2004)

    Google Scholar 

  12. Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Y.: Named Entity Recognition from Diverse Text Types. In: Proceedings of Recent Advances in NLP (2001)

    Google Scholar 

  13. Min, K., Wilson, W.H., Moon, Y.: Syntactic and Semantic Disambiguation of Numeral Strings Using an N-gram Method. In: Zhang, S., Jarvis, R. (eds.) AI 2005. LNCS (LNAI), vol. 3809, pp. 82–91. Springer, Heidelberg (2005)

    Google Scholar 

  14. Min, K., Wilson, W.H.: Comparison of Numeral Strings Interpretation: Rule-base and Feature-Based N-gram Methods. In: Sattar, A., Kang, B.H. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 1226–1230. Springer, Heidelberg (2006)

    Google Scholar 

  15. Nelson, G., Wallis, S., Aarts, B.: Exploring Natural Language - Working with the British Component of the International Corpus of English, John Benjamins, Netherlands (2002)

    Google Scholar 

  16. Paradis, F., Nie, J.: Contextual Feature Selection for Text Classification. Information Processing and Management. 43, 344–352 (2007)

    Article  Google Scholar 

  17. Polanyi, L., van den Berg, M.: Logical Structure and Discourse Anaphora Resolution. In: ACL 1999. Proceedings of Workshop on The Relation of Discourse/Dialogue Structure and Reference, pp. 110–117 (1999)

    Google Scholar 

  18. Reiter, E., Sripada, S.: Learning the Meaning and Usage of Time Phrases from a parallel Text-Data Corpus. In: HLT-NAACL2003. Proceedings of Workshop on Learning Word Meaning from Non-Linguistic Data, pp. 78–85 (2003)

    Google Scholar 

  19. Seki, K., Mostafa, J.: A Hybrid Approach to Protein Name Identification in Biomedical Texts. Information Processing and Management. 41, 723–743 (2005)

    Article  Google Scholar 

  20. Siegel, M., Bender, E.M.: Efficient Deep Processing of Japanese. In: Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization (2002)

    Google Scholar 

  21. Torii, M., Kamboj, S., Vijay-Shanker, K.: An investigation of Various Information Sources for Classifying Biological Names. In: Dignum, F.P.M. (ed.) ACL 2003. LNCS (LNAI), vol. 2922, pp. 113–120. Springer, Heidelberg (2004)

    Google Scholar 

  22. Wang, H., Yu, S.: The Semantic Knowledge-base of Contemporary Chinese and its Apllication in WSD. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, pp. 112–118 (2003)

    Google Scholar 

  23. Zhou, G., Su, J.: Named Entity Recognition using an HMM-based Chunk Tagger. In: Proceedings of ACL2002, pp. 473–480 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Mehmet A. Orgun John Thornton

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Min, K., Wilson, W.H., Kang, BH. (2007). Effectiveness of Methods for Syntactic and Semantic Recognition of Numeral Strings: Tradeoffs Between Number of Features and Length of Word N-Grams. In: Orgun, M.A., Thornton, J. (eds) AI 2007: Advances in Artificial Intelligence. AI 2007. Lecture Notes in Computer Science(), vol 4830. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76928-6_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76928-6_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76926-2

  • Online ISBN: 978-3-540-76928-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics