Skip to main content

Text-Mining: Application Development Challenges

  • Conference paper
Applications and Innovations in Intelligent Systems X

Abstract

This paper reviews the best practices and challenges for project managers and developers involved in implementing text-mining applications. With focus on rule-based information extraction, and references to actual cases, the authors share their experiences from having developed several text-mining applications in diverse industries. First, project management issues are discussed, including a process for capturing business requirements and mapping them into features and linguistic patterns, development of linguistic rules, rule development standards, performance metrics, and an evaluation methodology. Linguistic representations such as sub-syntactic, syntactic, semantic, and application-specific rules are identified. Special emphasis is placed on post-information extraction processing, such as improving the relevance of the extracted information, summarization models, techniques for handling typographical errors, resolution of temporal information, anaphora resolution, and a discussion on shallow vs. full parsing. Lastly, the paper discusses various utilities to help with the development of a text-mining application, such as feature analysis, visualization, source document pre-processing, and rule authoring tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Grishman R., The role of syntax in Information Extraction, In: Advances in Text Processing: Tipster Program Phase H, Morgan Kaufmann, 1996.

    Google Scholar 

  2. Hobbs, J. (1986), Resolving Pronoun References, In B. J. Grosz, K. Sparck Jones, & B. L. Webber (Eds.), Readings in Natural Language Processing (pp. 339–352), Los Altos, CA: Morgan Kaufmann Publishers, Inc.

    Google Scholar 

  3. Hobbs, Jerry R., Douglas E. Appelt, John Bear, David Israel, Megumi Kameyama, and Mabry Tyson, “FASTUS: A System for Extracting Information from Text”, Proceedings, Human Language Technology, Princeton, New Jersey, March 1992, pp. 133–137.

    Google Scholar 

  4. Hobbs, Jerry R., Mark Stickel, Douglas Appelt, and Paul Martin, 1993, “Interpretation as Abduction”, Artificial Intelligence, Vol. 63, Nos. 1–2, pp. 69–142.

    Article  Google Scholar 

  5. Ingria, R. & Stallard, D., A computational mechanism for pronominal reference, Proceedings, 27th Annual Meeting of the Association for Computational Linguistics, Vancouver, 1989.

    Google Scholar 

  6. Lappin, S. & McCord, M., A syntactic filter on pronominal anaphora for slot grammar, Proceedings, 28th Annual Meeting, Association for Computational Linguistics (ACL), University of Pittsburgh, ACL, 1990.

    Google Scholar 

  7. Sheila Tejada, Craig A. Knoblock and Steven Minton, Learning Object Identification Rules For Information Integration, Information Systems Vol. 26, No. 8, 2001, pp. 607–633.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag London Limited

About this paper

Cite this paper

Varadarajan, S., Kasravi, K., Feldman, R. (2003). Text-Mining: Application Development Challenges. In: Macintosh, A., Ellis, R., Coenen, F. (eds) Applications and Innovations in Intelligent Systems X. Springer, London. https://doi.org/10.1007/978-1-4471-0649-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-0649-4_17

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-85233-673-8

  • Online ISBN: 978-1-4471-0649-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics