Abstract
This paper reviews the best practices and challenges for project managers and developers involved in implementing text-mining applications. With focus on rule-based information extraction, and references to actual cases, the authors share their experiences from having developed several text-mining applications in diverse industries. First, project management issues are discussed, including a process for capturing business requirements and mapping them into features and linguistic patterns, development of linguistic rules, rule development standards, performance metrics, and an evaluation methodology. Linguistic representations such as sub-syntactic, syntactic, semantic, and application-specific rules are identified. Special emphasis is placed on post-information extraction processing, such as improving the relevance of the extracted information, summarization models, techniques for handling typographical errors, resolution of temporal information, anaphora resolution, and a discussion on shallow vs. full parsing. Lastly, the paper discusses various utilities to help with the development of a text-mining application, such as feature analysis, visualization, source document pre-processing, and rule authoring tools.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Grishman R., The role of syntax in Information Extraction, In: Advances in Text Processing: Tipster Program Phase H, Morgan Kaufmann, 1996.
Hobbs, J. (1986), Resolving Pronoun References, In B. J. Grosz, K. Sparck Jones, & B. L. Webber (Eds.), Readings in Natural Language Processing (pp. 339–352), Los Altos, CA: Morgan Kaufmann Publishers, Inc.
Hobbs, Jerry R., Douglas E. Appelt, John Bear, David Israel, Megumi Kameyama, and Mabry Tyson, “FASTUS: A System for Extracting Information from Text”, Proceedings, Human Language Technology, Princeton, New Jersey, March 1992, pp. 133–137.
Hobbs, Jerry R., Mark Stickel, Douglas Appelt, and Paul Martin, 1993, “Interpretation as Abduction”, Artificial Intelligence, Vol. 63, Nos. 1–2, pp. 69–142.
Ingria, R. & Stallard, D., A computational mechanism for pronominal reference, Proceedings, 27th Annual Meeting of the Association for Computational Linguistics, Vancouver, 1989.
Lappin, S. & McCord, M., A syntactic filter on pronominal anaphora for slot grammar, Proceedings, 28th Annual Meeting, Association for Computational Linguistics (ACL), University of Pittsburgh, ACL, 1990.
Sheila Tejada, Craig A. Knoblock and Steven Minton, Learning Object Identification Rules For Information Integration, Information Systems Vol. 26, No. 8, 2001, pp. 607–633.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag London Limited
About this paper
Cite this paper
Varadarajan, S., Kasravi, K., Feldman, R. (2003). Text-Mining: Application Development Challenges. In: Macintosh, A., Ellis, R., Coenen, F. (eds) Applications and Innovations in Intelligent Systems X. Springer, London. https://doi.org/10.1007/978-1-4471-0649-4_17
Download citation
DOI: https://doi.org/10.1007/978-1-4471-0649-4_17
Publisher Name: Springer, London
Print ISBN: 978-1-85233-673-8
Online ISBN: 978-1-4471-0649-4
eBook Packages: Springer Book Archive