Text-Mining: Application Development Challenges

Varadarajan, Sundar; Kasravi, Kas; Feldman, Ronen

doi:10.1007/978-1-4471-0649-4_17

Sundar Varadarajan⁵,
Kas Kasravi⁵ &
Ronen Feldman⁴

53 Accesses
3 Citations

Abstract

This paper reviews the best practices and challenges for project managers and developers involved in implementing text-mining applications. With focus on rule-based information extraction, and references to actual cases, the authors share their experiences from having developed several text-mining applications in diverse industries. First, project management issues are discussed, including a process for capturing business requirements and mapping them into features and linguistic patterns, development of linguistic rules, rule development standards, performance metrics, and an evaluation methodology. Linguistic representations such as sub-syntactic, syntactic, semantic, and application-specific rules are identified. Special emphasis is placed on post-information extraction processing, such as improving the relevance of the extracted information, summarization models, techniques for handling typographical errors, resolution of temporal information, anaphora resolution, and a discussion on shallow vs. full parsing. Lastly, the paper discusses various utilities to help with the development of a text-mining application, such as feature analysis, visualization, source document pre-processing, and rule authoring tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Grishman R., The role of syntax in Information Extraction, In: Advances in Text Processing: Tipster Program Phase H, Morgan Kaufmann, 1996.
Google Scholar
Hobbs, J. (1986), Resolving Pronoun References, In B. J. Grosz, K. Sparck Jones, & B. L. Webber (Eds.), Readings in Natural Language Processing (pp. 339–352), Los Altos, CA: Morgan Kaufmann Publishers, Inc.
Google Scholar
Hobbs, Jerry R., Douglas E. Appelt, John Bear, David Israel, Megumi Kameyama, and Mabry Tyson, “FASTUS: A System for Extracting Information from Text”, Proceedings, Human Language Technology, Princeton, New Jersey, March 1992, pp. 133–137.
Google Scholar
Hobbs, Jerry R., Mark Stickel, Douglas Appelt, and Paul Martin, 1993, “Interpretation as Abduction”, Artificial Intelligence, Vol. 63, Nos. 1–2, pp. 69–142.
Article Google Scholar
Ingria, R. & Stallard, D., A computational mechanism for pronominal reference, Proceedings, 27th Annual Meeting of the Association for Computational Linguistics, Vancouver, 1989.
Google Scholar
Lappin, S. & McCord, M., A syntactic filter on pronominal anaphora for slot grammar, Proceedings, 28th Annual Meeting, Association for Computational Linguistics (ACL), University of Pittsburgh, ACL, 1990.
Google Scholar
Sheila Tejada, Craig A. Knoblock and Steven Minton, Learning Object Identification Rules For Information Integration, Information Systems Vol. 26, No. 8, 2001, pp. 607–633.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

ClearForest Corp., 15 E. 26th St., Suite 1711, New York, NY, 10010, USA
Ronen Feldman
Electronic Data Systems Corp., 5555 New King St., Troy, MI, 48098, USA
Sundar Varadarajan & Kas Kasravi

Authors

Sundar Varadarajan
View author publications
You can also search for this author in PubMed Google Scholar
Kas Kasravi
View author publications
You can also search for this author in PubMed Google Scholar
Ronen Feldman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

International Teledemocracy Centre, Napier University, 10 Colinton Road, Edinburgh, EH10 5DT, UK
Ann Macintosh BSc, CEng
Crew Services Ltd, Fairfield House, Kingston Crescent, Portsmouth, PO2 8AA, UK
Richard Ellis BSc, MSc
Department of Computer Science, University of Liverpool, Liverpool, UK
Frans Coenen PhD

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Varadarajan, S., Kasravi, K., Feldman, R. (2003). Text-Mining: Application Development Challenges. In: Macintosh, A., Ellis, R., Coenen, F. (eds) Applications and Innovations in Intelligent Systems X. Springer, London. https://doi.org/10.1007/978-1-4471-0649-4_17

Download citation

DOI: https://doi.org/10.1007/978-1-4471-0649-4_17
Publisher Name: Springer, London
Print ISBN: 978-1-85233-673-8
Online ISBN: 978-1-4471-0649-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics