Skip to main content

TEA: A Text Analysis Tool for the Intelligent Text Document Filtering

  • Conference paper
  • First Online:
Text, Speech and Dialogue (TSD 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1902))

Included in the following conference series:

Abstract

This paper describes results achieved with a text-document classification tool TEA (TExt Analyzer) based on the naïve Bayes algorithm. TEA provides also a set of additional functions, which can assist users at fine-tuning the text classifiers and improving the classification accuracy, mainly through modifications of dictionaries generated during the training phase. Experiments, described in the paper, aimed at supporting work with medical unstructured text documents downloaded from the Internet. Good and stable results (around 97% of the classification accuracy) were achieved for selecting documents in a certain area of interest among a large number of documents from different areas.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Domingos, P., Pazzani, M.: Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier. Proceedings of the 13th International Conference on Machine Learning ICML’96 (1996), 105–112.

    Google Scholar 

  2. Frey, L.: Automatic Filtration of Text Documents by Machine Learning Methods. Master Thesis, Faculty of Informatics, Masaryk University in Brno (2000). In Czech.

    Google Scholar 

  3. Joachims, T.: A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Proceedings of the 14th International Conference on Machine Learning ICML’97 (1997), 143–151.

    Google Scholar 

  4. Lewis, D.D.: Naïve (Bayes) at Forty: The Independence Assumption in Information Retrieval. Proceedings of the 10th European Conference on Machine Learning ECML’ 98. Springer Verlag Berlin Heidelberg New York (1998), 4–15.

    Chapter  Google Scholar 

  5. Special Issue of Machine Learning Journal on Information Retrieval. Machine Learning Journal, Vol. 39, No. 2/3, May/June, 2000. Kluwer Academic Publishers (2000).

    Google Scholar 

  6. Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997).

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Žižka, J., Bourek, A., Frey, L. (2000). TEA: A Text Analysis Tool for the Intelligent Text Document Filtering. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2000. Lecture Notes in Computer Science(), vol 1902. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45323-7_26

Download citation

  • DOI: https://doi.org/10.1007/3-540-45323-7_26

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41042-3

  • Online ISBN: 978-3-540-45323-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics