Abstract
This paper describes results achieved with a text-document classification tool TEA (TExt Analyzer) based on the naïve Bayes algorithm. TEA provides also a set of additional functions, which can assist users at fine-tuning the text classifiers and improving the classification accuracy, mainly through modifications of dictionaries generated during the training phase. Experiments, described in the paper, aimed at supporting work with medical unstructured text documents downloaded from the Internet. Good and stable results (around 97% of the classification accuracy) were achieved for selecting documents in a certain area of interest among a large number of documents from different areas.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Domingos, P., Pazzani, M.: Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier. Proceedings of the 13th International Conference on Machine Learning ICML’96 (1996), 105–112.
Frey, L.: Automatic Filtration of Text Documents by Machine Learning Methods. Master Thesis, Faculty of Informatics, Masaryk University in Brno (2000). In Czech.
Joachims, T.: A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Proceedings of the 14th International Conference on Machine Learning ICML’97 (1997), 143–151.
Lewis, D.D.: Naïve (Bayes) at Forty: The Independence Assumption in Information Retrieval. Proceedings of the 10th European Conference on Machine Learning ECML’ 98. Springer Verlag Berlin Heidelberg New York (1998), 4–15.
Special Issue of Machine Learning Journal on Information Retrieval. Machine Learning Journal, Vol. 39, No. 2/3, May/June, 2000. Kluwer Academic Publishers (2000).
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Žižka, J., Bourek, A., Frey, L. (2000). TEA: A Text Analysis Tool for the Intelligent Text Document Filtering. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2000. Lecture Notes in Computer Science(), vol 1902. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45323-7_26
Download citation
DOI: https://doi.org/10.1007/3-540-45323-7_26
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41042-3
Online ISBN: 978-3-540-45323-9
eBook Packages: Springer Book Archive