TEA: A Text Analysis Tool for the Intelligent Text Document Filtering

Žižka, Jan; Bourek, Aleš; Frey, Luděk

doi:10.1007/3-540-45323-7_26

Jan Žižka³,
Aleš Bourek⁴ &
Luděk Frey³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1902))

Included in the following conference series:

International Workshop on Text, Speech and Dialogue

372 Accesses
5 Citations

Abstract

This paper describes results achieved with a text-document classification tool TEA (TExt Analyzer) based on the naïve Bayes algorithm. TEA provides also a set of additional functions, which can assist users at fine-tuning the text classifiers and improving the classification accuracy, mainly through modifications of dictionaries generated during the training phase. Experiments, described in the paper, aimed at supporting work with medical unstructured text documents downloaded from the Internet. Good and stable results (around 97% of the classification accuracy) were achieved for selecting documents in a certain area of interest among a large number of documents from different areas.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Domingos, P., Pazzani, M.: Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier. Proceedings of the 13th International Conference on Machine Learning ICML’96 (1996), 105–112.
Google Scholar
Frey, L.: Automatic Filtration of Text Documents by Machine Learning Methods. Master Thesis, Faculty of Informatics, Masaryk University in Brno (2000). In Czech.
Google Scholar
Joachims, T.: A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Proceedings of the 14th International Conference on Machine Learning ICML’97 (1997), 143–151.
Google Scholar
Lewis, D.D.: Naïve (Bayes) at Forty: The Independence Assumption in Information Retrieval. Proceedings of the 10th European Conference on Machine Learning ECML’ 98. Springer Verlag Berlin Heidelberg New York (1998), 4–15.
Chapter Google Scholar
Special Issue of Machine Learning Journal on Information Retrieval. Machine Learning Journal, Vol. 39, No. 2/3, May/June, 2000. Kluwer Academic Publishers (2000).
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997).
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technologies Faculty of Informatics, Masaryk University in Brno, Botanická 68a, 602 00, Brno, Czech Republic
Jan Žižka & Luděk Frey
Department of Biophysics Faculty of Medicine, Masaryk University in Brno, Joštova 10, 662 43, Brno, Czech Republic
Aleš Bourek

Authors

Jan Žižka
View author publications
You can also search for this author in PubMed Google Scholar
Aleš Bourek
View author publications
You can also search for this author in PubMed Google Scholar
Luděk Frey
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics Department of Programming Systems and Communication, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Department of Information Technologies, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Ivan Kopeček & Karel Pala &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Žižka, J., Bourek, A., Frey, L. (2000). TEA: A Text Analysis Tool for the Intelligent Text Document Filtering. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2000. Lecture Notes in Computer Science(), vol 1902. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45323-7_26

Download citation

DOI: https://doi.org/10.1007/3-540-45323-7_26
Published: 15 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41042-3
Online ISBN: 978-3-540-45323-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics