Incorporating External Information in Bayesian Classifiers Via Linear Feature Transformations

  • Tapio Pahikkala
  • Jorma Boberg
  • Aleksandr Mylläri
  • Tapio Salakoski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4139)


Naive Bayes classifier is a frequently used method in various natural language processing tasks. Inspired by a modified version of the method called the flexible Bayes classifier, we explore the use of linear feature transformations together with the Bayesian classifiers, because it provides us an elegant way to endow the classifier with an external information that is relevant to the task. While the flexible Bayes classifier is based on the idea of using kernel density estimation to obtain the class conditional probabilities of continuously valued attributes, we use the linear transformations to smooth the feature frequency counts of discrete valued attributes. We evaluate the method on the context sensitive spelling error correction problem using the Reuters corpus. For this particular task, we define a positional feature transformation and a word feature transformation that take advantage of the positional information of the context words and the part-of-speech information of words, respectively. Our experimental results show that the performance of the Bayesian classifiers in the natural language disambiguation tasks can be improved with the proposed transformations and that the incorporation of external information via the linear feature transformations is a promising research direction.


Kernel Density Estimation External Information Ambiguous Word Word Sense Disambiguation Positional Transformation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Pahikkala, T., Ginter, F., Boberg, J., Jarvinen, J., Salakoski, T.: Contextual weighting for support vector machines in literature mining: an application to gene versus protein name disambiguation. BMC Bioinformatics 6, 157 (2005)CrossRefGoogle Scholar
  2. 2.
    Pahikkala, T., Pyysalo, S., Ginter, F., Boberg, J., Järvinen, J., Salakoski, T.: Kernels incorporating word positional information in natural language disambiguation tasks. In: Russell, I., Markov, Z. (eds.) Proceedings of the Eighteenth International Florida Artificial Intelligence Research Society Conference, Clearwater Beach, Florida, pp. 442–447. AAAI Press, Menlo Park (2005)Google Scholar
  3. 3.
    Pahikkala, T., Pyysalo, S., Boberg, J., Mylläri, A., Salakoski, T.: Improving the performance of bayesian and support vector classifiers in word sense disambiguation using positional information. In: Honkela, T., Könönen, V., Pöllä, M., Simula, O. (eds.) Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning, Espoo, Finland, Helsinki University of Technology, pp. 90–97 (2005)Google Scholar
  4. 4.
    John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Besnard, P., Hanks, S. (eds.) Proceedings of the Eleventh Annual Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Mateo (1995)Google Scholar
  5. 5.
    Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall PTR, Upper Saddle River (2000)Google Scholar
  6. 6.
    Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Joshi, A., Palmer, M. (eds.) Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, pp. 310–318. Morgan Kaufmann Publishers, San Francisco (1996)Google Scholar
  7. 7.
    Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman & Hall, London (1986)MATHGoogle Scholar
  8. 8.
    Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001)MATHGoogle Scholar
  9. 9.
    Golding, A.R., Roth, D.: A winnow-based approach to context-sensitive spelling correction. Machine Learning 34, 107–130 (1999)MATHCrossRefGoogle Scholar
  10. 10.
    Rose, T.G., Stevenson, M., Whitehead, M.: The Reuters Corpus Volume 1: From yesterday’s news to tomorrow’s language resources. In: Rodriguez, M.G., Araujo, C.P.S. (eds.) Proceedings of the Third International Conference on Language Resources and Evaluation, ELRA, Paris, France (2002)Google Scholar
  11. 11.
    Fawcett, T.: Roc graphs: Notes and practical considerations for data mining researchers. Technical Report HPL-2003-4, HP Labs, Palo Alto, California (2003)Google Scholar
  12. 12.
    Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)CrossRefGoogle Scholar
  13. 13.
    Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41, 391–407 (1990)CrossRefGoogle Scholar
  14. 14.
    Pahikkala, T., Pyysalo, S., Boberg, J., Järvinen, J., Salakoski, T.: Matrix representations, linear transformations, and kernels for natural language processing (submitted, 2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Tapio Pahikkala
    • 1
  • Jorma Boberg
    • 1
  • Aleksandr Mylläri
    • 1
  • Tapio Salakoski
    • 1
  1. 1.Turku Centre for Computer Science (TUCS), Department of Information TechnologyUniversity of TurkuTurkuFinland

Personalised recommendations