T-HMM: A Novel Biomedical Text Classifier Based on Hidden Markov Models

Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 294)

Abstract

In this paper, we propose an original model for the classification of biomedical texts stored in large document corpora. The model classifies scientific documents according to their content using information retrieval techniques and Hidden Markov Models.

To demonstrate the efficiency of the model, we present a set of experiments which have been performed on OHSUMED biomedical corpus, a subset of the MEDLINE database, and the Allele and GO TREC corpora. Our classifier is also compared with Naive Bayes, k-NN and SVM techniques.

Experiments illustrate the effectiveness of the proposed approach. Results show that the model is comparable to the SVM technique in the classification of biomedical texts.

Keywords

Hidden Markov Model Text classification Bioinformatics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. In: Waibel, A., Lee, K.-F. (eds.) Readings in Speech Recognition, pp. 267–296. Morgan Kaufmann Publishers Inc., San Francisco (1990)Google Scholar
  2. 2.
    Freitag, D., Mccallum, A.K.: Information extraction with hmms and shrinkage. In: Proceedings of the AAAI 1999 Workshop on Machine Learning for Information Extraction, pp. 31–36 (1999)Google Scholar
  3. 3.
    Leek, T.R.: Information extraction using hidden markov models. Master’s thesis, UC San Diego (1997)Google Scholar
  4. 4.
    Miller, D.R.H., Leek, T., Schwartz, R.M.: A hidden markov model information retrieval system. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999, pp. 214–221. ACM, New York (1999)CrossRefGoogle Scholar
  5. 5.
    Frasconi, P., Soda, G., Vullo, A.: Hidden markov models for text categorization in multi-page documents. Journal of Intelligent Information Systems 18, 195–217 (2002)CrossRefGoogle Scholar
  6. 6.
    Li, K., Chen, G., Cheng, J.: Research on hidden markov model-based text categorization process. International Journal of Digital Content Technology and its Application 5(6), 244–251 (2011)CrossRefGoogle Scholar
  7. 7.
    Yi, K., Beheshti, J.: A hidden markov model-based text classification of medical documents. Journal of Information Science 35(1), 67–81 (2009)CrossRefGoogle Scholar
  8. 8.
    Sebastiani, F.: Text categorization. In: Text Mining and its Applications to Intelligence, CRM and Knowledge Management, pp. 109–129. WIT Press (2005)Google Scholar
  9. 9.
    Nikolaos, T., George, T.: Document classification system based on hmm word map. In: Proceedings of the 5th International Conference on Soft Computing as Transdisciplinary Science and Technology, CSTST 2008, pp. 7–12. ACM, New York (2008)Google Scholar
  10. 10.
    Barros, F.A., Silva, E.F.A., Cavalcante Prudêncio, R.B., Filho, V.M., Nascimento, A.C.A.: Combining text classifiers and hidden markov models for information extraction. International Journal on Artificial Intelligence Tools 18(2), 311–329 (2009)CrossRefGoogle Scholar
  11. 11.
    Hersh, W.R., Buckley, C., Leone, T.J., Hickam, D.H.: Ohsumed: An interactive retrieval evaluation and new large test collection for research. In: SIGIR, pp. 192–201 (1994)Google Scholar
  12. 12.
    Hersh, W., Cohen, A., Yang, J., Bhupatiraju, R.T., Roberts, P., Hearst, M.: Trec 2005 genomics track overview. In: TREC 2005 Notebook, pp. 14–25 (2005)Google Scholar
  13. 13.
    Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman (1999)Google Scholar
  14. 14.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Series in Data Management Sys. Morgan Kaufmann (June 2005)Google Scholar
  15. 15.
    Janecek, A.G., Gansterer, W.N., Demel, M.A., Ecker, G.F.: On the relationship between feature selection and classification accuracy. In: JMLR: Workshop and Conference Proceedings, vol. 4, pp. 90–105 (2008)Google Scholar
  16. 16.
    Viera, A.J., Garrett, J.M.: Understanding interobserver agreement: the kappa statistic. Family Medicine 37(5), 360–363 (2005)Google Scholar
  17. 17.
    Sierra Araujo, B.: Aprendizaje automático: conceptos básicos y avanzados: aspectos prácticos utilizando el software Weka. Pearson Prentice Hall (2006)Google Scholar
  18. 18.
    Iglesias, E.L., Seara Vieira, A., Borrajo, L.: An hmm-based over-sampling technique to improve text classification. Expert Systems with Applications 40(18), 7184–7192 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Computer Science Dept., Escola Superior de Enxeñería InformáticaUniv. of VigoOurenseSpain

Personalised recommendations