Ontology-Based Text Classification for Filtering Cholangiocarcinoma Documents from PubMed

  • Chumsak Sibunruang
  • Jantima Polpinij
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8609)


PubMed is a search engine used to access the MEDLINE database, which comprises the massive amounts of biomedical literature. This an make more difficult for accessing to find the relevant medical literature. Therefore, this problem has been challenging in this work. We present a solution to retrieve the most relevant biomedical literature relating to Cholangiocarcinoma in clinical trials from PubMed. The proposed methodology is called ontology-based text classification (On-TC). We provide an ontology used as a semantic tool. It is called Cancer Technical Term Net (CCT-Net). This ontology is intergrated to the methodology to support automatic semantic interpretation during text processing, especially in the case of synonyms or term variations.


PubMed Ontology CCT-Net Text Classification Cholangiocarcinoma 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    World Health Organization (WHO): GLOBOCAN 2012: Estimated Cancer Incidence, Morlality and Prevalence Worldwide in 2012. International Agency for Research on Cancer (2012),
  2. 2.
    World Health Organization (WHO): Global battle against cancer won’t be won with treatment alone Effective prevention measures urgently needed to prevent cancer crisis. International Agency for Research on Cancer (2014)Google Scholar
  3. 3.
    Hadzic, M., D’Souza, R., Hadzic, F., Dillon, T.: Thinking PubMed: an Innovative System for Mental Health Domain. In: The 21st IEEE International Symposium on Computer-Based Medical Systems, pp. 330–335 (2008)Google Scholar
  4. 4.
    Sawanyawisuth, K.: Genes and Cholangiocarcinoma, Department of Biochemistry, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand (2009),
  5. 5.
    Hwanjo, Y., Taehoon, K., Jinoh, O., Ilhwan, K., Sungchul, K.: Relevance feedback retrieval system of PubMed. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (2009)Google Scholar
  6. 6.
    Hwanjo, Y., Taehoon, K., Jinoh, O., Ilhwan, K., Sungchul, K., Wook-Shin, H.: Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS. BMC Bioinformatics (2010)Google Scholar
  7. 7.
    Vaka, H.G.G., Mukhopadhyay, S.: Knowledge Extraction and Extrapolation Using Ancient and Modern Biomedical Literature. In: International Conference on Advanced Information Networking and Applications Workshops, pp. 996–1001 (2009)Google Scholar
  8. 8.
    Mendonça, E.A., Cimino, J.J.: Automated knowledge extraction from MEDLINE citations. In: Proc. AMIA Symp. (2000)Google Scholar
  9. 9.
    Pustejovsky, J., Castafio, J., Cpchran, B., Kotecki, M., Morrell, M., Rumshisky, A.: Extraction and disambiguation of acronym-meaning pairs in MEDLINE. In: Proceedings of Medical Information (2001)Google Scholar
  10. 10.
    Demner-Fushman, D., Lin, J.: Answering Clinical Questions with Knowledge-Based and Statistical Techniques. Association for Computational Linguistics (2007)Google Scholar
  11. 11.
    Song, S.-K., Oh, H.-S., Myaeng, S.H., Choi, S.-P., Chun, H.-W., Choi, Y.-S., Jeong, C.-H.: Procedural knowledge extraction on MEDLINE abstracts. In: Zhong, N., Callaghan, V., Ghorbani, A.A., Hu, B. (eds.) AMT 2011. LNCS, vol. 6890, pp. 345–354. Springer, Heidelberg (2011)Google Scholar
  12. 12.
    Polpinij, J., Miller, A., Ghose, A.K., Dam, H.K.: Ontology-based Text Analysis Approach to Retrieve Oncology Documents from PubMed Relevant to Cervical Cancer in Clinical Trials. In: Industrial Conference on Data Mining – Workshops, pp. 157–169 (2010)Google Scholar
  13. 13.
    Golbeck, J., Fragoso, G., Hartel, F., Hendler, J., Oberthaler, J., Parsia, B.: National Cancer Institute’s Thesaurus and Ontology. Journal of Web Semantics (2003)Google Scholar
  14. 14.
    Miller, G.A.: WordNet: An online lexical database. Int. J. Lexicograph 3(4), 235–244 (1990)CrossRefGoogle Scholar
  15. 15.
    Dale, R., Moisl, H., Somers, H. (eds.): Handbook of Natural Language Processing. Mercel Dekker Inc., New York (2000)Google Scholar
  16. 16.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval. ACM Press, New York (1999)Google Scholar
  17. 17.
    Polpinij, J., Ghose, A.K.: An Ontology-based Sentiment Classification Methodology for Online Consumer Reviews. In: 2008 IEEE/WIC/ACM International Conference on Web Intelligence (2008)Google Scholar
  18. 18.
    Fox, E.A., Sharat, S.: A comparison of two methods for soft Boolean interpretation in information retrieval. TR-86-1. Virginia Tech. Department of Computer Science (1986)Google Scholar
  19. 19.
    Lee, W.C., Fox, E.A.: Experimental Comparison of Schemes for Interpreting Boolean Queries. TR-88-27. Virginia Tech M.S. Thesis Department of Computer Science (1988)Google Scholar
  20. 20.
    Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)zbMATHGoogle Scholar
  21. 21.
    Yang, Y., Liu, X.: A Re-examination of Text Categorization Methods. In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49 (1999)Google Scholar
  22. 22.
    Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)Google Scholar
  23. 23.
    Baoli, L., Yuzhong, C., Shiwen, Y.: A Comparative Study on Automatic Categorization Methods for Chinese Search Engine. In: Proceedings of the Eighth Joint International Computer Conference, pp. 117–120. Zhejiang University Press, Hangzhou (2000)Google Scholar
  24. 24.
    Baoli, L., Shiwen, Y., Qin, L.: An Improved k-Nearest Neighbor Algorithm for text categorization. In: Proceedigns of 20th International Conference on Computer Processing of Oriental Language (2003)Google Scholar
  25. 25.
    Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: Proceedings of the International Conference on Machine Learning (ICML) (1999)Google Scholar
  26. 26.
    Chang, C.C., Lin, C.J.: LIBSVM: A Library for Support Vector Machines. Department of Computer Science and Information Engineering. National Taiwan University, Taipei, Taiwan (2004)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Chumsak Sibunruang
    • 1
  • Jantima Polpinij
    • 1
  1. 1.Intellect Laboratory, Faculty of InformaticsMahasarakham UniversityMahasarakhamThailand

Personalised recommendations