Mining Cell Cycle Literature Using Support Vector Machines

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7297)


While biomedical literature is rapidly increasing, text classification remains a challenge for researchers, curators and librarians. In the context of this work, we use the Caipirini ( service to report on the exploration of a literature corpus related to the G1, S, G2 and M phases of the human cell cycle respectively. We use Support Vector Machines (SVMs) and a well-studied dataset to compare each of the cell cycle phases against all others in order to find abstracts that are related to one specific phase at a time. Finally we measure the performance of the results using the standard accuracy, precision and recall metrics. We find differences between the results of each of the four phases and we compare with previous findings of relevant work. We conclude that the results concur and help interpreting the observed classification performance.


supervised machine learning biomedical literature cell cycle support vector machines 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Krallinger, M., Valencia, A.: Text-mining and information-retrieval services for molecular biology. Genome Biol. 6(7), 224 (2005), doi:10.1186/gb-2005-6-7-224CrossRefGoogle Scholar
  2. 2.
    Krallinger, M., Erhardt, R.A., Valencia, A.: Text-mining approaches in molecular biology and biomedicine. Drug Discov. Today 10(6), 439–445 (2005), doi:10.1016/S1359-6446(05)03376-3CrossRefGoogle Scholar
  3. 3.
    Lewis, J., Ossowski, S., Hicks, J., Errami, M., Garner, H.R.: Text similarity: an alternative way to search MEDLINE. Bioinformatics 22(18), 2298–2304 (2006), doi:btl388Google Scholar
  4. 4.
    Goetz, T., von der Lieth, C.-W.: PubFinder: a tool for improving retrieval rate of relevant PubMed abstracts. Nucleic Acids Res. 33, W774–W778 (2005)Google Scholar
  5. 5.
    Poulter, G.L., Rubin, D.L., Altman, R.B., Seoighe, C.: MScanner: a classifier for retrieving Medline citations. Bioinformatics 9, 108 (2008), doi:1471-2105-9-108Google Scholar
  6. 6.
    Tuchler, T., Velez, G., Graf, A., Kreil, D.P.: BibGlimpse: the case for a light-weight reprint manager in distributed literature research. BMC Bioinformatics 9, 406 (2008), doi:1471-2105-9-406Google Scholar
  7. 7.
    Nobata, C., Cotter, P., Okazaki, N., Rea, B., Sasak1, Y., Tsuruoka, Y., Tsujii, J.I., Ananiadou, S.: Kleio: A Knowledge-enriched Information Retrieval System for Biology. In: 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, pp. 787–788. Association for Computing Machinery (2008)Google Scholar
  8. 8.
    Fontaine, J.F., Barbosa-Silva, A., Schaefer, M., Huska, M.R., Muro, E.M., Andrade-Navarro, M.A.: MedlineRanker: flexible ranking of biomedical literature. Nucleic Acids Res. 37(Web Server issue), W141–W146 (2009), doi:gkp353Google Scholar
  9. 9.
    Soldatos, T.G., O’Donoghue, S.I., Satagopam, V.P., Barbosa-Silva, A., Pavlopoulos, G.A., Wanderley-Nogueira, A.C., Soares-Cavalcanti, N.M., Schneider, R.: Caipirini: using gene sets to rank literature. BioData Mining 5(1), 1 (2012), doi:10.1186/1756-0381-5-1CrossRefGoogle Scholar
  10. 10.
    Soldatos, T., O’Donoghue, S.I., Satagopam, V.P., Brown, N.P., Jensen, L.J., Schneider, R.: Martini: using literature keywords to compare gene sets. Nucleic Acid Res. 38(1), 26–38 (2010), doi:10.1093/nar/gkp876Google Scholar
  11. 11.
    Jensen, L.J., Jensen, T.S., de Lichtenberg, U., Brunak, S., Bork, P.: Co-evolution of transcriptional and post-translational cell-cycle regulation. Nature 443(7111), 594–597 (2006), doi:10.1038/nature05186Google Scholar
  12. 12.
  13. 13.
  14. 14.
  15. 15.
    Fan, R.-E., Chang, K.W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)zbMATHGoogle Scholar
  16. 16.
    Medical Subject Headings (MeSH) Fact sheet. In: National Library of Medicine (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Life Biosystems GmbHHeidelbergGermany
  2. 2.ESAT-SCD / IBBT-K.U.Leuven Future Health DepartmentKatholieke Universiteit LeuvenLeuvenBelgium

Personalised recommendations