Journal of Shanghai Jiaotong University (Science)

, Volume 17, Issue 4, pp 494–499

Web-based biomedical literature mining

  • Jian-fu An (安建福)
  • Hui-ping Xue (薛惠平)
  • ying Chen (陈 瑛)
  • Jian-guo Wu (吴建国)
  • Lu Zhang (章 鲁)


With an upsurge in biomedical literature, using data-mining method to search new knowledge from literature has drawing more attention of scholars. In this study, taking the mining of non-coding gene literature from the network database of PubMed as an example, we first preprocessed the abstract data, next applied the term occurrence frequency (TF) and inverse document frequency (IDF) (TF-IDF) method to select features, and then established a biomedical literature data-mining model based on Bayesian algorithm. Finally, we assessed the model through area under the receiver operating characteristic curve (AUC), accuracy, specificity, sensitivity, precision rate and recall rate. When 1 000 features are selected, AUC, specificity, sensitivity, accuracy rate, precision rate and recall rate are 0.868 3, 84.63%, 89.02%, 86.83%, 89.02% and 98.14%, respectively. These results indicate that our method can identify the targeted literature related to a particular topic effectively.

Key words

Bayesian algorithm term occurrence frequency (TF) and inverse document frequency (IDF) (TFIDF) data-mining 

CLC number

TP 182 Q 31 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Gumus E, Kilic N, Sertbas A, et al. Evaluation of face recognition techniques using PCA, wavelets and SVM [J]. Expert Systems with Applications, 2010, 37(9): 6404–6408.CrossRefGoogle Scholar
  2. [2]
    Barnickel T, Weston J, Collobert R, et al. Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts [J]. PLOS ONE, 2009, 4(7): 1–6.CrossRefGoogle Scholar
  3. [3]
    Kolchinsky A, Abi-Haidar A, Kaur J, et al. Classification of protein-protein interaction full-text documents using text and citation network features [J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2010, 7(3): 400–411.CrossRefGoogle Scholar
  4. [4]
    Marcotte E M, Xenarios I, Eisenberg D. Mining literature for protein-protein interactions [J]. Bioinformatics, 2001, 17(4): 359–363.CrossRefGoogle Scholar
  5. [5]
    Jing L P, Ng M K. Prior knowledge based mining functional modules from Yeast PPI networks with gene ontology [J]. BMC Bioinformatics, 2010, 11(Sup): 53–81.CrossRefGoogle Scholar
  6. [6]
    Ensan L S, Faghankhani M, Javanbakht A, et al. To compare PubMed clinical queries and uptodate in teaching information mastery to clinical residents: A crossover randomized controlled trial [J]. PLOS ONE, 2011, 6(8): 1–7.Google Scholar
  7. [7]
    Lu Z Y, Kim W, Wilbur W J. Evaluating relevance ranking strategies for MEDLINE retrieval [J]. Journal of the American Medical Informatics Association, 2009, 16(1): 32–36.CrossRefGoogle Scholar
  8. [8]
    Chan C L, Ting H W. Constructing a novel mortality prediction model with Bayes theorem and genetic algorithm [J]. Expert Systems with Applications, 2011, 38(7): 7924–7928.CrossRefGoogle Scholar
  9. [9]
    Demler O V, Pencina M J, D’Agostino R B. Equivalence of improvement in area under ROC curve and linear discriminant analysis coefficient under assumption of normality [J]. Statistics in Medicine, 2011, 30(12): 1410–1418.MathSciNetCrossRefGoogle Scholar
  10. [10]
    Fawcett T. An introduction to ROC analysis [J]. Pattern Recognition Letters, 2006, 27: 861–874.CrossRefGoogle Scholar
  11. [11]
    Cohen G, Hilario M, Sax H, et al. Learning from imbalanced data in surveillance of nosocomial infection [J]. Artificial Intelligence in Medicine, 2006, 37(1): 7–18.CrossRefGoogle Scholar

Copyright information

© Shanghai Jiaotong University and Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jian-fu An (安建福)
    • 1
    • 4
  • Hui-ping Xue (薛惠平)
    • 2
  • ying Chen (陈 瑛)
    • 1
  • Jian-guo Wu (吴建国)
    • 3
  • Lu Zhang (章 鲁)
    • 1
  1. 1.Department of Biomedical Engineering, Basic Medical CollegeShanghai Jiaotong University School of MedicineShanghaiChina
  2. 2.Division of Gastroenterology and Hepatology, Renji HospitalShanghai Jiaotong University School of MedicineShanghaiChina
  3. 3.Department of Nuclear Medicine, Renji HospitalShanghai Jiaotong University School of MedicineShanghaiChina
  4. 4.Information and Resource CenterShanghai Jiaotong University School of MedicineShanghaiChina

Personalised recommendations