Identification of Biomedical Articles with Highly Related Core Contents

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10191)

Abstract

Given a biomedical article a, identification of those articles with similar core contents (including research goals, backgrounds, and conclusions) as a is essential for the survey and cross-validation of the highly related biomedical evidence presented in a. We thus present a technique CCSE (Core Content Similarity Estimation) that retrieves these highly related articles by estimating and integrating three kinds of inter-article similarity: goal similarity, background similarity, and conclusion similarity. CCSE works on titles and abstracts of biomedical articles, which are publicly available. Experimental results show that CCSE performs better than PubMed (a popular biomedical search engine) and typical techniques in identifying those scholarly articles that are judged (by biomedical experts) to be the ones whose core contents focus on the same gene-disease associations. The contribution is essential for the retrieval, clustering, mining, and validation of the biomedical evidence in literature.

Keywords

Biomedical article Highly-Related evidence Core content Inter-Article similarity estimation 

Notes

Acknowledgment

This research was supported by the Ministry of Science and Technology (grant ID: MOST 105-2221-E-320-004) and Tzu Chi University (grant IDs: TCRPP103020 and TCRPP104010), Taiwan.

References

  1. 1.
    Aljaber, B., Stokes, N., Bailey, J., Pei, J.: Document clustering of scientific texts using citation contexts. Inf. Retrieval 13(2), 101–131 (2010)CrossRefGoogle Scholar
  2. 2.
    Becker, K.G., Barnes, K.C., Bright, T.J., Wang, S.A.: The genetic association database. Nat. Genet. 36(5), 431–432 (2004)CrossRefGoogle Scholar
  3. 3.
    Boyack, K.W., Newman, D., Duhon, R.J., Klavans, R., Patek, M., Biberstine, J.R., et al.: Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches. PLoS ONE 6(3), e18029 (2011)CrossRefGoogle Scholar
  4. 4.
    Boyack, K.W., Klavans, R.: Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? J. Am. Soc. Inform. Sci. Technol. 61(12), 2389–2404 (2010)CrossRefGoogle Scholar
  5. 5.
    Calado, P., Cristo, M., Moura, E., Ziviani, N., Ribeiro-Neto, B., Goncalves, M.A.: Combining link-based and content-based methods for web document classification. In: Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management, New Orleans, Louisiana, USA (2003)Google Scholar
  6. 6.
    Couto, T., Cristo, M., Gonçalves, M.A., Calado, P., Nivio Ziviani, N., Moura, E., Ribeiro-Neto, B.: A comparative study of citations and links in document classification. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 75–84 (2006)Google Scholar
  7. 7.
    Gipp, B., Beel, J.: Citation proximity analysis (CPA) – a new approach for identifying related work based on co-citation analysis. In: Proceedings of the 12th International Conference on Scientometrics and Informetrics, vol. 2, pp. 571–575 (2009)Google Scholar
  8. 8.
    Janssens, F., Glänzel, W., De Moor, B.: A hybrid mapping of information science. Scientometrics 75(3), 607–631 (2008)CrossRefGoogle Scholar
  9. 9.
    Kessler, M.M.: Bibliographic coupling between scientific papers. Am. Doc. 14(1), 10–25 (1963)CrossRefGoogle Scholar
  10. 10.
    Lin, J., Wilbur, W.J.: PubMed related articles: a probabilistic topic-based model for content similarity. BMC Bioinformatics 8, 423 (2007)CrossRefGoogle Scholar
  11. 11.
    Liu, R.-L.: Citation-based extraction of core contents from biomedical articles. In: Proceedings of the 29th International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA/AIE 2016), pp. 217–228 (2016)Google Scholar
  12. 12.
    Liu, R.-L.: Passage-based bibliographic coupling: an inter-article similarity measure for biomedical articles. PLoS ONE 10(10), e0139245 (2015)CrossRefGoogle Scholar
  13. 13.
    PubMed: Computation of Related Citations. http://www.ncbi.nlm.nih.gov/books/NBK3827/#pubmedhelp.Computation_of_Similar_Articl. Accessed: Nov 2014
  14. 14.
    Robertson, S.E., Walker, S., Beaulieu, M.: Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive. In: proceedings of the 7th Text REtrieval Conference (TREC 7), Gaithersburg, USA, pp. 253–264 (1998)Google Scholar
  15. 15.
    Small, H.G.: Co-citation in the scientific literature: a new measure of relationship between two documents. J. Am. Soc. Inform. Sci. Technol. 24(4), 265–269 (1973)CrossRefGoogle Scholar
  16. 16.
    Wiegers, T.C., Davis, A.P., Cohen, K.B., Hirschman, L., Mattingly, C.J.: Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD). BMC Bioinf. 10, 326 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Medical InformaticsTzu Chi UniversityHualienTaiwan

Personalised recommendations