Identification of Conclusive Association Entities by Biomedical Association Mining

  • Rey-Long LiuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11431)


Conclusive association entities (CAEs) in the title and the abstract of an article a are those biomedical entities (e.g., genes, diseases, and chemicals) that are specific targets on which conclusive findings about their associations are reported in a. Identification of the CAEs is essential for the analysis of conclusive associations, which is a task routinely conducted by many biomedical scientists. However, CAE identification is challenging, as it is difficult to identify the specific entities and then estimate how conclusive the findings on the entities are. In this paper we present an association mining technique to improve CAE identification. The technique is based on a hypothesis: two candidate entities in an article are likely to be CAEs of the article if a strong association between them is mined from a collection of articles. Experimental results show that, by integrating the technique with representative keyword identification indicators, CAE identification can be significantly improved. The results are of technical and practical significance to the indexing, curation, and exploration of conclusive associations reported in biomedical literature.


Biomedical literature Conclusive association entity Association mining 



This research was supported by Ministry of Science and Technology, Taiwan (grant ID: MOST 107-2221-E-320-004).


  1. 1.
    Arighi, C.N., et al.: BioCreative III interactive task: an overview. BMC Bioinform. 12(Suppl. 8), S4 (2011)CrossRefGoogle Scholar
  2. 2.
    Aronson, A.R.: The MMI Ranking Function (1997). Accessed May 2018
  3. 3.
    Boyack, K.W., et al.: Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches. PLoS ONE 6(3), e18029 (2011)CrossRefGoogle Scholar
  4. 4.
    Davis, A.P., et al.: The comparative toxicogenomics database: update 2017. Nucleic Acids Res. 45(Database issue), D972–D978 (2017)Google Scholar
  5. 5.
    Frijters, R., van Vugt, M., Smeets, R., van Schaik, R., de Vlieg, J., Alkema, W.: Literature mining for the discovery of hidden connections between drugs, genes diseases. PLoS Comput. Biol. 6(9), e1000943 (2010). Scholar
  6. 6.
    Heo, G.E., Kang, K.Y., Song, M.: A flexible text mining system for entity and relation extraction in PubMed. In: Proceedings of DTMBIO 2015 (2015)Google Scholar
  7. 7.
    Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of ACM SIGKDD, Edmonton, Alberta, Canada, pp. 133–142 (2002)Google Scholar
  8. 8.
    Kim, J., So, S, Lee, H.J., Park, J.C., Kim, J.J., Lee, H.: DigSee: disease gene search engine with evidence sentences (version cancer). Nucleic Acids Res. 41(Web Server issue), W510–W517 (2013). Scholar
  9. 9.
    Kwon, K., Choi, C.H., Lee, J., Jeong, J., Cho, W.S.: A graph based representative keywords extraction model from news articles. In: Proceedings of the 2015 International Conference on Big Data Applications and Services, pp. 30–36 (2015)Google Scholar
  10. 10.
    Li, L., Liu, S., Qin, M., Wang, Y., Huang, D.: Extracting biomedical event with dual decomposition integrating word embeddings. IEEE/ACM Trans. Comput. Biol. Bioinform. 13(4), 669–677 (2016)Google Scholar
  11. 11.
    Liu, R.-L., Huang, Y.-C.: Ranker enhancement for proximity-based ranking of biomedical texts. J. Am. Soc. Inf. Sci. Technol. 62(12), 2479–2495 (2011)CrossRefGoogle Scholar
  12. 12.
    Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. Int. J. Artif. Intell. Tools 13(01), 157–169 (2004)CrossRefGoogle Scholar
  13. 13.
    Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2004)Google Scholar
  14. 14.
    Mork, J., Aronson, A., Demner-Fushman, D.: 12 years on - Is the NLM medical text indexer still useful and relevant? J. Biomed. Semant. 8, 8 (2017)CrossRefGoogle Scholar
  15. 15.
    Özgür, A., Vu, T., Erkan, G., Radev, D.R.: Identifying gene-disease associations using centrality on a literature mined gene-interaction network. Bioinformatics 24(13), i277–i285 (2008)CrossRefGoogle Scholar
  16. 16.
    PubMed: Algorithm for finding best matching citations in PubMed. Accessed September 2018
  17. 17.
    Shah, P.K., Perez-Iratxeta, C., Bork, P., Andrade, M.A.: Information extraction from full text scientific articles: where are the keywords? BMC Bioinform. 4, 20 (2003)CrossRefGoogle Scholar
  18. 18.
    Thomas, J.R., Bharti, S.K., Babu, K.S.: Automatic keyword extraction for text summarization in e-Newspapers. In: Proceedings of ICIA-16 (2016)Google Scholar
  19. 19.
    Thuy Phan, T.T., Ohkawa, T.: Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features. BMC Bioinform. 17(Suppl 7), 246 (2016)CrossRefGoogle Scholar
  20. 20.
    Tsatsaronis, G., et al.: An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinform. 16, 138 (2015)Google Scholar
  21. 21.
    Tudor, C.O., Schmidt, C.J., Vijay-Shanker, K.: eGIFT: mining gene information from the literature. BMC Bioinform. 11, 418 (2010)CrossRefGoogle Scholar
  22. 22.
    Wiegers, T.C., Davis, A.P., Cohen, K.B., Hirschman, L., Mattingly, C.J.: Text mining and manual curation of chemical-gene-disease networks for the comparative toxicogenomics database (CTD). BMC Bioinform. 10, 326 (2009)CrossRefGoogle Scholar
  23. 23.
    Žitnik, S., Žitnik, M., Zupan, B., Bajec, M.: Sieve-based relation extraction of gene regulatory networks from biological literature. BMC Bioinform. 16(Suppl. 16), S1 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Medical InformaticsTzu Chi UniversityHualienTaiwan

Personalised recommendations