Advertisement

MeSHLabeler and DeepMeSH: Recent Progress in Large-Scale MeSH Indexing

  • Shengwen Peng
  • Hiroshi Mamitsuka
  • Shanfeng Zhu
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1807)

Abstract

The US National Library of Medicine (NLM) uses the Medical Subject Headings (MeSH) (seeNote 1 ) to index almost all 24 million citations in MEDLINE, which greatly facilitates the application of biomedical information retrieval and text mining. Large-scale automatic MeSH indexing has two challenging aspects: the MeSH side and citation side. For the MeSH side, each citation is annotated by only 12 (on average) out of all 28,000 MeSH terms. For the citation side, all existing methods, including Medical Text Indexer (MTI) by NLM, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. To solve these two challenges, we developed the MeSHLabeler and DeepMeSH. By utilizing “learning to rank” (LTR) framework, MeSHLabeler integrates multiple types of information to solve the challenge in the MeSH side, while DeepMeSH integrates deep semantic representation to solve the challenge in the citation side. MeSHLabeler achieved the first place in both BioASQ2 and BioASQ3, and DeepMeSH achieved the first place in both BioASQ4 and BioASQ5 challenges. DeepMeSH is available at http://datamining-iip.fudan.edu.cn/deepmesh.

Key words

MeSH indexing Text categorization Multi-label classification Medical subject headings MEDLINE Machine learning 

Notes

Acknowledgments

This work has been partially supported by National Natural Science Foundation of China (Grant Nos: 61572139), MEXT KAKENHI #16H02868 and FiDiPro by Tekes.

References

  1. 1.
    Aronson AR, Mork JG, Gay CW, Humphrey SM, Rogers WJ (2004) The NLM indexing initiativeś medical text indexer. Stud Health Technol Inform 107(Pt 1):268–272PubMedGoogle Scholar
  2. 2.
    Stokes N, Li Y, Cavedon L, Zobel J (2010) Exploring criteria for successful query expansion in the genomic domain. Inf Retr 12:17–50CrossRefGoogle Scholar
  3. 3.
    Lu Z, Kim W, Wilbur WJ (2010) Evaluation of query expansion using MeSH in PubMed. Inf Retr 12:69–80CrossRefGoogle Scholar
  4. 4.
    Zhu S, Takigawa I, Zeng J, Mamitsuka H (2009) Field independent probabilistic model for clustering multi-field documents. Inf Process Manage 45(5):555–570CrossRefGoogle Scholar
  5. 5.
    Zhu S, Zeng J, Mamitsuka H (2009) Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity. Bioinformatics 25(15):1944–1951CrossRefPubMedGoogle Scholar
  6. 6.
    Gu J, Feng W, Zeng J, Mamitsuka H, Zhu S (2013) Efficient semisupervised MEDLINE document clustering with MeSH-semantic and global-content constraints. IEEE Trans Cybernetics 43(4):1265–1276CrossRefGoogle Scholar
  7. 7.
    Zhou J, Shui Y, Peng S, Li X, Mamitsuka H, Zhu S (2015) MeSHSim: An R/Bioconductor package for measuring semantic similarity over MeSH headings and MEDLINE documents. J Bioinform Comput Biol 13(6):1542002CrossRefPubMedGoogle Scholar
  8. 8.
    Huang X, Zheng X, Yuan W, Wang F, Zhu S (2011) Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization. Inform Sci 181(11):2293–2302CrossRefGoogle Scholar
  9. 9.
    Mork JG, Jimeno-Yepes A, Aronson AR (2013) The NLM medical text indexer system for indexing biomedical literature. BioASQ@ CLEFGoogle Scholar
  10. 10.
    Demner-Fushman D, Mork JG (2016) A report to the board of Scientific Counselors, April 2016Google Scholar
  11. 11.
    Mork JG, Demner-Fushman D, Schmidt S, Aronson AR (2014) Recent Enhancements to the NLM Medical Text Indexer. CLEF (Working Notes), pp 1328–1336Google Scholar
  12. 12.
    Nelson SJ, Schopen M, Savage AG, Schulman JL, Arluk N (2004) The MeSH translation maintenance system: structure, interface design, and implementation. Medinfo 11:67–69Google Scholar
  13. 13.
    Aronson AR, Lang FM (2004) An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 17:229–236CrossRefGoogle Scholar
  14. 14.
    Lin J, Wilbur WJ (2007) PubMed related articles: a probabilistic topic-based model for content similarity. BMC Bioinformatics 8:423CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Partalas I, Gaussier É, Ngomo ACN et al. (2013) Results of the first BioASQ Workshop. BioASQ@ CLEFGoogle Scholar
  16. 16.
    Tsatsaronis G et al (2015) An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics 16:138CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Balikas G, Partalas I, Ngomo AN, Krithara A, Paliouras G (2014) Results of the BioASQ track of the question answering lab at CLEF 2014. CLEF (Working Notes), pp 1181–1193Google Scholar
  18. 18.
    Tsoumakas G, Laliotis M, Markantonatos N, Vlahavas IP (2013) Large-scale semantic indexing of biomedical publications. BioASQ@ CLEFGoogle Scholar
  19. 19.
    Mao Y, Lu Z (2013) NCBI at the 2013 BioASQ challenge task: learning to rank for automatic MeSH indexing. BioASQ@ CLEFGoogle Scholar
  20. 20.
    Liu K, Peng S, Wu J, Zhai C, Mamitsuka H, Zhu S (2015) MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence. Bioinformatics 12:i339–i347CrossRefGoogle Scholar
  21. 21.
    Peng S, You R, Wang H, Zhai C, Mamitsuka H, Zhu S (2016) DeepMeSH: deep semantic representation for improving large-scale MeSH indexing. Bioinformatics 32(12):i70–i79CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Peng S, You R, Xie Z, Wang B, Zhang Y, Zhu S (2015) The Fudan participation in the 2015 BioASQ challenge: large-scale biomedical semantic indexing and question answering. CLEF (Working Notes)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Shengwen Peng
    • 1
    • 2
  • Hiroshi Mamitsuka
    • 3
    • 4
  • Shanfeng Zhu
    • 1
    • 2
    • 5
  1. 1.School of Computer ScienceFudan UniversityShanghaiChina
  2. 2.Shanghai Key Lab of Intelligent Information ProcessingFudan UniversityShanghaiChina
  3. 3.Bioinformatics Center, Institute for Chemical ResearchKyoto UniversityKyotoJapan
  4. 4.Department of Computer ScienceAalto UniversityEspooFinland
  5. 5.Center for Computational System BiologyFudan UniversityShanghaiChina

Personalised recommendations