Annotating Patents with Medline MeSH Codes via Citation Mapping

  • Thomas D. GriffinEmail author
  • Stephen K. Boyer
  • Isaac G. Councill
Conference paper
Part of the Advances in Experimental Medicine and Biology book series (AEMB, volume 680)


Both patents and Medline are important document collections for discovering new relationships between chemicals and biology, searching for prior art for patent applications and retrieving background knowledge for current research activities. Finding relevance to a topic within patents is often made difficult by poor categorization, badly written descriptions, and even intentional obfuscation. Unlike patents, the Medline corpus has Medical Subject Heading (MeSH) keywords manually added to their articles, giving a medically relevant taxonomy to the 18 million article abstracts. Our work attempts to accurately recognize the citations made in patents to Medline-indexed articles, linking them to their corresponding PubMed ID and exploiting the associated MeSH to enhance patent search by annotating the referencing patents with their Medline citations’ MeSH codes. The techniques, system features, and benefits are explained.


MeSH Pubmed Patent Medline Citation 


  1. 1.
    Weaver, D. Don’t overlook the rigorously reviewed novel work in patents. Nature, Vol 461, No 17. Sept 2009.Google Scholar
  2. 2.
  3. 3.
    Introduction to MeSH – 2010.
  4. 4.
    Rak, R., Kurgan, L., and Reformat, M. Multilabel Associative Classification Categorization of MEDLINE Articles into MeSH Keywords. IEEE Engineering in Medicine and Biology Magazine. pp. 47–55. Mar/Apr 2007.Google Scholar
  5. 5.
    Chen, Y., Spangler, W. S., He, B., Behal, A., Kato, L., Griffin, T., Alba, A., Kreulen, J., Boyer, S., Zhang, L., Wu, X., and Kieliszewski, C. SIMPLE: A Strategic Information Mining Platform for IP Excellence. 1st Workshop on Large-scale Data Mining: Theory and Applications in Conjunction with ICDM09. Miami, Florida, USA. Dec. 2009.Google Scholar
  6. 6.
    Apache Solr Project.
  7. 7.
    Hasan, M., Spangler, W. S., Griffin, T. D., and Alba, A. “COA: Finding Novel Patents through Text Analysis”, Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France 2009, pp. 1175–1184, 2009.Google Scholar
  8. 8.
    Angell, R., Boyer, S., Cooper, J., Hennessy, R., Kanungo, T., Kreulen, J., Martin, D., Rhodes, J., Spangler, W. S., and Weintraub, H. System and method for annotating patents with MeSH data, US Patent Application, Publication number US20070112833. May 17, 2007.Google Scholar
  9. 9.
    Councill, I. G., Lee Giles, C., and Kan, M.-Y. “ParsCit: An open-source CRF reference string parsing package”, In Proceedings of the Language Resources and Evaluation Conference (LREC 08), Marrakesh, Morrocco, May 2008.Google Scholar
  10. 10.
    ParsCite: An open-source CDF Reference String Parsing Package,

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Thomas D. Griffin
    • 1
    Email author
  • Stephen K. Boyer
  • Isaac G. Councill
  1. 1.IBM Almaden Research CenterSan JoseUSA

Personalised recommendations