Skip to main content

Tagging Sentence Boundaries in Biomedical Literature

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4394))

  • 1508 Accesses

Abstract

Identifying sentence boundaries is an indispensable task for most natural language processing (NLP) systems. While extensive efforts have been devoted to mine biomedical text using NLP techniques, few attempts are specifically targeted at disambiguating sentence boundaries in biomedical literature, which has a number of unique features that can reduce the accuracy of algorithms designed for general English genre significantly. In order to increase the accuracy of sentence boundary identification for biomedical literature, we developed a method using a combination of heuristic and statistical strategies. Our approach does not require part-of-speech taggers or training procedures. Experiments with biomedical test corpora show our system significantly outperforms existing sentence boundary determination algorithms, particularly for full text biomedical literature. Our system is very fast and it should also be easily adaptable for sentence boundary determination in scientific literature from non-biomedical fields.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. PubMed: http://www.ncbi.nlm.nih.gov/entrez (2006)

  2. Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging. Computational Linguistics 21(4), 543–565 (1995)

    MathSciNet  Google Scholar 

  3. Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning Sentences in Parallel Corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA, USA (1991)

    Google Scholar 

  4. Choi, F.Y.Y.: Advances in Domain Independent Linear Text Segmentation. In: Proceedings of NAACL, Seattle, WA, USA (2000)

    Google Scholar 

  5. Nallapati, R., Allan, J.: Capturing Term Dependencies Using a Sentence Tree Based Language Model. In: Proceedings of CIKM ’02 conference, McLean, VA, USA (2002)

    Google Scholar 

  6. Ponte, J.M., Croft, W.B.: Text Segmentation by Topic. In: European Conference on Digital Libraries, Pisa, Italy (1997)

    Google Scholar 

  7. Cheery, L.L., Vesterman, W.: Writing Tools - The STYLE and DICTION Programs. In: 4.4 BSD User’s Supplementary Documents, Computer Science Research Group, Berkeley, CA, USA (1994)

    Google Scholar 

  8. Aberdeen, J., Burger, J., Day, D., Hirschman, L., Robinson, P., Vilain, M.: MITRE: Description of The Alembicsystem Used for MUC-6. In: Proceedings of the 6th message understanding conference, Columbia, MD, USA (1995)

    Google Scholar 

  9. Palmer, D.D., Hearst, M.A.: Adaptive Sentence Boundary Disambiguation. In: Proceedings of the 4th Conference on Applied Natural Language Processing, Stuttgart, Germany (1994)

    Google Scholar 

  10. Humphrey, T.L., Zhou, F.: Period Disambiguation Using a Neural Network. In: International Joint Conference on Neural Networks, Washington, DC, USA (1989)

    Google Scholar 

  11. Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Automatic Extraction of Rules For Sentence Boundary Disambiguation. In: Proceedings of the Workshop in Machine Learning in Human Language Technology, Chania, Greece (1999)

    Google Scholar 

  12. Mikheev, A.: Tagging Sentence Boundaries. In: Proceedings of NAACL, Seattle, WA, USA (2000)

    Google Scholar 

  13. Reynar, J.C., Ratnaparkhi, A.: A Maximum Entropy Approach to Identifying Sentence Boundaries. In: Proceedings of the 5th Conference on Applied Natural Language Processing, Washington, DC, USA (1997)

    Google Scholar 

  14. Humphreys, B.L., Lindberg, D.A.B., M., S.H., O., B.G.: The Unified Medical Language System: An informatics research collaboration. Journal of the American Medical Informatics Association 5(1), 1–11 (1998)

    Article  Google Scholar 

  15. Pruitt, K.D., Maglott, D.R.: RefSeq and LocusLink: NCBI Gene-Centered Resources. Nucleic acids research 29(1), 137–140 (2001)

    Article  Google Scholar 

  16. ISI: Journal Citation Reports (2003), http://www.isinet.com

  17. Aronson, A.R.: Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program. In: Proceedings of AMIA Annual Symposium, Washington, DC, USA (2001)

    Google Scholar 

  18. Xuan, W., Watson, S.J., Akil, H., Meng, F.: Identifying Gene and Protein Names from Biological Texts. In: Proceedings of Computational Systems Bioinformatics, Stanford, CA, USA (2003)

    Google Scholar 

  19. Blaschke, C., A., M.A., Ouzounis, C., Valencia, A.: Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions. In: Proceedings of the AAAI Conference on Intelligent Systems in Molecular Biology, Bethesda, MD, USA (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xuan, W., Watson, S.J., Meng, F. (2007). Tagging Sentence Boundaries in Biomedical Literature. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70939-8_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70938-1

  • Online ISBN: 978-3-540-70939-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics