Skip to main content

Disentangling the Structure of Tables in Scientific Literature

Part of the Lecture Notes in Computer Science book series (LNISA,volume 9612)

Abstract

Within the scientific literature, tables are commonly used to present factual and statistical information in a compact way, which is easy to digest by readers. The ability to “understand” the structure of tables is key for information extraction in many domains. However, the complexity and variety of presentation layouts and value formats makes it difficult to automatically extract roles and relationships of table cells. In this paper, we present a model that structures tables in a machine readable way and a methodology to automatically disentangle and transform tables into the modelled data structure. The method was tested in the domain of clinical trials: it achieved an F-score of 94.26 % for cell function identification and 94.84 % for identification of inter-cell relationships.

Keywords

  • Table mining
  • Text mining
  • Data management
  • Data modelling
  • Natural language processing

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-41754-7_14
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   64.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-41754-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   84.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.

Notes

  1. 1.

    http://www.ncbi.nlm.nih.gov/pmc/.

References

  1. Alley, M.: The Craft of Scientific Writing. Springer Science & Business Media, New York (1996)

    CrossRef  Google Scholar 

  2. Attwood, T.K., Kell, D.B., McDermott, P., Marsh, J., Pettifer, S., Thorne, D.: Utopia documents: linking scholarly literature with research data. Bioinformatics 26(18), i568–i574 (2010)

    CrossRef  Google Scholar 

  3. Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(Suppl 1), D267–D270 (2004)

    CrossRef  Google Scholar 

  4. Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. Proc. VLDB Endowment 1(1), 538–549 (2008)

    CrossRef  Google Scholar 

  5. Chavan, M.M., Shirgave, S.: A methodology for extracting head contents from meaningful tables in web pages. In: 2011 International Conference on Communication Systems and Network Technologies (CSNT), pp. 272–277. IEEE (2011)

    Google Scholar 

  6. Divoli, A., Wooldridge, M.A., Hearst, M.A.: Full text and figure display improves bioscience literature search. PloS One 5(4), e9619 (2010)

    CrossRef  Google Scholar 

  7. Doush, I.A., Pontelli, E.: Non-visual navigation of spreadsheets. Univ. Access Inf. Soc. 12(2), 143–159 (2013)

    CrossRef  Google Scholar 

  8. Hearst, M.A., Divoli, A., Guturu, H., Ksikes, A., Nakov, P., Wooldridge, M.A., Ye, J.: Biotext search engine: beyond abstract search. Bioinformatics 23(16), 2196–2197 (2007)

    CrossRef  Google Scholar 

  9. Hu, J., Kashi, R., Lopresti, D., Wilfong, G.: A system for understanding and reformulating tables. In: Proceedings of the Fourth IAPR International Workshop on Document Analysis Systems, pp. 361–372 (2000)

    Google Scholar 

  10. Hurst, M.F.: The interpretation of tables in texts. Ph.D. Thesis, University of Edinburgh (2000)

    Google Scholar 

  11. Jensen, L.J., Saric, J., Bork, P.: Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 7(2), 119–129 (2006)

    CrossRef  Google Scholar 

  12. Jung, S.W., Kwon, H.C.: A scalable hybrid approach for extracting head components from web tables. IEEE Trans. Knowl. Data Eng. 18(2), 174–187 (2006)

    CrossRef  Google Scholar 

  13. Kieninger, T., Dengel, A.R.: The T-Recs table recognition and analysis system. In: Lee, S.-W., Nakano, Y. (eds.) DAS 1998. LNCS, vol. 1655, pp. 255–270. Springer, Heidelberg (1999)

    CrossRef  Google Scholar 

  14. Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proc. VLDB Endowment 3(1–2), 1338–1347 (2010)

    CrossRef  Google Scholar 

  15. Milosevic, N., Gregson, C., Hernandez, R., Nenadic, G.: Extracting patient data from tables in clinical literature: Case study on extraction of BMI, weight and number of patients. In: Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2016), vol. 5, pp. 223–228 (2016)

    Google Scholar 

  16. Mulwad, V., Finin, T., Syed, Z., Joshi, A.: Using linked data to interpret tables. In: Proceedings of the First International Conference on Consuming Linked Data, vol. 665, pp. 109–120. CEUR-WS.org (2010)

    Google Scholar 

  17. Ng, H.T., Lim, C.Y., Koo, J.L.T.: Learning to recognize tables in free text. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 443–450. ACL (1999)

    Google Scholar 

  18. Quercini, G., Reynaud, C.: Entity discovery and annotation in tables. In: Proceedings of the 16th International Conference on Extending Database Technology, pp. 693–704. ACM (2013)

    Google Scholar 

  19. Son, J.W., Lee, J.A., Park, S.B., Song, H.J., Lee, S.J., Park, S.Y.: Discriminating meaningful web tables from decorative tables using a composite kernel. In: 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2008, vol. 1, pp. 368–371. IEEE (2008)

    Google Scholar 

  20. Spasić, I., Livsey, J., Keane, J.A., Nenadić, G.: Text mining of cancer-related information: review of current status and future directions. Int. J. Med. Inf. 83(9), 605–623 (2014)

    CrossRef  Google Scholar 

  21. Tengli, A., Yang, Y., Ma, N.L.: Learning table extraction from examples. In: Proceedings of the 20th International Conference on Computational Linguistics, pp. 987–994. ACL (2004)

    Google Scholar 

  22. Wang, Y., Hu, J.: A machine learning based approach for table detection on the web. In: Proceedings of the 11th International Conference on World Wide Web, pp. 242–250. ACM (2002)

    Google Scholar 

  23. Wei, X., Croft, B., McCallum, A.: Table extraction for answer retrieval. Inf. Retrieval 9(5), 589–611 (2006)

    CrossRef  Google Scholar 

  24. Wong, W., Martinez, D., Cavedon, L.: Extraction of named entities from tables in gene mutation literature. In: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, pp. 46–54. ACL (2009)

    Google Scholar 

  25. Yesilada, Y., Stevens, R., Goble, C., Hussein, S.: Rendering tables in audio: the interaction of structure and reading styles. In: ACM SIGACCESS Accessibility and Computing, pp. 16–23. No. 77–78. ACM (2004)

    Google Scholar 

  26. Yildiz, B., Kaiser, K., Miksch, S.: pdf2table: a method to extract table information from pdf files. In: IICAI, pp. 1773–1785 (2005)

    Google Scholar 

  27. Zhu, F., Patumcharoenpol, P., Zhang, C., Yang, Y., Chan, J., Meechai, A., Vongsangnak, W., Shen, B.: Biomedical text mining and its applications in cancer research. J. Biomed. Inf. 46(2), 200–211 (2013)

    CrossRef  Google Scholar 

Download references

Acknowledgments

This research is funded by a doctoral funding grant from the Engineering and Physical Sciences Research Council (EPSRC) and AstraZeneca Ltd.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikola Milosevic .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Milosevic, N., Gregson, C., Hernandez, R., Nenadic, G. (2016). Disentangling the Structure of Tables in Scientific Literature. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds) Natural Language Processing and Information Systems. NLDB 2016. Lecture Notes in Computer Science(), vol 9612. Springer, Cham. https://doi.org/10.1007/978-3-319-41754-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41754-7_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41753-0

  • Online ISBN: 978-3-319-41754-7

  • eBook Packages: Computer ScienceComputer Science (R0)