Advertisement

Table Detection from Slide Images

  • Xiaoyin Che
  • Haojin Yang
  • Christoph Meinel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9431)

Abstract

In this paper we propose a solution to detect tables from slide images. Presentation slides are one type of document with growing importance. But the layout difference between slides and traditional documents makes many existing table detection methods less effective on slides. The proposed solution works with both high-resolution slide images from digital files and low-resolution slide screenshots from videos. By taking OCR (Optical Character Recognition) as initial step, a heuristic analysis on page layout focuses not only on the table structure but also the textual content. The evaluation result shows that the proposed solution achieves an approximate accuracy of 80 %. It is way better than the open-source academic solution Tesseract and also outperforms the commercial software ABBYY FineReader, which is supposed to be one of the best table detection tools.

Keywords

Table detection Slide image Table structure 

References

  1. 1.
    Nathans-Kelly, T., Nicometo, C.G.: Slide rules: design, build, and archive presentations in the engineering and technical fields. IEEE Trans. Prof. Commun. 58(2), 232–235 (2015)CrossRefGoogle Scholar
  2. 2.
    Canós, J.H., Marante, M.I., Llavador, M.: SliDL: a slide digital library supporting content reuse in presentations. In: Lalmas, M., Jose, J., Rauber, A., Sebastiani, F., Frommholz, I. (eds.) ECDL 2010. LNCS, vol. 6273, pp. 453–456. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  3. 3.
    Hill, A., Arford, T., Lubitow, A., Smollin, L.M.: “I’m ambivalent about it" The dilemmas of PowerPoint. Teach. Sociol. 40(3), 242–256 (2012)CrossRefGoogle Scholar
  4. 4.
    Levasseur, D.G., Kanan Sawyer, J.: Pedagogy meets powerpoint: a research review of the effects of computer-generated slides in the classroom. Rev. Commun. 6(1–2), 101–123 (2006)CrossRefGoogle Scholar
  5. 5.
    Fang, J., Gao, L., Bai, K., Qiu, R., Tao, X., Tang, Z.: A table detection method for multipage pdf documents via visual seperators and tabular structures. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 779–783. IEEE (2011)Google Scholar
  6. 6.
    Liu, Y., Bai, K., Mitra, P., Giles, C.L.: Tableseer: automatic table metadata extraction and searching in digital libraries. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 91–100. ACM (2007)Google Scholar
  7. 7.
    Yildiz, B., Kaiser, K., Miksch, S.: Pdf2table: a method to extract table information from pdf files. In: IICAI, pp. 1773–1785 (2005)Google Scholar
  8. 8.
    Gatos, B., Perantonis, S.J., Danatsas, D., Pratikakis, I.: Automatic table detection in document images. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 609–618. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Kasar, T., Barlas, P., Adam, S., Chatelain, C., Paquet, T.: Learning to detect tables in scanned document images using line information. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1185–1189. IEEE (2013)Google Scholar
  10. 10.
    Tian, Y., Gao, C., Huang, X.: Table frame line detection in low quality document images based on hough transform. In: 2014 2nd International Conference on Systems and Informatics (ICSAI), pp. 818–822. IEEE (2014)Google Scholar
  11. 11.
    Mandal, S., Chowdhury, S., Das, A.K., Chanda, B.: A simple and effective table detection system from document images. Int. J. Doc. Anal. Recogn. 8(2–3), 172–182 (2006)CrossRefGoogle Scholar
  12. 12.
    Wang, Y., Phillips, I.T., Haralick, R.: Automatic table ground truth generation and a background-analysis-based table structure extraction method. In: Proceedings of the Sixth International Conference on Document Analysis and Recognition, pp. 528–532. IEEE (2001)Google Scholar
  13. 13.
    Gobel, M., Hassan, T., Oro, E., Orsi, G.: ICDAR 2013 table competition. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1449–1453. IEEE (2013)Google Scholar
  14. 14.
    Li, J., Wang, K., Hao, S., Wang, Q.: Location and recognition of free tables in form. In: Zhang, W. (ed.) Software Engineering and Knowledge Engineering: Theory and Practice, pp. 685–692. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  15. 15.
    Ghanmi, N., Belaid, A.: Table detection in handwritten chemistry documents using conditional random fields. In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 146–151. IEEE (2014)Google Scholar
  16. 16.
    Seo, W., Koo, H.I., Cho, N.I.: Junction-based table detection in camera-captured document images. Int. J. Doc. Anal. Recogn. 18(1), 47–57 (2015)CrossRefGoogle Scholar
  17. 17.
    Kieninger, T.G.: Table structure recognition based on robust block segmentation. In: International Society for Optics and Photonics Photonics West 1998 Electronic Imaging, pp. 22–32 (1998)Google Scholar
  18. 18.
    Shin, J., Guerette, N.: Table recognition and evaluation. In: Class of 2005 Senior Conference on Natural Language Processing (2005)Google Scholar
  19. 19.
    Shafait, F., Smith, R.: Table detection in heterogeneous documents. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 65–72. ACM (2010)Google Scholar
  20. 20.
    Blanke, T., Bryant, M., Hedges, M.: Ocropodium: open source ocr for small-scale historical archives. J. Inf. Sci. 38(1), 76–86 (2012)CrossRefGoogle Scholar
  21. 21.
    Chattopadhyay, T., Sinha, P., Biswas, P.: Performance of document image ocr systems for recognizing video texts on embedded platform. In: 2011 International Conference on Computational Intelligence and Communication Networks (CICN), pp. 606–610. IEEE (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Hasso Plattner InstituteUniversity of PotsdamPotsdamGermany

Personalised recommendations