Skip to main content

Graph Neural Network Approach to Semantic Type Detection in Tables

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14650))

Included in the following conference series:

Abstract

This study addresses the challenge of detecting semantic column types in relational tables, a key task in many real-world applications. While language models like BERT have improved prediction accuracy, their token input constraints limit the simultaneous processing of intra-table and inter-table information. We propose a novel approach using Graph Neural Networks (GNNs) to model intra-table dependencies, allowing language models to focus on inter-table information. Our proposed method not only outperforms existing state-of-the-art algorithms but also offers novel insights into the utility and functionality of various GNN types for semantic type detection. The code is available at https://github.com/hoseinzadeehsan/GAIT

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/ysunbp/RECA-paper.

References

  1. Chen, J., Jiménez-Ruiz, E., Horrocks, I., Sutton, C.: Colnet: embedding the semantics of web tables for column type prediction. In: AAAI (2019)

    Google Scholar 

  2. Chen, J., Jiménez-Ruiz, E., Horrocks, I., Sutton, h.: Learning semantic annotations for tabular data. In: IJCAI. vol. 33, pp. 2088–2094 (2019)

    Google Scholar 

  3. Chen, X., Li, L.J., Fei-Fei, L., Gupta, A.: Iterative visual reasoning beyond convolutions. In: CVPR, pp. 7239–7248 (2018)

    Google Scholar 

  4. Deng, X., Sun, H., Lees, A., Wu, Y., Yu, C.: Turl: table understanding through representation learning. ACM SIGMOD Rec. 51(1), 33–40 (2022)

    Article  Google Scholar 

  5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)

  6. Fernandez, R.C., Abedjan, Z., Koko, F., Yuan, G., Madden, S., Stonebraker, M.: Aurum: A data discovery system. In: ICDE, pp. 1001–1012. IEEE (2018)

    Google Scholar 

  7. Feuer, B., Liu, Y., Hegde, C., Freire, J.: Archetype: a novel framework for open-source column type annotation using large language models. arXiv (2023)

    Google Scholar 

  8. Hu, K., et al.: Viznet: Towards a large-scale visualization learning and benchmarking repository. In: CHI, pp. 1–12 (2019)

    Google Scholar 

  9. Hulsebos, M., et al.: Sherlock: A deep learning approach to semantic data type detection. In: SIGKDD, pp. 1500–1508 (2019)

    Google Scholar 

  10. Iida, H., Thai, D., Manjunatha, V., Iyyer, M.: Tabbie: Pretrained representations of tabular data. arXiv preprint arXiv:2105.02584 (2021)

  11. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  12. Korini, K., Bizer, C.: Column type annotation using chatgpt. arXiv (2023)

    Google Scholar 

  13. Li, P., et al.: Table-gpt: Table-tuned gpt for diverse table tasks. arXiv (2023)

    Google Scholar 

  14. Li, Y., Ouyang, W., Zhou, B., Wang, K., Wang, X.: Scene graph generation from objects, phrases and region captions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1261–1270 (2017)

    Google Scholar 

  15. Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015)

  16. Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. VLDB 3(1–2), 1338–1347 (2010)

    Google Scholar 

  17. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. the VLDB Journal 10(4), 334–350 (2001)

    Google Scholar 

  18. Suhara, Y., Li, J., Li, Y., Zhang, D., Demiralp, Ç., Chen, C., Tan, W.C.: Annotating columns with pre-trained language models. In: SIGMOD (2022)

    Google Scholar 

  19. Sun, Y., Xin, H., Chen, L.: Reca: related tables enhanced column semantic type annotation framework. VLDB 16(6), 1319–1331 (2023)

    Google Scholar 

  20. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)

  21. Wang, D., Shiralkar, P., Lockard, C., Huang, B., Dong, X.L., Jiang, M.: Tcn: Table convolutional network for web table interpretation. In: WWW (2021)

    Google Scholar 

  22. Wang, M., et al.: Deep graph library: a graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315 (2019)

  23. Wang, Z., et al.: Tuta: Tree-based transformers for generally structured table pre-training. In: SIGKDD (2021)

    Google Scholar 

  24. Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)

    Google Scholar 

  25. Yin, P., Neubig, G., Yih, W.t., Riedel, S.: Tabert: pretraining for joint understanding of textual and tabular data. arXiv preprint arXiv:2005.08314 (2020)

  26. Zhang, D., Suhara, Y., Li, J., Hulsebos, M., Demiralp, Ç., Tan, W.C.: Sato: Contextual semantic type detection in tables. arXiv preprint arXiv:1911.06311 (2019)

  27. Zhang, H., Dong, Y., Xiao, C., Oyamada, M.: Jellyfish: A large language model for data preprocessing. arXiv (2023)

    Google Scholar 

Download references

Acknowledgement

The work of Ke Wang is supported in part by a discovery grant from Natural Sciences and Engineering Research Council of Canada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ehsan Hoseinzade .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hoseinzade, E., Wang, K. (2024). Graph Neural Network Approach to Semantic Type Detection in Tables. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14650. Springer, Singapore. https://doi.org/10.1007/978-981-97-2266-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-2266-2_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-2265-5

  • Online ISBN: 978-981-97-2266-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics