Graph Neural Network Approach to Semantic Type Detection in Tables

Hoseinzade, Ehsan; Wang, Ke

doi:10.1007/978-981-97-2266-2_10

Ehsan Hoseinzade¹³ &
Ke Wang¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14650))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

207 Accesses
1 Altmetric

Abstract

This study addresses the challenge of detecting semantic column types in relational tables, a key task in many real-world applications. While language models like BERT have improved prediction accuracy, their token input constraints limit the simultaneous processing of intra-table and inter-table information. We propose a novel approach using Graph Neural Networks (GNNs) to model intra-table dependencies, allowing language models to focus on inter-table information. Our proposed method not only outperforms existing state-of-the-art algorithms but also offers novel insights into the utility and functionality of various GNN types for semantic type detection. The code is available at https://github.com/hoseinzadeehsan/GAIT

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/ysunbp/RECA-paper.

References

Chen, J., Jiménez-Ruiz, E., Horrocks, I., Sutton, C.: Colnet: embedding the semantics of web tables for column type prediction. In: AAAI (2019)
Google Scholar
Chen, J., Jiménez-Ruiz, E., Horrocks, I., Sutton, h.: Learning semantic annotations for tabular data. In: IJCAI. vol. 33, pp. 2088–2094 (2019)
Google Scholar
Chen, X., Li, L.J., Fei-Fei, L., Gupta, A.: Iterative visual reasoning beyond convolutions. In: CVPR, pp. 7239–7248 (2018)
Google Scholar
Deng, X., Sun, H., Lees, A., Wu, Y., Yu, C.: Turl: table understanding through representation learning. ACM SIGMOD Rec. 51(1), 33–40 (2022)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)
Fernandez, R.C., Abedjan, Z., Koko, F., Yuan, G., Madden, S., Stonebraker, M.: Aurum: A data discovery system. In: ICDE, pp. 1001–1012. IEEE (2018)
Google Scholar
Feuer, B., Liu, Y., Hegde, C., Freire, J.: Archetype: a novel framework for open-source column type annotation using large language models. arXiv (2023)
Google Scholar
Hu, K., et al.: Viznet: Towards a large-scale visualization learning and benchmarking repository. In: CHI, pp. 1–12 (2019)
Google Scholar
Hulsebos, M., et al.: Sherlock: A deep learning approach to semantic data type detection. In: SIGKDD, pp. 1500–1508 (2019)
Google Scholar
Iida, H., Thai, D., Manjunatha, V., Iyyer, M.: Tabbie: Pretrained representations of tabular data. arXiv preprint arXiv:2105.02584 (2021)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Korini, K., Bizer, C.: Column type annotation using chatgpt. arXiv (2023)
Google Scholar
Li, P., et al.: Table-gpt: Table-tuned gpt for diverse table tasks. arXiv (2023)
Google Scholar
Li, Y., Ouyang, W., Zhou, B., Wang, K., Wang, X.: Scene graph generation from objects, phrases and region captions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1261–1270 (2017)
Google Scholar
Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015)
Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. VLDB 3(1–2), 1338–1347 (2010)
Google Scholar
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. the VLDB Journal 10(4), 334–350 (2001)
Google Scholar
Suhara, Y., Li, J., Li, Y., Zhang, D., Demiralp, Ç., Chen, C., Tan, W.C.: Annotating columns with pre-trained language models. In: SIGMOD (2022)
Google Scholar
Sun, Y., Xin, H., Chen, L.: Reca: related tables enhanced column semantic type annotation framework. VLDB 16(6), 1319–1331 (2023)
Google Scholar
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
Wang, D., Shiralkar, P., Lockard, C., Huang, B., Dong, X.L., Jiang, M.: Tcn: Table convolutional network for web table interpretation. In: WWW (2021)
Google Scholar
Wang, M., et al.: Deep graph library: a graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315 (2019)
Wang, Z., et al.: Tuta: Tree-based transformers for generally structured table pre-training. In: SIGKDD (2021)
Google Scholar
Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)
Google Scholar
Yin, P., Neubig, G., Yih, W.t., Riedel, S.: Tabert: pretraining for joint understanding of textual and tabular data. arXiv preprint arXiv:2005.08314 (2020)
Zhang, D., Suhara, Y., Li, J., Hulsebos, M., Demiralp, Ç., Tan, W.C.: Sato: Contextual semantic type detection in tables. arXiv preprint arXiv:1911.06311 (2019)
Zhang, H., Dong, Y., Xiao, C., Oyamada, M.: Jellyfish: A large language model for data preprocessing. arXiv (2023)
Google Scholar

Download references

Acknowledgement

The work of Ke Wang is supported in part by a discovery grant from Natural Sciences and Engineering Research Council of Canada.

Author information

Authors and Affiliations

Simon Fraser University, Burnaby, Canada
Ehsan Hoseinzade & Ke Wang

Authors

Ehsan Hoseinzade
View author publications
You can also search for this author in PubMed Google Scholar
Ke Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ehsan Hoseinzade .

Editor information

Editors and Affiliations

Taipei, Taiwan
De-Nian Yang
Microsoft Research Asia, Beijing, China
Xing Xie
National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Vincent S. Tseng
Duke University, Durham, NC, USA
Jian Pei
National Cheng Kung University, Tainan, Taiwan
Jen-Wei Huang
Silesian University of Technology, Gliwice, Poland
Jerry Chun-Wei Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hoseinzade, E., Wang, K. (2024). Graph Neural Network Approach to Semantic Type Detection in Tables. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14650. Springer, Singapore. https://doi.org/10.1007/978-981-97-2266-2_10

Download citation

DOI: https://doi.org/10.1007/978-981-97-2266-2_10
Published: 25 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2265-5
Online ISBN: 978-981-97-2266-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Graph Neural Network Approach to Semantic Type Detection in Tables