Skip to main content

Dependency-Aware Core Column Discovery for Table Understanding

  • Conference paper
  • First Online:
The Semantic Web – ISWC 2023 (ISWC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14265))

Included in the following conference series:

  • 1512 Accesses

Abstract

In a relational table, core columns represent the primary subject entities that other columns in the table depend on. While discovering core columns is crucial for understanding a table’s semantic column types, column relations, and entities, it is often overlooked. Previous methods typically rely on heuristic rules or contextual information, which can fail to accurately capture the dependencies between columns and make it difficult to preserve their relationships. To address these challenges, we introduce Dependency-aware Core Column Discovery (DaCo), an iterative method that uses a novel rough matching strategy to identify both inter-column dependencies and core columns. Unlike other methods, DaCo does not require labeled data or contextual information, making it suitable for practical scenarios. Additionally, it can identify multiple core columns within a table, which is common in real-world tables. Our experimental results demonstrate that DaCo outperforms existing core column discovery methods, substantially improving the efficiency of table understanding tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    By dependency, we are implying that column y is an attribute of column x if y depends on x [28].

  2. 2.

    Our solution can be applied to table understanding tasks since research on table understanding assumes an overlap between the table and the KG.

  3. 3.

    https://github.com/barrel-0314/daco.

  4. 4.

    It includes tough tables generated from SemTab for dealing with the tabular data to KG matching problem.

References

  1. T2d gold standard for matching web tables to dbpedia (2015). http://webdatacommons.org/webtables/goldstandard.html

  2. Gittables benchmark-column type detection (2021). https://zenodo.org/record/5706316#.YxAVU9NBw2x

  3. Semtab 2021: Semantic web challenge on tabular data to knowledge graph matching (2021), http://www.cs.ox.ac.uk/isg/challenges/sem-tab/2021/

  4. Bhagavatula, C.S., Noraset, T., Downey, D.: TabEL: entity linking in web tables. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 425–441. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_25

    Chapter  Google Scholar 

  5. Birnick, J., Blasius, T., Friedrich, T., Naumann, F., Papenbrock, T., Schirneck, M.: Hitting set enumeration with partial information for unique column combination discovery. In: Proceedings of the VLDB Endowment, vol. 13, pp. 2070–2083 (2020)

    Google Scholar 

  6. Bornemann, L., Bleifuß, T., Kalashnikov, D.V., Naumann, F., Srivastava, D.: Natural key discovery in wikipedia tables. In: Proceedings of The Web Conference 2020, pp. 2789–2795 (2020)

    Google Scholar 

  7. Cafarella, M.J., Halevy, A., Wang, D.: WebTables: exploring the power of tables on the web. In: Proceedings of the VLDB Endowment, pp. 538–549 (2008)

    Google Scholar 

  8. Cafarella, M.J., Halevy, A., Wang, D., Wu, E., Zhang, Y.: Uncovering the relational web. In: Proceedings of the 11th International Workshop on Web and Databases (2008)

    Google Scholar 

  9. Chen, J., Jiménez-Ruiz, E., Horrocks, I., Sutton, C.: ColNet: embedding the semantics of web tables for column type prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 29–36 (2019)

    Google Scholar 

  10. Chen, Z., Trabelsi, M., Heflin, J., Xu, Y., Davison, B.D.: Table search using a deep contextualized language model. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 589–598 (2020)

    Google Scholar 

  11. Chirigati, F., Liu, J., Korn, F., Wu, Y., Yu, C., Zhang, H.: Knowledge exploration using tables on the web. In: Proceedings of the VLDB Endowment, vol. 10, pp. 193–204 (2016)

    Google Scholar 

  12. Deng, X., Sun, H., Lees, A., Wu, Y., Yu, C.: TURL: table understanding through representation learning. In: Proceedings of the 2022 ACM SIGMOD International Conference on Management of Data, vol. 14, pp. 33–40 (2022)

    Google Scholar 

  13. Efthymiou, V., Hassanzadeh, O., Rodriguez-Muro, M., Christophides, V.: Matching web tables with knowledge base entities: from entity lookups to entity embeddings. In: Proceedings of the International Semantic Web Conference, pp. 260–277 (2017)

    Google Scholar 

  14. Ermilov, I., Ngomo, A.-C.N.: TAIPAN: automatic property mapping for tabular data. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds.) EKAW 2016. LNCS (LNAI), vol. 10024, pp. 163–179. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49004-5_11

    Chapter  Google Scholar 

  15. Fan, W., Wu, Y., Xu, J.: Functional dependencies for graphs. In: Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data, pp. 1843–1857 (2016)

    Google Scholar 

  16. Gentile, A.L., Ristoski, P., Eckel, S., Ritze, D., Paulheim, H.: Entity matching on web tables: a table embeddings approach for blocking. In: Proceedings of the 20th International Conference on Extending Database Technology, pp. 510–513 (2017)

    Google Scholar 

  17. Harmouch, H., Papenbrock, T., Naumann, F.: Relational header discovery using similarity search in a table corpus. In: 2021 IEEE 37th International Conference on Data Engineering, pp. 444–455. IEEE (2021)

    Google Scholar 

  18. Ho, V.T., Pal, K., Razniewski, S., Berberich, K., Weikum, G.: Extracting contextualized quantity facts from web tables. In: Proceedings of the Web Conference 2021, pp. 4033–4042 (2021)

    Google Scholar 

  19. Ibrahim, Y., Riedewald, M., Weikum, G., Zeinalipour-Yazti, D.: Bridging quantities in tables and text. In: Proceedings of IEEE 35th International Conference on Data Engineering, pp. 1010–1021 (2019)

    Google Scholar 

  20. Khatiwada, A., et al.: Santos: relationship-based semantic table union search. CoRR abs/2209.13589 (2022)

    Google Scholar 

  21. Korini1, K., Peeters, R., Bizer, C.: SOTAB: the WDC schema.org table annotation benchmark. In: Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching co-located with the 21st International Semantic Web Conference, vol. 3320, pp. 14–19 (2022)

    Google Scholar 

  22. Kruit, B., Boncz, P., Urbani, J.: Extracting N-ary facts from wikipedia table clusters. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 655–664 (2020)

    Google Scholar 

  23. Kruit, B., Boncz, P., Urbani, J.: TAKCO: a platform for extracting novel facts from tables. In: Companion Proceedings of the Web Conference, pp. 705–707 (2021)

    Google Scholar 

  24. Kruse, S., Naumann, F.: Efficient discovery of approximate dependencies. In: Proceedings of the VLDB Endowment, vol. 11, pp. 759–772 (2018)

    Google Scholar 

  25. Lehmann, J., et al.: Dbpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web 6(2), 167–195 (2014)

    Google Scholar 

  26. Lehmberg, O., Bizer, C.: Web table column categorisation and profiling. In: Proceedings of the 19th International Workshop on Web and Databases, pp. 1–7 (2016)

    Google Scholar 

  27. Lehmberg, O., Bizer, C.: Stitching web tables for improving matching quality. In: Proceedings of the VLDB Endowment, vol. 10, pp. 1502–1513 (2017)

    Google Scholar 

  28. Lehmberg, O., Bizer, C.: Profiling the semantics of N-ary web table data. In: Proceedings of the International Workshop on Semantic Big Data, vol. 5, pp. 1–6 (2019)

    Google Scholar 

  29. Lehmberg, O., Bizer, C.: Synthesizing N-ary relations from web tables. In: Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics, vol. 17, pp. 1–12 (2019)

    Google Scholar 

  30. Li, Z.: Cauchy convergence topologies on the space of continuous functions. Topol. Appl. 161, 321–329 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  31. Luzuriaga, J., Munoz, E., Rosales-Mendez, H., Hogan, A.: Merging web tables for relation extraction with knowledge graphs. IEEE Trans. Knowl. Data Eng. 35(2), 1803–1816 (2023)

    Google Scholar 

  32. Marzocchi, M., Cremaschi, M., Pozzi, R., Avogadro, R., Palmonari, M.: MammoTab: a giant and comprehensive dataset for semantic table interpretation. In: Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching co-located with the 21st International Semantic Web Conference, vol. 3320, pp. 28–33 (2022)

    Google Scholar 

  33. Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. The MIT Press (2018)

    Google Scholar 

  34. Nargesian, F., Zhu, E., Pu, K.Q., Miller, R.J.: Table union search on open data. In: Proceedings of the VLDB Endowment, vol. 11, pp. 813–825 (2018)

    Google Scholar 

  35. Neumaier, S., Umbrich, J., Parreira, J.X., Polleres, A.: Multi-level semantic labelling of numerical values. In: Groth, P., et al. (eds.) Proceedings of the 15th International Semantic Web Conference, pp. 428–445 (2016)

    Google Scholar 

  36. Nguyen, P., Kertkeidkachorn, N., Ichise, R., Takeda, H.: TabEAno: table to knowledge graph entity annotation. CoRR abs/2010.01829 (2020)

    Google Scholar 

  37. Pham, M., Alse, S., Knoblock, C.A., Szekely, P.: Semantic labeling: a domain-independent approach. In: Groth, P., et al., (eds.) Proceedings of the 15th International Semantic Web Conference, pp. 446–462 (2016)

    Google Scholar 

  38. Ritze, D., Lehmberg, O., Bizer, C.: Matching html tables to DBpedia. In: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, pp. 1–6 (2015)

    Google Scholar 

  39. Shyu, S.j., Yin, P., Lin, B.M.T.: An ant colony optimization algorithm for the minimum weight vertex cover problem. Ann. Oper. Res. 131, 283–304 (2004)

    Google Scholar 

  40. Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: efficient and scalable discovery of composite keys. In: Proceedings of the VLDB Endowment, pp. 691–702 (2006)

    Google Scholar 

  41. Sun, H., Ma, H., Yih, W.t., Yan, X.: Table cell search for question answering. In: Proceedings of the 25th International Conference on World Wide Web, pp. 771–782 (2016)

    Google Scholar 

  42. Takeoka, K., Oyamada, M., Nakadai, S., Okadome, T.: Meimei: an efficient probabilistic approach for semantically annotating tables. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 281–288 (2019)

    Google Scholar 

  43. Tan, Z., Ran, A., Ma, S., Qin, S.: Fast incremental discovery of pointwise order dependencies. In: Proceedings of the VLDB Endwment, vol. 13, pp. 1669–1681 (2020)

    Google Scholar 

  44. Trabelsi, M., Chen, Z., Zhang, S., Davison, B.D., Heflin, J.: StruBERT: structure-aware BERT for table search and matching. In: Proceedings of the Web Conference 2022, pp. 442–451 (2021)

    Google Scholar 

  45. Venetis, P., et al.: Recovering semantics of tables on the web. In: Proceedings of the VLDB Endowment, vol. 4, pp. 528–538 (2011)

    Google Scholar 

  46. Wang, N., Ren, X.: Identifying multiple entity columns in web tables. Int. J. Softw. Eng. Knowl. Eng. 28(3), 287–309 (2018)

    Article  Google Scholar 

  47. Wei, Z., Hartmann, S., Link, S.: Discovery algorithms for embedded functional dependencies. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 833–843 (2020)

    Google Scholar 

  48. Yin, P., Neubig, G., Yih, W.T., Riedel, S.: TaBERT: pretraining for joint understanding of textual and tabular data. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, pp. 8413–8426 (2020)

    Google Scholar 

  49. Zhang, M., Chakrabarti, K.: InfoGather+ semantic matching and annotation of numeric and time-varying attributes in web tables. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 145–156 (2013)

    Google Scholar 

  50. Zhang, S., Balog, K.: Ad hoc table retrieval using semantic similarity. In: Proceedings of the World Wide Web Conference, pp. 1553–1562 (2018)

    Google Scholar 

  51. Zhang, S., Balog, K.: On-the-fly table generation. In: Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 595–604 (2018)

    Google Scholar 

  52. Zhang, S., Balog, K.: Auto-completion for data cells in relational tables. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 761–770 (2019)

    Google Scholar 

  53. Zhang, S., Balog, K.: Web table extraction, retrieval, and augmentation: a survey. ACM Trans. Intell. Syst. Technol. 11, 13:1-13:35 (2020)

    Article  Google Scholar 

  54. Zhang, S., Meij, E., Balog, K., Rernanda, R.: Novel entity discovery from web tables. In: Proceedings of International World Wide Web Conference, pp. 1298–1308 (2020)

    Google Scholar 

  55. Zhang, X., Chen, Y., Chen, J., Du, X., Zou, L.: Mapping entity-attribute web tables to web-scale knowledge bases. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013. LNCS, vol. 7826, pp. 108–122. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37450-0_8

    Chapter  Google Scholar 

  56. Zhang, Z.: Towards efficient and effective semantic table interpretation. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 487–502. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_31

    Chapter  Google Scholar 

  57. Zhang, Z.: Effective and efficient semantic table interpretation using TableMiner+. Semantic Web 8(6), 921–957 (2017)

    Article  Google Scholar 

  58. Zhu, G., Iglesias, C.A.: Computing semantic similarity of concepts in knowledge graphs. IEEE Trans. Knowl. Data Eng. 29(1), 72–89 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank Jiaoyan Chen for his useful comment on this paper. This work is supported by the State Grid Technology Project “research and application of key technologies for automatic graphic construction of power grid control system driven by model and data”, the National Natural Science Foundation of China under the grant numbers [62061146001, 62072099, 62232004], the “Zhishan” Scholars Programs of Southeast University, and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiahui Jin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qiu, J. et al. (2023). Dependency-Aware Core Column Discovery for Table Understanding. In: Payne, T.R., et al. The Semantic Web – ISWC 2023. ISWC 2023. Lecture Notes in Computer Science, vol 14265. Springer, Cham. https://doi.org/10.1007/978-3-031-47240-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47240-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47239-8

  • Online ISBN: 978-3-031-47240-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics