Skip to main content

Schema Matching Based on Source Codes

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9052))

Included in the following conference series:

  • 1105 Accesses

Abstract

Schema matching is a critical step in numerous database applications, such as web data sources integrating, data warehouse loading and information exchanging among several authorities. Existing techniques for schema matching are classified as either schema-based, instance-based, or a combination of both. In this paper, we propose a new class of techniques, called schema matching based on source codes. The idea is to exploit the exterior schema extracted from the source codes to find semantic correspondences between attributes in the schemas to be matched. Essentially, the exterior schema is a schema that is used to be exposed to final users and is in the outermost shell of applications. Thus, it typically contains complete semantics of data, which is very helpful in the solution of schema matching. We present a framework for schema matching based on source codes, which includes three key components: extracting the exterior schema, evaluating the quality of matching and finding the optimal mapping. We also present some helpful features and rules of the source codes for the implementation of each component, and address the corresponding challenges in details.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Li, W.-S., Clifton, C.: SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl. Eng. 33(1), 49–84 (2000)

    Article  MATH  Google Scholar 

  2. Doan, A., Domingos, P., Halevy, A.: Reconciling schemas of disparate data sources: a machine-learning approach. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 509–520 (2001)

    Google Scholar 

  3. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. J. Very Large Data Bases (VLDB) 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  4. Do, H.-H., Rahm, E.: COMA - A system for flexible combination of schema matching approaches. In: Proceedings of Very Large Data Bases (VLDB), pp. 610–621 (2002)

    Google Scholar 

  5. Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proceedings of the International Conference on Data Engineering (ICDE), pp. 117–128 (2002)

    Google Scholar 

  6. Kang, J., Naughton, J.F.: On schema matching with opaque column names and data values. In: Proceedings of the Special Interest Group on Management Of Data (SIGMOD), pp. 205–216 (2003)

    Google Scholar 

  7. Cohen, W. W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Proceedings of the IJCAI Workshop on Information Integration on the Web (IIWeb), pp. 73–78 (2003)

    Google Scholar 

  8. He, B., Chang, K.C.: Statistical schema matching across web query interfaces. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 217–228 (2003)

    Google Scholar 

  9. He, B., Chang, K.C.-C., Han, J.: Discovering complex matchings across web query interfaces: a correlation mining approach. In: Proceedings of Knowledge Discovery and Data Mining (KDD), pp. 148–157 (2004)

    Google Scholar 

  10. Bilke, A., Naumann, F.: Schema Matching using Duplicates. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 69–80 (2005)

    Google Scholar 

  11. Madhavan, J., Bernstein, P.A., Doan, A., Halevy, A.Y.: Corpus-based schema matching. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 57–68 (2005)

    Google Scholar 

  12. Warren, R.H., Tompa, F.: Multicolumn substring matching for database schema translation. In: Proceedings of Very Large Data Bases (VLDB), pp. 331–342 (2006)

    Google Scholar 

  13. Bohannon, P., Elnahrawy, E., Fan, W., Flaster, M.: Putting context into schema matching. In: Proceedings of Very Large Data Bases (VLDB), pp. 307–318 (2006)

    Google Scholar 

  14. Dong, X., Halevy, A.Y., Yu, C.: Data integration with uncertainty. In: Proceedings of Very Large Data Bases (VLDB), pp. 687–698 (2007)

    Google Scholar 

  15. An, Y., Borgid, A., Miller, R.J.: A semantic approach to discovering schema mapping expressions. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 206–215 (2007)

    Google Scholar 

  16. Dai, B.T., Koudas, N., Srivastavat, D., Tung, A.K.H., Venkatasubramaniant, S.: Validating Multi-column Schema Matchings by Type. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 120–129 (2008)

    Google Scholar 

  17. Sarma, A.D., Dong, X., Halevy, A.: Bootstrapping pay-as-you-go data integration systems. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 861–874 (2008)

    Google Scholar 

  18. Chan, C., Elmeleegy, H.V.J.H., Ouzzani, M., Elmagarmid, A.: Usage-based schema matching. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 20–29 (2008)

    Google Scholar 

  19. Nguyen, T., Moreira, V., Nguyen, H., Nguyen, H., Freire, J.: Multilingual schema matching for wikipedia infoboxes. In: Proceedings of Very Large Data Bases (VLDB), pp. 133–144 (2011)

    Google Scholar 

  20. Peukert, E., Eberius, J., Rahm, E.: A self-configuring schema matching system. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 306–317 (2012)

    Google Scholar 

  21. Qian, L., Cafarella, M.J., Jagadish, H.V.: Sample-driven schema mapping. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 73–84 (2012)

    Google Scholar 

  22. Zhang, M., Chakrabarti, K.: Infogather+: semantic matching and annotation of numeric and time-varying attributes in web tables. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 145–156 (2013)

    Google Scholar 

  23. http://www.amazon.com

  24. http://www.newegg.com/

  25. http://www.mvnforum.com/

Download references

Acknowledgments

This research was supported by the National Natural Science Foundation of China (Grant No. 61303016) and the Normal Project Foundation of Education Department of LiaoNing Province (Grant No. L2012045).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guohui Ding .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ding, G., Wang, G., Fan, C., Chen, S. (2015). Schema Matching Based on Source Codes. In: Liu, A., Ishikawa, Y., Qian, T., Nutanong, S., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9052. Springer, Cham. https://doi.org/10.1007/978-3-319-22324-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22324-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22323-0

  • Online ISBN: 978-3-319-22324-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics