Abstract
Instance-based schema matching is to determine the correspondences between heterogeneous databases by comparing instances. Heterogeneous databases consist of an enormous number of tables containing various attributes, causing the data heterogeneity. In such cases, it is effective to consider semantic information. In this paper, we propose the instance-based schema matching considering attributes’ semantics. We used Word2Vec to match attributes of character strings. The result shows a possibility to detect matching between attributes with high semantic similarity.
Article PDF
Avoid common mistakes on your manuscript.
References
P.A. Bernstein, J. Madhavan, E. Rahm, Generic schema matching, ten years later, Proc. VLDB Endow. 4 (2011) 695–701.
A.A. Alwan, A. Nordin, M. Alzeber, A.Z. Abualkishik, A survey of schema matching research using database schemas and instances, Int. J. Adv. Comput. Sci. Appl. 8 (2017) 102–111.
T. Mikolov, W-t. Yih, G. Zweig, Linguistic regularities in continuous space word representations, in: Proceedings of NAACL-HLT, Association for Computational Linguistics, Atlanta, Georgia, 2013, pp. 746–751.
T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, 2013, pp. 1–12, arX-iv:1301.3781v3.
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in: Proceedings of the 26th International Conference on Neural Information Processing Systems, volume 2, ACM Digital Library, Nevada, USA, 2013, pp. 3111–3119.
K. Nozaki, T. Hochin, H. Nomiya, Semantic Schema Matching for String Attribute with Word Vectors, in: 6th International Conference on Computational Science/Intelligence and Applied Informatics, 2019, p. 6.
H. Zhao, S. Ram, Clustering database objects for semantic integration of heterogeneous databases, in: AMCIS 2001 Proceedings, volume 70, 2001, pp. 357–362.
J. Partyka, L. Khan, B. Thuraisingham, Semantic schema matching without shared instances, 2009 IEEE International Conference on Semantic Computing, IEEE, Berkeley, CA, USA, 2009, pp. 297–302.
O.A. Mehdi, H. Ibrahim, L.S. Affendey, An approach for instance based schema matching with google similarity and regular expression, Int. Arab J. Inform. Technol. 14 (2017) 755–763.
Google code archive - long-term storage for google code project hosting. https://code.google.com/archive/p/word2vec/ (accessed 2019-5-13).
Uci machine learning repository. http://archive.ics.uci.edu/ml/datasets.html (accessed 2019-6-25).
Kaggle. https://www.kaggle.com/ (accessed 2019-6-25).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).
About this article
Cite this article
Nozaki, K., Hochin, T. & Nomiya, H. Semantic Schema Matching for String Attribute with Word Vectors and its Evaluation. Int J Netw Distrib Comput 7, 100–106 (2019). https://doi.org/10.2991/ijndc.k.190710.001
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.2991/ijndc.k.190710.001