DASFAA 2016: Database Systems for Advanced Applications pp 415-431 | Cite as
PBA: Partition and Blocking Based Alignment for Large Knowledge Bases
Abstract
The vigorous development of semantic web has enabled the creation of a growing number of large-scale knowledge bases across various domains. As different knowledge-bases contain overlapping and complementary information, automatically integrating these knowledge bases by aligning their classes and instances can improve the quality and coverage of the knowledge bases. Existing knowledge-base alignment algorithms have some limitations: (1) not scalable, (2) poor quality, (3) not fully automatic. To address these limitations, we develop a scalable partition-and-blocking based alignment framework, named Pba, which can automatically align knowledge bases with tens of millions of instances efficiently. Pba contains three steps. (1) Partition: we propose a new hierarchical agglomerative co-clustering algorithm to partition the class hierarchy of the knowledge base into multiple class partitions. (2) Blocking: we judiciously divide the instances in the same class partition into small blocks to further improve the performance. (3) Alignment: we compute the similarity of the instances in each block using a vector space model and align the instances with large similarities. Experimental results on real and synthetic datasets show that our algorithm significantly outperforms state-of-art approaches in efficiency, even by an order of magnitude, while keeping high alignment quality.
Keywords
Resource Description Framework Priority Queue Vector Space Model Alignment Process Uniform Resource IdentifierNotes
Acknowledgement
This work was supported by the National Grand Fundamental Research 973 Program of China (2015CB358700), the National Natural Science Foundation of China (61422205, 61373024, 61472198), Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology, “NExT Research Center”, Singapore (WBS:R-252-300-001-490), Huawei, Shenzhou, FDCT/116/2013/A3, MYRG105(Y1-L3)-FST13-GZ, National 863 Program of China (2012AA012600), Chinese Special Project of Science and Technology (2013zx01039-002-002) and the National Center for International Joint Research on E-Business Information Processing (2013B01035).
References
- 1.Abdullah, N., Ibrahim, R.: Knowledge retrieval in lexical ontology-based semantic web search engine. In: ICUIMC, Kota Kinabalu, Malaysia, 17–19 January 2013, p. 8 (2013)Google Scholar
- 2.Afacan, Y., Demirkan, H.: An ontology-based universal design knowledge support system. Knowl.-Based Syst. 24(4), 530–541 (2011)CrossRefGoogle Scholar
- 3.Anagnostopoulos, A., Dasgupta, A., Kumar, R.: Approximation algorithms for co-clustering. In: PODS 2008, Vancouver, BC, Canada, 9–11 June 2008, pp. 201–210 (2008)Google Scholar
- 4.Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)CrossRefGoogle Scholar
- 5.Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD 2008, Vancouver, BC, Canada, 10–12 June 2008, pp. 1247–1250 (2008)Google Scholar
- 6.Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: SIGKDD, San Francisco, CA, USA, 26–29 August 2001, pp. 269–274 (2001)Google Scholar
- 7.Do, H.H., Rahm, E.: Matching large schemas: approaches and evaluation. Inf. Syst. 32(6), 857–885 (2007)CrossRefGoogle Scholar
- 8.Guo, Q., Zhang, M.: Question answering based on pervasive agent ontology and semantic web. Knowl.-Based Syst. 22(6), 443–448 (2009)CrossRefGoogle Scholar
- 9.Hu, W., Qu, Y., Cheng, G.: Matching large ontologies: a divide-and-conquer approach. Data Knowl. Eng. 67(1), 140–160 (2008)CrossRefGoogle Scholar
- 10.Lacoste-Julien, S., Palla, K., Davies, A., Kasneci, G., Graepel, T., Ghahramani, Z.: Sigma: simple greedy matching for aligning large knowledge bases. In: KDD 2013, Chicago, IL, USA, 11–14 August 2013, pp. 572–580 (2013)Google Scholar
- 11.Lee, S., Hwang, S.: ARIA: asymmetry resistant instance alignment. In: AAAI, Québec City, Québec, Canada, 27–31 July 2014, pp. 94–100 (2014)Google Scholar
- 12.Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web 6(2), 167–195 (2015)Google Scholar
- 13.Li, J., Tang, J., Li, Y., Luo, Q.: Rimom: a dynamic multistrategy ontology alignment framework. IEEE Trans. Knowl. Data Eng. 21(8), 1218–1232 (2009)CrossRefGoogle Scholar
- 14.Li, J., Wang, Z., Zhang, X., Tang, J.: Large scale instance matching via multiple indexes and candidate selection. Knowl.-Based Syst. 50, 112–120 (2013)CrossRefGoogle Scholar
- 15.Lo, K.K., Lam, W.: Building knowledge base for reading from encyclopedia. In: Machine Reading, Papers from the 2007 AAAI Spring Symposium, Technical Report SS-07-06, Stanford, California, USA, 26–28 March 2007, pp. 73–78 (2007)Google Scholar
- 16.McNeill, N., Kardes, H., Borthwick, A.: Dynamic record blocking: efficient linking of massive databases in mapreduce. In: Proceedings of the 10th International Workshop on Quality in Databases (QDB) (2012)Google Scholar
- 17.Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)CrossRefMATHGoogle Scholar
- 18.Secer, A., Sonmez, A.C., Aydin, H.: Ontology mapping using bipartite graph. Int. J. Phys. Sci. 17, 4224–4244 (2011)Google Scholar
- 19.Suchanek, F.M., Abiteboul, S., Senellart, P.: PARIS: probabilistic alignment of relations, instances, and schema. PVLDB 5(3), 157–168 (2011)Google Scholar
- 20.Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a large ontology from wikipedia and wordnet. J. Web Semant. 6(3), 203–217 (2008)CrossRefGoogle Scholar