Advertisement

PBA: Partition and Blocking Based Alignment for Large Knowledge Bases

  • Yan Zhuang
  • Guoliang LiEmail author
  • Zhuojian Zhong
  • Jianhua Feng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9642)

Abstract

The vigorous development of semantic web has enabled the creation of a growing number of large-scale knowledge bases across various domains. As different knowledge-bases contain overlapping and complementary information, automatically integrating these knowledge bases by aligning their classes and instances can improve the quality and coverage of the knowledge bases. Existing knowledge-base alignment algorithms have some limitations: (1) not scalable, (2) poor quality, (3) not fully automatic. To address these limitations, we develop a scalable partition-and-blocking based alignment framework, named Pba, which can automatically align knowledge bases with tens of millions of instances efficiently. Pba contains three steps. (1) Partition: we propose a new hierarchical agglomerative co-clustering algorithm to partition the class hierarchy of the knowledge base into multiple class partitions. (2) Blocking: we judiciously divide the instances in the same class partition into small blocks to further improve the performance. (3) Alignment: we compute the similarity of the instances in each block using a vector space model and align the instances with large similarities. Experimental results on real and synthetic datasets show that our algorithm significantly outperforms state-of-art approaches in efficiency, even by an order of magnitude, while keeping high alignment quality.

Keywords

Resource Description Framework Priority Queue Vector Space Model Alignment Process Uniform Resource Identifier 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgement

This work was supported by the National Grand Fundamental Research 973 Program of China (2015CB358700), the National Natural Science Foundation of China (61422205, 61373024, 61472198), Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology, “NExT Research Center”, Singapore (WBS:R-252-300-001-490), Huawei, Shenzhou, FDCT/116/2013/A3, MYRG105(Y1-L3)-FST13-GZ, National 863 Program of China (2012AA012600), Chinese Special Project of Science and Technology (2013zx01039-002-002) and the National Center for International Joint Research on E-Business Information Processing (2013B01035).

References

  1. 1.
    Abdullah, N., Ibrahim, R.: Knowledge retrieval in lexical ontology-based semantic web search engine. In: ICUIMC, Kota Kinabalu, Malaysia, 17–19 January 2013, p. 8 (2013)Google Scholar
  2. 2.
    Afacan, Y., Demirkan, H.: An ontology-based universal design knowledge support system. Knowl.-Based Syst. 24(4), 530–541 (2011)CrossRefGoogle Scholar
  3. 3.
    Anagnostopoulos, A., Dasgupta, A., Kumar, R.: Approximation algorithms for co-clustering. In: PODS 2008, Vancouver, BC, Canada, 9–11 June 2008, pp. 201–210 (2008)Google Scholar
  4. 4.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)CrossRefGoogle Scholar
  5. 5.
    Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD 2008, Vancouver, BC, Canada, 10–12 June 2008, pp. 1247–1250 (2008)Google Scholar
  6. 6.
    Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: SIGKDD, San Francisco, CA, USA, 26–29 August 2001, pp. 269–274 (2001)Google Scholar
  7. 7.
    Do, H.H., Rahm, E.: Matching large schemas: approaches and evaluation. Inf. Syst. 32(6), 857–885 (2007)CrossRefGoogle Scholar
  8. 8.
    Guo, Q., Zhang, M.: Question answering based on pervasive agent ontology and semantic web. Knowl.-Based Syst. 22(6), 443–448 (2009)CrossRefGoogle Scholar
  9. 9.
    Hu, W., Qu, Y., Cheng, G.: Matching large ontologies: a divide-and-conquer approach. Data Knowl. Eng. 67(1), 140–160 (2008)CrossRefGoogle Scholar
  10. 10.
    Lacoste-Julien, S., Palla, K., Davies, A., Kasneci, G., Graepel, T., Ghahramani, Z.: Sigma: simple greedy matching for aligning large knowledge bases. In: KDD 2013, Chicago, IL, USA, 11–14 August 2013, pp. 572–580 (2013)Google Scholar
  11. 11.
    Lee, S., Hwang, S.: ARIA: asymmetry resistant instance alignment. In: AAAI, Québec City, Québec, Canada, 27–31 July 2014, pp. 94–100 (2014)Google Scholar
  12. 12.
    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web 6(2), 167–195 (2015)Google Scholar
  13. 13.
    Li, J., Tang, J., Li, Y., Luo, Q.: Rimom: a dynamic multistrategy ontology alignment framework. IEEE Trans. Knowl. Data Eng. 21(8), 1218–1232 (2009)CrossRefGoogle Scholar
  14. 14.
    Li, J., Wang, Z., Zhang, X., Tang, J.: Large scale instance matching via multiple indexes and candidate selection. Knowl.-Based Syst. 50, 112–120 (2013)CrossRefGoogle Scholar
  15. 15.
    Lo, K.K., Lam, W.: Building knowledge base for reading from encyclopedia. In: Machine Reading, Papers from the 2007 AAAI Spring Symposium, Technical Report SS-07-06, Stanford, California, USA, 26–28 March 2007, pp. 73–78 (2007)Google Scholar
  16. 16.
    McNeill, N., Kardes, H., Borthwick, A.: Dynamic record blocking: efficient linking of massive databases in mapreduce. In: Proceedings of the 10th International Workshop on Quality in Databases (QDB) (2012)Google Scholar
  17. 17.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)CrossRefzbMATHGoogle Scholar
  18. 18.
    Secer, A., Sonmez, A.C., Aydin, H.: Ontology mapping using bipartite graph. Int. J. Phys. Sci. 17, 4224–4244 (2011)Google Scholar
  19. 19.
    Suchanek, F.M., Abiteboul, S., Senellart, P.: PARIS: probabilistic alignment of relations, instances, and schema. PVLDB 5(3), 157–168 (2011)Google Scholar
  20. 20.
    Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a large ontology from wikipedia and wordnet. J. Web Semant. 6(3), 203–217 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Yan Zhuang
    • 1
    • 2
  • Guoliang Li
    • 1
    Email author
  • Zhuojian Zhong
    • 3
  • Jianhua Feng
    • 1
  1. 1.Tsinghua UniversityBeijingChina
  2. 2.PLA Navy General HospitalBeijingChina
  3. 3.Beijing University of Posts and TelecommunicationsBeijingChina

Personalised recommendations