Entity resolution for media metadata based on structural clustering

  • Qi Gu
  • Jian CaoEmail author
  • Yancen Liu


An increasing amount of media metadata are published by different organizations on the Web which leads to a fragmented dataset landscape. Identifying media metadata from disparate datasets and integrating heterogeneous datasets have many applications but also pose significant challenges. To tackle this problem, entity resolution methods are commonly used as an essential prerequisite for integrating media information from different sources and effectively foster the re-use of existing data sources. While the amount of media metadata published on the Web grows steadily, how to scale it well to large media knowledge bases while maintaining a high matching quality is a critical challenge. This article investigates the relationships between media entities. To that end, the media database is formulated as a knowledge graph with entities as nodes and the associations between related entities as edges. Thus, media entities are grouped into communities by how they share neighbors. Then, a structural clustering-based model is proposed to detect communities and discover anchor vertices as well as isolated vertices. Specifically, an initial seed set of matched anchor vertex pairs is obtained. Furthermore, an iterative propagation approach for identifying the matched entities in the whole graph is developed, where community similarity is introduced into the measure function to control the total measurement of candidate pairs. Therefore, starting with the elements of the initial seed set, the entity resolution algorithm updates the matching information over the whole network along with the neighbor relationships iteratively. Extensive experiments are conducted on real datasets to evaluate how the seed set impacts the matching process and performance. The experiment results show this model can achieve an excellent balance between accuracy and efficiency and is a clear improvement compared to state-of-the-art methods.


Entity resolution Structural clustering Iterative propagation Graph structure 



This work is partially supported by National Key Research and Development Plan (No. 2018YFB1003800).


  1. 1.
    Balduzzi M, Platzer C, Holz T, Kirda E, Balzarotti D, Kruegel C (2010) Abusing social networks for automated user profiling. In: International workshop on recent advances in intrusion detection. Springer, pp 422–441Google Scholar
  2. 2.
    Baxter R, Christen P, Churches T, et al. (2003) A comparison of fast blocking methods for record linkage. In: ACM SIGKDD. Citeseer, vol 3, pp 25–27Google Scholar
  3. 3.
    Bhattacharya I, Getoor L (2007) Collective entity resolution in relational data. Acm Trans Knowl Discov Data 1(1):5CrossRefGoogle Scholar
  4. 4.
    Christen P (2012) A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans Knowl Data Eng 24(9):1537–1555CrossRefGoogle Scholar
  5. 5.
    Doan A, Halevy AY (2005) Semantic integration research in the database community: a brief survey. AI Mag 26(1):83Google Scholar
  6. 6.
    Dong X, Halevy A, Madhavan J (2005) Reference reconciliation in complex information spaces. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data. ACM, pp 85–96Google Scholar
  7. 7.
    Elmagarmid AK, Ipeirotis PG, Verykios VS (2012) Duplicate record detection: a survey. IEEE Trans Knowl Data Eng 19(1):1–16CrossRefGoogle Scholar
  8. 8.
    Fellegi IP, Sunter AB (1969) A theory for record linkage. J Am Stat Assoc 64(328):1183–1210CrossRefzbMATHGoogle Scholar
  9. 9.
    Getoor L, Machanavajjhala A (2012) Entity resolution: theory, practice & open challenges. Proc VLDB Endowment 5(12):2018–2019CrossRefGoogle Scholar
  10. 10.
    Gu Q, Zhang Y, Cao J, Xu G, Cuzzocrea A (2014) A confidence-based entity resolution approach with incomplete information. In: International conference on data science and advanced analytics, pp 97–103Google Scholar
  11. 11.
    He JL, Fu Y, Chen DB (2015) A novel top-k strategy for influence maximization in complex networks with community structure. Plos One 10(12):e0145283CrossRefGoogle Scholar
  12. 12.
    Jain P, Kumaraguru P (2012) Finding nemo: searching and resolving identities of users across online social networks. arXiv:1212.6147
  13. 13.
    Jeh G, Widom J (2002) Simrank: a measure of structural-context similarity. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 538–543Google Scholar
  14. 14.
    Jentzsch A, Isele R, Bizer C (2010) Silk-generating rdf links while publishing or consuming linked data. In: 9Th international semantic web conference (ISWC’10)Google Scholar
  15. 15.
    Korula N, Lattanzi S (2014) An efficient reconciliation algorithm for social networks. Proc VLDB Endowment 7(5):377–388CrossRefGoogle Scholar
  16. 16.
    Lacoste-Julien S, Palla K, Davies A, Kasneci G, Graepel T, Ghahramani Z Sigma: simple greedy matching for aligning large knowledge bases. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 572–580Google Scholar
  17. 17.
    Lacoste-Julien S, Palla K, Davies A, Kasneci G, Graepel T, Ghahramani Z (2013) Sigma: simple greedy matching for aligning large knowledge bases. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 572–580Google Scholar
  18. 18.
    Lee T, Hwang SW (2017) Linking, integrating, and translating entities via iterative graph matching. In: Technologies and applications of artificial intelligence, pp 248–255Google Scholar
  19. 19.
    Li J, Wang Z, Zhang X, Tang J (2013) Large scale instance matching via multiple indexes and candidate selection. Knowl-Based Syst 50(3):112–120CrossRefGoogle Scholar
  20. 20.
    Livi L, Rizzi A (2013) The graph matching problem. Pattern Anal Appl 16 (3):253–283MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Mahdisoltani F, Biega J, Suchanek F (2014) Yago3: a knowledge base from multilingual wikipedias. In: 7Th biennial conference on innovative data systems research. CIDR conferenceGoogle Scholar
  22. 22.
    Narayanan A, Shmatikov V (2009) De-anonymizing social networks. In: 2009 IEEE symposium on Security and privacy, pp 173–187Google Scholar
  23. 23.
    Ngomo ACN, Auer S (2011) Limes-a time-efficient approach for large-scale link discovery on the web of data. In: IJCAI, pp 2312–2317Google Scholar
  24. 24.
    Otero-Cerdeira L, Rodríguez-martínez FJ, Gómez-Rodríguez A (2015) Ontology matching: A literature review. Expert Syst Appl 42(2):949–971CrossRefGoogle Scholar
  25. 25.
    Papadakis G, Svirsky J, Gal A, Palpanas T (2016) Comparative analysis of approximate blocking techniques for entity resolution. Proc VLDB Endowment 9 (9):684–695CrossRefGoogle Scholar
  26. 26.
    Shao C, Hu LM, Li JZ, Wang ZC, Chung T, Xia JB (2016) Rimom-im: a novel iterative framework for instance matching. J Comput Sci Technol 31(1):185–197MathSciNetCrossRefGoogle Scholar
  27. 27.
    Shu K, Wang S, Tang J, Zafarani R, Liu H (2017) User identity linkage across online social networks: a review. ACM SIGKDD Explor Newslett 18(2):5–17CrossRefGoogle Scholar
  28. 28.
    Suchanek FM, Abiteboul S, Senellart P (2011) Paris: Probabilistic alignment of relations, instances, and schema. Proc VLDB Endowment 5(3):157–168CrossRefGoogle Scholar
  29. 29.
    Xu X, Yuruk N, Feng Z, Schweiger TAJ (2007) Scan: a structural clustering algorithm for networks. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 824–833Google Scholar
  30. 30.
    Yu M (2014) Entity linking on graph data. In: Proceedings of the 23rd international conference on World Wide Web. ACM, pp 21–26Google Scholar
  31. 31.
    Zhang Y, Tang J, Yang Z, Pei J, Yu PS (2015) Cosnet: Connecting heterogeneous social networks with local and global consistency. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1485–1494Google Scholar
  32. 32.
    Zhu H, Xie R, Liu Z, Sun M (2017) Iterative entity alignment via joint knowledge embeddings. In: Twenty-sixth international joint conference on artificial intelligence, pp 4258–4264Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Science and Engineering, School of Electronic Information and Electrical EngineeringShanghai Jiao Tong UniversityShanghaiChina
  2. 2.Department of Computer Science, School of Information Science and TechnologyNantong UniversityNantongChina

Personalised recommendations