Abstract
We propose a principled approach for the problem of aligning multiple partially overlapping networks. The objective is to map multiple graphs into a single graph while preserving vertex and edge similarities. The problem is inspired by the task of integrating partial views of a family tree (genealogical network) into one unified network, but it also has applications, for example, in social and biological networks. Our approach, called Flan, introduces the idea of generalizing the facility location problem by adding a non-linear term to capture edge similarities and to infer the underlying entity network. The problem is solved using an alternating optimization procedure with a Lagrangian relaxation. Flan has the advantage of being able to leverage prior information on the number of entities, so that when this information is available, Flan is shown to work robustly without the need to use any ground truth data for fine-tuning method parameters. Additionally, we present three multiple-network extensions to an existing state-of-the-art pairwise alignment method called Natalie. Extensive experiments on synthetic, as well as real-world datasets on social networks and genealogical networks, attest to the effectiveness of the proposed approaches which clearly outperform a popular multiple network alignment method called IsoRankN.
Similar content being viewed by others
Notes
The code is available at: https://github.com/ekQ/flan.
Like in the case of Natalie, we assume an ordering of graphs and consider aligning vertex i with itself or any vertex from graphs \(g'=1,\ldots ,g-1\). In other words, we avoid considering simultaneously vertex i as an entity for vertex j and j as an entity for i, which we have observed to result in larger duality gaps.
The implementation of the feasibility heuristics is available at: https://github.com/ekQ/flan.
For simplicity, we write “\(\min _{B} \text {objective}\)” although the objective is being minimized only w.r.t. elements \(B_{jl}\), where \((j,\ell ) \notin E_I\).
The implementation is available at https://www.cs.purdue.edu/homes/dgleich/codes/netalign/ and has been used in Bayati et al. (2013) and Malmi et al. (2016).
References
Althaus E, Canzar S (2008) A Lagrangian relaxation approach for the multiple sequence alignment problem. J Comb Optim 16(2):127–154
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Bayati M, Gleich DF, Saberi A, Wang Y (2013) Message-passing algorithms for sparse network alignment. ACM Trans Knowl Discov Data 7(1):3
Bezdek JC, Hathaway RJ (2003) Convergence of alternating optimization. Neural Parallel Sci Comput 11(4):351–368
Bhattacharya I, Getoor L (2007) Collective entity resolution in relational data. ACM Trans Knowl Discov Data 1(1):5
Christen P (2012) Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection. Springer, Berlin
Christen P, Vatsalan D, Fu Z (2015) Advanced record linkage methods and privacy aspects for population reconstruction—a survey and case studies. In: Population reconstruction. Springer, pp 87–110
Clark C, Kalita J (2014) A comparison of algorithms for the pairwise alignment of biological networks. Bioinformatics 30(16):2351–2359
Conte D, Foggia P, Sansone C, Vento M (2004) Thirty years of graph matching in pattern recognition. IJPRAI 18(3):265–298
Cornuejols G, Fisher ML, Nemhauser GL (1977) Location of bank accounts to optimize float: an analytic study of exact and approximate algorithms. Manag Sci 23(8):789–810
Efremova J, Ranjbar-Sahraei B, Rahmani H, Oliehoek FA, Calders T, Tuyls K, Weiss G (2015) Multi-source entity resolution for genealogical data. In: Population reconstruction. Springer, pp 129–154
El-Kebir M, Heringa J, Klau GW (2015) Natalie 2.0: sparse global network alignment as a special case of quadratic assignment. Algorithms 8(4):1035–1051
Elmsallati A, Clark C, Kalita J (2015) Global alignment of protein–protein interaction networks: a survey. IEEE/ACM Trans Comput Biol Bioinform PP(99):1-1. doi:10.1109/TCBB.2015.2474391
Fisher ML (1981) The Lagrangian relaxation method for solving integer programming problems. Manag Sci 27:1–18
Goga O, Loiseau P, Sommer R, Teixeira R, Gummadi KP (2015) On the reliability of profile matching across large online social networks. In: Proceedings of the 21st ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1799–1808
Hochbaum DS (1982) Heuristics for the fixed cost median problem. Math Program 22(1):148–162
Hu J, Kehr B, Reinert K (2013) NetCoffee: a fast and accurate global alignment approach to identify functionally conserved proteins in multiple networks. Bioinformatics 30(4):540–548
Klau GW (2009) A new graph-based method for pairwise global network alignment. BMC Bioinform 10(Suppl 1):S59
Kouki P, Marcum C, Koehly L, Getoor L (2016) Entity resolution in familial networks. In: Proceedings of the 12th workshop on mining and learning with graphs
Liao CS, Lu K, Baym M, Singh R, Berger B (2009) IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics 25(12):i253–i258. doi:10.1093/bioinformatics/btp203
Magnani M, Micenkova B, Rossi L (2013) Combinatorial analysis of multiple networks. arXiv:1303.4986
Malmi E, Terzi E, Gionis A (2016) Active network alignment: a matching-based approach. arXiv:1610.05516
Sahraeian SME, Yoon BJ (2013) SMETANA: accurate and scalable algorithm for probabilistic alignment of large-scale biological networks. PLOS ONE 8(7):e67,995
Shor NZ (2012) Minimization methods for non-differentiable functions, vol 3. Springer, New York
Singh R, Xu J, Berger B (2008) Global alignment of multiple protein interaction networks with application to functional orthology detection. Proc Natl Acad Sci 105(35):12763–12768
Singla P, Domingos P (2006) Entity resolution with markov logic. In: Proceedings of the sixth international conference on data mining, ICDM’06. IEEE, pp 572–582
Vazirani VV (2001) Approximation algorithms. Springer, New York
Winkler WE (1990) String comparator metrics and enhanced decision rules in the fellegi–sunter model of record linkage. In: Proceedings of the section on survey research methods. American Statistical Association, pp 354–359
Zhai Y, Liu B (2005) Web data extraction based on partial tree alignment. In: Proceedings of the 14th international conference on world wide web. ACM, pp 76–85
Zhang J, Yu PS (2015) Multiple anonymized social networks alignment. In: Proceedings of the IEEE international conference on data mining, ICDM’15. IEEE
Acknowledgements
The authors are grateful to Pekka Valta and the Genealogical Society of Finland for providing the family tree dataset, to Jukka Suomela for useful discussions on Flan, to Gunnar W. Klau for his advice on extending Natalie to multiple networks, and to the anonymous reviewers for their constructive comments. This work was supported by Academy of Finland Project “Nestor” (286211).
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editors: Thomas Gärtner, Mirco Nanni, Andrea Passerini and Celine Robardet.
Rights and permissions
About this article
Cite this article
Malmi, E., Chawla, S. & Gionis, A. Lagrangian relaxations for multiple network alignment. Data Min Knowl Disc 31, 1331–1358 (2017). https://doi.org/10.1007/s10618-017-0505-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-017-0505-2