Skip to main content

Visual analytics of genealogy with attribute-enhanced topological clustering


Clustering is able to present a brief illustration for families of interest and patterns of significance within large-scale genealogical datasets. In the traditional clustering methods, topological features are mostly taken for summarizing and organizing family trees. However, plentiful attributes are ignored which are also important to enhance the understanding and interpretation of genealogical clustering features. Thus, it is a crucial task to combine structures and attributes into a clustering model for exploring genealogy datasets. In this paper, we propose an attribute-enhanced topological clustering method for exploring genealogy datasets based on partial least squares (PLS). Firstly, a graphlet kernel method is utilized to measure the structure difference between family trees. Then, we leverage PLS to combine the learned vectors and multiple attributes, and a joint dimensionality reduction method is applied to project the high-dimensional vectors into a two-dimensional space in which a distance-based clustering method is employed to aggregate the similar family trees taking both the topological structures and attribute features into consideration. Further, we implement a visual analysis system with multi-view collaboration, including glyph, family tree view and parallel coordinate view, to represent, evaluate and explore the clustering features. Case studies and quantitative comparisons based on real-world genealogy datasets have demonstrated the effectiveness of our method in genealogical clustering and exploration.

Graphic abstract

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 7
Fig. 8


  1. 1.

    Toward Better Bus Networks: {A} Visual Analytics Approach.

  2. 2.

    Facilitating discourse analysis with interactive visualization.

  3. 3.

    Pathline: A Tool For Comparative Functional Genomics Manually linearize a complete network and render attributes next to the linear layout.

  4. 4.

    Tac-Miner: Visual Tactic Mining for Multiple Table Tennis Matches.


  1. Bezerianos A, Dragicevic P, Fekete JD, Bae J, Watson B (2011) GeneaQuilts: a system for exploring large genealogies. IEEE Trans vis Comput Graph 16:1073–1081.

    Article  Google Scholar 

  2. Boudjeloud-Assala L, Pinheiro P, Blansch A, Tamisier T, Otjacques B (2016) Interactive and iterative visual clustering. Inf vis 15(3):181–197

    Article  Google Scholar 

  3. Cao N, Gotz D, Sun J, Qu H (2011) DICON: interactive visual analysis of multidimensional clusters. In: IEEE transactions on visualization and computer graphics, vol 17, no 12. pp 2581–2590.

  4. Chen K, Liu L (2004) VISTA: validating and refining clusters via visualization. Inf vis 3(4):257–270

    Article  Google Scholar 

  5. Davies DL, Bouldin DW (1979) A cluster separation measure. In: IEEE transactions on pattern analysis and machine intelligence, vol PAMI-1, no 2, pp 224–227.

  6. Dexter S, Yarmish G, Listowsky P (2016) Parallel clustering of protein structures generated via stochastic Monte Carlo. In: proceedings of symposium on stochastic models in reliability engineering. Life Science and Operations Management (SMRLO), pp 410–413.

  7. Fu S, Dong H, Cui W, Zhao J, Qu H (2017) How do ancestral traits shape family trees over generations? In: IEEE transactions on visualization and computer graphics, vol. 24, no 1, pp 205–214.

  8. Furnas GW, Zacks J (1994) Multitrees: enriching and reusing hierarchical structure. In: CHI’94: proceedings of the SIGCHI conference on human factors in computing systems. ACM, New York, pp 330–336

  9. Grygorash O, Zhou Y, Jorgensen Z (2006) Minimum spanning tree based clustering algorithms. In: Proceedings—international conference on tools with artificial intelligence, ICTAI, pp 73–81.

  10. Gu T, Zhu M, Chen W et al (2018) Structuring mobility transition with an adaptive graph representation. IEEE Trans Comput Soc Syst 5(4):1121–1132

    Article  Google Scholar 

  11. Hillis DM, Heath TA, John KS (2005) Analysis and visualization of tree space. Syst Biol 54:471–482.

    Article  Google Scholar 

  12. Jin C, Bai Q (2016) Text clustering algorithm based on the graph structures of semantic word co-occurrence. In: 2016 international conference on information system and artificial intelligence (ISAI), pp 497-502.

  13. Kemp T (1999) Genealogy: finding roots on the web. Coll Res Libr News 60:452–455.

    Article  Google Scholar 

  14. Ko S, Afzal S, Walton S, et al (2014) Analyzing high-dimensional multivaríate network links with integrated anomaly detection, highlighting and exploration. In: IEEE conference on visual analytics science and technology (VAST)

  15. Kong X, Chen Y, Tian H, Wang T, Cai Y, Chen X (2016) A novel botnet detection method based on preprocessing data packet by graph structure clustering. In: 2016 international conference on cyber-enabled distributed computing and knowledge discovery (CyberC), pp 42–45.

  16. Kosaka T, Sagayama S (1994) Tree-structured speaker clustering for fast speaker adaptation. In: IEEE international conference on acoustics, pp 245–248.

  17. Kozak M (2012) A dendrite method for cluster analysis. Commun Stat Theory Methods 3(1):1–27.

    MathSciNet  Article  Google Scholar 

  18. Kutz DO (2004) Examining the evolution and distribution of patent classifications. In: Proceedings of information visualisation, pp 983–988.

  19. Kwon BC, Eysenbach B, Verma J, Ng K, De Filippi C, Stewart WF, Perer A (2017) Clustervision: visual supervision of unsupervised clustering. In: IEEE transactions on visualization and computer graphics. vol 24, no 1, pp 142–151.

  20. L’Yi S, Ko B, Shin D, Cho Y-J, Lee J, Kim B, Seo J (2015) XCluSim: a visual analytics tool for interactively comparing multiple clustering results of bioinformatics data. BMC Bioinf 16(11):S5

    Google Scholar 

  21. Laurens VDM, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(2605):2579–2605

    MATH  Google Scholar 

  22. Liao H, Wu Y, Chen L, Chen W (2018) Cluster-based visual abstraction for multivariate scatterplots. IEEE Trans vis Comput Graph 24(9):2531–2545.

    Article  Google Scholar 

  23. Liu M, Shi J, Li Z, Li C, Zhu J, Liu S (2017a) Towards better analysis of deep convolutional neural networks. IEEE Trans vis Comput Graph 23(1):91–100.

    Article  Google Scholar 

  24. Liu S, Cui W, Wu Y, Liu M (2014) A survey on information visualization: recent advances and challenges. Vis Comput 30(12):1373–1393.

    Article  Google Scholar 

  25. Liu Y, Dai S, Wang C, Zhou Z, Qu H (2017). GenealogyVis: A system for visual analysis of multidimensional genealogical data. In: IEEE

  26. Maguire E, Koutsakis I, Louppe G (2016) Clusterix: a visual analytics approach to clustering. In: Symposium on visualization in data science at IEEE VIS

  27. Munzner T, Guimbretiere F, Tasiran S, Zhang L, Zhou Y (2003) TreeJuxtaposer: scalable tree comparison using focus+context with guaranteed visibility. ACM Trans Graph 22:453–462.

    Article  Google Scholar 

  28. Nober C, Gehlenborg N, Coo H et al (2019) Lineage: visualizing multivariate clinical data in genealogy graphs. Trans vis Comput Graph 25(3):1543–1558

    Article  Google Scholar 

  29. Papadopoulos AN, Manolopoulos Y (1999) Structure-based similarity search with graph histograms. In: Proceedings. 10th international workshop on database and expert systems applications. DEXA 99, pp 174–178.

  30. Partl C, Gratzl S, Streit M et al (2016) Pathfinder: visual analysis of paths in graphs. Comput Graph Forum J Eur Assoc Comput Graph 35(3):71–80

    Article  Google Scholar 

  31. Rahman M, Bhuiyan MA, Rahman M, Al HM (2014) GUISE: a uniform sampler for constructing frequency histogram of graphlets. Knowl Inf Syst 38:511–536.

    Article  Google Scholar 

  32. Shaw PD, Graham M, Kennedy J, Milne I, Marshall DF (2014) Helium: visualization of large scale plant pedigrees. BMC Bioinf 15(1):259.

    Article  Google Scholar 

  33. Tsuya NO, Wang F, Alter G, Lee JZ (2010) Prudence and pressure: reproduction and human agency in Europe and Asia, 1700–1900.

  34. Wang Y, Shi C, Li L et al (2018) Visualizing research impact through citation data. ACM Trans Interactive Intell Syst 8(1):1–24

    Article  Google Scholar 

  35. Wattenberg M (2006) Visual exploration of multivariate graphs. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 811–819.

  36. White D (1993) Representing and computing kinship—a new approach (VOL 33, PG 454, 1992). Curr Anthropol 34:176–176

    Article  Google Scholar 

  37. Wu W, Xu J, Zeng H, Zheng Y, Qu H, Ni B, Yuan M, Ni LM (2015) TelCoVis: visual exploration of co-occurrence in urban human mobility based on telco data. IEEE Trans vis Comput Graph 22:1–1.

    Article  Google Scholar 

  38. Xia JZ, Zhang YH, Ye H, Wang Y, Jiang G, Zhao Y, Xie C, Kui XY, Liao SH, Wang WP (2020a) SuPoolVisor: a visual analytics system for mining pool surveillance. Front Inf Technol Electron Eng 21(4):507–523

    Article  Google Scholar 

  39. Xia et al (2020b) SMAP: a joint dimensionality reduction scheme for secure multi-party visualization. IEEE Conf vis Anal Sci Technol 2020:107–118.

    Article  Google Scholar 

  40. Xia J, Ye F, Chen W, Wang Y, Chen W, Ma Y, Tung AK (2018) LDSScanner: exploratory analysis of low-dimensional structures in high-dimensional datasets. IEEE Trans vis Comput Graph 24(1):236–245

    Article  Google Scholar 

  41. Yang M, Wu C, Xie T (2020) Information propagation dynamics model based on implicit cluster structure network. In: Proceedings of IEEE information technology and mechatronics engineering conference, pp 1253–1257.

  42. Yuan J, Chen C, Yang W et al (2020) A survey of visual analytics techniques for machine learning. Comput Vis Media 7:3–36.

    Article  Google Scholar 

  43. Zhang K, Wang JT, Shasha D (2011) On the editing distance between undirected acyclic graphs. Int J Found Comput Sci.

    Article  MATH  Google Scholar 

  44. Zhao Y, Luo X, Lin X, Wang H, Kui X, Zhou F, Wang J, Chen Y, Chen W (2020) Visual analytics for electromagnetic situation awareness in radio monitoring and management. IEEE Trans vis Comput Graph 26(1):590–600.

    Article  Google Scholar 

  45. Zhao Y, Jiang H, Qin Y, Xie H, Wu Y, Liu S, Zhou Z, Xia J, Zhou F (2021) Preserving minority structures in graph sampling. IEEE Trans vis Comput Graph 27(2):1–10.

    Article  Google Scholar 

  46. Zhou Z, Ye Z, Liu Y et al (2017a) Visual analytics for spatial clusters of air-quality data. IEEE Comput Graph Appl 37(5):98

    Article  Google Scholar 

  47. Zhou Z et al (2021) Context-aware sampling of large networks via graph representation learning. IEEE Trans vis Comput Graph 27(2):1709–1719.

    Article  Google Scholar 

Download references


We would like to thank the reviewers for their thoughtful comments. The work is supported in part by the National Natural Science Foundation of China (Nos. 61872314 and 61802339), the Open Project Program of the State Key Lab of CADCG of Zhejiang University (No. A2001) and the Natural Science Foundation of Zhejiang Province, China (Nos. LY21F020029 and LY19F020011).

Author information



Corresponding author

Correspondence to Zhiguang Zhou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sun, L., Zhang, X., Pan, X. et al. Visual analytics of genealogy with attribute-enhanced topological clustering. J Vis (2021).

Download citation


  • Visualization in the humanities
  • Data aggregation
  • Data clustering
  • Compression techniques
  • Dimensionality reduction
  • Hierarchical data