Skip to main content

Visual analytics of genealogy with attribute-enhanced topological clustering

Abstract

Clustering is able to present a brief illustration for families of interest and patterns of significance within large-scale genealogical datasets. In the traditional clustering methods, topological features are mostly taken for summarizing and organizing family trees. However, plentiful attributes are ignored which are also important to enhance the understanding and interpretation of genealogical clustering features. Thus, it is a crucial task to combine structures and attributes into a clustering model for exploring genealogy datasets. In this paper, we propose an attribute-enhanced topological clustering method for exploring genealogy datasets based on partial least squares (PLS). Firstly, a graphlet kernel method is utilized to measure the structure difference between family trees. Then, we leverage PLS to combine the learned vectors and multiple attributes, and a joint dimensionality reduction method is applied to project the high-dimensional vectors into a two-dimensional space in which a distance-based clustering method is employed to aggregate the similar family trees taking both the topological structures and attribute features into consideration. Further, we implement a visual analysis system with multi-view collaboration, including glyph, family tree view and parallel coordinate view, to represent, evaluate and explore the clustering features. Case studies and quantitative comparisons based on real-world genealogy datasets have demonstrated the effectiveness of our method in genealogical clustering and exploration.

Graphic abstract

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig.6
Fig. 7
Fig. 8

Notes

  1. 1.

    Toward Better Bus Networks: {A} Visual Analytics Approach.

  2. 2.

    Facilitating discourse analysis with interactive visualization.

  3. 3.

    Pathline: A Tool For Comparative Functional Genomics Manually linearize a complete network and render attributes next to the linear layout.

  4. 4.

    Tac-Miner: Visual Tactic Mining for Multiple Table Tennis Matches.

References

  1. Bezerianos A, Dragicevic P, Fekete JD, Bae J, Watson B (2011) GeneaQuilts: a system for exploring large genealogies. IEEE Trans vis Comput Graph 16:1073–1081. https://doi.org/10.1109/TVCG.2010.159

    Article  Google Scholar 

  2. Boudjeloud-Assala L, Pinheiro P, Blansch A, Tamisier T, Otjacques B (2016) Interactive and iterative visual clustering. Inf vis 15(3):181–197

    Article  Google Scholar 

  3. Cao N, Gotz D, Sun J, Qu H (2011) DICON: interactive visual analysis of multidimensional clusters. In: IEEE transactions on visualization and computer graphics, vol 17, no 12. pp 2581–2590. https://doi.org/10.1109/TVCG.2011.188

  4. Chen K, Liu L (2004) VISTA: validating and refining clusters via visualization. Inf vis 3(4):257–270

    Article  Google Scholar 

  5. Davies DL, Bouldin DW (1979) A cluster separation measure. In: IEEE transactions on pattern analysis and machine intelligence, vol PAMI-1, no 2, pp 224–227. https://doi.org/10.1109/TPAMI.1979.4766909

  6. Dexter S, Yarmish G, Listowsky P (2016) Parallel clustering of protein structures generated via stochastic Monte Carlo. In: proceedings of symposium on stochastic models in reliability engineering. Life Science and Operations Management (SMRLO), pp 410–413. https://doi.org/10.1109/SMRLO.2016.71

  7. Fu S, Dong H, Cui W, Zhao J, Qu H (2017) How do ancestral traits shape family trees over generations? In: IEEE transactions on visualization and computer graphics, vol. 24, no 1, pp 205–214. https://doi.org/10.1109/TVCG.2017.2744080

  8. Furnas GW, Zacks J (1994) Multitrees: enriching and reusing hierarchical structure. In: CHI’94: proceedings of the SIGCHI conference on human factors in computing systems. ACM, New York, pp 330–336

  9. Grygorash O, Zhou Y, Jorgensen Z (2006) Minimum spanning tree based clustering algorithms. In: Proceedings—international conference on tools with artificial intelligence, ICTAI, pp 73–81. https://doi.org/10.1109/ICTAI.2006.83

  10. Gu T, Zhu M, Chen W et al (2018) Structuring mobility transition with an adaptive graph representation. IEEE Trans Comput Soc Syst 5(4):1121–1132

    Article  Google Scholar 

  11. Hillis DM, Heath TA, John KS (2005) Analysis and visualization of tree space. Syst Biol 54:471–482. https://doi.org/10.1080/10635150590946961

    Article  Google Scholar 

  12. Jin C, Bai Q (2016) Text clustering algorithm based on the graph structures of semantic word co-occurrence. In: 2016 international conference on information system and artificial intelligence (ISAI), pp 497-502. https://doi.org/10.1109/ISAI.2016.0112

  13. Kemp T (1999) Genealogy: finding roots on the web. Coll Res Libr News 60:452–455. https://doi.org/10.5860/crln.60.6.452

    Article  Google Scholar 

  14. Ko S, Afzal S, Walton S, et al (2014) Analyzing high-dimensional multivaríate network links with integrated anomaly detection, highlighting and exploration. In: IEEE conference on visual analytics science and technology (VAST)

  15. Kong X, Chen Y, Tian H, Wang T, Cai Y, Chen X (2016) A novel botnet detection method based on preprocessing data packet by graph structure clustering. In: 2016 international conference on cyber-enabled distributed computing and knowledge discovery (CyberC), pp 42–45. https://doi.org/10.1109/CyberC.2016.16

  16. Kosaka T, Sagayama S (1994) Tree-structured speaker clustering for fast speaker adaptation. In: IEEE international conference on acoustics, pp 245–248. https://doi.org/10.1109/ICASSP.1994.389309

  17. Kozak M (2012) A dendrite method for cluster analysis. Commun Stat Theory Methods 3(1):1–27. https://doi.org/10.1080/03610927408827101

    MathSciNet  Article  Google Scholar 

  18. Kutz DO (2004) Examining the evolution and distribution of patent classifications. In: Proceedings of information visualisation, pp 983–988. https://doi.org/10.1109/IV.2004.1320261

  19. Kwon BC, Eysenbach B, Verma J, Ng K, De Filippi C, Stewart WF, Perer A (2017) Clustervision: visual supervision of unsupervised clustering. In: IEEE transactions on visualization and computer graphics. vol 24, no 1, pp 142–151. https://doi.org/10.1109/TVCG.2017.2745085

  20. L’Yi S, Ko B, Shin D, Cho Y-J, Lee J, Kim B, Seo J (2015) XCluSim: a visual analytics tool for interactively comparing multiple clustering results of bioinformatics data. BMC Bioinf 16(11):S5

    Google Scholar 

  21. Laurens VDM, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(2605):2579–2605

    MATH  Google Scholar 

  22. Liao H, Wu Y, Chen L, Chen W (2018) Cluster-based visual abstraction for multivariate scatterplots. IEEE Trans vis Comput Graph 24(9):2531–2545. https://doi.org/10.1109/TVCG.2017.2754480

    Article  Google Scholar 

  23. Liu M, Shi J, Li Z, Li C, Zhu J, Liu S (2017a) Towards better analysis of deep convolutional neural networks. IEEE Trans vis Comput Graph 23(1):91–100. https://doi.org/10.1109/TVCG.2016.2598831

    Article  Google Scholar 

  24. Liu S, Cui W, Wu Y, Liu M (2014) A survey on information visualization: recent advances and challenges. Vis Comput 30(12):1373–1393. https://doi.org/10.1007/s00371-013-0892-3

    Article  Google Scholar 

  25. Liu Y, Dai S, Wang C, Zhou Z, Qu H (2017). GenealogyVis: A system for visual analysis of multidimensional genealogical data. In: IEEE

  26. Maguire E, Koutsakis I, Louppe G (2016) Clusterix: a visual analytics approach to clustering. In: Symposium on visualization in data science at IEEE VIS

  27. Munzner T, Guimbretiere F, Tasiran S, Zhang L, Zhou Y (2003) TreeJuxtaposer: scalable tree comparison using focus+context with guaranteed visibility. ACM Trans Graph 22:453–462. https://doi.org/10.1145/1201775.882291

    Article  Google Scholar 

  28. Nober C, Gehlenborg N, Coo H et al (2019) Lineage: visualizing multivariate clinical data in genealogy graphs. Trans vis Comput Graph 25(3):1543–1558

    Article  Google Scholar 

  29. Papadopoulos AN, Manolopoulos Y (1999) Structure-based similarity search with graph histograms. In: Proceedings. 10th international workshop on database and expert systems applications. DEXA 99, pp 174–178. https://doi.org/10.1109/DEXA.1999.795162

  30. Partl C, Gratzl S, Streit M et al (2016) Pathfinder: visual analysis of paths in graphs. Comput Graph Forum J Eur Assoc Comput Graph 35(3):71–80

    Article  Google Scholar 

  31. Rahman M, Bhuiyan MA, Rahman M, Al HM (2014) GUISE: a uniform sampler for constructing frequency histogram of graphlets. Knowl Inf Syst 38:511–536. https://doi.org/10.1007/s10115-013-0673-3

    Article  Google Scholar 

  32. Shaw PD, Graham M, Kennedy J, Milne I, Marshall DF (2014) Helium: visualization of large scale plant pedigrees. BMC Bioinf 15(1):259. https://doi.org/10.1186/1471-2105-15-259

    Article  Google Scholar 

  33. Tsuya NO, Wang F, Alter G, Lee JZ (2010) Prudence and pressure: reproduction and human agency in Europe and Asia, 1700–1900. https://doi.org/10.7551/mitpress/8162.001.0001

  34. Wang Y, Shi C, Li L et al (2018) Visualizing research impact through citation data. ACM Trans Interactive Intell Syst 8(1):1–24

    Article  Google Scholar 

  35. Wattenberg M (2006) Visual exploration of multivariate graphs. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 811–819. https://doi.org/10.1145/1124772.1124891

  36. White D (1993) Representing and computing kinship—a new approach (VOL 33, PG 454, 1992). Curr Anthropol 34:176–176

    Article  Google Scholar 

  37. Wu W, Xu J, Zeng H, Zheng Y, Qu H, Ni B, Yuan M, Ni LM (2015) TelCoVis: visual exploration of co-occurrence in urban human mobility based on telco data. IEEE Trans vis Comput Graph 22:1–1. https://doi.org/10.1109/TVCG.2015.2467194

    Article  Google Scholar 

  38. Xia JZ, Zhang YH, Ye H, Wang Y, Jiang G, Zhao Y, Xie C, Kui XY, Liao SH, Wang WP (2020a) SuPoolVisor: a visual analytics system for mining pool surveillance. Front Inf Technol Electron Eng 21(4):507–523

    Article  Google Scholar 

  39. Xia et al (2020b) SMAP: a joint dimensionality reduction scheme for secure multi-party visualization. IEEE Conf vis Anal Sci Technol 2020:107–118. https://doi.org/10.1109/VAST50239.2020.00015

    Article  Google Scholar 

  40. Xia J, Ye F, Chen W, Wang Y, Chen W, Ma Y, Tung AK (2018) LDSScanner: exploratory analysis of low-dimensional structures in high-dimensional datasets. IEEE Trans vis Comput Graph 24(1):236–245

    Article  Google Scholar 

  41. Yang M, Wu C, Xie T (2020) Information propagation dynamics model based on implicit cluster structure network. In: Proceedings of IEEE information technology and mechatronics engineering conference, pp 1253–1257. https://doi.org/10.1109/ITOEC49072.2020.9141733

  42. Yuan J, Chen C, Yang W et al (2020) A survey of visual analytics techniques for machine learning. Comput Vis Media 7:3–36. https://doi.org/10.1007/s41095-020-0191-7

    Article  Google Scholar 

  43. Zhang K, Wang JT, Shasha D (2011) On the editing distance between undirected acyclic graphs. Int J Found Comput Sci. https://doi.org/10.1142/S0129054196000051

    Article  MATH  Google Scholar 

  44. Zhao Y, Luo X, Lin X, Wang H, Kui X, Zhou F, Wang J, Chen Y, Chen W (2020) Visual analytics for electromagnetic situation awareness in radio monitoring and management. IEEE Trans vis Comput Graph 26(1):590–600. https://doi.org/10.1109/TVCG.2019.2934655

    Article  Google Scholar 

  45. Zhao Y, Jiang H, Qin Y, Xie H, Wu Y, Liu S, Zhou Z, Xia J, Zhou F (2021) Preserving minority structures in graph sampling. IEEE Trans vis Comput Graph 27(2):1–10. https://doi.org/10.1109/TVCG.2020.3030428

    Article  Google Scholar 

  46. Zhou Z, Ye Z, Liu Y et al (2017a) Visual analytics for spatial clusters of air-quality data. IEEE Comput Graph Appl 37(5):98

    Article  Google Scholar 

  47. Zhou Z et al (2021) Context-aware sampling of large networks via graph representation learning. IEEE Trans vis Comput Graph 27(2):1709–1719. https://doi.org/10.1109/TVCG.2020.3030440

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank the reviewers for their thoughtful comments. The work is supported in part by the National Natural Science Foundation of China (Nos. 61872314 and 61802339), the Open Project Program of the State Key Lab of CADCG of Zhejiang University (No. A2001) and the Natural Science Foundation of Zhejiang Province, China (Nos. LY21F020029 and LY19F020011).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Zhiguang Zhou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sun, L., Zhang, X., Pan, X. et al. Visual analytics of genealogy with attribute-enhanced topological clustering. J Vis (2021). https://doi.org/10.1007/s12650-021-00802-x

Download citation

Keywords

  • Visualization in the humanities
  • Data aggregation
  • Data clustering
  • Compression techniques
  • Dimensionality reduction
  • Hierarchical data