Skip to main content

What makes a scatterplot hard to comprehend: data size and pattern salience matter

Abstract

With the growing popularity of visualizations in various fields, visualization comprehension has gained considerable attention. In this work, we focus on the effect of data size and pattern salience on comprehension of scatterplot, a popular visualization type. We began with a preliminary study in which we interviewed 50 people in terms of comprehension difficulties of 90 different visualizations. The results reveal that data size is one of the top three factors affecting visualization comprehension. Besides, the effect of data size probably depends on the pattern salience within the data. Therefore, we carried out our experiment on the effect of data size and data-related pattern salience on three intermediate-level comprehension tasks, namely finding anomalies, judging correlation, and identifying clusters. The tasks were conducted on the scatterplot due to its familiarity to users and ability to support diverse tasks. Through the experiment, we found a significant interaction effect of data size and pattern salience on the comprehension of the trends in scatterplots. In specific conditions of pattern salience, data size impacts the judgment of anomalies and cluster centers. We discussed the findings in our experiment and further summarized the factors in visualization comprehension.

Graphic abstract

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. 1.

    https://www.kaggle.com/.

  2. 2.

    https://github.com/VisWang/scatterplots-dataset.

References

  1. Alper B, Riche NH, Chevalier F, Boy J, Sezgin M (2017) Visualization literacy at elementary school. In: Proceedings of the CHI conference on human factors in computing systems, pp 5485–5497

  2. Bertin J, Berg WJ (1985) Semiology of graphics: diagrams, networks, maps. Ann Assoc Am Geogr 75(4):605–609

    Google Scholar 

  3. Best LA, Hunter AC, Stewart BM (2006) Perceiving relationships: a physiological examination of the perception of scatterplots. In: Barker-Plummer D, Cox R, Swoboda N (eds) Diagrammatic representation and inference. Diagrams 2006, pp 244–257

  4. Borkin MA, Vo AA, Bylinskii Z, Isola P, Sunkavalli S, Oliva A, Pfister H (2013) What makes a visualization memorable? IEEE Trans Vis Comput Graph 19(12):2306–2315

    Article  Google Scholar 

  5. Börner K, Maltese A, Balliet RN, Heimlich J (2016) Investigating aspects of data visualization literacy using 20 information visualizations and 273 science museum visitors. Inf Vis 15(3):198–213

    Article  Google Scholar 

  6. Börner K, Bueckle A, Ginda M (2019) Data visualization literacy: definitions, conceptual, frameworks, exercises, and assessments. Proc Natl Acad Sci 116(6):1857–1864

    Article  Google Scholar 

  7. Boy J, Rensink RA, Bertini E, Fekete JD (2014) A principled way of assessing visualization literacy. IEEE Trans Vis Comput Graph 20(12):1963–1972

    Article  Google Scholar 

  8. Carpenter PA, Shah P (1998) A model of the perceptual and conceptual processes in graph comprehension. J Exp Psychol Appl 4(2):75–100

    Article  Google Scholar 

  9. Carswell CM (1992) Choosing specifiers: an evaluation of the basic tasks model of graphical perception. Hum Factors 34(5):535–554

    Article  Google Scholar 

  10. Chen R, Shu X, Chen J, Weng D, Tang J, Fu S, Wu Y (2021) Nebula: a coordinating grammar of graphics. IEEE Trans Vis Comput Graph. https://doi.org/10.1109/TVCG.2021.3076222

    Article  Google Scholar 

  11. Cleveland WS, McGill R (1984) Graphical perception: theory, experimentation, and application to the development of graphical methods. J Am Stat Assoc 79(387):531–554

    Article  Google Scholar 

  12. Curcio FR (1987) Comprehension of mathematical relationships expressed in graphs. J Res Math Educ 18(5):382–393

    Article  Google Scholar 

  13. delMas R, Garfield J, Ooms A (2005) Using assessment items to study students’ difficulty reading and interpreting graphical representations of distributions. In: Proceedings of the fourth international research forum on statistical reasoning, thinking, and literacy

  14. Deng Z, Weng D, Liang Y, Bao J, Zheng Y, Schreck T, Xu M, Wu Y (2021) Visual cascade analytics of large-scale spatiotemporal data. IEEE Trans Vis Comput Graph. https://doi.org/10.1109/TVCG.2021.3071387

    Article  Google Scholar 

  15. Embretson SE, Reise SP (2000) Item response theory for psychologists. Lawrence Erlbaum Associates Publishers, Mahwah

    Google Scholar 

  16. Filipov V, Schetinger V, Raminger K, Soursos N, Zapke S, Miksch S (2021) Gone full circle: a radial approach to visualize event-based networks in digital humanities. Vis Inform 5(1):45–60

    Article  Google Scholar 

  17. Freedman EG, Shah P (2002) Toward a model of knowledge-based graph comprehension. In: Hegarty M, Meyer B, Narayanan NH (eds) Diagrammatic representation and inference. Diagrams 2002, pp 18–30

  18. Friendly M, Denis D (2005) The early origins and development of the scatterplot. J Hist Behav Sci 41(2):103–130

    Article  Google Scholar 

  19. Galesic M, Garcia-Retamero R (2011) Graph literacy: a cross-cultural comparison. Med Decis Mak 31(3):444–457

    Article  Google Scholar 

  20. Handzic M, Lam B, Aurum A, Oliver G (2002) A comparative analysis of two knowledge discovery tool: Scatterplot versus barchart. In: Proceedings of international conference on data mining, pp 167–176

  21. Heer J, Bostock M, Ogievetsky V (2010) A tour through the visualization zoo. Commun ACM 53(6):59–67

    Article  Google Scholar 

  22. Hopkins B, Skellam JG (1954) A new method for determining the type of distribution of plant individuals. Ann Bot 18(2):213–227

    Article  Google Scholar 

  23. Huang W, Eades P, Hong SH (2009) Measuring effectiveness of graph visualizations: a cognitive load perspective. Inf Vis 8(3):139–152

    Article  Google Scholar 

  24. Hu K, Gaikwad N, Bakker M, Hulsebos M, Zgraggen E, Hidalgo C, Kraska T, Li G, Satyanarayan A (2019) Çağatay Demiralp: Viznet: towards a large-scale visualization learning and benchmarking repository. In: Proceedings of the conference on human factors in computing systems, pp 1–12

  25. Jin Z, Chen N, Shi Y, Qian W, Xu M, Cao N (2021) TrammelGraph: visual graph abstraction for comparison. J Vis 24(2):365–379

    Article  Google Scholar 

  26. Kim Y, Heer J (2018) Assessing effects of task and data distribution on the effectiveness of visual encodings. Comput Graph Forum 37(3):157–167

    Article  Google Scholar 

  27. Klein G, Moon B, Hoffman RR (2006) Making sense of sensemaking 2: a macrocognitive model. IEEE Intell Syst 21(5):88–92

    Article  Google Scholar 

  28. Klein G, Phillips JK, Rall EL, Peluso DA (2007) A data-frame theory of sensemaking. In: Expertise out of context: proceedings of the sixth international conference on naturalistic decision making, pp 113–155

  29. Kwon BC, Lee B (2016) A comparative evaluation on online learning approaches using parallel coordinate visualization. In: Proceedings of the CHI conference on human factors in computing systems, pp 993–997

  30. Lan J, Wang J, Shu X, Zhou Z, Zhang H, Wu Y (2021) RallyComparator: visual comparison of the multivariate and spatial stroke sequence in a Table Tennis Rally. J Vis (to appear)

  31. Lee S, Kim SH, Hung YH (2016) How do people make sense of unfamiliar visualizations? A grounded model of novice’s information visualization sensemaking. IEEE Trans Vis Comput Graph 22(1):499–508

    Article  Google Scholar 

  32. Lee S, Kim SH, Kwon BC (2017) Vlat: development of a visualization literacy assessment test. IEEE Trans Vis Comput Graph 23(1):551–560

    Article  Google Scholar 

  33. Lee S, Kwon B, Yang J, Lee B, Kim SH (2019) The correlation between users’ cognitive characteristics and visualization literacy. Appl Sci 9(3):488

    Article  Google Scholar 

  34. Li J, Martens JB, van Wijk JJ (2010) Judging correlation from scatterplots and parallel coordinate plots. Inf Vis 9(1):13–30

    Article  Google Scholar 

  35. Li Y, Fujiwara T, Choi YK, Kim KK, Ma KL (2020) A visual analytics system for multi-model comparison on clinical data predictions. Vis Inform 4(2):122–131

    Article  Google Scholar 

  36. Liu FT, Ting KM, hua Zhou Z (2008) Isolation forest. In: Proceedings of IEEE international conference on data mining, pp 413–422

  37. Liu Z, Stasko J (2010) Mental models, visual reasoning and interaction in information visualization: a top-down perspective. IEEE Trans Vis Comput Graph 16(6):999–1008

    Article  Google Scholar 

  38. Ma Y, Tung AK, Wang W, Gao X, Pan Z, Chen W (2020) Scatternet: a deep subjective similarity model for visual analysis of scatterplots. IEEE Trans Vis Comput Graph 26(3):1562–1576

    Article  Google Scholar 

  39. Mei H, Guan H, Xin C, Wen X, Chen W (2020) DataV: data visualization on large high-resolution displays. Vis Inform 4(3):12–23

    Article  Google Scholar 

  40. Nguyen QV, Miller N, Arness D, Huang W, Huang ML, Simoff S (2020) Evaluation on interactive visualization data with scatterplots. Vis Inform 4(4):1–10

    Article  Google Scholar 

  41. Niklas E, Fekete JD (2010) Hierarchical aggregation for information visualization: overview, techniques, and design guidelines. IEEE Trans Vis Comput Graph 16(3):439–454

    Article  Google Scholar 

  42. Pan J, Chen W, Zhao X, Zhou S, Zeng W, Zhu M, Chen J, Fu S, Wu Y (2020) Exemplar-based layout fine-tuning for node-link diagrams. IEEE Trans Vis Comput Graph 27(2):1655–1665

    Article  Google Scholar 

  43. Patterson RE, Blaha LM, Grinstein GG, Liggett KK, Kaveney DE, Sheldon KC, Havig PR, Moore JA (2014) A human cognition framework for information visualization. Comput Graph 42:42–58

    Article  Google Scholar 

  44. Pinker S (1990) A theory of graph comprehension. In: Freedle R (ed) Artificial intelligence and the future of testing. Lawrence Erlbaum Associates Publishers, Mahwah, pp 73–126

    Google Scholar 

  45. Rensink RA, Baldridge G (2010) The perception of correlation in scatterplots. Comput Graph Forum 29(3):1203–1210

    Article  Google Scholar 

  46. Ruchikachorn P, Mueller K (2015) Learning visualizations by analogy: promoting visual literacy through visualization morphing. IEEE Trans Vis Comput Graph 21(9):1028–1044

    Article  Google Scholar 

  47. Ryan G, Mosca A, Chang R, Wu E (2019) At a glance: pixel approximate entropy as a measure of line chart complexity. IEEE Trans Vis Comput Graph 25(1):872–881

    Article  Google Scholar 

  48. Sarikaya A, Gleicher M (2018) Scatterplots: tasks, data, and designs. IEEE Trans Vis Comput Graph 24(1):402–412

    Article  Google Scholar 

  49. Shah P, Freedman EG (2011) Bar and line graph comprehension: an interaction of top-down and bottom-up processes. Top Cognit Sci 3(3):560–578

    Article  Google Scholar 

  50. Shah P, Hoeffner J (2002) Review of graph comprehension research: implications for instruction. Educ Psychol Rev 14(1):47–69

    Article  Google Scholar 

  51. Shi D, Xu X, Sun F, Shi Y, Cao N (2020) Calliope: automatic visual data story generation from a spreadsheet. IEEE Trans Vis Comput Graph 27(2):453–463

    Article  Google Scholar 

  52. Shu X, Wu J, Wu X, Liang H, Cui W, Wu Y, Qu H (2021) Dancingwords: exploring animated word clouds to tell stories. J Vis 24(1):85–100

    Article  Google Scholar 

  53. Simkin D, Hastie R (1987) An information-processing analysis of graph perception. J Am Stat Assoc 82(398):454–465

    Article  Google Scholar 

  54. Spence I (2005) No humble pie: the origins and usage of a statistical chart. J Educ Behav Stat 30(4):353–368

    Article  Google Scholar 

  55. Spence I, Lewandowsky S (1991) Displaying proportions and percentages. Appl Cognit Psychol 5(1):61–77

    Article  Google Scholar 

  56. Tang J, Zhou Y, Tang T, Weng D, Xie B, Yu L, Zhang H, Wu Y (2022) A visualization approach for monitoring order processing in e-commerce warehouse. IEEE Trans Vis Comput Graph

  57. Tatu A, Bak P, Bertini E, Keim D, Schneidewind J (2010) Visual quality metrics and human perception: an initial study on 2d projections of large multidimensional data. In: Proceedings of the international conference on advanced visual interfaces, pp 49–56

  58. Tufte ER (2001) The visual display of quantitative information. Graphics Press, Cheshire

    Google Scholar 

  59. Wainer H (1992) Understanding graphs and tables. Educ Res 21(1):14–23

    Article  Google Scholar 

  60. Wang Y, Wang Z, Zhu L, Zhang J, Fu CW, Cheng Z, Tu C, Chen B (2018) Is there a robust technique for selecting aspect ratios in line charts? IEEE Trans Vis Comput Graph 24(12):3096–3110

    Article  Google Scholar 

  61. Wang J, Zhao K, Deng D, Cao A, Xie X, Zhou Z, Zhang H, Wu Y (2020) Tac-Simur: tactic-based simulative visual analytics of table tennis. IEEE Trans Vis Comput Graph 26(1):407–417

    Article  Google Scholar 

  62. Wang J, Wu J, Cao A, Zhou Z, Zhang H, Wu Y (2021) Tac-Miner: visual tactic mining for multiple table tennis matches. IEEE Trans Vis Comput Graph 27(6):2770–2782

    Article  Google Scholar 

  63. Wang Y, Peng TQ, Lu H, Wang H, Xie X, Qu H, Wu Y (2022) Seek for success: a visualization approach for understanding the dynamics of academic careers. IEEE Trans Vis Comput Graph

  64. Weng D, Zheng C, Deng Z, Ma M, Bao J, Zheng Y, Xu M, Wu Y (2021) Towards better bus networks: a visual analytics approach. IEEE Trans Vis Comput Graph 27(2):817–827

    Article  Google Scholar 

  65. Wilkinson L, Anand A, Grossman R (2005) Graph-theoretic scagnostics. In: Proceedings of IEEE symposium on information visualization, pp 157–164

  66. Wu Y, Weng D, Deng Z, Bao J, Xu M, Wang Z, Zheng Y, Ding Z, Chen W (2020) Towards better detection and analysis of massive spatiotemporal co-occurrence patterns. IEEE Trans Intell Transp Syst 22(6):3387–3402

    Article  Google Scholar 

  67. Wu J, Liu D, Guo Z, Xu Q, Wu Y (2022) TacticFlow: visual analytics of ever-changing tactics in racket sports. IEEE Trans Vis Comput Graph

  68. Xiong C, Ceja CR, Ludwig CJ, Franconeri S (2020) Biased average position estimates in line and bar graphs: underestimation, overestimation, and perceptual pull. IEEE Trans Vis Comput Graph 26(1):301–310

    Article  Google Scholar 

  69. Yang F, Harrison LT, Rensink RA, Franconeri SL, Chang R (2019) Correlation judgment and visualization features: a comparative study. IEEE Trans Vis Comput Graph 25(3):1474–1488

    Article  Google Scholar 

  70. Ye S, Chen Z, Chu X, Wang Y, Fu S, Shen L, Zhou K, Wu Y (2020) Shuttlespace: exploring and analyzing movement trajectory in immersive visualization. IEEE Trans Vis Comput Graph 27(2):860–869

    Article  Google Scholar 

  71. Yoghourdjian V, Archambault D, Diehl S, Dwyer T, Klein K, Purchase HC, Wu HY (2018) Exploring the limits of complexity: a survey of empirical studies on graph visualisation. Vis Inform 2(4):264–282

    Article  Google Scholar 

  72. Yoghourdjian V, Yang Y, Dwyer T, Lawrence L, Wybrow M, Marriott K (2020) Scalability of network visualisation from a cognitive load perspective. IEEE Trans Vis Comput Graph 27(2):1677–1687

    Article  Google Scholar 

  73. Zhao Y, Jiang H, Qin Y, Xie H, Wu Y, Liu S, Zhou Z, Xia J, Zhou F et al (2020) Preserving minority structures in graph sampling. IEEE Trans Vis Comput Graph 27(2):1698–1708

    Article  Google Scholar 

  74. Zhao M, Qu H, Sedlmair M (2019) Neighborhood perception in bar charts. In: Proceedings of the CHI conference on human factors in computing systems, pp 1–12

  75. Zhu H, Zhu M, Feng Y, Cai D, Hu Y, Wu S, Wu X, Chen W (2021) Visualizing large-scale high-dimensional data via hierarchical embedding of KNN graphs. Vis Inform 5:51–59

    Article  Google Scholar 

Download references

Acknowledgements

We thank all participants and reviewers for their thoughtful feedback and comments. The work was supported by Zhejiang Provincial Natural Science Foundation (LR18F020001).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Yingcai Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Cai, X., Su, J. et al. What makes a scatterplot hard to comprehend: data size and pattern salience matter. J Vis (2021). https://doi.org/10.1007/s12650-021-00778-8

Download citation

Keywords

  • Visualization comprehension
  • Data size
  • Pattern salience