Skip to main content

Visualizing Large Collections of URLs Using the Hilbert Curve

  • 221 Accesses

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13480)

Abstract

Search engines like Google provide an aggregation mechanism for the web and constitute the main access point to the Internet for a large part of the population. For this reason, biases and personalization schemes of search results may have huge societal implications that require scientific inquiry and monitoring. This work is dedicated to visualizing data such inquiry produces as well as understanding changes and development over time in such data. We argue that the aforementioned data structure is very akin to text corpora, but possesses some distinct characteristics that requires novel visualization methods. The key differences between URLs and other textual data are their lack of internal cohesion, their relatively short lengths, and—most importantly—their semi-structured nature that is attributable to their standardized constituents (protocol, top-level domain, country domain, etc.). We present a technique to spatially represent such data while retaining comparability over time: A corpus of URLs in alphabetical order is evenly distributed onto the so-called Hilbert curve, a space-filling curve which can be used to map one-dimensional spaces into higher dimensions. Rank and other associated meta-data can then be mapped to other visualization primitives. We demonstrate the viability of this technique by applying it to a data set of Google search result lists. The data retains much of its spatial structure (i.e., the closeness between similar URLs) and the spatial stability of the Hilbert curve enables comparisons over time. To make our technique accessible, we provide an R-package compatible with the ggplot2-package.

Keywords

  • Visualization techniques
  • Text visualization
  • URL collections
  • Computational social science

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-031-14463-9_18
  • Chapter length: 20 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   79.99
Price excludes VAT (USA)
  • ISBN: 978-3-031-14463-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.

References

  1. Abbasi, A., Chen, H.: Categorization and analysis of text in computer mediated communication archives using visualization. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2007, pp. 11–18. Association for Computing Machinery, New York (2007). https://doi.org/10.1145/1255175.1255178

  2. Almutairi, B.A.A.: Visualizing patterns of appraisal in texts and corpora. Text & Talk 33(4–5), 691–723 (2013)

    Google Scholar 

  3. Anders, S.: Visualization of genomic data with the Hilbert curve. Bioinformatics 25(10), 1231–1235 (2009)

    CrossRef  Google Scholar 

  4. Barkowsky, T., Latecki, L.J., Richter, K.-F.: Schematizing maps: simplification of geographic shape by discrete curve evolution. In: Freksa, C., Habel, C., Brauer, W., Wender, K.F. (eds.) Spatial Cognition II. LNCS (LNAI), vol. 1849, pp. 41–53. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45460-8_4

    CrossRef  Google Scholar 

  5. Berners-Lee, T., Cailliau, R., Luotonen, A., Nielsen, H., Secret, A.: The world-wide web. Commun. ACM 37(8), 76–82 (1994). https://doi.org/10.1145/179606.179671

    CrossRef  Google Scholar 

  6. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30, 107–117 (1998). http://www-db.stanford.edu/~backrub/google.html

  7. Castro, J., Burns, S.: Online data visualization of multidimensional databases using the Hilbert space–filling curve. In: Lévy, P.P., et al. (eds.) VIEW 2006. LNCS, vol. 4370, pp. 92–109. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71027-1_9

    CrossRef  Google Scholar 

  8. Chi, E.H., Hong, L., Heiser, J., Card, S.K.: ScentIndex: conceptually reorganizing subject indexes for reading. In: 2006 IEEE Symposium on Visual Analytics Science and Technology, pp. 159–166. IEEE (2006)

    Google Scholar 

  9. Collins, J., Kaufer, D., Vlachos, P., Butler, B., Ishizaki, S.: Detecting collaborations in text comparing the authors’ rhetorical language choices in the Federalist Papers. Comput. Humanit. 38(1), 15–36 (2004). https://doi.org/10.1023/B:CHUM.0000009291.06947.52

    CrossRef  Google Scholar 

  10. Correll, M., Witmore, M., Gleicher, M.: Exploring collections of tagged text for literary scholarship. Comput. Graph. Forum 30(3), 731–740 (2011)

    CrossRef  Google Scholar 

  11. Cui, W., Qu, H., Zhou, H., Zhang, W., Skiena, S.: Watch the story unfold with textwheel: visualization of large-scale news streams. ACM Trans. Intell. Syst. Technol. 3(2), 1–17 (2012). https://doi.org/10.1145/2089094.2089096

    CrossRef  Google Scholar 

  12. DataReportal, We Are Social, Hootsuite: Top Google search queries worldwide during 3rd quarter 2020 (index value) [graph], October 2020. https://www.statista.com/statistics/265825/number-of-searches-worldwide/. Accessed 30 Nov 2020

  13. DeLoache, J.S.: Becoming symbol-minded. Trends Cogn. Sci. 8(2), 66–70 (2004)

    CrossRef  Google Scholar 

  14. Eddelbuettel, D., François, R.: Rcpp: seamless R and C++ integration. J. Stat. Softw. 40(8), 1–18 (2011). https://doi.org/10.18637/jss.v040.i08. http://www.jstatsoft.org/v40/i08/

  15. Hilbert, D.: über die stetige abbildung einer linie auf ein flächenstück. Math. Ann. 38, 459–460 (1891)

    MathSciNet  CrossRef  Google Scholar 

  16. Hogräfer, M., Heitzler, M., Schulz, H.J.: The state of the art in map-like visualization. In: Computer Graphics Forum, vol. 39, pp. 647–674. Wiley Online Library (2020)

    Google Scholar 

  17. IDC, Statista: Volume of data/information worldwide from 2010 to 2024 (in zettabytes) [graph], May 2020. https://www.statista.com/statistics/871513/worldwide-data-created/. Accessed 19 Nov 2020

  18. Irwin, B., Pilkington, N.: High level internet scale traffic visualization using Hilbert curve mapping. In: Goodall, J.R., Conti, G., Ma, K.L. (eds.) VizSEC 2007. MATHVISUAL, pp. 147–158. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78243-8_10

    CrossRef  Google Scholar 

  19. Kaufer, D., Ishizaki, S.: A corpus study of canned letters: mining the latent rhetorical proficiencies marketed to writers-in-a-hurry and non-writers. IEEE Trans. Prof. Commun. 49(3), 254–266 (2006). https://doi.org/10.1109/TPC.2006.880743

    CrossRef  Google Scholar 

  20. Keim, D.A.: Pixel-oriented visualization techniques for exploring very large data bases. J. Comput. Graph. Stat. 5(1), 58–77 (1996)

    Google Scholar 

  21. Keim, D.A.: Information visualization and visual data mining. IEEE Trans. Vis. Comput. Graph. 8(1), 1–8 (2002)

    MathSciNet  CrossRef  Google Scholar 

  22. Krafft, T.D., Gamer, M., Zweig, K.A.: What did you see? A study to measure personalization in Google’s search engine. EPJ Data Sci. 8(1), 38 (2019)

    CrossRef  Google Scholar 

  23. Kucher, K., Kerren, A.: Text visualization techniques: taxonomy, visual survey, and community insights. In: 2015 IEEE Pacific Visualization Symposium (PacificVis), pp. 117–121. IEEE (2015)

    Google Scholar 

  24. Lorigo, L., et al.: Eye tracking and online search: lessons learned and challenges ahead. J. Am. Soc. Inf. Sci. Techno. 59(7), 1041–1052. https://doi.org/10.1002/asi.20794. https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.20794

  25. Markowsky, L., Markowsky, G.: Scanning for vulnerable devices in the Internet of Things. In: 2015 IEEE 8th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), vol. 1, pp. 463–467. IEEE (2015)

    Google Scholar 

  26. McBryan, O.A.: GENVL and WWWW: tools for taming the web. In: Proceedings of the First International World Wide Web Conference, pp. 79–90 (1994)

    Google Scholar 

  27. Mokbel, M.F., Aref, W.G., Kamel, I.: Performance of multi-dimensional space-filling curves. In: Proceedings of the 10th ACM International Symposium on Advances in Geographic Information Systems, pp. 149–154 (2002)

    Google Scholar 

  28. Pariser, E.: The Filter Bubble: What the Internet is Hiding from You. Penguin UK (2011)

    Google Scholar 

  29. Peano, G.: Sur une courbe, qui remplit toute une aire plane. Math. Ann. 36(1), 157–160 (1890)

    MathSciNet  CrossRef  Google Scholar 

  30. Rohrer, R.M., Ebert, D.S., Sibert, J.L.: The shape of Shakespeare: visualizing text using implicit surfaces. In: Proceedings IEEE Symposium on Information Visualization (Cat. No. 98TB100258), pp. 121–129. IEEE (1998)

    Google Scholar 

  31. Samak, T., Ghanem, S., Ismail, M.A.: On the efficiency of using space-filling curves in network traffic representation. In: IEEE INFOCOM Workshops 2008, pp. 1–6. IEEE (2008)

    Google Scholar 

  32. Scharl, A., Hubmann-Haidvogel, A., Weichselbraun, A., Wohlgenannt, G., Lang, H.P., Sabou, M.: Extraction and interactive exploration of knowledge from aggregated news and social media content. In: Proceedings of the 4th ACM SIGCHI Symposium on Engineering Interactive Computing Systems, pp. 163–168 (2012)

    Google Scholar 

  33. Schulz, C., Nocaj, A., Goertler, J., Deussen, O., Brandes, U., Weiskopf, D.: Probabilistic graph layout for uncertain network visualization. IEEE Trans. Vis. Comput. Graph. 23(1), 531–540 (2016)

    CrossRef  Google Scholar 

  34. Shneiderman, B.: The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings 1996 IEEE Symposium on Visual Languages, pp. 336–343. IEEE (1996)

    Google Scholar 

  35. Skupin, A., Fabrikant, S.I.: Spatialization methods: a cartographic research agenda for non-geographic information visualization. Cartogr. Geogr. Inf. Sci. 30(2), 99–119 (2003)

    CrossRef  Google Scholar 

  36. Webber, W., Moffat, A., Zobel, J.: A similarity measure for indefinite rankings. ACM Trans. Inf. Syst. (TOIS) 28(4), 1–38 (2010)

    CrossRef  Google Scholar 

  37. Wickham, H.: ggplot2: Elegant Graphics for Data Analysis. Springer, New York (2016). https://doi.org/10.1007/978-0-387-98141-3. https://ggplot2.tidyverse.org

  38. Wikipedia contributors: Hilbert curve – Wikipedia, the free encyclopedia (2020). https://en.wikipedia.org/w/index.php?title=Hilbert_curve &oldid=990914971. Accessed 3 Dec 2020

  39. Wilkinson, L.: The grammar of graphics. In: Gentle, J., Härdle, W., Mori, Y. (eds.) Handbook of Computational Statistics. SHCS, pp. 375–414. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-21551-3_13

    CrossRef  Google Scholar 

Download references

Acknowledgement

We would like to thank Nils Plettenberg for his help in developing the initial ideas of this project. This research was supported by the Digital Society research program funded by the Ministry of Culture and Science of the German State of North Rhine-Westphalia. We would further like to thank the authors of the packages we have used.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to André Calero Valdez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 IFIP International Federation for Information Processing

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Belavadi, P., Nakayama, J., Calero Valdez, A. (2022). Visualizing Large Collections of URLs Using the Hilbert Curve. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds) Machine Learning and Knowledge Extraction. CD-MAKE 2022. Lecture Notes in Computer Science, vol 13480. Springer, Cham. https://doi.org/10.1007/978-3-031-14463-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-14463-9_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-14462-2

  • Online ISBN: 978-3-031-14463-9

  • eBook Packages: Computer ScienceComputer Science (R0)