Abstract
Search engines like Google provide an aggregation mechanism for the web and constitute the main access point to the Internet for a large part of the population. For this reason, biases and personalization schemes of search results may have huge societal implications that require scientific inquiry and monitoring. This work is dedicated to visualizing data such inquiry produces as well as understanding changes and development over time in such data. We argue that the aforementioned data structure is very akin to text corpora, but possesses some distinct characteristics that requires novel visualization methods. The key differences between URLs and other textual data are their lack of internal cohesion, their relatively short lengths, and—most importantly—their semi-structured nature that is attributable to their standardized constituents (protocol, top-level domain, country domain, etc.). We present a technique to spatially represent such data while retaining comparability over time: A corpus of URLs in alphabetical order is evenly distributed onto the so-called Hilbert curve, a space-filling curve which can be used to map one-dimensional spaces into higher dimensions. Rank and other associated meta-data can then be mapped to other visualization primitives. We demonstrate the viability of this technique by applying it to a data set of Google search result lists. The data retains much of its spatial structure (i.e., the closeness between similar URLs) and the spatial stability of the Hilbert curve enables comparisons over time. To make our technique accessible, we provide an R-package compatible with the ggplot2-package.
Keywords
- Visualization techniques
- Text visualization
- URL collections
- Computational social science
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abbasi, A., Chen, H.: Categorization and analysis of text in computer mediated communication archives using visualization. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2007, pp. 11–18. Association for Computing Machinery, New York (2007). https://doi.org/10.1145/1255175.1255178
Almutairi, B.A.A.: Visualizing patterns of appraisal in texts and corpora. Text & Talk 33(4–5), 691–723 (2013)
Anders, S.: Visualization of genomic data with the Hilbert curve. Bioinformatics 25(10), 1231–1235 (2009)
Barkowsky, T., Latecki, L.J., Richter, K.-F.: Schematizing maps: simplification of geographic shape by discrete curve evolution. In: Freksa, C., Habel, C., Brauer, W., Wender, K.F. (eds.) Spatial Cognition II. LNCS (LNAI), vol. 1849, pp. 41–53. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45460-8_4
Berners-Lee, T., Cailliau, R., Luotonen, A., Nielsen, H., Secret, A.: The world-wide web. Commun. ACM 37(8), 76–82 (1994). https://doi.org/10.1145/179606.179671
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30, 107–117 (1998). http://www-db.stanford.edu/~backrub/google.html
Castro, J., Burns, S.: Online data visualization of multidimensional databases using the Hilbert space–filling curve. In: Lévy, P.P., et al. (eds.) VIEW 2006. LNCS, vol. 4370, pp. 92–109. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71027-1_9
Chi, E.H., Hong, L., Heiser, J., Card, S.K.: ScentIndex: conceptually reorganizing subject indexes for reading. In: 2006 IEEE Symposium on Visual Analytics Science and Technology, pp. 159–166. IEEE (2006)
Collins, J., Kaufer, D., Vlachos, P., Butler, B., Ishizaki, S.: Detecting collaborations in text comparing the authors’ rhetorical language choices in the Federalist Papers. Comput. Humanit. 38(1), 15–36 (2004). https://doi.org/10.1023/B:CHUM.0000009291.06947.52
Correll, M., Witmore, M., Gleicher, M.: Exploring collections of tagged text for literary scholarship. Comput. Graph. Forum 30(3), 731–740 (2011)
Cui, W., Qu, H., Zhou, H., Zhang, W., Skiena, S.: Watch the story unfold with textwheel: visualization of large-scale news streams. ACM Trans. Intell. Syst. Technol. 3(2), 1–17 (2012). https://doi.org/10.1145/2089094.2089096
DataReportal, We Are Social, Hootsuite: Top Google search queries worldwide during 3rd quarter 2020 (index value) [graph], October 2020. https://www.statista.com/statistics/265825/number-of-searches-worldwide/. Accessed 30 Nov 2020
DeLoache, J.S.: Becoming symbol-minded. Trends Cogn. Sci. 8(2), 66–70 (2004)
Eddelbuettel, D., François, R.: Rcpp: seamless R and C++ integration. J. Stat. Softw. 40(8), 1–18 (2011). https://doi.org/10.18637/jss.v040.i08. http://www.jstatsoft.org/v40/i08/
Hilbert, D.: über die stetige abbildung einer linie auf ein flächenstück. Math. Ann. 38, 459–460 (1891)
Hogräfer, M., Heitzler, M., Schulz, H.J.: The state of the art in map-like visualization. In: Computer Graphics Forum, vol. 39, pp. 647–674. Wiley Online Library (2020)
IDC, Statista: Volume of data/information worldwide from 2010 to 2024 (in zettabytes) [graph], May 2020. https://www.statista.com/statistics/871513/worldwide-data-created/. Accessed 19 Nov 2020
Irwin, B., Pilkington, N.: High level internet scale traffic visualization using Hilbert curve mapping. In: Goodall, J.R., Conti, G., Ma, K.L. (eds.) VizSEC 2007. MATHVISUAL, pp. 147–158. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78243-8_10
Kaufer, D., Ishizaki, S.: A corpus study of canned letters: mining the latent rhetorical proficiencies marketed to writers-in-a-hurry and non-writers. IEEE Trans. Prof. Commun. 49(3), 254–266 (2006). https://doi.org/10.1109/TPC.2006.880743
Keim, D.A.: Pixel-oriented visualization techniques for exploring very large data bases. J. Comput. Graph. Stat. 5(1), 58–77 (1996)
Keim, D.A.: Information visualization and visual data mining. IEEE Trans. Vis. Comput. Graph. 8(1), 1–8 (2002)
Krafft, T.D., Gamer, M., Zweig, K.A.: What did you see? A study to measure personalization in Google’s search engine. EPJ Data Sci. 8(1), 38 (2019)
Kucher, K., Kerren, A.: Text visualization techniques: taxonomy, visual survey, and community insights. In: 2015 IEEE Pacific Visualization Symposium (PacificVis), pp. 117–121. IEEE (2015)
Lorigo, L., et al.: Eye tracking and online search: lessons learned and challenges ahead. J. Am. Soc. Inf. Sci. Techno. 59(7), 1041–1052. https://doi.org/10.1002/asi.20794. https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.20794
Markowsky, L., Markowsky, G.: Scanning for vulnerable devices in the Internet of Things. In: 2015 IEEE 8th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), vol. 1, pp. 463–467. IEEE (2015)
McBryan, O.A.: GENVL and WWWW: tools for taming the web. In: Proceedings of the First International World Wide Web Conference, pp. 79–90 (1994)
Mokbel, M.F., Aref, W.G., Kamel, I.: Performance of multi-dimensional space-filling curves. In: Proceedings of the 10th ACM International Symposium on Advances in Geographic Information Systems, pp. 149–154 (2002)
Pariser, E.: The Filter Bubble: What the Internet is Hiding from You. Penguin UK (2011)
Peano, G.: Sur une courbe, qui remplit toute une aire plane. Math. Ann. 36(1), 157–160 (1890)
Rohrer, R.M., Ebert, D.S., Sibert, J.L.: The shape of Shakespeare: visualizing text using implicit surfaces. In: Proceedings IEEE Symposium on Information Visualization (Cat. No. 98TB100258), pp. 121–129. IEEE (1998)
Samak, T., Ghanem, S., Ismail, M.A.: On the efficiency of using space-filling curves in network traffic representation. In: IEEE INFOCOM Workshops 2008, pp. 1–6. IEEE (2008)
Scharl, A., Hubmann-Haidvogel, A., Weichselbraun, A., Wohlgenannt, G., Lang, H.P., Sabou, M.: Extraction and interactive exploration of knowledge from aggregated news and social media content. In: Proceedings of the 4th ACM SIGCHI Symposium on Engineering Interactive Computing Systems, pp. 163–168 (2012)
Schulz, C., Nocaj, A., Goertler, J., Deussen, O., Brandes, U., Weiskopf, D.: Probabilistic graph layout for uncertain network visualization. IEEE Trans. Vis. Comput. Graph. 23(1), 531–540 (2016)
Shneiderman, B.: The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings 1996 IEEE Symposium on Visual Languages, pp. 336–343. IEEE (1996)
Skupin, A., Fabrikant, S.I.: Spatialization methods: a cartographic research agenda for non-geographic information visualization. Cartogr. Geogr. Inf. Sci. 30(2), 99–119 (2003)
Webber, W., Moffat, A., Zobel, J.: A similarity measure for indefinite rankings. ACM Trans. Inf. Syst. (TOIS) 28(4), 1–38 (2010)
Wickham, H.: ggplot2: Elegant Graphics for Data Analysis. Springer, New York (2016). https://doi.org/10.1007/978-0-387-98141-3. https://ggplot2.tidyverse.org
Wikipedia contributors: Hilbert curve – Wikipedia, the free encyclopedia (2020). https://en.wikipedia.org/w/index.php?title=Hilbert_curve &oldid=990914971. Accessed 3 Dec 2020
Wilkinson, L.: The grammar of graphics. In: Gentle, J., Härdle, W., Mori, Y. (eds.) Handbook of Computational Statistics. SHCS, pp. 375–414. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-21551-3_13
Acknowledgement
We would like to thank Nils Plettenberg for his help in developing the initial ideas of this project. This research was supported by the Digital Society research program funded by the Ministry of Culture and Science of the German State of North Rhine-Westphalia. We would further like to thank the authors of the packages we have used.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 IFIP International Federation for Information Processing
About this paper
Cite this paper
Belavadi, P., Nakayama, J., Calero Valdez, A. (2022). Visualizing Large Collections of URLs Using the Hilbert Curve. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds) Machine Learning and Knowledge Extraction. CD-MAKE 2022. Lecture Notes in Computer Science, vol 13480. Springer, Cham. https://doi.org/10.1007/978-3-031-14463-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-14463-9_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14462-2
Online ISBN: 978-3-031-14463-9
eBook Packages: Computer ScienceComputer Science (R0)
-
Published in cooperation with
http://www.ifip.org/