Suzanne Briet’s Theory of Documentation

Is a star a document? Is a pebble rolled by a torrent a document? Is a living animal a document? No. But the photographs and the catalogues of stars, the stones in a museum of mineralogy, and the animals that are catalogued and shown in a zoo, are documents. (Briet 1951/2006: 10)

The scholar Michael Buckland marked the centenary of the birth of Suzanne Briet in his short piece ‘The centenary of Madame Documentation’, who he describes as ‘…librarian, documentalist, historian, organizer, and feminist’ (Buckland 1995: 1). Briet was one of the first women to be employed as a librarian at the Biblioteque Nationale in Paris and was a leading figure in the theory of documentation, with her manifesto, Qu’est-ce que la documentation? [What is Documentation?] (Briet 1951/2006), continuing to be a key text in information science to this day. Briet founded the Union Francais des Organismes de Documentation (UFOD) and was appointed as the Vice-President of the International Federation for Documentation (FID) (Buckland 1995). Qu’est-ce que la documentation? (Briet 1951/2006) published as a 48-page pamphlet, and as Buckland sets out, it redrew the boundaries of what should be considered a document ‘beyond texts to include any material form of evidence’ (Buckland 1995: 1). In doing so, Briet (1951/2006) expanded the scope of information science (Buckland 1991). Famously, Briet (1951/2006) posited that a living animal could be considered a document, giving the imagined example of an antelope captured in the wild and taken to a zoo. She describes the ensuing events, writing in Paris in 1951:

A press release makes the event known by newspaper, by radio, and by newsreels. The discovery becomes the topic of an announcement at the Academy of Sciences. A professor at the Museum discusses it in his courses. The living animal is placed in a cage and catalogued (zoological garden). Once it is dead, it will be stuffed and preserved (in the Museum). It is loaned to an Exposition. It is played on a soundtrack at the cinema. Its voice is recorded on a disk. (Briet 1951/2006: 10)

She continues, describing how the antelope is then featured in a monograph and encyclopaedias, which are catalogued in a library. These documents are then copied, leading to the production of drawings, paintings, statues, photos, films, and microfilms, which are in turn rendered into a series of ‘documentary productions’ in which they are ‘…selected, analysed, described, translated’. For Briet (1951/2006:11), ‘[t]he catalogued antelope is an initial document, and the other documents are secondary or derived’.

As Day (2006) puts it in his preface to the English translation, ‘What is Documentation?’, Briet’s treatise

offers a vision beyond that of libraries and books, seeing in documentation an unlimited horizon of physical forms and aesthetic formats for documents and an unlimited horizon of techniques and technologies (and of ‘documentary agencies’ employing these) in the service of multitudes of particular cultures. (Day 2006: v)

Day points out that Briet’s work embraced new technologies and expressed ‘the desire to see the merging of human techniques and technologies in larger harmonies’ (Day 2006: vi). Day also emphasises the importance of Briet’s observation that ‘documentary forms are increasingly taking the shape of “substitutes for lived experiences”—that is, representational forms that assume the illusion of lived experience itself’ (Day 2006: vi) giving the example of film and photography. As he points out, Briet’s vision of documentation as a dynamic field, looking for new forms of knowledge and new forms of production enabled by new technologies, sets her work apart:

These and other theoretical observations regarding documentation’s relation to culture make Briet’s book of value not only to Library and Information Studies, but also to cultural studies, rhetoric, and science and technology studies. The rhetorical and theoretical brilliance that characterize Briet’s book have, perhaps, never been replicated in library and information publications and have rarely been seen in professional texts of any type. Not again—until Actor Network Theory at the end of the twentieth century— would a social network account of technical production, and specifically, documentary production, be articulated. (Day 2006: viii)

Briet’s radical and prescient proposal was in the insight that entities, and even living beings, are not simply documented but are themselves rendered ontologically into ‘initial documents’ by those processes. As Kosciejew (2017: 101) puts it, ‘on its own, the antelope is just an antelope; however, when these material assemblages and components surround it, it becomes a document’. In this paper, I will make the argument that data visualisation in learning analytics can be analysed in terms of Briet’s (1951/2006) theory of documentation. However, before setting that out, I will turn my attention to a consideration of the ontological and epistemological status of the data visualisation itself.

The Sublime and the Diagrammatic

McCosker and Wilken (2014) trace the rise of a ‘fascination’ at that time with data visualisation and its potentials for the generation of knowledge, citing early work such as Dodge and Kitchin’s (2001) Atlas of Cyberspace, and later publications such as Lima’s (2011) Visual Complexity: Mapping Patterns of Information. They point out that many accounts of data and visualisation are dominated by ‘an emphasis on the explanatory power of beauty and the role of the “dataviz” designer, as artist, mathematician and coder, in extracting information from immense data sets and generating knowledge in the form of innovative visual design’ (McCosker and Wilken 2014: 155). They refer to Zikopoulos et al.’s (2012) definition of big data as being characterised by the ‘three v’s’ which are volume, variety, and velocity. They highlight how data visualisation has been based on a belief that it can ‘offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objectivity, and accuracy’ (Boyd and Crawford 2012: 663 in McCosker and Wilken 2014: 157).

McCosker and Wilken point out the emphasis placed in the literature about data visualisation on beauty and clarity (e.g., Segaran and Hammerbacker 2009; Tufte 2006), which they see as ‘a kind of aesthetic engagement with big data, a form of knowledge encounter that turns on the complexity and aura of an unimaginable object’ (McCosker and Wilken 2014: 157). They go on to suggest that data visualisation can be considered in terms of Kant’s theory of the ‘mathematical sublime’ as set out in the Critique of Judgement (2007), which pertains to encounters with extreme vastness, defined as ‘estimation of magnitude by means of concepts of number’ (Kant 2007: 251). Kant (2007: 256) sets out that ‘instead of the object, it is rather the cast of mind in appreciating it that we have to estimate as sublime’. McCosker and Wilken (2014: 158) propose that in the contemporary age, an attempt to comprehend the vastness of big data may be seen as ‘the product of a sublime cast of mind appealing to techniques of reason and rationalisation in the face of the conflicting sense of fear and pleasure of big data’.

Kant’s (2007: 256) argument is that the subject moves from humiliation and awe in the face of the sublime ‘to a heightened sense of the power of reason in comprehending the phenomenon just experienced’. McCosker and Wilken (2014: 158) suggest that attempts at visualisation of big data are akin to this, in the sense that they arise from a desire to pursue a desire for ‘total knowledge’. However, they argue that data visualisations may in fact fail to aid comprehension but instead might simply reinforce the sense of the unknowability of the vast data set, invoking a sense of awe in place of aiding understanding. They propose a turn towards what they call ‘diagrammatic thinking’ (McCosker and Wilken 2014: 160), in which the diagram serves not as an end point, but rather as a starting point to understanding phenomena in the world.

McCosker and Wilkin draw on Deleuze’s (1988, 2003) approach to the diagram as ‘non-representational’, in a departure from Kantian reason and rationality. They give the example of the network graph or diagram, which they argue ‘poses problems and offers a virtual space for exploration’ (McCosker and Wilken 2014: 162). A distributed network is always incomplete, partial, fluid, and emergent; and they propose that the diagram is less a representation than a ‘mode of discovery’. McCosker and Wilkin give the example of the Cascade Project (New York Times Research and Development 2011), which visualised how news stories were shared on Twitter using an animated three-dimensional image which could be scaled in and out. They suggest ‘[t]he element of time in this animated network diagram is by design “exploratory” and manipulable, and introduces new problems or questions that might be posed for the data’. They conclude that ‘the diagram does not “demonstrate” but rather casts light on the creative acts through which concepts, constructions and knowledge might emerge’ (McCosker and Wilken 2014: 162).

Although written ten years ago, I would argue that this paper continues to provide a thought-provoking disquisition on the possible ontological status of a data visualisation, set out by the authors as either an awe-inspiring depiction of the vastness of a data set which might inculcate a sublime ‘cast of mind’ in Kant’s terms, or instead, it might consist of an emergent, rhizomatic, partial type of visualisation which does not seek to be all-encompassing but instead might more function as a kind of problem-posing heuristic, in their notion derived from Deleuze. However, the status of any human individual or action under the gaze of data is not considered by McCosker and Wilkin, nor is the specific effect on the viewer discussed in terms of how it might influence their ontology, or indeed action. In the next section, I will return to Briet, to address this point.

Learning Analytics and Documentation

I would like to extend a brief initial analysis I set out in Gourlay (2022), in which I proposed that data visualisation as used in learning analytics could be considered a documenting practice in Briet’s (1951/2006) terms. I considered a statement produced by the UK government agency Joint Information Services Committee (JISC), which proposed the notion of students engaged digitally in the university as leaving a ‘digital footprint’ (Sclater et al. 2016: 4). I critiqued the emphasis in the document on the use of learning analytics for ‘quality assurance and quality improvement’, and its imbrication with the use of such documentation to provide evidence for regimes of audit such as the UK ‘Teaching Excellence Framework’ (Sclater et al. 2016: 5). I also argued that this use of learning analytics shifts the locus of ‘student engagement’ towards the digitally mediated Learning Management System (LMS), which documents the students’ ‘footprints’, therefore potentially occluding or even rendering as deviant the student who is not documented by it. In this paper, I would like to turn my attention in more detail to, in Briet’s (1951/2006) terms, the derived documents of learning analytics, specifically the data visualisations that are produced using dashboards.

Learning analytics is defined by the Society for Learning Analytics Research (2024) as ‘[t]he measurement, collection, analysis and reporting of data about learners and their contexts for purposes of understanding and optimising learning and the environments in which it occurs’. The website sets out that learning analytics ‘seeks to exploit the new opportunities once we capture new forms of digital data from students’ learning activity, and use computational analysis techniques from data science and AI’ (Society for Learning Analytics Research 2024). They set out what they describe as the ‘most popular goals of learning analytics’:

  • Supporting student development of lifelong learning skills and strategies

  • Provision of personalised and timely feedback to students regarding their learning

  • Supporting development of important skills such as collaboration, critical thinking, communication, and creativity

  • Develop student awareness by supporting self-reflection

  • Support quality learning and teaching by providing empirical evidence on the success of pedagogical innovations (Society for Learning Analytics Research 2024).

Society for Learning Analytics Research (2024) divides learning analytics into three categories. Descriptive analytics focuses on providing insights about the past and ‘uses data aggregation and data mining to understand trends and evaluative metrics over time’. They list that this data can consist of student feedback, data from admission processes, student orientation, enrolments, pastoral care, study support, exams, and graduation. Diagnostic analytics seeks to understand what happened in the past, using ‘drill down, data discovery, data mining and correlations to examine data or content to answer the question “Why did it happen?”’. Predictive analytics focuses on the future, ‘recommending one or more choices using a combination of machine learning, algorithms, business rules and computational modelling’. They list the techniques used in predictive analytics as ‘[d]ata visualisations via specific tools to provide program/degree level metrics on student enrolments, program stage, results and survey feedback to give teaching staff visual snapshots of students in their programs’ (Society for Learning Analytics Research 2024).

Masiello et al. (2024) provide a current overview of the use of learning analytics dashboards (LADs), which they define as ‘control panels that can be tailored to display LA components that update according to the learning processes, also in real time’. Visualisations ‘mainly exemplify descriptive information, such as time spent on an online task, access to various online resources, and learning progression in a course or subject/task, but also to provide means to compare user results’. Using an ‘umbrella review’ approach (Choi and Kang 2022), Masiello et al. (2024) compile evidence from multiple systematic reviews on the use of LADs in education, posing the research question: ‘What is the current use of learning analytics dashboards in education?’

Masiello et al. (2024) found that the common visualisation techniques were ‘graphs, such as line charts, bar charts, progress bars, pie charts, and timelines’. Importantly, they found that the design of visualisation elements of LADs was explicitly guided by ‘relevant frameworks of pedagogy and learning to facilitate educational practices’, with self-regulated learning and game-based learning being the most commonly adopted. The visualisations were used to track a range of what they term as ‘target outcomes’. They list them as:

learner performance, progress, and competency level, learning difficulties SRL (self-regulated learning), awareness, reflection, and/or self-thinking, affective measures such as motivation, anxiety, and satisfaction, feedback practices, acceptance, such as ease of use and perceived usefulness, behavioral measures such as time spent on activities and number of clicks; sequence of actions, knowledge creation, and sensory measures … privacy, study skills, and learning strategies, group/collaborative learning … The target outcomes fell within the cognitive, behavioral, contextual, and affective domains. (Masiello et al. 2024)

What is striking here is the range of ‘outcomes’ being measured and represented visually via LADs across the studies reviewed.

A Proposed Learning Analytics Dashboard

Figure 1 shows an example of a proposed learner-facing LAD from a recent paper (Susnjak et al. 2022: 16). The authors claim that this is the first of its kind ‘to integrate all levels of analytics capabilities’, with descriptive, predictive, and prescriptive components.

Fig. 1
figure 1

Proposed learning analytics dashboard (Susnjak et al. 2022: 16)

Panel 1 ‘highlights student engagement levels’ (Susnjak et al. 2022: 16), using descriptive analytics, comparing the individual student to the cohort average. This is calculated using weekly login counts to the learning management system, the number of resources accessed by the student, and the number of posts to the forum; the latter is used ‘as a measure of communication exchange levels’ (Susnjak et al. 2022: 16). The second panel focuses on the student’s academic performance, with descriptive and predictive elements. It includes the student’s grades, and results from tests and quizzes, compared to the results of the cohort mean. It also gives the student estimates for their future assignments, based on the performance of similar students in the past. The third panel contains both predictive and prescriptive analytic features, in which

an overall prediction is made regarding the learner’s estimated risk profile for meeting the course’s learning outcomes … The model reasoning provides the learners with a suggestion of what they can alter in their learning behaviour in order to alter their outcomes. (Susnjak et al. 2022: 17)

The data used for the dashboard is derived from Moodle, a widely used learning management system. The authors go on to elaborate on their methodology, including the software, programming language, and algorithm used to create the dashboard, the analysis of which is beyond the scope of this paper. However, what is of interest is their presentation of ‘counterfactuals’ to the students:

In order to realize prescriptive capabilities, we used data-driven counterfactuals (Wachter et al. 2017) to suggest to students how an adjustment in certain behavioral learning patterns would result in a more positive prediction. For example, the counterfactual may suggest to a learner that an increase in their next assignment mark by a specific amount would change their classification from high-risk to low-risk. Such data-driven counterfactuals are based on correlation and do not guarantee causal links; however, in many cases when features are judiciously selected some degree of potential causality can safely be assumed. (Susnjak et al. 2022: 19)

The authors created this dashboard and prototyped it in response to a review of the literature covering LADs. They allude in their conclusions to the lack of clear evidence of efficacy of LADs, and also a lack of implementation in educational settings; these are important critical points which could be levelled at proponents of LADs. However, I would like to pursue an alternative critical line of argument, focusing on the effects of the creation of the LAD on the ontology of the students, with reference to Briet’s (1951/2006) theory of documentation.

Analysis

There are hundreds of LADs which could be analysed. I chose this one proposed by Susnjak et al. above as it is recent and incorporates the three types of analytics as set out by the Society for Learning Analytics Research (2024). The impetus is not to critique this dashboard in particular; on the contrary, it seems highly user-friendly and transparent and is laudably aimed at the student. Instead, my intention is to consider a broader philosophical point regarding the possible effects of using learning analytics dashboards in the university.

Returning to Briet’s (1951/2006) antelope, we can make the parallel with the student using this dashboard. It seems uncontroversial that the dashboard display, as shown in Fig. 1, can be regarded as a document. However, it is worth considering the effect that the creation of that document has on the student. Where Briet’s imaginary antelope was physically captured and placed in a zoo, in this case, the student is ‘captured’ by the LMS and other algorithmic procedures. My first contention is that consequently, the student can be regarded as catalogued in Briet’s terms, and also an initial document in her terms, with the display as shown in Fig. 1 therefore functioning as a derived document. The antelope was rendered into a document by the act of putting it on display and by multiple acts of measurement, and in the same manner, the student can be said to have been rendered into a document by the display, which results from various acts of surveillance and measurement.

It is worth considering in more detail the three forms of learning analytics in terms of the type of documentation they lead to. Descriptive analytics might be seen as akin to measurements of the weight, food consumed, activity, and size of the antelope over a period of time. Predictive analytics might then create a forecast of how the antelope is likely to behave or thrive or otherwise in future, on the basis of measurements taken in the past of similar antelopes. Prescriptive analytics might perhaps equate to a strategy of improving the health or behaviour of the antelope through variation in its treatment, or even perhaps training it using known techniques (one assumes that this is the approach taken to training a performing zoo animal). The ontological status of the antelope, for Briet, is fundamentally altered. In her brief treatise, she does not elaborate on this point, but I would contend that it is altered by the surveillant gaze, by being seen as much as by being measured. Through the application of the gaze and measurement technologies, these alter and disrupt the singular, whole, boundaried, embodied, nature of the animal. Its being is splintered and fractured into a range of measures and also becomes multiple through reproductions of its image and associated ‘data’.

I would argue that in the same manner, the student’s embodied, physical, singular, evolved being is breached by the ‘gaze’ of the LMS and is splintered by the disaggregation of their self into demographic attributes, performative and observable micro-behaviours such as logging on to the LMS, and ideological values attributed to these actions such as the construct of ‘self-regulated learning’. A counterargument to this might be that the student is not forced to consult the dashboard. However, Briet’s antelope was not aware of its altered status, but the argument still holds that it was fundamentally altered, nonetheless.

Looking in more detail at the LAD, we can consider the type of document that is generated by the different ‘gazes’ of the technology. Panel 1 deploys descriptive analytics focusing on ‘student engagement levels’ (Susnjak et al. 2022: 16). The student can make a comparison with the cohort average in terms of the number of logins to the LMS, the quantity of resources accessed, and the number of posts that the student has written on the discussion board. In terms of Briet’s (1951/2006) concept of the document, the visual representation is rendered complex, as it is composed of both the individual student as a document, and as aggregated group documents of the rest of the cohort, presented as an average across more than one metric. These documents as conjured by panel 1 consist of behaviours, or actions. Although they are digital, they are fundamentally embodied actions of the hands and eyes, leading to traces of activity on the LMS.

The second panel generates different type of documents, based on test or assignment performance, not solely activity. It is also complex, as it renders the student as a document, and again an aggregated group document of the median performance of the rest of the cohort. However, it also takes a further step by creating the student as a potential future document, in terms of a predicted exam score. This is ontologically distinct in the sense that it is a document that derives not from traces of that individual student, but from a predictive algorithm based on multiple previous students on previous cohorts with a similar profile. The third panel generates the student as a document and subject which could exist in the future, in terms of the degree of ‘risk’ that the current student may be facing of failing the module, instructing the student on which activities they should undertake in order to minimise this risk.

Viewed in Briet’s (1951/2006) terms, this is not simply a risk assessment, it is the generation of a new ontological being in a possible future, one that can be directly manipulated and altered by the student’s actions in the present. In that sense, the student is invited to actively participate in the creation of themselves as a document. Susnjak et al. (2022) refer to these prescriptive elements as ‘data-driven counterfactuals’ drawing on Wachter et al. (2017): ‘Such data-driven counterfactuals are based on correlation and do not guarantee causal links; however, in many cases when features are judiciously selected some degree of potential causality can safely be assumed.’ (Susnjak et al. 2022: 19).

Discussion

Returning to McCosker and Wilken’s (2014) distinction, it could be argued that it fits more in the latter category, encouraging ‘diagrammatic thinking’ in the sense that it invites the student to address and respond to it in specific ways. It does not simply invoke a sense of the sublime in contemplation of a large number of data points but instead is personalised to an individual, therefore retaining a sense of human scale in relation to the data. What is also noteworthy is the extent to which it is directive, in the high-stake context of a university module. Unlike McCosker and Wilkin’s two types of visualisation, this does not only seek to inspire awe or allow for the investigation of specific questions. It also actively constitutes the individual subject as a document and is performative in that it also constitutes them and moulds them as a human acting in the world, by explicitly encouraging certain types of behaviour and engagement with the technology which itself produces the emergent visualisation. In that sense, it is recursive and complex, as the student is encouraged to modify the shifting document of herself via her actions in relation to the LMS.

In order to diminish the level of ‘risk’ portrayed by the visualisation, the student must prioritise frequency of logins to the LMS, access to the resources, and the discussion forum. In this regard, the student and the visualisation are in a close co-constitutive relationship, which is more intimately entangled than that of the viewer struck by the sublime nature of a vast data set, or an analyst posing questions via emergent diagrammatic thinking. Instead, the relationship here is an ontological one in which the data visualisation not only renders the student a document in Briet’s (1951/2006) terms but also ‘trains’ the student to behave in certain ways in order to alter herself as a document.

Beer (2019: 1), in his book The Data Gaze, raises the question ‘…with all these amassing data about people, places, organisations, and nation states, who has the power to speak with these data? Or, perhaps more fittingly, who has the power to speak with our data?’ Beer draws on Kitchin (2014), who defines data as being

commonly understood to be the raw materials produced by abstracting the world into categories, measures and other representational forms - numbers, characters, symbols, images, sounds, electro-magnetic waves, bits – that constitute the building blocks from which information and knowledge are created (Kitchin 2014: 1 in Beer 2019: 2).

What is relevant for the present analysis is Beer’s concept of the data gaze and how the ‘social expansion of the ordering role of data’ has led to new ways of knowing (2019: 5). He explores how notions of value and ideals of living are built into ways in which data are used and what he calls ‘the type of prosthetic vision that is said to be provided by data analytics packages of various kinds’ (Beer 2019: 5) and deploys the concept of data imaginary to conceptualise the way that data analytics and its underpinning rationalities are envisaged.

It is clear that the proposed dashboard provides information to the student which is based on the behaviours and performance of previous students. However, it might be argued that certain ideological positions are also encoded into the way in which the student’s engagement is analysed and visually represented. One striking feature is the value placed on logging on, accessing resources, and posting on forums on the learning management system. These are used here as the sole measures of student engagement, raising the question of how other forms of engagement might be regarded, such as private reading and offline conversation. These analogue (or non-LMS-based) activities are effectively rendered invisible by the sole focus on LMS activity as a measure of student engagement here. Arguably, the dashboard has a normative force here in rewarding ‘busy-ness’ and frequent use of the LMS. However, it might be argued that a student could score highly in all of these measures, without necessarily being successful in learning on the module. Another notable feature is the comparison of the individual with the cohort average in terms of these and other measures. This arguably reinforces the sense of engagement with the module as a competition against others, as opposed to a learning experience in itself.

In a criterion-referenced assessment system, one’s position in the class should not be of a primary concern. Although the predictive element may be useful to students, the effect of the dashboard could possibly be to encourage a risk-averse conformity to engagement as recognised by the dashboard, or even a ‘gaming’ of the system when under its gaze. What is also noteworthy is the lack of any specific reference to module content, or feedback. Presumably, this would be provided by other means, but the effect is highly ‘stripped back’, instrumental, and strategic. It may be of utility to students, particularly those at risk of failing, but it would be fair to say that it does not emphasise learning for learning’s sake but instead reinscribes a focus on working towards the assessment as the main value being upheld.

Conclusions

Tourney (2003) provides a detailed overview of Briet’s (1951/2006) contribution to documentation theory, with a particular emphasis on how little attention her groundbreaking work received in her lifetime by archivists and record managers. Her work has subsequently been resurrected and promoted by the scholars Buckland (1991, 1995, 1997, 1998, 2014, 2017) and Day (1991, 2001, 2007, 2014). In this analysis, I have attempted to apply her notion of the document to the contemporary university student caught up in the data gaze (Beer 2019) or learning analytics, rather like the antelope in Briet’s (1951/2006) zoo. My tentative conclusion is that learning analytics and the derived document of the visual dashboard serve to render the student into a document and that he or she is not rendered whole but is splintered in the various visualisations. His or her previous actions online are isolated from all other aspects of their practices or actions as a student. Tests and quiz performances are also separated and represented. A putative future self in terms of possible exam results is also created as a visual representation. In this regard, it might be argued that the LAD does not render the student into a singular document as an ontological entity as proposed by Briet (1951/2006) but could be seen to refract the student into multiple entities, reminiscent of Mol’s (2002) notion of ‘the body multiple’ being created by multiple medical testing and scans.

A critical assessment could make the argument that this apparently useful and informative facility does not simply aid the student in monitoring his or her performance on a module but in fact brings about an ontological change to the student’s being. In this regard, certain behaviours recorded digitally come to stand for learning, and for the student themselves. Their embodied self and various forms of unseen and ephemeral ‘analog’ engagement are elided by the process, potentially encouraging a performative form of engagement which may seek to ‘game the system’, particularly in terms of the LMS engagement analysed in the first segment of the dashboard above. I would suggest that this type of analysis, inspired by Briet’s (1951/2006) provocation—‘Is a star a document?’—may have some utility in advancing our understanding of the effects of the data gaze and data visualisations of various kinds in digital higher education and beyond.