Figures don’t lie, but liars do figure. (Attributed to Carroll D. Wright, 1889)

5.1 Poker (Inter)faces

Data visualisation and information visualisation are commonly used as synonyms but it has been argued that they in fact mean different things (Spence 2014; Falkowitz 2019; Ware 2021). The main difference would lie in the basic distinction between data and information in computer science: data is understood as raw materials (e.g., numbers), that is, the input, and believed to not carry any specific meaning per se, whereas information is the output, i.e., the meaning carried by a set of data. Thus, following this definition, information visualisation is understood as a cognitive activity (Spence 2014, 2), the process of discovering the meaning associated with a set of data, whereas data visualisation is the process of exploring data that may or may not uncover meaning, i.e., result in information visualisation. Another way to look at it is to consider the purpose of these two activities; data visualisation would essentially be a heuristic activity, whereas the main goal of information visualisation would be to influence a decision-making process (Falkowitz 2019). The two types of visualisations would accordingly translate into distinct products: data visualisations would allow several levels of interactions (e.g., filtering, zooming, selecting, aggregating), whereas information visualisations would simply show one or limited viewpoints while obscuring other perspectives more or less deliberately. Thus, according to this logic, only data that function as cognitive tools become information and therefore not all data is information.

The post-authentic framework that I advance in this book argues against binary conceptualisations that misleadingly suggest and continue to perpetuate the artificial notion of ‘raw data’, as if data could naturally pre-exist in a pristine, untouched environment, as if all the steps preceding the visualisation, for example selection, collection, compilation, categorisation and storage, would not already be acts of interpretation and creation (Manovich 2002; Gitelman 2013; Drucker 2020). The post-authentic framework therefore transcends the distinction between data and information and between data visualisation and information visualisation; it acknowledges that data always embed the interpretative dimensions that have originated it. It also recognises that not just the processes of data creation but equally the very tools and methods adopted for creating data are equally situated, limited and partial. Actions, tools, algorithms, platforms, infrastructures and methods are never neutral because they themselves stem from systems that are in turn situated and therefore already interpreted. Hence, whether the intent is to explore data or to persuade through data, the post-authentic framework to visualisation advocates transparency in the way the data is created and conclusions are drawn. In light of the considerations reasoned in the previous chapters, I will therefore use these terms interchangeably to signal that we need to move beyond the distinction between data and information and consequently between data visualisation and information visualisation because data is always produced to various degrees.

Historically, innovations in data visualisation have originated from concrete, often practical goals (Friendly 2008, 30) so it is no surprise that the explosion of data of the last two decades and the subsequent need to analyse it and interpret it paired with advances in technology and statistical theory have greatly impacted the field. Indeed, as it is praised for its capacity to promptly display emerging properties in the data as well as to enhance access, visualisation has increasingly become an integral part of the digital. For example, using information visualisation to better understand the complex, internal processes according to which ML models elaborate data and provide results has been shown to offer insights that may lead to more transparency and increased trustworthiness in ML outputs and it has therefore become very popular in recent years (Chatzimparmpas et al. 2020).

Visualisation has also gained a significant role in the context of analytical methods, including topic modelling. Studies have argued that graphic display tools are valuable not only for understanding the models’ results but, because similarity measures and human interpretation are partially misaligned (cfr. Chap. 4), also for a general assessment of whether topic modelling is at all a suitable technique for AI and cognitive modelling applications (Murdock and Allen 2015, 4284). Several possible visualisation solutions have therefore over the years been proposed towards solving some of the already discussed challenges around topic modelling. These can be roughly divided into two research directions: the use of visualisation to improve the interpretation of the results and, stemming from the first one, the use of visualisation to improve the results themselves. Solutions in the first category try to enhance topics’ interpretability by visualising the results in a variety of ways using different statistical measures. Termite (Chuang et al. 2012), for example, allows terms’ comparison within and across topics using saliency measures based on the concept of weight (cfr. Chap. 4), but it does not allow for document interactivity. Chaney and Blei (2012) propose a web-based interface to allow nontechnical users to navigate the output of a topic model but it is not possible to draw comparisons of the topics’ distribution across documents. TopicNets (Gretarsson et al. 2012) visualises the relations between a set of documents (or parts of documents) and their discovered topics in the form of an interactive network-type graph (i.e., nodes and edges), but it does not show topic or document composition. LDAvis (Sievert and Shirley 2014) visualises terms within a topic according to weighted topic-word and topic-topic relationships but the connection with the documents is lost. Finally, Topic Explorer (Murdock and Allen 2015) builds on LDAvis by visualising topic-document and document-document relationships as well as topic distribution and document composition.

Studies in the second category allow users to interact with the models through a variety of human-in-the-loop1 (HINTL) methods. For example, iVisClustering (Lee et al. 2012) allows users to manually create or remove topics, merge or split topics and reassign documents to another topic while visualising topic-document associations in a scatter plot. Using ITM—Interactive Topic Modelling (Hu et al. 2014)—users can add, emphasise or ignore words within topics, whereas with UTOPIAN (Jaegul Choo et al. 2013), users can adjust the weights of words within topics, merge and split topics and generate new topics. Hoque and Carenini (2016) and Cai and Sun (2018) also propose visual methods to curate the topics by adding or removing terms within a topic, adjusting the weight, merging similar topics or splitting mixed ones, manually validating the results and finally generating new topics on the fly.

As this brief literature review shows, one of the main challenges of topic modelling, namely interpreting the results, has so far been tackled from a problem-solving point of view for which the main task is essentially to best exploit the model’s identified document structure (Blei 2012). Significantly, what all these studies have in common is the implementation of visualisation techniques exclusively in the final stage of a topic modelling workflow, that is, either to interpret the algorithm’s output or when training the algorithm itself. What these visualisation interfaces clearly show is the persistent conceptual disconnection between the results and the processes that generated them, the common belief that only interventions on the algorithm or on the final output are worthy of study and examination and so interventions on the sources-data are dismissed as not immediately relevant. As I have argued in Chap. 3, these processes of manipulations are often seen as ‘standard’, unproblematic and inconsequential rather than as heavy interventions on the sources and therefore on the results. The post-authentic framework that I propose in this book, on the contrary, strives to preserve and maintain the connection between the analyst and the digital object and it opposes any naïve conceptualisation of digital objects as finished, fixed, unproblematic entities. The post-authentic framework ultimately sees the human-digital object relationship as an essential component of the process of knowledge production in the digital. When applied to UI, the post-authentic framework is therefore not only mindful of such connection but it in fact encourages the scholar to be critically aware of it. My efforts towards building a post-authentic interface for topic modelling that I present here are therefore guided by this intention to enable users to actively engage with their digital sources and take ownership of their interventions but also to self-reflect and critique on those, thus openly acknowledging the interpretative dimension of the digital research process.

The endless flow of digitised material and the need to store it, access it and analyse it has impacted the role of visualisation also in those fields that traditionally relied on material sources, for instance, cultural heritage, history, linguistics and more widely the humanities. With specific reference to cultural heritage, for instance, institutions have over the years resorted more and more to visual means—typically web-based interfaces—as a way to enhance access to cultural collections for users’ appreciation as well as for research purposes (Windhager et al. 2019a). In a survey of information visualisation approaches to digital cultural heritage collections from 2014 to 2017, for example, Windhager et al. (ibid.) found out that visualisations of digital cultural heritage material have steadily increased, peaking in 2015. At the same time, however, these authors also highlighted that the seventy visualisation systems, prototypes and platforms they surveyed were sharing ‘overly narrow task- and deficiency-driven approaches to interface design that are grounded in a simplistic user-as-consumer- and problem solver-model’ (ibid., 13). Drucker (2013; 2014; 2020) has also long argued that graphical displays in the humanities often display a function- and task-driven UI design and generally lack a critical stance towards visualisation, evidencing IR intentions rather than the elicitation of curiosity, thoughtful engagement and reflection.

The post-authentic framework that I advance in this book aims to contribute to the urgent need for the establishment of critical data literacy, including visualisation literacy. It conceptualises digital objects as unfinished, situated processes, and it acknowledges the limitations, biases and incompleteness of tools and methods adopted for the analysis and visual representation of digital content. It provides helpful concepts for a re-theorisation of the process of digital knowledge creation, including the implementation of re-devised practices which are also acknowledged as always being adapted, unfixed, unfinished, arranged and interpreted. Applied to visualisations and interfaces, it acknowledges them as problematic endeavours that embed a wide net of situated processes, and it caters for their novel conceptualisation as epistemic objects which themselves carry meanings and therefore bear consequences.

Post-authentic graphical displays counter what I call poker interfaces, attractive visualisations and sleek interfaces that tend to present information as detached from any subjectivity or which obscure or even break the connection with the digital object and the multiple layers of manipulation. In this chapter, I discuss two examples of how the post-authentic framework can be applied to visualisations; in Sect. 5.2, I examine prototypical work for designing a topic modelling interface whereas in Sect. 5.3, I present the design choices we took whilst developing DeXTER, the interactive visualisation app to explore enriched cultural heritage material currently loaded with ChroniclItaly 3.0 (cfr. Sect. 2.4). My discussion will specifically revolve around the challenges of promoting symbiotic exchanges when engaging with software especially focusing on the efforts we took to expose—rather than hiding—the ambiguities and uncertainties of NA and SA. I end the chapter by acknowledging digital visualisation as fundamentally a curatorial operation which requires countless of subjective decisions that intervene on the digital object with several layers of manipulation; the post-authentic framework to graphical display, I conclude, can guide the encoding of such processes in the visualisation.

5.2 Visualisation of Digital Objects: Towards a Post-authentic User Interface for Topic Modelling

The development of a post-authentic interface for topic modelling should be understood in the context of the wider project Digital History Advanced Research Projects Accelerator (DHARPA),2 within which software for DH research is currently being developed. Originally conceived by Sean Takats, the DHARPA project today is a team of developers and academics who continuously contribute to each other’s expertise by sharing knowledge and practises from a range of disciplines (computer programming, data engineering, data visualisation, linguistics, geography and various strains of history) (Cunningham et al 2022). Like DeXTER, DHARPA is hosted at the C2DH (cfr. Chap. 2). At the heart of DHARPA is encoding criticism, the effort of advocating the active and reflexive participation of the scholar in the process of digital knowledge production (Viola et al. 2021). Digital tools and techniques have been harshly criticised for alienating humanities scholars from their sources (ibid.) (cfr. Chap. 1), a bond regarded as crucial for the pursuit of scholarly enquiry; the driving rationale of DHARPA is that through critical assessment, contextualisation and documentation of digital methodologies—which are understood as partial and situated—such relationship can on the contrary be fortified and expanded. With this aim, DHARPA is developing software that operationalise critical epistemology by placing the scholar-source relationship at its centre. The efforts towards building a post-authentic interface for topic modelling that I present here are therefore guided by the very same intention to enable users to actively engage with the digital object and take ownership of their interventions. Moreover, through the post-authentic lens, my aim is to openly acknowledge the interpretative dimension of the digital research process and thus to embed self-reflection and critique into software’s both back-end and front-end. The confluence of the post-authentic framework, DeXTER, DHARPA and the C2DH is a perfect example of how the notions of symbiosis and mutualism can guide the process of knowledge creation in the digital.

The post-authentic framework opposes any conceptualisation of digital objects as something disconnected from the material sources; when applied to UI, it is therefore oriented towards safeguarding such connection and encouraging the scholar to be critically aware of it. The example of the NLP software MALLET (McCallum 2002) illustrates a case in which this connection is obscured. MALLET is a widely used ML tool for a range of NLP tasks such as document classification, clustering, topic modelling, information extraction and others. During the steps of data preparation for topic modelling (cfr. Sect. 4.5), for example, the analyst is never prompted to view the results of their interventions and overall, there is little chance of interacting with the digital object. This does not intrinsically mean that any topic modelling analysis based on MALLET is to be discarded, but it does mean that a distance is imposed between the sources and the analyst. I argue that it is this distance that inevitably causes disconnection and increases the risk to attribute meaning to spurious patterns (cfr. Sect. 4.5.3). Indeed, to ensure that the identified patterns carry actual significance, considerable efforts need to be subsequently directed towards regaining this connection, sometimes in the form of novel analytical methodologies such as the discourse-driven topic modelling approach (DDTM) we developed within OcEx (cfr. Sect. 2.4) (Viola and Verheul 2019b). This approach integrates topic modelling with the discourse-historical approach (DHA) (Reisigl and Wodak 2001), an applied method of critical discourse analysis theory (van Dijk 1993) which triangulates linguistic, social and historical data to understand language use in its full socio-historical context and as a reflection of its cultural values and political ideologies (Viola and Verheul 2019b). The integration of DHA into topic modelling is particularly useful for tasks such as topic interpretation and labelling, thus reducing the risk of attributing meaning to spurious patterns.

Applied to interface design, the post-authentic framework strives to avoid the human-digital object disconnection by prompting critical engagement with the specificity of the source. Taking once again the example of ChroniclItaly 3.0, the post-authentic framework devotes careful attention to never lose contact with the information embedded in the filenames themselves. Based on the Library of Congress cataloguing schema, the filenames carry valuable metadata information including the reference code of the newspapers’ titles, the page number and the publication date of each issue (Viola and Fiscarelli 2021a). The reason why it is so very important to critically engage with this information is once more due to the specificity of the source. Immigrant newspapers were constantly on the verge of bankruptcy which caused titles to be often discontinued; for the same reason, some newspapers could afford to publish biweekly or even daily issues, while others could only publish intermittently (Viola and Verheul 2019a,b). This is naturally reflected in the composition of the collection; newspapers like L’Italia—one of the most mainstream Italian immigrant publications in the United States at the time—and Cronaca Sovversiva, the most important anarchic Italian American newspaper managed to continuously publish for years, whilst others like La Rassegna or La Sentinella del West which came into being as small, personal projects of their funders could only survive for a few months. Although across the entire period of coverage, on the whole the collection holds a fair balance between the number of issues, the type of newspaper, the geographical location, the time span and political orientation of each title, the exploration of the collection’s metadata highlights factors such as over- or under-representation of some titles either on the whole or at specific points in time. Figure 5.1 displays how the issues are diversely distributed throughout the collection.

Fig. 5.1
A graph of title versus the years from 1898 to 1937. The distribution of issues for the title L Italia is high between 1898 and 1919. La Rassengna and La Ragione mark the least.

Distribution of issues within ChroniclItaly 3.0 per title. Red lines indicate at least one issue in a three-month period. Figure taken from Viola and Fiscarelli (2021b)

The application of the post-authentic framework to digital objects recognises that factors like the heterogeneity of the digital object may result in potential polarisation of topics and points of view; it therefore maintains a connection with the digital object by facilitating access to such information and allowing the researcher to engage critically with it. By embedding the option to explore the metadata information (if present), the post-authentic framework signals the acknowledgement of the continuous underlying structure of a digital object (cfr. Sect. 4.2) hidden by its digital transformation into discrete form, i.e., sequences of 0s and 1s. It is indeed this acknowledgment that allows the analyst to obtain a fuller understanding of the object itself, in turn facilitating fundamental tasks such as adjusting the research question, resizing expectations and making sense of the results.

This sustained connection with the materiality of the source has immediate relevance for computational techniques such as topic modelling. As discussed in Sect. 4.4, the LDA algorithm assumes that a fixed number of topics is represented in different proportions in all the documents; this is clearly a rather artificial and unrealistic assumption as it is highly unlikely that one fixed—and to some extent arbitrary—number of topics could adequately represent the content of all the ingested documents. Allowing the analyst to know that the material for the digital analysis is distributed differently acts as a way to highlight that problematic aspects of digital research and digital objects that precede the analysis itself but which nevertheless influence how the technique may be applied and the results interpreted. Figure 5.2 shows how this step could be handled in the interface. Once the documents are uploaded, the analyst is prompted by a question asking them about the potential presence of metadata information. With this question the intention is to maintain contact with the continuous aspect of the digital object hidden by its discrete representation and further altered by the topic modelling algorithm which treats the documents, too as a collection of discrete data.

Fig. 5.2
A screenshot explains how meta data elements guide the development of an interface. At the top, three blocks are present. The text inside the block reads data upload. On the left, the steps are present.

Wireframe of a post-authentic interface for topic modelling: sources upload. The wireframe displays how the post-authentic framework to metadata information could guide the development of an interface. Wireframe by the author and Mariella de Crouy Chanel

If the analyst chooses ‘yes’, the metadata information would then be used to create a dynamic, interactive visualisation inspired by the one displayed in Fig. 5.1; this would display how the files are distributed in the collection, ultimately creating room for reflection and awareness. In the case of ChroniclItaly 3.0, for example, this visualisation displays the number of published issues on a specific day, month or year and by which titles; the display of this information allows the analyst to promptly identify the difference in the frequency rate of publication across titles and potential gaps in the collection (Fig. 5.3). The post-authentic framework to visualisation signals the importance of maintaining the connection with the digital object, understood as an organic, problematic entity. Such connection is acknowledged as an essential element of the process of knowledge creation in the digital in that it favours a more engaged, critical approach to digital objects and it creates a space in which more informed decisions can be made and ultimately answering the need for digital data and visualisation literacy.

Fig. 5.3
A screenshot titled timestamped corpus contains the post-authentic framework. It has color scheme, table, type for scaling the values, and label interval for the data. At the bottom, a graph of published issue versus the years is present.

Post-authentic framework to sources metadata information display. Interactive visualisation available at https://observablehq.com/@dharpa-project/timestamped-corpus. Visualisation by the author and Mariella de Crouy Chanel

The post-authentic framework to interface design aims to make the link between the analyst, the digital object’s discretised continuous information and the methods employed to manage it, analyse it and visualise it explicit at each stage of the digital knowledge creation process. Informed by these motivations, an interface for topic modelling would facilitate close engagement, for instance by allowing users to create and preview subsets of the digital object (e.g., through filtering cfr. Sect. 4.5.2) for further exploration or to test hypotheses on a sample. In this way, the post-authentic framework signals the rejection of objectivist and positivist understandings of digital processes which depict data as pre-existing and somewhat fixed. The interface, on the contrary, would adopt a constructivist principle which exposes the management of data as a problematic enterprise, a subjective act made of constant interpretation, manipulation and decisions which transform, select, aggregate and ultimately create data (Drucker 2011). Following these principles, the wireframe in Fig. 5.4 displays how sources’ preview could be handled in the interface.

Fig. 5.4
A screenshot explains how post-authentic framework guides the development of an interface. At the top, three blocks are present. The text inside the second block reads data preview and selection. On the left, the subsets panel is present. At the center, a table of three columns is present. The headers are date, publication, and document.

Post-authentic interface for topic modelling: data preview. The wireframe displays how the post-authentic framework could guide the development of an interface for exploring the sources. Wireframe by the author and Mariella de Crouy Chanel

Research that adopts computational techniques rarely acknowledges the influential role of tools, infrastructures, software, categories, models and algorithms on the research process or the results, as these are typically reputed to be neutral. The researcher or curator often provides little or no documentation of the decisions and the mechanisms that transformed their sources into data (Viola and Fiscarelli 2021b). Through the chapters of this book, however, I have demonstrated that transformative operations such as those directed at the creation, enrichment, digital analysis and visualisation of a digital object involve an intricate network of complex interactions between countless elements and factors including the materiality of the sources, the digital object and the analyst as well as between the operations themselves. Although often presented as more or less ‘standard’, these operations on the contrary need to be problematised and tackled critically. The post-authentic framework to knowledge creation in the digital acknowledges them as limited and situated, and it prompts a fundamental rethink of how these operations impact the sources and produce a digital object; this challenge, I maintain, can be met by maintaining engaged contact with the digital object. For problematic operations such as pre-processing, stemming and lemmatising (cfr. Sect. 4.5.2), this connection can be sustained by prompting engagement, for instance by making processes readily visible and intelligible to the analyst. The wireframes in Figs. 5.5 and 5.6 show how these operations would be handled in the interface. An expandable tool-tip asking ‘What is pre-processing?’ together with i buttons located next to each operation would give users the possibility to access detailed explanations of the available operations—often grouped under opaque labels such as ‘data cleaning’—to better understand the assumptions behind them. The UI would also allow data preview, thus making the impact of each intervention visible and accessible to the analyst. These features would create room for more conscious decisions and, at the same time, they would signal that data is always made.

Fig. 5.5
A screenshot illustrates the interface of data pre-processing. The text below the three blocks reads What is pre-processing. It has fields such as tokenize, lowercase, and filter tokens.

Interface for topic modelling: data pre-processing. The wireframe displays how the post-authentic framework to UI could make pre-processing more transparent to users. Wireframe by the author and Mariella de Crouy Chanel

Fig. 5.6
A screenshot of the data pre-processing interface. At the top, there are three blocks. The text inside the second block reads stem or lemmatize. The stem on lemmatize section is present at the bottom.

Interface for topic modelling: data pre-processing (stemming and lemmatising). The wireframe displays how the post-authentic framework to UI could make stemming and lemmatising more transparent to users. Wireframe by the author and Mariella de Crouy Chanel

The post-authentic framework calls upon the scholar’s critical and active engagement in the process of knowledge creation in the digital and raises awareness of the limitations, biases and incompleteness of tools and methods; applied to interface design it can therefore contribute to the establishment of critical data management and visualisation literacy. In the interface, this would be achieved by entering into a dialogue with the researcher, for instance, by asking the question ‘What is corpus preparation?’ (Fig. 5.7); the combination of expandable tool-tips and i buttons next to each operation would serve the dual purpose of making the process of data creation more intelligible to users while maintaining the connection with the digital object. Indeed, more transparent processes enable a more conscious participation of the scholar in the fluid exchanges between computational and human processes which are understood as part of a wider, complex system of interactions. The post-authentic framework attempts to reach symbiosis and mutualism (cfr. Sect. 2.2) by making these exchanges explicit as opposed to a passive and dissociated fruition of such interactions. To the same aim, the output resulting from implementing the different methods for corpus preparation would be saved each time (left panel in Fig. 5.7) so that users could experiment with various methods and settings, compare results and make more informed decisions. In this way, the interface would actualise a counterbalancing narrative in the main positivist discourse that equals the removal of the human—which in any case is illusory—to the removal of biases. To the contrary, the argument I advance in this book is that it is only through the active and conscious participation of the human in processes of data creation, tools’ selection, methods’ and algorithms’ implementation that such biases can in fact be identified, acknowledged and to an extent, addressed.

Fig. 5.7
A screenshot of the corpus preparation interface. The text below the three blocks reads what is corpus preparation? The filter extremes section is below. On the left, the table titled output is present.

Interface for topic modelling: corpus preparation. The wireframe displays how the post-authentic framework to UI could make corpus preparation more transparent to users. Wireframe by the author and Mariella de Crouy Chanel

The post-authentic framework to knowledge creation in the digital advocates a more participatory, critical approach towards digital methods and tools, particularly if they are applied for humanistic enquiry. Against a purely correlations-driven big data approach, it offers a more complex and nuanced perspective that challenges current views sidelining human agency and criticality in favour of patterns and correlations. Applied to methods such as topic modelling, for instance, the post-authentic framework highlights the assumptions behind the technique, such as discreteness, a-causality, randomness and text disappearance. Whilst exploiting the new opportunities offered by computational technologies, it rejects a passive adoption of these methods, and it highlights the intrinsic dynamic, situated, interpreted and partial nature of the digital in contrast with the main discourse that still presents techniques and outputs as exact, final, objective and true. Applied to UI, it also provides helpful concepts for both its theorisation and the implementation of re-devised visualisation practices which are also acknowledged as being adapted, unfixed, unfinished, arranged and interpreted.

5.3 DeXTER: A Post-authentic Approach to Network and Sentiment Visualisation

In the context of visualisation, questions of criticality, transparency, trust and accountability have increasingly become part of the scientific discourse (see for instance Gaver et al. 2003; Drucker 2011, 2013, 2014, 2020; Glinka et al. 2015; Sánchez et al. 2019; Windhager et al. 2019a; Boyd Davis et al. 2021) and several recommendations for operationalising critical digital literacy in visual design have been suggested. For example, the interpretative and evaluative value of ambiguity for design has been praised by Gaver et al. (2003); Drucker (2020) has proposed a framework for visualisations that promotes plurality, critical engagement and data transparency; Windhager et al. (2019a) have suggested design guidelines that also promote contingency (i.e., acknowledging the incompleteness of user experience) and empowerment (i.e., encouraging user’s self-activation and engagement) (141), and Sánchez et al. (2019) have offered a framework for managing uncertainty in DH visualisations. Despite an increased awareness, however, research in this area points out how intrinsic aspects of knowledge creation such as ambiguity, uncertainty and errors are still largely hidden from view and how instead the majority of graphical displays tend to be sleek visualisations that convey exactness, neutrality and assertiveness, i.e., poker interfaces.

The post-authentic framework that this book suggests incorporates all these recent perspectives; however, as it refers to the realm of digital knowledge that is created daily, at the same time, it goes beyond them. With specific reference to visualisations, the post-authentic framework endorses ambiguity, uncertainty and transparency; it acknowledges the incompleteness and partiality of data, tools and methods and rather than mudding it, it exposes their potential untrustworthiness. It is thanks to this awareness, I maintain, that the post-authentic framework contributes to maintain the process of knowledge creation in the digital honest and accountable, both for present and future generations. The visualisations for NA and SA in the DeXTER app that I present here are a good example of how the post-authentic framework can actualise these aims when visualising a digital object.

The DeXTER project is a post-authentic research activity which combines the creation of an enrichment workflow with a meta-reflection on the workflow itself as well as the creation of an interactive app to visualise enriched digital heritage collections. This means that the main intention guiding its design is to provoke independent assessment (Gaver et al. 2003), to expose inconsistencies and cast doubts on the digital object and to create a space for interpretation, rather than to provide one. This includes openly acknowledging that the implementation and potential value of the used methods are also inextricably intertwined with the specificity of the source as well as the research context of the related project. For example, when enriching ChroniclItaly 3.0, we used NA and SA to explore the several ways in which referential entities relate to each other in the collection; this included modelling their frequency of co-occurrence in a sentence and how this changes over time, the prevailing attitude towards such entities, and connections between entities at specific points in time (e.g., on the same day) across the different newspapers. These operations aimed to maximise the potential value of using referential entities as indicators of markers of identity (cfr. Chap. 3), that is, as a way to navigate the process of Italian Transatlantic migration as it was narrated by the different communities of Italian immigrants in the United States. Far from being standard, techniques and methods are therefore understood as adapted and chosen and their suitability in need of assessment rather than assumed to be intrinsically good (or bad).

The post-authentic framework can inform the selection of methods by warning the analyst that techniques developed in other fields for specific aims and with specific assumptions are not necessarily compatible across different data types. For example, NA is a method that originates in mathematics and graph theory (Biggs et al. 1986), and although it has long been applied across disciplines and for different purposes, it is typically used to answer questions mostly pertaining to the social sciences. This is because the underlying assumption is that the discrete modelling of how actors (e.g., entities) relate to each other (i.e., edges) provides adequate explanations of social phenomena. For a detailed overview of its application particularly in modern sociology, I refer the reader to Korom (2015).

Due to its characteristic feature of schematically representing abstract and often ambiguous information, NA has recently become popular also in the humanities. In linguistics, for example, NA has been applied to large textual corpora of naturally occurring language to analyse the relationship between language and identity in multilingual communities (Lanza and Svendsen 2007) or to explore complex syntactic and lexical patterns as networks, for example in language acquisition or language development studies (Barceló-Coblijn et al. 2017). It has also been argued that NA could be integrated in sociolinguistics as a way to provide insights into the relationship between the use of linguistic forms and culture (Diehl 2019). In branches of DH such as digital history and digital cultural heritage, NA is also considered to be an efficient method to intuitively reduce complexity (Düring et al. 2015). This may be due to the fact that this technique benefits particularly from attractive visualisations which support the impression that explanations for social events are accurate, complete, detailed and scientific, naturally adding to the allure of using it.

However, a typically omitted, yet rather critical issue of NA is that the graphs can only display the nodes and attributes that are modelled; as these stem from samples which by definition are incomplete and which undergo several layers of manipulation, transformation and selection, the conclusions the graphs suggest will always be partial and potentially based on over-represented actors or conversely, on underrepresented social categories. In the case of a digital object such as the cultural heritage collection ChroniclItaly 3.0 which aggregates sources heterogeneously distributed (cfr. Sect. 5.2), this issue is particularly significant as any resulting graph depends on the modelled newspaper (e.g., mainstream vs anarchic), on the type and number of entities included and excluded and on the attributes’ variables (e.g., frequency of co-occurrence, number of relations, sentiment polarity), to name but a few. Each one of these factors can dramatically influence the network displays and consequently impact on the provided interpretation of the past.

The project’s GitHub repository3—which is to be understood as integral part of the visualisation interface—is a good example of how the post-authentic framework can guide the actualisation of principles of transparency, accountability and reproducibility and how it values ambiguity and uncertainty. The DeXTER’s GitHub repository documents, explains and motivates all the interventions on the data, including reporting on the processes of entity selection (cfr. Sect. 3.3). The aim is to warn the analyst that despite being (too) often presented as a statement of fact, a visually displayed network is a mediated and heavily processed representation of the modelled actors. As such, the post-authentic framework does not solely aim to increase trust in the data and how it is transformed, but also to acknowledge uncertainty in both the data lifecycle and the resulting graphs and finally to expose and accept how these may be untrustworthy (Boyd Davis et al. 2021, 546). The act of making explicit the interpretative work of shaping the data is what Drucker calls ‘exposing the enunciative workings’ (2020, 149):

For data production, the task is to expose some of the procedures and steps by which data is created, selected, cleaned, and processed. Retracing the statistical processes, showing the datamodel and what has been eliminated, averaged, reduced, and changed in the course of the lifecycle would put the values of the data into a relative, rather than declarative, mode. This is one of the points of connection with the interface system and task of exposing the enunciative workings.

By acknowledging that the displayed entities are not all the entities in the collection but in fact a representative, yet small, selection, DeXTER encourages close engagement with the NA graphs; it does not try to remove uncertainty but it points where it is. At the same time, it recognises the management of data as an act of constant creation, rather than a mere observation of neutral phenomena. For example, the process of entity selection as I described it in Sect. 3.4 created a subset of the most frequently occurring entities distributed proportionately across the different newspapers. With this intervention, we aimed to alleviate the issue of source over-representation due to some titles being much larger than others and to reduce complexity in the resulting network graphs, notoriously considered as the downside of NA. At the same time, however, this intervention may cause the least occurring entities to be under-represented in the visualisations. Thus, the transparent and detailed documentation of how we intervened on the data that originates the NA visualisations counterbalances the illusion of neutrality and completeness often conveyed by ultra-polished NA visualisations.

Another issue of NA data modelling concerns the theoretical assumption upon which the technique is based. As a bare minimum, a network visualisation connects nodes through a line (i.e., edge) that carries information on the type of relation between the nodes (i.e., attributes). Nodes are understood as discrete objects, i.e., completely independent from each other (cfr. Chap. 4); this ultimately means that the nodes are modelled to remain stable and that the emphasis is on the relations, as these are believed to provide adequate explanations of social phenomena. However, this type of modelling arguably paints a rather artificial picture of both the phenomena and the actors who remain unaffected by the changing relationships between them. To put it in Drucker’s words:

This is a highly mechanistic characterization of nodes (and edges), whether they consist of human beings, institutions, or events which reduce[s] all relationships to the same presentation and make[s] static representations out of dynamic conditions. (2020, 180)

NA factually transforms continuous (i.e., inseparable) elements such as cultural actors into discrete and fixed points; this transformation is further modelled visually, giving the impression of a neutral, exact and observable description of their entanglement. The possibility to historicise actors and relations in DeXTER is a concrete example of how the post-authentic framework to NA aims to counteract this inevitably artificial ‘flattening effect’. When developing the DeXTER’s interface, we decided to model the data points displayed in the graphs according to several parameters and attributes that reflect a conceptualisation of networks as lively and dynamic structures. By sliding the time bar (cfr. Fig. 5.8), the analyst can, for example, observe not just how the relationships between entities change over time but also the entities themselves. It is for instance possible to explore how entities of interest were mentioned by migrants over time: by selecting/deselecting specific titles (cfr. Fig. 5.9) of different political orientation and geographical location, by selecting the frequency rate and sentiment polarity (cfr. Fig. 5.10) to observe the prevailing emotional attitude of the sentences in which the entities were mentioned together as well as their frequency of occurrence.

Fig. 5.8
A screenshot of the D e X T E R default landing interface. On the left, the oval shape highlights the time bar. On the right, two sections, namely, co-occurrence frequency and sentiment polarity are present.

DeXTER default landing interface for NA. The red oval highlights the time bar (historicise feature)

Fig. 5.9
A screenshot of the D e X T E R default landing interface. On the left, the vertical oval shape highlights the title parameters. They are L Italia, La Rassegna, La Ragione, Cronaca Sovversiva, L Independente, La Sentinella del West, and La Sentinella.

DeXTER default landing interface for NA. The red oval highlights the different title parameters

Fig. 5.10
A screenshot illustrates the D e X T E R landing interface. Two horizontal oval shapes on the right illustrate the frequency and sentiment polarity parameters. Frequency ranges from 0 to 3, while sentiment ranges from negative 0.4 to 0.3.

DeXTER default landing interface for NA. The red ovals highlight the frequency and sentiment polarity parameters

By visualising both entities and relations and by creating dynamic and interactive NA visualisations, the DeXTER interface on the whole aims to provide several viewpoints on the same data, and it effectively shows how several dimensions of observance dramatically affect the graphical arrangements. In the case of the historicisation feature, for example, as the data is modelled in reference to the documents’ timestamp, the analyst can swipe the time bar on the top left of the interface to explore the changing relationships between entities over time and/or at specific intervals. This adds a historical dimension to the networks and allows the analyst to observe and engage with changes in the graphs interactively as they reflect how the displayed entities were mentioned by migrants according to changing temporal parameters. We also added informative tool-tips next to each available option to encourage close engagement with the interface, with the process of data creation, with the method of NA itself and the meanings offered by these parameters (Gaver et al. 2003).

The post-authentic framework conceptualises ambiguity and uncertainty as intrinsic elements of knowledge creation in the digital; thus, rather than rejecting them or obscuring them, it preserves them as opportunities to reduce the reliance on potentially biased methods and to remind us on the whole of the illusion of certainty (Edmond 2019). Applied to NA, this means creating a space for interpretation, for instance by exposing the data multi-dimensional complexity (Windhager et al. 2019b; Drucker 2020). In the DeXTER interface, this was implemented by providing multi-perspectivity on the same nodes. DeXTER allows users to explore three types of networks: two entity-focused graphs (i.e., egocentric networks) and one issue-focused network. We decided to visualise the networks as egocentric networks for two reasons. Egocentric networks are local networks with one central node, known as the ego. This type of network visualises all the nodes directly connected to the ego, i.e., the alters. Crossley et al. (2015) suggest that one main advantage of egocentric networks is that they allow for rich visualisations even when all the entities in a data-set cannot be mapped because of the network’s large size, which is indeed the case of ChroniclItaly 3.0 as discussed in Chap. 3. Furthermore, the provided ego’s extensive information may offer a personal perspective on the node and the alters; indeed, thanks to this property, egocentric networks are often referred to as cognitive networks (Perry et al. 2018). We therefore chose egocentric network visualisations for their potential ability to provide relevant material for the study of migration as experienced and narrated by the migrants themselves. Starting from a selected entity of their choice, users can explore several parameters: the net of entities most frequently mentioned in the same sentence as the ego, the prevailing emotional attitude in those sentences, the number of times entities were mentioned together and the titles in which they were mentioned. This information is encoded and made available to the analyst both through pop-up tool-tips and through the colour of the edges (i.e., pastel blue for negative sentiment, white for neutral and pastel red for positive). Figure 5.11 shows the egocentric network for the GPE entity sicilia (Sicily) across all the titles of the collection as mentioned in sentences with prevailing positive sentiment. If ego-network is not selected, the graph additionally displays the relations among the alters. As shown in Fig. 5.12, the representation of relations can react significantly to the tiniest modification of parameters (Windhager et al. 2019b); even when the same node is selected, the overall offered perspective on the relational structure of the graph can change significantly.

Fig. 5.11
A screenshot illustrates the time period on the left, On the right, it contains two sections, namely co-occurrence frequency and sentiment polarity within the network tab. At the bottom, a network diagram with 9 nodes and 8 edges is present and is positive across all titles.

DeXTER: egocentric network for the node sicilia across all titles in the collection in sentences with prevailing positive sentiment

Fig. 5.12
A screenshot illustrates the time period on the left, On the right, it contains two sections, namely co-occurrence frequency and sentiment polarity within the network tab. At the bottom, a network diagram with 56 nodes and 65 edges is present.

DeXTER: network for the ego sicilia and alters across titles in the collection in sentences with prevailing positive sentiment

The third type of network visualisation (i.e., issue-focused network) allows the exploration of entities starting from a specific issue. Whereas in an egocentric network users observe a network which has an actor/entity of their choice as the focal node, this third visualisation displays the actors mentioned in specific newspapers on specific days. In this way, the issue-focused network offers an additional perspective on the same digital object potentially contributing valuable insights for the analysis of how events and actors of interest were portrayed by migrants of different political affiliation and who were based in different parts of the United States. Thus, instead of offering one obvious meaning, DeXTER offers multiple perspectives, and by capturing heterogeneous contexts, it creates a tension that the analyst is encouraged to resolve through independent assessment (Gaver et al. 2003). Figure 5.13 shows the default issue-focused network graph.

Fig. 5.13
A screenshot of the D e X T E R interface illustrates the network graph within the network tab on the right. It contains 17 nodes and 10 edges that explain the issue-focused network.

DeXTER: default issue-focused network graph

DeXTER’s visualisation of sentiment as an attribute of NA is also guided by post-authentic principles. As already discussed in Sect. 3.4, SA is a computational technique that aims to identify the prevailing emotional attitude, i.e., the sentiment, in a given text (or portions of a text); the sentiment is then typically categorised according to three labels, i.e., positive, negative or neutral. A problematic aspect of the technique is that it presents these labels as unambiguous, universally accepted categories, providing a neutral and observable description of reality, and obscuring the highly problematic and interpretative quality of the very process of establishment of such categories (cfr. Sect. 3.4) (Puschmann and Powell 2018). The concept of ‘sentiment score’ additionally reinforces the illusion of objectivity, and it further obfuscates the inherently vague, profoundly subjective dimension of emotions and their definitions, a process intrinsically open to multiple interpretations and subject to ambiguity. As a way to acknowledge the ambiguities of the assumptions behind the technique and of a ‘sentiment score’, DeXTER’s graph colouring scheme is fluid and nuanced (as opposed to solid colours): the colour gradients go from a darker shade of blue for the lowest score (i.e., negative) to a darker shade of red for the highest score (i.e., positive). The DeXTER’s visual representation of sentiment results in a deliberately blurred graph, the borders of the edges are purposely smudged and pale, and pastel shades are preferred over bright, solid shades; the aim is to openly acknowledge SA as ambiguous, situated and therefore open to interpretation, rather than precise, neutral and certain. By exposing these inconsistencies, post-authentic visualisations on the whole question the main positivist discourse around technology. We achieved this goal by providing a transparent documentation of how we identified the sentiment categories, how we aggregated the results, how we conducted the classification, how we interpreted the scores and how we rendered them in the visualisation, in the openly available dedicated GitHub repository which also includes the code, links to the original and processed material and the files documenting the manual interventions.

Finally, guided by the post-authentic framework, DeXTER emphasises the continuous making and re-making of data; this process of forming, arranging and interpreting data is encoded within the interface itself. Through the tab ‘Data’, users can at any point access and download the data behind the visualisations as they reflect users’ selection of filters and parameters (e.g., title, time interval, frequency, entity). The intention is to disrupt traditional notions that conceptualise data as fixed, unarguable and defined. At the same time, DeXTER acknowledges the collective responsibility of building a source of knowledge for current and future generations, and it frames the process of knowledge creation in the digital as accountable, unfinished and receptive to alternatives.

Through the exploration of several case studies, i.e., the creation, enrichment, analysis and visualisation of a digital object, this book argues that new theoretical paradigms are now urgently required; these must be centred on a reconceptualisation of digital objects as epistemic objects which themselves carry meanings and which therefore alter the perception of knowledge created in a digital environment. With specific reference to visualisations, interfaces and graphic display, the post-authentic framework that I propose in this book acknowledges them as problematic endeavours embedding a wide net of situated processes which require more systematic and sophisticated criteria than over-simplistic user-as-consumer- and problem-solver-models (Windhager et al. 2019a). The recognition of such complexities accepts and in fact embraces digital knowledge creation practices as being embedded in extremely convoluted networks of countless factors at play which cannot be fully trusted nor predicted. The post-authentic framework therefore recognises the limitations and biases of specific tools and techniques and exposes problematic processes such as data creation, selection and manipulation by openly disclosing their complexities and lifecycle, by thoroughly documenting the decisions and actions and by allowing users to access the data behind the visualisations, including making the acts of transformation explicit.

In the post-authentic interface DeXTER, we actualised this by providing a space for interpretation and individual assessment, by favouring multi-perspectivity through different types of network visualisations and by offering dynamic and interactive graphs. This also arguably alleviates the issue of displaying artificial pictures of social phenomena due to the technique’s intrinsic properties for which actors remain stable and unaffected by the relations. While I am not implying that a post-authentic framework is the perfect approach to digital knowledge creation practices, I do argue that, by redefining our understanding of the theoretical dimensions of digital objects, tools, techniques, platforms, interfaces and infrastructures, especially for humanistic enquiry, the framework offers theoretical and methodological criteria that recognise the larger cultural relevance of digital objects, and it provides an urgently needed architecture for issues such as transparency, replicability, Open Access, sustainability, data manipulation, accountability and visual display.