$\chi$iplot: web-first visualisation platform for multidimensional data

$\chi$iplot is an HTML5-based system for interactive exploration of data and machine learning models. A key aspect is interaction, not only for the interactive plots but also between plots. Even though $\chi$iplot is not restricted to any single application domain, we have developed and tested it with domain experts in quantum chemistry to study molecular interactions and regression models. $\chi$iplot can be run both locally and online in a web browser (keeping the data local). The plots and data can also easily be exported and shared. A modular structure also makes $\chi$iplot optimal for developing machine learning and new interaction methods.


Introduction and related work
This paper introduces χiplot (Š"kaIpl6tŠ), a modular system for interactive exploration of data and pre-trained machine learning models.χiplot can be run locally on the user's computer or installation-free in a web browser.Our motivation for writing χiplot was three-fold.
(i) First, we want a Python-based system to develop and test machine learning and dimensionality reduction methods, such as [1], a manifold visualisation method for explainable AI.For this purpose, we prefer a modular system that is easy to expand and modify to test new machine learning and visualisation methods and interaction ideas.
(ii) Second, we need a tool to facilitate collaboration with primarily domain experts in quantum chemistry but also other domains.Ideally, we want to avoid forcing our collaborators to install additional software.However, we also do not want to set up and maintain server infrastructure to host a web-accessible service.
(iii) Third, the system should be practical and usable for the end user, including physicists and chemists, despite being built for quick prototyping and painless implementation.We know no prior system satisfies all of these three requirements.
Many interactive visualisation tools are available; see, e.g., [7] for a recent survey and references.Much of our research collaboration targets quantum chemistry; hence the system must also be capable of visualising, e.g., molecular structures from SMILES strings [11].ChemInformatics Model Explorer [5] (CIME) is another tool that explores explainable AI in small molecule research.However, CIME has only four fixed views, and full functionality requires a server.Another recent example is XSMILES [4], where users can examine individual molecules in 2D diagrams and visualise attribution scores for atoms and non-atom tokens.

Usage
The main idea of χiplot is to simultaneously show multiple plots and visualisations to compare and contrast diverse information.Since χiplot also targets non-technical end users, intuitive visual selection and configuration of the plots are required.
χiplot comes with six types of plots out-of-the-box -scatterplots, histograms, heat maps, bar plots, data tables, and SMILES plots, which render molecules in a stick structure from a SMILES string [11] -but more can be added with χiplot's plugin system.Users can add and remove plots to create a layout that is the most optimal for their specific needs.The end users have the capability to generate clusters by running a k-means algorithm or by lasso selection on a scatterplot.Unique colours distinguish the generated clusters.In addition, the end users can generate a 2D embedding through Principal Component Analysis (PCA).
To use χiplot, the user may install it with pip install xiplot.The xiplot console command is then available to host a local χiplot server.Alternatively, an installation-free WebAssembly (WASM) 3 version can be used immediately at https://edahelsinki.fi/xiplot.
We demonstrate the main concepts with the QM9 molecular dataset [8,9], a collection of quantum chemical properties calculated for small organic molecules.Our machine-learning task is to estimate some quantum chemical properties from their structural description.We can use physics simulators with varying fidelity or regression models.In this example, we want to study how the structures in the dataset relate to the estimation task.We have precomputed a 2D Slisemap [1] embedding (revealing the structures relevant to a regression model) and attached the embedding to the dataset file we uploaded to χiplot.
Fig. 1 shows a view of the χiplot interface during our exploration.A chemist can explore the Slisemap embedding in a scatter plot on the left.There is a notable cluster structure, so we use χiplot to find the clusters and plot their distribution in the middle.If we compare the two clusters, we notice that the distributions of the functional groups differ.For example, we could manually draw an additional cluster in the scatter plot to further study the two subgroups in the rightmost cluster.
The behaviour of a molecule is not only determined by the functional groups but also by how they are structured.However, finding good summary statistics for structure is much more difficult.Therefore, we add a visualisation of individual molecules on the right of Fig. 1.A chemist can then rapidly inspect multiple molecules inside and between clusters by hovering over the points in the scatter plot; the molecule visualisation is automatically updated.

Description of the system
A key aspect of χiplot is interactivity, not just for a single plot but also between plots.For example, selecting a data item in one might show you more information about it in another, as described above.To accomplish this interactivity, the plots of χiplot are implemented as independent modules, communicating through shared data storage.Furthermore, to support collaboration and sharing, the set of active plots, their configuration, and the data can be saved to and restored from a file.Since χiplot is an interactive system, time-consuming computations (e.g., learning the Slisemap embedding) should be done as part of data preprocessing.
χiplot is implemented in Python using Plotly [6] for the plots and Dash for the interactivity.Usually, this would require the users to be able to install Python packages (see Section 2).However, we also provide a static server-less webpage version of χiplot that runs both the Dash backend and the Plotly frontend installation-free inside a browser using WebAssembly [10] (WASM).This also means no data leaves the user's computer in the WASM version.
In detail, the WASM version of χiplot uses Pyodide [3] to run Python in the browser.The front-and backend communication is intercepted and redirected to the in-WASM server, inspired by the WebDash prototype [2].Crucially, neither the front-nor backend code needs to know that it runs inside a browser.
As Pyodide does not yet support all Python packages, we use dynamic import detection to enable certain features and fallbacks, such as additional data file formats.Deploying the WASM version requires bundling all frontend files, χiplot, and the scripts that bootstrap the web app in the WASM backend, all documented in the χiplot GitHub repository.
To open up χiplot to even more use cases, χiplot has an API for creating plugins for, e.g., new visualisations and machine learning methods.It uses the "entry points" feature of Python to discover installed plugins, which also works in the WASM version.Due to the modular design with shared data, new plots can automatically interact with old ones.

Conclusions
We have already found χiplot helpful when collaborating with domain experts since it lets them configure interactive plots without programming or installing anything 4 .The online version also enables easy results sharing without exposing the data to any third party.For more technical users χiplot is easy to maintain end expand due to the modular architecture.Finally, χiplot is available under the Open Source MIT license from GitHub 5 (which includes documentation, usage examples, and a demonstration video).