Keywords

1 Overview

IIIF (International Image Interoperability Framework) is a collection of open web standards broadly used for presenting and annotating high resolution images and audiovisual content [1]. These standards are adopted by hundreds of GLAM (galleries, libraries, archives, museums) organisations worldwide [2], and permit the reuse and recombination of digital assets across traditional digital institutional boundaries [3]. While these standards have been successful for 2D images and Audio/Video (AV) data, there is a growing interest in how they might be applied to 3D content. An example of this would be to combine shards of pottery from disparate collections into a single whole within a viewing experience.

The IIIF 3D Community Group [4] has been cataloguing user stories and user needs [5], exploring various 3D workflows [6], and initiated a viewer comparison project [7] with major 3D developers and researchers, and ongoing dialogue with VIGIE2020/654 [8] study colleagues. The viewer project was collaboratively considering common challenges and potential solutions to key areas, as part of forming the IIIF 3D Technical Specification Group [9], for which the 3D group is engaging even more widely with other specialists and representatives across user communities and standards bodies.

Some key questions have been around the conceptual framework needed to address the use cases for 3D scenes or digital dioramas, including by adding depth to the current 2D IIIF canvas model, and by embedding one or more canvases within a 3D scene (e.g. multiple paintings or texts or music associated with a cathedral, temple or palace, held in separate collections, reunited with their original walls and interiors of a suitable building model). With growing user and institutional demands, technical developments, and examples of advanced research collection and integration of virtual resources (e.g. Sketchfab [10], MorphoSource [11], Exhibit [12], Mozilla Hubs [13]), there is pressing need for a technical specification to ensure interoperability and longer term sustainability. The VIGIE2020/654 study has more information about these needs.

The plan for the IIIF 3D Technical Specification Group is to continue a collaborative approach to clarifying and specifying interoperable frameworks for 3D data, including common ways to:

  1. 1.

    annotate 3D media of various types into a shared canvas space, with commentary

  2. 2.

    combine 3D media with images and AV content within a shared space

  3. 3.

    specify the presentation (placement, orientation, and contextualization) of 3D media

The group will continue its work with other standards bodies and 3D image viewer developers, to collaboratively address the many challenges around this developing area. The combined and widespread expertise from the many 3D specialists will continue to guide the work of the IIIF 3D Technical Specification Group, as it outlines sustainable options for the interworking of existing open standards (e.g. VRML/X3D [14], WebXR [15]), established foundations (e.g. WebGL [16], Three.js [17], react-three-fiber [18]), and emerging proposals (e.g. <model> tag [19]), to provide recommendations for expansions to and modifications of IIIF APIs to better interoperate with the evolving digital ecosystem of online 3D content.

These 3D developments will complement the ongoing updates and continued widespread adoption of the IIIF technical specifications for 2D and AV data, which has been enabling greater access to widespread resources – of even greater significance for teaching and learning and research during the global pandemic – as well as providing for richer presentations, close object inspection through deep zoom ranges, and shareable annotation of media using W3C’s Web Annotation Data Model [20].

IIIF specifications also enable the recombination of long-separated parts of an original whole (e.g. missing sections or leaves in a digitised medieval manuscript, viewed with missing pieces contained in another digitised collection). IIIF adoption continues to enable effective sharing and support for the preservation of cultural heritage resources, whether as individual items or as combinations of media from one or more collections, locally and globally.

These technical developments for the community rely on regular meetings and input from the community. Planning for the IIIF 3D technical specification includes recorded monthly meetings, complementing the more general 3D Community Group presentations and discussions, and involving group problem-solving with regular input from and interactions with representation from 3D researchers and media and viewer developers, including from University of Cambridge, Duke University, University of Edinburgh, UC San Diego universities, Deutsches Museum, IIIF Consortium, Mnemoscene, MorphoSource, MPEG consortium, Sketchfab, Smithsonian Institution, Visual Computing Lab (CNR-ISTI, Italy), Web3D Consortium. Google has presented and been part of our deliberations, and we plan to follow up with Apple to learn more about the plan by the WebKit team to propose a <model> HTML element for 3D. We also intend to maintain a close engagement with existing web-based 3D standards such as X3D, with its VRML roots, and WebXR.

We regularly collaborate with 3D-related projects and funding proposals, and ask funding bodies to help engage others in further developing this collaborative community, ensuring that we interconnect with more communities, to further develop 3D standards which will be the most widely usable and adopted across current and future proposals and projects.

2 Key Use Cases and Prototypes

Drawing on more than a year engaging with various individuals, institutions, and professions through regular IIIF and other group meetings and conferences, the IIIF 3D Community Group identified a wide set of user stories [21]. With that input, and as part of the work of the group making preparations for the 3D TSG, the core practical use cases identified are:

  • Display a 3D model, specifying position, orientation, and scale

  • Display multiple 3D Models in a shared space

  • Display a 3D model alongside 2D images and AV content within a single viewing experience

  • Annotate displayed 3D models with commentary

  • Sharing camera position, orientation, and target

  • Display multiple 3D models that can be rotated and manipulated independently

To clarify the viability of meeting these cases, some research and demonstrations were able to highlight the availability of existing practical options, as a basis to proceed and to indicate areas for further development in the future. For instance, there was interest in an initial demonstration of the sharing of two or more 3D models, stored in different formats, which could be rendered in the same visual space. This is a basic requirement, similar to the common rendering of differently formatted 2D images together (e.g. JPEGs and PNGs) in the same viewer, which is essential to enable the important IIIF options for interoperability across the necessary formats.

While it has long been possible to manually load 3D models in a web browser, using something like the popular three.js cross-browser JavaScript library and the underlying WebGL application programming interface (API), there are well-developed viewers built on such frameworks which provide additional options useful for interacting with the models. As part of the IIIF 3D group efforts, even more advanced options have been shown in some technical proofs of concept that demonstrate multiple assets in different formats being rendered in the same virtual space, most using IIIF in some capacity, in particular:

  • The Infinite Canvas [22]: Combines a 3D asset, an audiovisual clip, and multiple still 2D images in a single navigable 3D space, all using IIIF manifests for each asset (Fig. 1).

    Fig. 1.
    figure 1

    Figure from the initial screen of The Infinite Canvas, a proof of concept application which is shown displaying 2D, an audiovisual clip, and a 3D model.

  • Antikythera Mechanism [23]: Combines three distinctly different types of 3D files (GLTF, CT – computed tomography, and point clouds; uses JSON, not currently in IIIF manifest format) (Fig. 2)

    Fig. 2.
    figure 2

    Figure of the Antikythera Mechanism, combining three different types of 3D files – GLTF, CT (computed tomography), and point clouds.

  • Mozilla Hubs gallery demonstrating three 3D assets from different sources [24]: created for the IIIF June 2021 Annual Conference to demonstrate that it is possible to combine 3D assets from different data resources (in this case MorphoSource, Royal Pavilion & Museums Trust [25], and The British Library [26]). All assets are hosted via IIIF and IIIF was used to locate the assets. IIIF manifests were not used directly at this point, however the actual models were downloaded from repositories and uploaded to Hub. In Mozilla Hubs, the space is navigable either on flat computer monitors, mobile AR, or in a VR headset, and multiple users can interact with each other and with the objects in the space (Fig. 3).

    Fig. 3.
    figure 3

    Figure from a Mozilla Hubs gallery, displaying 3D assets from three sources – MorphoSource, Royal Pavilion & Museums Trust, The British Library; used with permission from Julie Winchester, Doug Boyer, Edward Silverton, and respective collection holders.

3 Combining Content in 3D Environments

To focus on the operational use cases for combining media in a 3D environment, and to consider best options to enable the intersection of 3D and non-3D content in the same rendering environment in a web browser, the IIIF group discussions and follow up research identified two main approaches for translating the traditional IIIF canvas into a 3D scene. These are high-level and use case oriented solutions for creating IIIF manifests that address some options for adding a third dimension to extend the existing 2D canvas or otherwise fitting the 2D canvas in a 3D space. While not necessarily mutually exclusive, these options require further consideration and experimentation to help enable and encourage interoperable and sustainable development in the 3D context.

Canvases and Scenes

By default, the IIIF presentation specification makes use of a canvas concept which has some of the following basic properties:

  • Bounded dimensions (width and height)

  • Coordinate origin at top left

  • Fundamentally “flat,” intended for standard mobile/computer 2D use cases

  • Can place standard assets (images, video, audio) and 3D mesh models on the canvas

    • Standard assets have parameters to indicate size and position within the canvas

  • Can nest secondary (and subsequent) canvases inside a primary canvas

    • Placement of the secondary canvas is determined by parameters indicating size and position within the canvas

We have proposed a kind of straw man concept, to compare and contrast with the canvas, calling it a “scene”. Scenes are similar to canvases but have some important differences. The following basic properties apply to scenes:

  • Unbounded dimensions (across X, Y, and Z axes, or equivalent polar coordinates)

  • Origin at the centre of the coordinate space

  • Can place standard assets (images, video, audio), 3D mesh models, 3D point clouds, or 3D volumes in the scene

    • All assets have parameters to indicate scale (3D) or size (2D/AV), position, and rotation within the scene relative to the scene’s coordinate origin

  • Can nest secondary scenes within primary scenes

    • Placement of the secondary scene is determined by parameters indicating size, position, and rotation within the scene. Objects inside the secondary scene respect the coordinate origin of the secondary scene, but end up modified by the placement of the secondary scene inside the primary scene. Example: An overall “world” scene that has placed a single secondary “room” scene at position (0, 0, 10). The room has a lamp inside of it at the room’s origin (0, 0, 0). To observers, the lamp would be located at world coordinates (0, 0, 10).

  • Can nest secondary canvases within primary scenes

    • Placement of the secondary canvas is determined by parameters indicating size, position, and rotation within the scene.

Additionally, the scene concept requires one additional basic property to be added to canvases, namely that:

  • Scenes can be nested inside primary canvases

    • Placement of the secondary scene is determined by parameters indicating size and position within the canvas

Therefore, canvases and scenes can nest inside of each other (and canvases can nest inside canvases, and scenes can nest inside scenes, etc.). It could be argued that canvases and scenes are really the same thing, but with different component-style properties (e.g., see the IIIF-ECS extension proposal [27]). These ideas could easily be expanded to treat canvases and scenes as specific manifestations of a more generic entity. This would make some of the operational use case examples even simpler to describe than they are explained here, at the expense of introducing more lower-level complexity.

While there is a question of whether this proposal would entail a commitment to setting a 0,0 origin point in the top left corner of a canvas, with a z-axis extending from there outwards, there is also consideration that this would be unnecessarily complicated to implement compared to a common configuration of an infinite scene setting 0,0,0 in its centre. The work of the IIIF Technical Specification Group has been supported by the understanding that the Shared Canvas Model can potentially be altered to permit “scene-like” functionality. In order for 3D to become a useful part of the IIIF specification it will require the same level of changes and the resulting kind of core status as has been achieved with that addition of the IIIF AV technical specification, as part of the major update of the Presentation API in 2021.

Scenarios for Transforming 2D into 3D

Extend 2D Canvas I: I want to display a single 3D object on a flat webpage for rotation, zooming, panning of the object.

This is an operational use case that IIIF 3.0 does currently support for 3D meshes, via the “Model” or “PhysicalObject” annotation types, although the standard parameters for size (height and width) don’t translate naturally to a 3D object and there is no way to specify properties such as rotation.

Using the concepts described above, this use case would be achieved by a primary canvas nesting a single secondary scene, with the scene’s size dimensions equal to the size of the canvas. The 3D object would be placed within the scene at the origin with scale, rotation, and other properties explicitly specified.

The currently supported IIIF manifest that presents a 3D Model or PhysicalObject annotation as the only annotation on a canvas could easily be considered a simplified convenient way of describing the more complex setup described above. In other words, creating a IIIF manifest that posits a canvas with a single PhysicalObject annotation could be read as a convenience method to describe a Canvas -> Scene -> Physical Object (Scale 1, Rotation null, Position {0, 0, 0}) hierarchy. In fact, a potential strength of this solution is that it allows for “collapsing” or “expanding” complex circumstances as is needed.

Extend 2D Canvas II: I want to display several 3D objects in separate views on a flat webpage so that viewers can compare and contrast multiple objects at once.

This could be achieved with a single primary canvas nesting multiple secondary scenes, with the gridding and size of each scene determined at the canvas level. Each secondary scene would place a single 3D object at the origin with standard scale and no rotation.

For reference, this approach is how the Aleph viewer [28] on MorphoSource uses IIIF manifests currently, although it obviously goes beyond the current IIIF specifications in order to display volumes as well as meshes.

Fit 2D Into 3D Space I: I want users to interact freely (in flat, VR, or AR contexts) with a virtual space. This virtual space should have a mesh and a point cloud, bounded by two “walls” displaying 2D content that are already accessible via two standard IIIF 3.0 manifests.

The primary entity in this operational use case is a scene rather than a canvas (or perhaps a canvas specifying flat screen size that nests the world scene within it). The mesh and point cloud are nested within the scene at the desired positions, scales, rotations, etc. Finally, the two already-existing IIIF manifests - each with its own respective canvas - are nested within the 3D scene with their own position, scale, and rotation. The placement of the assets within each canvas are determined by the original manifests for those canvases, and are just displayed in the world scene.

Fit 2D Into 3D Space II: I want to fill a virtual gallery hall with individual virtual rooms. Each room has been designed by a separate group and they consist of a mix of 3D objects displayed on room floors and 2D content displayed on room walls.

Each room in this example is equivalent to the operational use case described above this, and so each room is a scene nesting combinations of assets and IIIF manifests describing canvases to serve as “walls.” Rooms are then nested as multiple secondary scenes inside a larger world scene, which provides the position/rotation/scale information necessary to connect the rooms together, make them sit alongside each other, etc. Objects in each room are placed using the coordinate systems of the rooms (or in some cases the top-left origin coordinate system of canvases inside rooms!), but the world scene brings these rooms together in a way that makes sense for the overall purpose of the project.

For reference, this is the approach taken in Mozilla Hubs and the Infinite Canvas demonstrations noted above, with the latter also obviously going beyond the current IIIF specifications in order to display the many media types.

4 Requirements for Annotation in a 3D Environment

The operational use cases for annotation in the context of IIIF and in a 3D environment have two fundamental purposes – to add textual and other forms of commentary to a 3D model, and to add multiple models from different sources into a combined scene. The essential and broad use of annotation in a IIIF context needs special considerations for 3D environments, as highlighted in the following details from the IIIF 3D Technical Specification Group.

Annotations for Commentary

The most basic form of 3D commentary annotation we can identify, specifies a point on the surface of a model by its coordinates (x, y, z).

The point itself has no dimension, and serves as an anchor for a visual marker, often represented as a pin, or circular hotspot. This marker can either be displayed in screen space, overlaid on top of the 3D scene, or in world space, displayed within the 3D scene with its own geometry (such as a sphere).

In the case of screen space annotations, they may always be visible regardless of position given that they are overlaid on top of the 3D scene. However, this is not always desirable, and a content publisher may want them to appear only when on a surface facing the camera.

To achieve this, it is necessary for each annotation to store the surface normal vector of the mesh at the point where the annotation was created. This can be used to determine if the annotation is facing the camera and should be displayed, or if on the reverse side of a model and hidden.

In the case of annotations located within the scene’s world space, it is not always possible to ensure that they are visible to the user just by defining their position, as the 3D model may occlude itself. For example, the arm of a statue might occlude an annotation on the torso, or if there are multiple objects within the scene one object might occlude another.

Therefore it is necessary to optionally include a camera position and orientation along with the annotation coordinates in order for a content publisher to ensure that an annotation is visible to the user. If for example the user were using a simple previous/next interface to page through annotations, this would allow the camera view of the scene to be automatically adjusted for each annotation in order for it to be visible, without requiring the user to manually rotate the scene.

The necessity to specify camera position and orientation per annotation is not limited to world space annotations however, and can also provide a satisfying user experience for screen space annotations.

Beyond the concepts of screen space and world space, there is also “object space”, i.e. the coordinate system relative to each 3D model’s individual origin. This is a useful concept when loading models from various sources which carry their own commentary annotations, in that the annotations are positioned relative to each model. This bypasses the need to translate each model’s annotations into world space and means the models can easily be remixed in various ways without losing their associated annotations.

While there are other forms of annotation possible beyond points on a surface, such as regions of a surface, or volumes within an object, we believe that focussing on the simplest case first will provide insights into how to extend into other modalities.

Annotations for Scene Building

Annotations with a “painting” motivation are used to combine images within a “shared canvas” coordinate space in IIIF. This concept could also extend to 3D models, where the body of an annotation is a 3D model URL, and the target is a canvas.

In order to create a 3D scene such as by placing artworks from various sources within a gallery space, or combining disparate shards of pottery into a whole, we will need the ability to specify xyz coordinates for each individual model within the shared canvas (as with commentary annotations).

If for example each pottery shard was produced using a slightly different method, resulting in varying object scales, origins, and rotations, it will be necessary to override/normalise these properties per annotation in order to combine them effectively. This is similar to how the Web Annotation Data Model allows images to be painted onto a canvas using a given xywh parameter.

It is common practice in 3D viewers such as Sketchfab for users to specify an initial camera position and orientation for a 3D model or scene. This can satisfy an aesthetic or informative preference on behalf of the content publisher. This is similar to how commentary annotations require an associated camera position and orientation, and the same concept may apply to “painting” annotations. One could imagine a scene combining multiple 3D models, where clicking on each model animates the camera to a new position and orientation in order to best view that model.

5 Authentication and Search APIs for 3D Content

Given the ongoing developments with the Authentication and Content Search technical specification groups (TSGs), and the related changes introduced in version 3.0 of the IIIF Presentation and Image APIs, as well as developments in the browser community and the evolving web landscape, the 3D Technical Specification Group expect similar challenges, as well as the need for collaboration with the other TSGs in these areas of mutual concern.

The 3D TSG will coordinate efforts with those groups to ensure that any authentication or search solutions pertaining to 3D content are either consistent with any updates from those groups, or can clarify and inform those groups about any changes which may be suggested by particular features of the broad base of 3D content forming part of the work of the 3D TSG.

That said, and given the IIIF bases upon which the related 3D Community efforts are already developing (e.g. with reference to Sketchfab.com, and the open source technologies used by MorphoSource.org), we anticipate no new difficulties with searching the text content of manifests and annotations to be associated with 3D data. Existing examples of annotations used with 3D data include key viewers used with Sketchfab, Smithsonian Institute, MorphoSource, and X3D content. Some annotation models are closer to existing IIIF annotation approaches, and all annotation models will be key features of the 3D TSG’s further research and review of search-related concerns.

Search and authentication requirements for 3D data may at some point (perhaps in a subsequent version rather than initial specification) include extra differentiation between component parts of a composite object, or with reference to individual items in a scene comprised of combined data which may require login (e.g. for selected images and audio-visual items). These will be matters for future endeavours, following the initial phase of 3D technical specification.

6 IIIF 3D Technical Specification Group Charter

The group effort to establish the IIIF 3D Technical Specification Group expanded as it involved more committed institutions and volunteers, over the course of more than a year. That group includes research and cultural heritage institutions who are already publishing 3D content and who are willing to experiment with various approaches to help improve the creating, storing, cataloguing, sharing and sustaining of this media.

The initial IIIF 3D Technical Specification Group Charter, formally accepted by IIIF in December 2021, includes a list of those collaborating institutions and their representatives, along with more information about planned scope and roadmaps and other key details. For reference, it is included here.

IIIF 3D Technical Specification Group Charter [29]

Introduction

As IIIF has evolved from an initial focus on 2D images, encompassing new media modalities such as audiovisual content with time-based data, we see a plethora of unmet use cases throughout the cultural heritage field relating to the display of 3D media and related metadata. We recognise these requirements and the need for further developing a conceptual framework, which can complement and extend existing IIIF specifications.

The IIIF 3D Technical Specification Group will collaboratively clarify and specify common interoperable frameworks pertaining to the 3D data domain. This will include ways to:

  • annotate 3D media of various types into a shared canvas space

  • annotate 3D media with commentary

  • combine 3D media with images and AV content within a shared space

  • specify the presentation (placement, orientation, and contextualization) of 3D media

The group will work with other standards bodies and 3D image viewer developers, and will collaboratively address challenges around this dynamic area, which shows great potential for a IIIF resolution, as practical options for media sharing and interchange, for which there is substantial demand and no demonstrably sustainable alternatives. Guided by widespread expertise from the 3D Community, committed to this purpose, the IIIF 3D TSG will outline sustainable options for the interworking of existing open standards, to provide recommendations for expansions to and modifications of IIIF APIs to better interoperate with the evolving digital ecosystem of online 3D content.

Scope

It is the intention of this group to explore suitable IIIF extensions and identify any necessary changes to the core IIIF specifications (e.g. adding a third physical dimension) to support display of 3D media using IIIF tools. After a period of time and suitable feedback from implementers, we expect to propose changes to the Presentation API, to accommodate stand alone and combinations of media, whether 2D, A/V or 3D.

This group will focus on the use cases identified, in versioned phases, including options to:

  • display a 3D model, specifying position, orientation, and scale

  • display a 3D model alongside a 2D image

  • display multiple 3D models in a shared space

  • annotate displayed 3D models with commentary

  • specify initial camera position, orientation, and target in 3D space

IIIF 3D TSG will look forward to working with other IIIF groups, especially where there are shared areas of interest (e.g. museums, archives, maps), and welcomes contributions to collection of user stories, ongoing community discussions, and specialist app development.

Deliverables

The expected initial deliverables are IIIF API extensions and a specification change to define interoperable methods to enable:

  • three-dimensionality via a third physical axis extending orthogonally from the traditional canvas model, with accommodation for the scene concept, and consideration for use cases and backward compatibility

  • viewing of 3D media, including combined media (multiple 3D assets and/or 3D and non-3D [2D or A/V] assets combined)

  • asset positioning, orientation and scaling

  • initial view, and shareable customised views

  • adding associated annotations and linked metadata

Roadmap

  • Group formation December 2021 - (See: IIIF 3D - TSG Preparation Checklist)

  • Research and secure funding for development work – through December 2022

  • Initial demo(s), testing, and feedback March 2023

  • Broad set of prototype demos with testing and feedback June 2023 (Annual Conference)

  • Draft specification change recommendations June 2025 (Annual Conference)

  • Proof of concept specification implementation December 2025 (Fall Working Meeting)

Communication Channels

Community Support

  • British Library (Adi Keinan-Schoonbaert)

  • Cyprus University of Technology / Digital Heritage Research Lab (Marinos Ioannides)

  • Deutsches Museum (Georg Hohmann)

  • Duke University (Doug Boyer)

  • ISTI - CNR (Federico Ponchio)

  • MorphoSource (Julie Winchester)

  • Mnemoscene (Ed Silverton)

  • sketchfab.com (Thomas Flynn)

  • The Smithsonian Institution (Jon Blundell, Jamie Cope, Vince Rossi)

  • University of Basel (Peter Fornaro)

  • University of Cambridge (Ronald Haynes)

  • University of Edinburgh (Mike Boyd)

  • University of Brighton (Karina Rodriguez Echavarria)

  • UC San Diego (Scott McAvoy)

  • University of Oxford (Kathryn Eccles, Nandy Millan)

Technical Editors

  • Rob Sanderson (Yale University)

  • Mike Appleby (Yale University)

  • Simeon Warner (Cornell University)

  • Dawn Childress (UCLA)

  • Tom Crane (Digirati)

Feedback: iiif-discuss@googlegroups.com