1 Introduction

Computer games with complex virtual worlds for entertainment are enjoying widespread use, and in recent years we have witnessed the introduction of serious games, including the use of games to support cultural heritage purposes, such as historical teaching and learning, or for enhancing museum visits. At the same time, game development has been fuelled by dramatic advances in computer graphics hardware—in turn driven by the success of video games—which have led to a rise in the quality of real-time computer graphics and increased realism in computer games. The successes of games that cross over into educational gaming—or serious gaming, such as the popular Civilization (although “abstract and ahistorical” (Apperley 2006)) and Total War series of entertainment games, as well as games and virtual worlds that are specifically developed for educational purposes, such as Revolution (Francis 2006) and the Virtual Egyptian Temple (Jacobson and Holden 2005), all of which exist within a cultural heritage context, reveal the potential of these technologies to engage and motivate beyond leisure time activities.

The popularity of video games, especially among younger people, makes them an ideal medium for educational purposes (Malone and Lepper 1987). As a result, there has been a trend towards the development of more complex, serious games, which are informed by both pedagogical and game-like, fun elements. The term ‘serious games’ describes a relatively new concept, computer games that are not limited to the aim of providing entertainment, that allow for collaborative use of 3D spaces that are used for learning and educational purposes in a number of application domains. Typical examples are game engines and online virtual environments that have been used to design and implement games for non-leisure purposes, e.g. in military and health training (Macedonia 2002; Zyda 2005), as well as cultural heritage (Fig. 1).

Fig. 1
figure 1

‘Roma Nova’–experiencing ‘Rome Reborn’ as a game

This report explores the wider research area of interactive games and related applications with a cultural heritage context and the technologies used for their creation. Modern games technologies (and related optimisations (Chalmers and Debattista 2009) allow the real-time interactive visualisation/simulation of realistic virtual heritage scenarios, such as reconstructions of ancient sites and monuments, while using relatively basic consumer machines. Our aim is to provide an overview of the methods and techniques used in entertainment games that can potentially be deployed in cultural heritage contexts, as demonstrated by particular games and applications, thus making cultural heritage much more accessible.

Serious games can exist in the form of mobile applications, simple Web-based solutions, more complex ‘mashup’ applications (e.g. combinations of social software applications) or in the shape of ‘grown-up’ computer games, employing modern games technologies to create virtual worlds for interactive experiences that may include socially based interactions, as well as mixed reality games that combine real and virtual interactions, all of which can be used in cultural heritage applications. This state-of-the-art report focuses on the serious games technologies that can be found in modern computer games.

The report is divided into two main sections:

  • The first of these is concerned with the area of cultural heritage and serious games, which integrate the core technologies of computer games with principled pedagogical methodologies. This is explored in a range of characteristic case studies, which include entertainment games that can be used for non-leisure purposes as well as virtual museums and educationally focused and designed cultural heritage projects.

  • The second part investigates those computer games technologies that are potentially useful for the creation of cultural heritage games, such as real-time rendering techniques, mixed reality technologies and subdomains of (game) artificial intelligence. This literature review includes discussions of strengths and weaknesses of the most prominent methods, indicating potential uses for cultural heritage serious games and illustrating challenges in their application.

2 The state-of-the-art in serious games

The state-of-the-art in Serious Game technology is identical to the state-of-the-art in Entertainment Games technology. Both types of computer game share the same infrastructure, or as Zyda notes, “applying games and simulations technology to non-entertainment domains results in serious games” (Zyda 2005). The main strengths of serious gaming applications may be generalised as being in the areas of communication, visual expression of information, collaboration mechanisms, interactivity and entertainment.

Over the past decade, there have been tremendous advances in entertainment computing technology, and “today’s games are exponentially more powerful and sophisticated than those of just three or four years ago” (Sawyer 2002), which in turn is leading to very high consumer expectations. Real-time computer graphics can achieve near-photorealism and virtual game worlds are usually populated with considerable amounts of high quality content, creating a rich user experience. In this respect, Zyda (2005) argues that while pedagogy is an implicit component of a serious game, it should be secondary to entertainment, meaning that a serious game that is not ‘fun’ to play would be useless, independent of its pedagogical content or value. This view is not shared by all, and there exist design methodologies for the development of games incorporating pedagogic elements, such as the four-dimensional framework (de Freitas and Oliver 2006), which outlines the centrality of four elements that can be used as design and evaluation criteria for the creation of serious games. In any case, there is a need for the game developers and instructional designers to work together to develop engaging and motivating serious games for the future.

2.1 Online virtual environments

There is a great range of different online virtual world applications—at least 80 virtual world applications existed in 2008 with another 100 planned for 2009. The field is extensive, not just in terms of potential use for education and training but also in terms of actual usage and uptake by users, which is amply illustrated by the online platform Second Life (Linden Labs), which currently has 13 million registered accounts worldwide. The use of Second Life for supporting seminar activities, lectures and other educational purposes has been documented in a number of recent reports and a wide range of examples of Second Life use by UK universities has been documented (Kirriemuir 2008). Online virtual worlds provide excellent capabilities for creating effective distance and online learning opportunities through the provision of unique support for distributed groups (online chat, the use of avatars, document sharing, etc.). This benefit has so far been most exploited in business where these tools have been used to support distributed or location-independent working groups or communities (Jones 2005). Online virtual worlds in this way facilitate the development of new collaborative models for bringing together subject matter experts and tutors from around the world, and in terms of learning communities are opening up opportunities for learning in international cohorts where students from more than one country or location can learn in mixed reality contexts including classroom and non-classroom based groups (https://lg3d-wonderland.dev.java.net). Online virtual worlds also notably offer real opportunities for training, rehearsing and role playing.

2.2 Application to cultural heritage: case studies

This section provides an overview of some of the most characteristic case studies in cultural heritage. In particular, the case studies have been categorised into three types of computer-game-like applications including: prototypes and demonstrators, virtual museums and commercial historical games.

2.2.1 Prototypes and demonstrators

The use of visualisation and virtual reconstruction of ancient historical sites is not new, and a number of projects have used this approach to study crowd modelling (Arnold et al. 2008; Maim et al. 2007). Several projects are using virtual reconstructions in order to train and educate their users. Many of these systems have, however, never been released to the wider public, and have only been used for academic studies. In the following section, the most significant and promising of these are presented.

2.2.1.1 Roma Nova

The Rome Reborn project is the world’s largest digitisation project and has been running for 15 years. The main aims of the project are to produce a high-resolution version of Rome at 320 AD (Fig. 2), a lower resolution model for creating a ‘mashup’ application with ‘Google Earth’ (http://earth.google.com/rome/), and finally the collaborative mode of the model for use with virtual world applications and aimed primarily at education (Frischer 2008).

Fig. 2
figure 2

‘Rome Reborn’ serious game

In order to investigate the efficacy of the Rome Reborn Project for learning, exploration, re-enactment and research of cultural and architectural aspects of ancient Rome the serious game ‘Roma Nova’ is currently under development. In particular, the project aims at investigating the suitability of using this technology to support the archaeological exploration of historically accurate societal aspects of Rome’s life, with an emphasis on political, religious and artistic expressions.

To achieve these objectives, the project will integrate four cutting-edge virtual world technologies with the Rome Reborn model, the most detailed three-dimensional model of Ancient Rome available. These technologies include:

  • the Quest3D visualisation engine (Godbersen 2008)

  • Instinct(maker) artificial life engine (Toulouse University) (Sanchez et al. 2004)

  • ATOM Spoken Dialogue System (http://www.agilingua.com)

  • High-resolution, motion-captured characters and objects from the period (Red Bedlam).

The use of the Instinct artificial life engine enables coherent crowd animation and therefore the population of the city of Rome with behaviour-driven virtual characters. These virtual characters with different behaviours can teach the player about different aspects of life in Rome (living conditions, politics, military) (Sanchez et al. 2004). Agilingua ATOM’s dialogue management algorithm allows determining how the system will react: asking questions, making suggestions, and/or confirming an answer.

This project aims to develop a researchers’ toolkit for allowing archaeologists to test past and current hypotheses surrounding architecture, crowd behaviour, social interactions, topography and urban planning and development, using Virtual Rome as a test-bed for reconstructions. By using such a game the researches will be able to analyse the impact of major events. For example, the use of this technique would allow researchers to analyse the impact of major events, such as grain distribution or the influx of people into the city. The experiences of residents and visitors as they pass through and interact with the ancient city can also be explored.

2.2.1.2 Ancient Pompeii

Pompeii was a Roman city, which was destroyed and completely buried in the first recorded eruption of the volcano Mount Vesuvius in 79 AD (Plinius 79a, Plinius 79b). For this project, a model of ancient Pompeii was constructed using procedural methods (Müller et al. 2005) and subsequently populated with avatars in order to simulate life in Pompeii in real time. The main goal of this project was to simulate a crowd of virtual Romans exhibiting realistic behaviours in a reconstructed district of Pompeii (Maim et al. 2007). The virtual entities can navigate freely in several buildings in the city model and interact with their environment (Arnold et al. 2008).

2.2.1.3 Parthenon Project

The Parthenon Project is a short computer animation that "visually reunites the Parthenon and its sculptural decorations" (Debevec 2005). The Parthenon itself is an ancient monument, completed in 437 BC, and stands in Athens, while many of its sculptural decorations reside in the collection of the British Museum, London (UK). The project goals were to create a virtual version of the Parthenon and its separated sculptural elements so that they could be reunited in a virtual representation.

The project involved capturing digital representations of the Parthenon structure and the separate sculptures, recombining them and then rendering the results. The structure was scanned using a commercial laser range scanner, while the sculptures were scanned using a custom 3D scanning system that the team developed specifically for the project (Tchou 2002). The project made heavy use of image-based lighting techniques, so that the structure could be relit under different illumination conditions within the virtual representation. A series of photographs were taken of the structure together with illumination measurements of the scene’s lighting. An inverse global illumination technique was then applied to effectively ‘remove’ the lighting. The resulting “lighting-independent model” (Debevec et al. 2004) could then be relit using any lighting scheme desired (Tchou et al. 2004; Debevec et al. 2004).

Although the Parthenon Project was originally an offline-rendered animation, it has since been converted to work in real-time (Sander and Mitchell 2006; Isidoro and Sander 2006). The original Parthenon geometry represented a large dataset consisting of 90 million polygons (after post-processing), which was reduced to 15 million for the real-time version and displayed using dynamic level-of-detail techniques. Texture data consisted of 300 MB and had to be actively managed and compressed, while 2.1 GB of compressed High-Dynamic Range (HDR) sky maps were reduced in a pre-processing step. The reduced HDR maps were used for lighting, and the extracted sun position was used to cast a shadow map.

2.2.2 Virtual museums

Modern interactive virtual museums using games technologies (Jones and Christal 2002; Lepouras and Vassilakis 2004) provide a means for the presentation of digital representations for cultural heritage sites (El-Hakim et al. 2006) that entertain and educate visitors (Hall et al. 2001) in a much more engaging manner than was possible only a decade ago. A recent survey paper that examines all the technologies and tools used in museums was recently published (Sylaiou et al. 2009). Here, we present several examples of this type of cultural heritage serious game, including some virtual museums that can be visited in real-world museums.

2.2.2.1 Virtual Egyptian Temple

This game depicts a hypothetical Virtual Egyptian Temple (Jacobson and Holden 2005; Troche and Jacobson 2010), which has no real-world equivalent. The temple embodies all of the key features of a typical New Kingdom period Egyptian temple in a manner that an untrained audience can understand. This Ptolemaic temple is divided into four major areas, each one of which houses an instance of the High Priest, a pedagogical agent. Each area of this virtual environment represents a different feature from the architecture of that era.

The objective of the game ‘Gates of Horus’ (Jacobson et al. 2009) is to explore the model and gather enough information to answer the questions asked by the priest (pedagogical agent). The game engine that this system is based on is the Unreal Engine 2 (Fig. 3) (Jacobson and Lewis 2005), existing both as an Unreal Tournament 2004 game modification (Wallis 2007) for use at home, as well as in the form of a Cave Automatic Virtual Environment (CAVE Cruz-Neira et al. 1992) system in a real museum.

Fig. 3
figure 3

New Kingdom Egyptian temple game

2.2.2.2 The Ancient Olympic Games

The Foundation of the Hellenic World has produced a number of gaming applications associated with the Olympic Games in ancient Greece (Gaitatzes et al. 2004). For example, the ‘Olympic Pottery Puzzle’ exhibit the user must re-assemble a number of ancient vases putting together pot shards. The users are presented with a colour-coded skeleton of the vessels with the different colours showing the correct position of the pieces. They then try to select one piece at a time from a heap and place it in the correct position on the vase. Another game is the ‘Feidias Workshop’, which is a highly interactive virtual experience taking place at the construction site of the 15-m-tall golden ivory statue of Zeus, one of the seven wonders of the ancient world. The visitors enter the two-storey-high workshop and come into sight of an accurate reconstruction of an unfinished version of the famous statue of Zeus and walk among the sculptor’s tools, scaffolding, benches, materials and moulds used to construct it. They take the role of the sculptor’s assistants and actively help finish the creation of the huge statue, by using virtual tools to apply the necessary materials onto the statue, process the ivory and gold plates, apply them onto the wooden supporting core and add the finishing touches. Interaction is achieved using the navigation wand of the Virtual Reality (VR) system, onto which the various virtual tools are attached. Using these tools, the user helps finish the work on the statue, learning about the procedures, materials and techniques applied for the creation of these marvellous statues. The last example is the ‘Walk through Ancient Olympia’, where the user, apart from visiting the historical site, learns about the ancient games themselves by interacting with athletes in the ancient game of pentathlon (Fig. 4). The visitors can wonder around and visit the buildings and learn their history and their function: the Heraion, the oldest monumental building of the sanctuary dedicated to the goddess Hera, the temple of Zeus, a model of a Doric peripteral temple with magnificent sculpted decoration, the Gymnasium, which was used for the training of javelin throwers, discus throwers and runners, the Palaestra, where the wrestlers, jumpers and boxers trained, the Leonidaion, which was where the official guests stayed, the Bouleuterion, where athletes, relatives and judges took a vow that they would upheld the rules of the Games, the Treasuries of various cities, where valuable offerings were kept, the Philippeion, which was dedicated by Philip II, king of Macedonia, after his victory in the battle of Chaeronea in 338 BC and the Stadium, where most of the events took place. Instead of just observing the games, the visitors take place in them. They can pick up the discus or the javelin and they try their abilities in throwing them towards the far end of the stadium. Excited about the interaction they ask when they will be able to interact with the wrestler one on one. A role-playing model of interaction with alternating roles was tried here with pretty good success as the visitors truly immersed in the environment, wishing they could participate in more games (Gaitatzes et al. 2004).

Fig. 4
figure 4

Walk through Ancient Olympia (Gaitatzes et al. 2004)

2.2.2.3 Virtual Priory Undercroft

Located in the heart of Coventry, UK, the Priory Undercrofts are the remains of Coventry’s original Benedictine monastery, dissolved by Henry VIII. Although archaeologists revealed the architectural structure of the cathedral, the current site is not easily accessible for the public. Virtual Priory Undercroft offers a virtual exploration of the site in both online and offline configurations.

Furthermore, a first version of a serious game (Fig. 5) has been developed at Coventry University, using the Object-Oriented Graphics Rendering Engine (OGRE) (Wright and Madey 2008). The motivation is to raise the interest of children in the museum, as well as cultural heritage in general. The aim of the game is to solve a puzzle by collecting medieval objects that used to be located in and around the Priory Undercroft. Each time a new object is found, the user is prompted to answer a question related to the history of the site. A typical user-interaction might take the form of: “What did St. George slay?–Hint: It is a mythical creature. –Answer: The Dragon”, meaning that the user then has to find the Dragon.

Fig. 5
figure 5

Priory Undercroft—a serious game

2.2.3 Commercial historical games

Commercial games with a cultural heritage theme are usually of the ‘documentary game’ (Burton 2005) genre that depict real historical events (frequently wars and battles), which the human player can then partake in. These are games that were primarily created for entertainment, but their historical accuracy allows them to be used in educational settings as well.

2.2.3.1 History Line: 1914–1918

An early representative of this type of game was History Line: 1914–1918 (Blue Byte 1992), an early turn-based strategy game depicting the events of the First World War The game was realised using the technology of the more prominent game Battle Isle, providing players with a 2D top-down view of the game world, divided into hexagons that could be occupied by military units, with the gameplay very much resembling traditional board games.

The game’s historical context was introduced in a long (animated) introduction, depicting the geo-political situation of the period and the events leading up to the outbreak of war in 1914. In between battles the player is provided with additional information on concurrent events that shaped the course of the conflict, which is illustrated with animations and newspaper clippings from the period.

2.2.3.2 Great Battles of Rome

More recently, a similar approach was used by the History Channel’s Great Battles of Rome (Slitherine Strategies 2007), another ‘documentary game’, which mixes interactive 3D real-time tactical simulation of actual battles with documentary information (Fig. 6), including footage originally produced for TV documentaries, which places the battles in their historical context.

Fig. 6
figure 6

Great Battles of Rome

2.2.3.3 Total War

The most successful representatives of this type of historical game are the games of the Creative Assembly’s Total War series, which provide a gameplay combination of turn-based strategy (for global events) and real-time tactics (for battles). Here, a historical setting is enriched with information about important events and developments that occurred during the timeframe experienced by the player. While the free-form campaigns allow the game’s players to change the course of history, the games also include several independent battle-scenarios with historical background information that depict real events and allow players to partake in moments of historical significance.

The use of up-to-date games technology for rendering, as well as the use of highly detailed game assets that are reasonably true to the historical context, enables a fairly realistic depiction of history. As a result, games from the Total War series have been used to great effect in the visualisation of armed conflicts in historical programmes produced for TV (Waring 2007).

The latest titles in the series, ‘Empire: Total War’ (released in 2009), depicting events from the start of the eighteenth century to the middle of the nineteenth century, and ‘Napoleon: Total War’ (released in 2010), depicting European history during the Napoleonic Wars, make use of some of the latest developments in computer games technology (Fig. 7). The games’ renderer is scalable to support different types of hardware, including older systems, especially older graphics cards (supporting the programmable Shader Model 2), but the highest visual fidelity is only achieved on recent systems (Shader Model 3 graphics hardware) (Gardner 2009). If the hardware allows for this, shadows for added realism in the virtual world are generated using Screen Space Ambient Occlusion (Mittring 2007; Bavoil and Sainz 2008), making use of existing depth-buffer information in rendered frames. Furthermore the virtual world of the game is provided with realistic vegetation generated by the popular middleware system SpeedTree (Interactive Data Visualization, Inc.), which “features realistic tree models and proves to be able to visualise literally thousands of trees in real-time” (Fritsch and Kada 2004). As a result, the human player is immersed in the historical setting, allowing the player to re-live history.

Fig. 7
figure 7

Reliving the battle of Brandywine Creek (McGuire 2006) in ‘Empire: Total War’

3 The technology of cultural heritage serious games

Modern interactive virtual environments are usually implemented using game engines, which provide the core technology for the creation and control of the virtual world. A game engine is an open, extendable software system on which a computer game or a similar application can be built. It provides the generic infrastructure for game creation (Zyda 2005), i.e. I/O (input/output) and resource/asset management facilities. The possible components of game engines include, but are not limited to the following: rendering engine, audio engine, physics engine, animation engine.

3.1 Virtual world system infrastructure

The shape that the infrastructure for a virtual environment takes is dictated by a number of components, defined by function rather than organisation, the exact selection of which determines the tasks that the underlying engine is suitable for. A game engine does not provide data or functions that could be associated with any game or other application of the game engine (Zerbst et al. 2003). Furthermore, a game engine is not just an API (Application Programming Interface), i.e. a set of reusable components that can be transferred between different games, but also provides a glue layer that connects its component parts. It is this glue layer that sets a game engine apart from an API, making it more than the sum of its components and sub-systems.

Modern game engines constitute complex parallel systems that compete for limited computing resources (Blow 2004). They “provide superior platforms for rendering multiple views and coordinating real and simulated scenes as well as supporting multiuser interaction” (Lewis and Jacobson 2002), employing advanced graphics techniques to create virtual environments. Anderson et al. (2008) provide a discussion of several challenges and open problems regarding game engines, which include the precise definition of the role of content creation tools in the game development process and as part of game engines, as well as the identification of links between game genres and game engine architecture, both of which play a crucial role in the process of selecting an appropriate game engine for a given project.

Frequently, the technology used for the development of virtual environments, be they games for entertainment, serious games or simulations, is limited by the development budget. Modern entertainment computer games frequently require “a multimillion-dollar budget” (Overmars 2004) that can now rival the budgets of feature film productions, a significant proportion of which will be used for asset creation (such as 3D models and animations). Some of these costs can be reduced through the use of procedural modelling techniques for the generation of assets, including terrain (Noghani et al. 2010), vegetation (Lintermann and Deussen 1999) or whole urban environments (Vanegas et al. 2009). Game developers are usually faced with the choice of developing a proprietary infrastructure, i.e. their own game engine, or to use an existing engine for their virtual world application. Commercially developed game engines are usually expensive, and while there are affordable solutions, such as the Torque game engine which is favoured by independent developers and which has been successfully used in cultural heritage applications (Leavy et al. 2007; Mateevitsi et al. 2008), these generally provide fewer features, thus potentially limiting their usefulness. If one of the project’s requirements is the use of highly realistic graphics with a high degree of visual fidelity, this usually requires a recent high-end game engine, the most successful of which usually come at a very high licensing fee.

There are alternatives, however, as several older commercially developed engines have been released under Open Source licences, such as the Quake 3 engine (id Tech 3) (Smith and Trenholme 2008; Wright and Madey 2008), making them easily accessible, and while they do not provide the features found in more recently published games, they nevertheless match the feature sets of the cheaper commercial engines. Furthermore, there exist open source game engines such as the Nebula Device (Rémond and Mallard 2003), or engine components, such as OGRE (Rémond and Mallard 2003; Wright and Madey 2008) or ODE (Open Dynamics Engine) (Macagon and Wünsche 2003), which are either commercially developed or close to commercial quality, making them a viable platform for the development of virtual worlds, although they may lack the content creation tools that are frequently packaged with larger commercial engines.

Finally, there is the possibility of taking an existing game and modifying it for one’s own purposes, which many recent games allow users to do (Wallis 2007; Smith and Trenholme 2008). This has the benefit of small up-front costs, as the only requirement is the purchase of a copy of the relevant game, combined with access to high-spec modern game engines, as well as the content development tools that they contain. Examples for this are the use of the game Civilization III for the cultural heritage game The History Game Canada (http://historycanadagame.com) or the use of the Unreal Engine 2 (Smith and Trenholme 2008) for the development of an affordable CAVE (Jacobson and Lewis 2005), which has been used successfully in cultural heritage applications (Jacobson and Holden 2005).

3.2 Virtual world user interfaces

There are different types of interface that allow users to interact with virtual worlds. These fall into several different categories, such as VR and Augmented Reality (AR), several of which are especially useful for cultural heritage applications, and which are presented in this section.

3.2.1 Mixed reality technologies

In 1994, (Milgram and Kishino 1994) tried to depict the relationship between VR and AR. To illustrate this, he introduced two new terms called Mixed Reality (MR), which is a type of VR but has a wider concept than AR, (Tamura et al. 2001) and Augmented Virtuality (AV). On the left-hand side of the Reality-Virtuality continuum, there is the representation of the real world and on the right-hand side there is the ultimate synthetic environment. MR stretches out in-between these environments, and it can be divided into two sub-categories: AR and AV (Milgram and Kishino 1994). AR expands towards the real world, and thus it is less synthetic than AV which expands towards virtual environments. To address the problem from another perspective, a further distinction has been made. This refers to all the objects that form an AR environment: real objects and virtual objects. Real objects are these, which always exist no matter what the external conditions may be. On the other hand, a virtual object depends on external factors but mimics objects of reality. Some of the most interesting characteristics that distinguish virtual objects, which include holograms and mirror images, and real objects are illustrated below (Milgram and Kishino 1994).

The most obvious difference is that a virtual object can only be viewed through a display device after it has been generated and simulated. Real-world objects that exist in essence, on the contrary, can be viewed directly and/or through a synthetic device. Another factor is the quality of viewed images that are generated using state-of-the-art technologies. Virtual information cannot be sampled directly but must be synthesised, therefore, depending on the chosen resolution, displayed objects may appear real, but their appearance does not guarantee that the objects are real. Virtual and real information may be distinguished depending on the luminosity of the location that it appears in. Images of real-world objects receive lighting information from the position at which they appear to be located while virtual objects do not necessarily, unless the virtual scene is lit exactly like the real-world location in which objects appear to be displayed. This is true for directly viewed real-world objects, as well as displayed images of indirectly viewed objects.

3.2.2 Virtual reality

Ivan Sutherland originally introduced the first Virtual Reality (VR) system in the 1960s (Sutherland 1965). Nowadays VR is moving from the research laboratories to the working environment by replacing ergonomically limited HMDs (Head-Mounted Displays) with projective displays (such as the well known CAVE and Responsive Workbench) as well as online VR communities. In a typical VR system the user’s natural sensory information is completely replaced with digital information. The user’s experience of a computer-simulated environment is called immersion. As a result, VR systems can completely immerse a user inside a synthetic environment by blocking all the signals of the real world. In addition, a VR simulated world does not always have to obey all laws of nature. In immersive VR systems, the most common problems of VR systems are of emotional and psychological nature including motion sickness, nausea, and other symptoms, which are created by the high degree of immersiveness of the users.

Moreover, internet technologies have the tremendous potential of offering virtual visitors ubiquitous access via the World Wide Web (WWW) to online virtual environments. Additionally, the increased efficiency of Internet connections (i.e. ADSL/broadband) makes it possible to transmit significant media files relating to the artefacts of virtual museum exhibitions. The most popular technology for WWW visualisation includes Web3D which offers tools such as the Virtual Reality Modeling Language (VRML–http://www.web3d.org/x3d/vrml/) and its successor X3D (http://www.web3d.org/x3d/), which can be used for the creation of an interactive virtual museum. Many cultural heritage applications based on VRML have been developed for the Web (Gatermann 2000; Paquet et al. 2001; Sinclair and Martinez 2001). Another 3D graphics format, is COLLAborative Design Activity (COLLADA – https://collada.org) which defines an open standard XML schema (http://www.w3.org/XML/Schema) for exchanging digital assets among various graphics software applications that might otherwise store their assets in incompatible formats. One of the main advantages of COLLADA is that it includes more advanced physics functionality such as collision detection and friction (which Web3D does not support).

In addition to these, there are more powerful technologies that have been used in museum environments, which include the OpenSceneGraph (OSG) high performance 3D graphics toolkit (http://www.openscenegraph.org/projects/osg) and a variety of 3D game engines. OSG is a freely available (open source) multi-platform toolkit, used by museums (Calori et al. 2005: Looser et al. 2006) to generate more powerful VR applications, especially in terms of immersion and interactivity since it supports the integration of text, video, audio and 3D scenes into a single 3D environment. An alternative to OpenSceneGraph, is OpenSG, which is an open-source scene graph system used to create real-time VR applications (http://www.opensg.org/) On the other hand, 3D game engines are also very powerful and they provide superior visualisation and physics support. Both technologies (OSG and 3D game engines), compared to VRML and X3D, can provide very realistic and immersive museum environments but they have two main drawbacks. First, they require advanced programming skills in order to design and implement custom applications. Secondly, they do not have support for mobile devices such as PDAs and third-generation mobile phones.

3.2.3 Augmented reality

The concept of AR is the opposite of the closed world of virtual spaces (Tamura et al. 1999) since users can perceive both virtual and real information. Compared to VR systems, most AR systems use more complex software approaches, usually including some form of computer vision techniques (Forsyth and Ponce 2002) for sensing the real world. The basic theoretical principle is to superimpose digital information directly into a user’s sensory perception (Feiner 2002), rather than replacing it with a completely synthetic environment as VR systems do. An interesting point is that both technologies—AR and VR—may process and display the same digital information and that they often make use of identical dedicated hardware. Although AR systems are influenced by the same factors, the amount of influence is much less than in VR since only a portion of the environment is virtual. However, there is still a lot of research to be done in AR (Azuma 1997; Azuma et al. 2001; Livingston 2005) to measure accurately its effects on humans.

The requirements related to the development of AR applications in the cultural heritage field have been well documented (Brogni et al. 1999; Liarokapis et al. 2008; Sylaiou et al. 2009). An interactive concept is the Meta-Museum visualised guide system based on AR, which tries to establish scenarios and provide a communication environment between the real world and cyberspace (Mase et al. 1996). Another AR system that could be used as an automated tour guide in museums is the automated tour guide, which superimposes audio in the world based on the location of the user (Bederson 1995). There are many ways in which archaeological sources can be used to provide a mobile AR system. Some of the wide range of related applications includes the initial collection of data to the eventual dissemination of information (Ryan 2000). MARVINS is an AR assembly, initially designed for mobile applications and can provide orientation and navigation possibilities in areas, such as science museums, art museums and other historical or cultural sites. Augmented information like video, audio and text is relayed from a server via the transmitter-receiver to a head-mounted display (Sanwal et al. 2000).

In addition, a number of EU projects have been undertaken in the field of virtual heritage. The SHAPE project (Hall et al. 2001) combined AR and archaeology to enhance the interaction of persons in public places like galleries and museums by educating visitors about artefacts and their history. The 3DMURALE project (Cosmas et al. 2001) developed 3D multimedia tools to record, reconstruct, encode and visualise archaeological ruins in virtual reality using as a test case the ancient city of Sagalassos in Turkey. The Ename 974 project (Pletinckx et al. 2000) developed a non-intrusive interpretation system to convert archaeological sites into open-air museums, called TimeScope-1 based on 3D computer technology originally developed by IBM, called TimeFrame. ARCHEOGUIDE (Stricker et al. 2001) provides an interactive AR guide for the visualisation of archaeological sites based on mobile computing, networking and 3D visualisation providing the users with a multi-modal interaction user interface. A similar project is LIFEPLUS (Papagiannakis et al. 2002), which explores the potential of AR so that users can experience a high degree of realistic interactive immersion by allowing the rendering of realistic 3D simulations of virtual flora and fauna (humans, animals and plants) in real-time.

AR technologies can be combined with existing game engine subsystems to create AR game engines (Lugrin and Cavazza 2010) for the development of AR games. AR has ben applied successfully to gaming in cultural heritage. One of the earliest examples is the Virtual Showcase (Bimber et al. 2001) which is an AR display device that has the same form factor as a real showcase traditionally used for museum exhibits and can be used for gaming. The potentials of AR interfaces in museum environments and other cultural heritage institutions (Liarokapis 2007) as well as outdoor heritage sites (Vlahakis et al. 2002) have been also briefly explored for potential educational applications. A more specific gaming example are the MAGIC and TROC systems (Renevier et al. 2004) which were based on a study of the tasks of archaeological fieldwork, interviews and observations in Alexandria. This takes the form of a mobile game in which the players discover archaeological objects while moving.

Another cultural heritage AR application is the serious game SUA that was part of the BIDAIATZERA project (Linaza et al. 2007). This project takes the form of a play which recreates the 1813 battle between the English and the French in San Sebastian. Researchers developed an interactive system based on AR and VR technologies for recreational and educational applications with tourist, cultural and socio-economical contents, the prototype for which was presented at the Museo del Monte Urgull in San Sebastian.

3.3 Advanced rendering techniques

One of the most important elements of the creation of interactive virtual environments is the visual representation of these environments. Although serious games have design goals that are different from those of pure entertainment video games, they can still make use of the wide variety of graphical features and effects that have been developed in recent years. The state-of-the-art in this subject area is broad and, at times, it can be difficult to specify exactly where the ‘cutting edge’ of the development of an effect lies. A number of the techniques that are currently in use were originally developed for offline applications and have only recently become adopted for use in real-time applications through improvements in efficiency or hardware. Here, the ‘state-of-the-art’ for real-time lags several years behind that for offline—good examples of this would be raytracing or global illumination, which we shall briefly examine. A number of effects, however, are developed specifically for immediate deployment on current hardware and can make use of specific hardware features—these are often written by hardware providers themselves to demonstrate their use or, of course, by game developers. Other real-time graphical features and effects can be considered to follow a development cycle, where initially they are proven in concept demonstrations or prototypes, but are too computationally expensive to implement in a full application or game. Over time these techniques may then be progressively optimised for speed, or held back until the development of faster hardware allows their use in computer games.

The primary reason for the proliferation of real-time graphics effects has been due to advances in low-cost graphics hardware that can be used in standard PCs or games consoles. Modern graphics processing units (GPUs) are extremely powerful parallel processors and the graphics pipeline is becoming increasingly flexible. Through the use of programmable shaders, which are small programs that define and direct part of the rendering process, a wide variety of graphical effects are now possible for inclusion in games and virtual environments, while there also exist a range of effects that are currently possible but still too expensive for practical use beyond anything but the display of simple scenes.

The graphics pipeline used by modern graphics hardware renders geometry using rasterisation, where an object is drawn as triangles which undergo viewing transformations before they are converted directly into pixels. In contrast, ray-tracing generates a pixel by firing a corresponding ray into the scene and sampling whatever it may hit. While the former is generally faster, especially using the hardware acceleration on modern graphics cards, it is easier to achieve effects such as reflections using ray-tracing. Although the flexibility of modern GPUs can allow ray-tracing (Purcell et al. 2002) in real-time (Horn et al. 2007; Shirley 2006), as well as fast ray-tracing now becoming possible on processors used in games consoles (Benthin et al. 2006), rasterisation is currently still the standard technique for computer games.

Although the modern graphics pipeline is designed and optimised to rasterise polygonal geometry, it should be noted that other types of geometry exist. Surfaces may be defined using a mathematical representation, while volumes may be defined using ‘3D textures’ of voxels or, again, using a mathematical formula (Engel et al. 2006). The visualisation of volumetric ‘objects’, which are usually semi-opaque, is a common problem that includes features such as smoke, fog and clouds. A wide variety of options exist for rendering volumes (Engel et al. 2006; Cerezo et al. 2005), although these are generally very computationally expensive and it is common to emulate a volumetric effect using simpler methods. This often involves drawing one or more rectangular polygons to which a four-channel texture has been applied (where the fourth, alpha, channel represents transparency)—for example a cloud element or wisp of smoke. These may be aligned to always face the viewer as billboards (Akenine-Möller et al. 2008), a common game technique with a variety of uses (Watt and Policarpo 2005), or a series of these may be used to slice through a full volume at regular intervals. An alternative method for rendering full volumes is ray-marching, where a volume is sampled at regular intervals along a viewing ray, which can now be implemented in a shader (Crassin et al. 2009), or on processors that are now being used in games consoles (Kim and Jaja 2009).

It is sometimes required to render virtual worlds, or objects within worlds, that are so complex or detailed that they cannot fit into the graphics memory, or even the main memory, of the computer—this can be especially true when dealing with volume data. Assuming that the hardware cannot be further upgraded, a number of options exist for such rendering problems. If the scene consists of many complex objects at varying distances, it may be possible to adopt a level-of-detail approach (Engel et al. 2008) and use less complex geometry, or even impostors (Akenine-Möller et al. 2008), to approximate distant objects (Sander and Mitchell 2006). Alternatively, if only a small sub-section of the world or object is in sight at any one time, it may be possible to hold only these visible parts in memory and ‘stream’ replace them as new parts come into view, which is usually achieved by applying some form of spatial partitioning (Crassin et al. 2009). This streaming approach can also be applied to textures that are too large to fit into graphics memory (Mittring and Crytek 2008). If too much is visible at one time for this to be possible, a cluster of computers may be used, where the entire scene is often too large for a single computer to hold in memory but is able to be distributed among the cluster with the computers’ individual renders being accumulated and composited together (Humphreys et al. 2002) or each computer controlling part of a multi-screen tile display (Yin et al. 2006).

3.3.1 Post-processing effects

One important category of graphical effect stems from the ability to render to an off-screen buffer, or even to multiple buffers simultaneously, which can then be used to form a feedback loop. A polygon may then be drawn (either to additional buffers or to the visible framebuffer) with the previously rendered texture(s) made available to the shader. This shader can then perform a variety of ‘post-processing’ effects.

Modern engines frequently include a selection of such effects (Feis 2007), which can include more traditional image processing, such as colour transformations (Burkersroda 2005; Bjorke 2004), glow (James and O’Rorke 2004), or edge-enhancement (Nienhaus and Döllner 2003), as well as techniques that require additional scene information such as depth of field (Gillham 2007; Zhou et al. 2007), motion blur (Rosado 2008) and others which will be mentioned in specific sections later.

The extreme of this type of technique is deferred shading, where the entire lighting calculations are performed as a ‘post-process’. Here, the scene geometry is rendered into a set of intermediate buffers, collectively called the G-buffer, and the final shading process is performed in image-space using the data from those buffers (Koonce 2008).

3.3.2 Transparency, reflection and refraction

The modern real-time graphics pipeline does not deal with the visual representation of transparency, reflection or refraction and their emulation must be dealt with using special cases or tricks. Traditionally, transparency has been emulated using alpha blending (Akenine-Möller et al. 2008), a compositing technique where a ‘transparent pixel’ is combined with the framebuffer according to its fourth colour component (alpha). The primary difficulty with this technique is that the results are order dependent, which requires the scene geometry to be sorted by depth before it is drawn and transparency can also present issues when using deferred shading (Filion and McNaughton 2008). A number of order-independent transparency techniques have been developed, however, such as depth-peeling (Everitt 2001; Nagy and Klein 2003).

Mirrored background reflections may be achieved using an environment map (Blinn and Newell 1976; Watt and Policarpo 2005), which can be a simple but effective method of reflecting a static scene. If the scene is more dynamic, but relatively fast to render, reflections on a flat surface may be achieved by drawing the reflective surface as transparent and mirroring the entire scene geometry about the reflection surface, drawing the mirrored geometry behind it (Fig. 8) or, for more complex scenes, using reduced geometry methods such as impostors (Tatarchuk and Isidoro 2006). Alternatively, six cameras can be used to produce a dynamic environment map (Blythe 2006). Alternative methods have also been developed to address the lack of parallax, i.e. apparent motion offsets due to objects at different distances, which are missing in a fixed environment map (Yu et al. 2005).

Fig. 8
figure 8

Achieving a mirror effect by rendering the geometry twice (Anderson and McLoughlin 2007)

Perhaps surprisingly on first note, simple refraction effects can be achieved using very similar techniques to those used for reflection. The only differences are that the sample ray direction points inside the object and that it is bent due to the difference in refractive indices of the two materials, in accordance with Snell’s Law (Akenine-Möller et al. 2008). Thus, environment mapping can be used for simple refractions in a static scene, which may be expanded to include chromatic dispersion (Fernando and Kilgard 2003). In some cases, refraction may also be achieved as a post-processing effect (Wyman 2007).

3.3.3 Surface detail

The simplest method of adding apparent detail to a surface, without requiring additional geometry, is texture mapping. The advent of pixel shaders means that textures can now be used in more diverse ways to emulate surface detail (Rost 2006; Watt and Policarpo 2005; Akenine-Möller et al. 2008).

A variety of techniques exist for adding apparent high-resolution bump detail to a low-resolution mesh. In normal mapping (Blinn 1978) the texture map stores surface normals, which can then be used for lighting calculations. Parallax mapping (Kaneko et al. 2001) uses a surface height map and the camera direction to determine an offset for texture lookups. Relief texture mapping (Oliveira et al. 2000; Watt and Policarpo 2005) is a related technique which performs a more robust ray-tracing of the height map and can provide better quality results at the cost of performance.

3.3.4 Lighting

The old fixed-function graphics pipeline supported a per-vertex Gouraud lighting model [OpenGL ARB], but programmable shaders now allow the developer to implement their own lighting model (Rost 2006; Hoffman 2006). In general, though, the fixed-function lighting equation is split into: a diffuse component, where direct lighting is assumed to be scattered by micro-facets on the surface; a specular component, which appears as a highlight and is dependent on the angle between the viewer and the light; and an ambient component, which is an indirect ‘background’ lighting component due to light that has bounced off other objects in the scene (Akenine-Möller et al. 2008).

3.3.4.1 Shadows

Although the graphics pipeline did not originally support shadows, it does now provide hardware acceleration for texture samples of a basic shadow map (Akenine-Möller et al. 2008; Engel et al. 2008). However, this basic method suffers from aliasing issues, is typically low resolution and can only result in hard shadow edges. Except in certain conditions, the majority of shadows in the real world exhibit a soft penumbra, so there is a desire within computer graphics to achieve efficient soft shadows, for which a large number of solutions have been developed (Hasenfratz et al. 2003; Bavoil 2008). Shadowing complex objects such as volumes can also present issues, many of which have also been addressed (Lokovic and Veach 2000; Hadwiger et al. 2006; Ropinski et al. 2008).

3.3.4.2 High-Dynamic Range Lighting

HDR Lighting is a technique that has become very popular in modern games (Sherrod 2006; Engel et al. 2008). It stems from the fact that real world luminance has a very high dynamic range, which means that bright surface patches are several orders of magnitude brighter than dark surface patches—for example, the sun at noon “may be 100 million times brighter than starlight” (Reinhard et al. 2006). In general, this means that the 8-bit integers traditionally used in each component of the RGB triplet of pixels in the framebuffer, are woefully inadequate for representing real luminance ranges. Thankfully, modern hardware now allows a greater precision in data types, so that calculations may be performed in 16 or even 32-bit floating-point format, although it should be noted that a performance penalty usually occurs when using more precise formats.

One of the most striking visual effects associated with HDR lighting is bloom, where extremely bright patches appear to glow. Practically, this is usually applied as a post-process effect in a similar way to a glow effect, where bright patches are drawn into a separate buffer which is blurred and then combined with the original image (Kawase 2004; Kawase 2003). This can also be applied to low-dynamic range images, to make them appear HDR (Sousa 2005).

Modern displays still use the traditional 8-bit per colour component format (with a few exceptions (Seetzen et al. 2004)), so the HDR floating point results must be converted, which is the process of tonemapping (Reinhard et al. 2006). Some tonemapping methods allow the specification of a brightness, or exposure value as taken from a physical camera analogy. In an environment where the brightness is likely to change dramatically this exposure should be automatically adjusted—much like a real camera does today. Various methods are available to achieve this, such as by downsampling the entire image to obtain the average brightness (Kawase 2004), or by asynchronous queries to build a basic histogram of the brightness level to determine the required exposure (McTaggart et al. 2006; Sheuermann and Hensley 2007).

3.3.4.3 Indirect lighting: global illumination

Incident light on a surface can originate either directly from a light source, or indirectly from light reflected by another surface. Global illumination techniques account for both of these sources of light, although in such methods it is the indirect lighting component that is usually of most interest and the most difficult to achieve. The main difficulty is that in order to render a surface patch, the light that is reflected by all other surface patches in the scene must be known. This interdependence can be costly to compute, especially for dynamic scenes, and although indirect lighting accounts for a high proportion of real world illumination, the computational cost of simulating its effects has resulted in very limited use within real-time applications (Dutr et al. 2003).

The simplest inclusion of indirect lighting is through pre-computed and baked texture maps, which can store anything from direct shadows or ambient occlusion results to those from radiosity or photon mapping (Mittring 2007). However, this technique is only viable for completely static objects within a static scene. Another simple global illumination technique, which is commonly associated with HDR lighting, is image-based lighting (Reinhard et al. 2006). Here, an environment map stores both direct and indirect illumination as a simple HDR image, which is then used to light objects in the scene. The image may be captured from a real-world location, drawn by an artist as an art asset or generated in a pre-processing stage by sampling the virtual environment. Multiple samples can then be used to light a dynamic character as it moves through the (static) environment (Mitchell et al. 2006). Although the results can be very effective, image-based lighting cannot deal with fully dynamic scenes without having to recompute the environment maps, which may be costly.

Fully dynamic global illumination techniques generally work on reduced or abstracted geometry, such as using discs to approximate the geometry around each vertex for ambient occlusion (Shanmugam and Arikan 2007; Hoberock and Jia 2008). It is also possible to perform some operations as a post-process, such as ambient occlusion (Mittring 2007) and even approximations for single-bounce indirect lighting (Ritschel et al. 2009). The general-purpose use of the GPU has also allowed for radiosity at near real-time for very small scenes (Coombe and Harris 2005) and fast, but not yet real-time, photon mapping (Purcell et al. 2003). The latter technique can also be used to simulate caustics, which are bright patches due to convergent rays from a refractive object, in real-time on the GPU (Krüger et al. 2006), although other techniques for specifically rendering caustics are also possible (Wand and Straßer 2003), including as an image-space post-process effect (Wyman 2007), or by applying the ’Caustic Cones’ that utilise an intensity map generated from real photographic images (Kider et al. 2009).

3.4 Artificial intelligence

Another important aspect of the creation of populated virtual environments as used in cultural heritage applications is the creation of intelligent behaviour for the inhabitants of the virtual world, which is achieved using artificial intelligence (AI) techniques.

It is important to understand that when we refer to the AI of virtual entities in virtual environments, that which we refer to is not truly AI—at least not in the conventional sense (McCarthy 2007) of the term. The techniques applied to virtual worlds, such as computer games, are usually a mixture of AI related methods whose main concern is the creation of a believable illusion of intelligence (Scott 2002), i.e. the behaviour of virtual entities only needs to be believable to convey the presence of intelligence and to immerse the human participant in the virtual world.

The main requirement for creating the illusion of intelligence is perception management, i.e. the organisation and evaluation of incoming data from the AI entity’s environment. This perception management mostly takes the form of acting upon sensor information but also includes communication between or coordination of AI entities in environments which are inhabited by multiple entities which may have to act co-operatively. The tasks which need to be solved in most modern virtual world applications such as computer games and to which the intelligent actions of the AI entities are usually restricted to (by convention rather than technology) are (Anderson 2003):

  • decision making

  • path finding (planning)

  • steering (motion control)

The exact range of problems that AI entities within a computer game have to solve depends on the context in which they exists and the virtual environment in which the game takes place. Combs and Ardoint (2004) state that a popular method for the implementation of game AI is the use of an ‘environment-based programming style’, i.e. the creation of the virtual game world followed by the association of AI code with the game world and the entities that exist in it. This means that the AI entity intelligence is built around and is intrinsically linked to the virtual game environment. This type of entity intelligence can be created using ‘traditional’ methods for ‘decision making’, ‘path finding’ and ‘steering’.

Of the three common AI tasks named above, ‘decision making’ most strongly implies the use of intelligence. Finite state machines (FSMs) are the most commonly used technique for implementing decision making in games (Fu and Houlette 2004). They arrange the behaviour of an AI entity in logical states—defining one state per possible behaviour—of which only one, the entity’s behaviour at that point in time, is active at any one time. In game FSMs each state is usually associated with a specific behaviour and an entity’s actions are often implemented by linking behaviours with pre-defined animation cycles for the AI entity that allow it to enact the selected behaviour (Orkin 2006). It is relatively simple to program a very stable FSM that may not be very sophisticated but that “will get the job done”. The main drawback of FSMs is that they can become very complex and hard to maintain, while on the other hand the behaviour resulting from a too simple FSM can easily become predictable. To overcome this problem sometimes hierarchical FSMs are used that break up complex states into a set of smaller ones that can be combined, allowing the creation of larger and more complex FSMs.

In recent years, there has been a move towards performing decision making using goal-directed techniques to enable the creation of nondeterministic behaviour. Dybsand describes this as a technique in which an AI entity “will execute a series of actions ... that attempt to accomplish a specific objective or goal” (Dybsand 2004). In its simplest form, goal-orientation can be implemented by determining a goal with an embedded action sequence for a given AI entity. This action sequence, the entity’s plan, will then be executed by the entity to satisfy the goal (Orkin 2004a). Solutions that allow for more diverse behaviour can improve this by selecting an appropriate plan from a pre-computed ‘plan library’ (Evans 2001) instead of using a built-in plan. More complex solutions use plans that are computed dynamically, i.e. ‘on the fly’, as is the case with Goal-Oriented Action Planning (GOAP) (Orkin 2004a). In GOAP the sequence of actions that the system needs to perform to reach its end-state or goal is generated in real-time by using a planning heuristic on a set of known values which need to exist within the AI entity’s domain knowledge. To achieve this in his implementation of GOAP, Orkin (2004b) separates the actions and goals, implicitly integrating preconditions and effects that define the planner’s search space, placing the decision making process into the domain of the planner. This can be further improved through augmenting the representation of the search space by associating costs with actions that can satisfy goals, effectively turning the AI entity’s knowledge base into a weighted graph. This then allows the use of path planning algorithms that find the shortest path within a graph as the planning algorithm for the entity’s high-level behaviour (Orkin 2006). This has the additional benefit of greater code re-use as the planning method for high-level decision making, as well as path planning is the same and can therefore be executed by the same code module (Orkin 2004b) if the representations of the search space are kept identical. The most popular path planning algorithm used in modern computer games is the A* (A-Star) algorithm (Stout 2000; Matthews 2002; Nareyek 2004), a generalisation of Dijkstra’s algorithm (1959). A* is optimal, i.e. proven to find the optimal path in a weighted graph if an optimal solution exists (Dechter and Pearl 1985), which guarantees that AI entities will find the least costly path if such a solution exists within the search space.

Challenges in game AI that are relevant to serious games include the construction of intelligent interfaces (Livingstone and Charles 2004), such as tutoring systems or virtual guides, and particularly real-time strategy game AI, part of which is concerned with the modelling of great numbers of virtual entities in large scale virtual environments. Challenges there include spatial and temporal reasoning (Buro 2004), which can be addressed through the use of potential fields (Hagelbäck and Johansson 2008).

3.4.1 Crowd simulation

The AI techniques described in the previous section are important tools with which more complex systems can be constructed. A domain of great potential relevance to cultural heritage that is derived from such techniques is the simulation of crowds of humanoid characters. If one wishes to reconstruct and visualise places and events from the past, a crowd of real-time virtual characters, if appropriately attired and behaving, can add new depths of immersion and realism to ancient building reconstructions. These characters can feature merely as a backdrop (Ciechomski et al. 2004) to add life to a reconstruction, or can assume the centre stage in more active roles, for example, as virtual tour guides to direct the spectator (DeLeon 1999). Indeed, the type of crowd or character behaviour to be simulated varies greatly with respect to the type of scenario that needs to be modelled. In this vein, (Ulicny and Thalmann 2002) model crowd behaviour of worshippers in a virtual mosque, while (Maim et al. 2007) and (Ryder et al. 2005) focus on the creation of more general pedestrian crowd behaviours, the former for populating a virtual reconstruction of a city resembling ancient Rome.

More general crowd synthesis and evaluation techniques are also directly applicable to crowd simulation in cultural heritage. A variety of different approaches have been taken, most notably the use of social force models (Helbing and Molnar 1995), path planning (Lamarche and Donikian 2004), behavioural models incorporating perception and learning (Shao and Terzopoulos 2005) sociological effects (Musse and Thalmann 1997) and hybrid models (Pelechano et al. 2007).

The study of real world corpus has also been used as a basis for synthesising crowd behaviour in approaches that do not entail the definition of explicit behaviour models. Lerner et al. (2007) manually track pedestrians from an input video containing real world behaviour examples. They use this data to construct a database of pedestrian trajectories for different situations. At runtime, the database is queried for similar situations matching those of the simulated pedestrians: the closest matching example from the database is selected as the resulting trajectory for each pedestrian and the process is repeated.

Lee et al. (2007) simulate behaviours based on aerial-view video recordings of crowds in controlled environments. A mixture of manual annotation and semi-automated tracking provides information from video about individuals’ trajectories. These are provided as inputs to an agent movement model that can create crowd behaviours of a similar nature to those observed in the original video.

Human perception of the animation of crowds and characters has been increasingly recognised as an important factor in achieving more realistic simulations. Research has been conducted regarding the perception of animation and motion of individuals (Reitsma and Pollard 2003; McDonnell et al. 2007), groups (Ennis et al. 2010a; McDonnell et al. 2009a) and crowds (Peters et al. 2008; Ennis et al. 2010b). For example, (Peters et al. 2008) examined the perceptual plausibility of pedestrian orientations and found that participants were able to consistently distinguish between those virtual scenes where the character orientations matched the orientations of the humans in the corresponding real scenes and those where the character orientations were artificially generated, according to a number of different rule types. The results of such perceptual studies can be linked to synthesis, in order to create more credible animations (McDonnell et al. 2009b).

A key factor of differentiation between crowd control methods concerns where knowledge is stored in the system. One approach is to endow knowledge separately to individual characters, an extreme example of which would create autonomous agents that have their own artificial perceptions, reasoning, memories, etc. with respect to the environment, as in (Lamarche and Donikian 2004). Another method is to place knowledge into the environment itself, to create a shared or partially shared database accessible to characters. According to this smart object methodology (Peters et al. 2003), graphical objects are tagged with behavioural information and may inform, guide or even control characters. Such an approach is applicable also to crowd simulation in urban environments. For example, navigation aids, placed inside the environment description, may be added by the designer during the construction process. These have been referred to as annotations (Doyle and Hayes-Roth 1998). The resulting environment description (Farenc et al. 1999; Thomas and Donikian 2000; Peters and O’Sullivan 2009) contains additional geometric, semantic and spatial partitioning information for informing pedestrian behaviour, thus transferring a degree of the behavioural intelligence into the environment. In (Hostetler 2002), for example, skeletal splines are defined that are aligned with walkways. These splines, called ribbons, provide explicit information for groups to use, such as the two major directions of travel on the walkway. In addition to environment annotation and mark-up, interfaces for managing the definition of crowd scenarios have also been investigated. Crowdbrush (Ulicny et al. 2004) provides an intuitive way for designers to add crowds of characters into an environment using tools analogous to those found in standard 2D painting packages. It allows designers to paint crowds and apply attributes and characteristics using a range of different tools in real-time, obtaining immediate feedback about the results.

3.4.2 Annotated entities and environments

A fairly recent method for enabling virtual entities to interact with one another as well as their surroundings is the use of annotated worlds. The mechanism for this, which we refer to using the term ‘Annotated Entities’, has been described using various names, such as ‘Smart Terrain’ (Cass 2002), ‘Smart Objects’ (Peters et al. 2003; Orkin 2006) and ‘Annotated Environment’ (Doyle 2002), all of which are generally interchangeable and mostly used with very similar meanings, although slight differences in their exact interpretation sometimes remain. A common aspect to all of the implementations that utilise this mechanism is the indirect approach to the creation of believable intelligent entities.

The idea of annotated environments is a computer application of the theory of affordance (or affordance theory) (Cornwell et al. 2003) that was originally developed in the fields of psychology and visual perception. Affordance theory states that the makeup and shape of objects contains suggestions about their usage. Affordance itself is an abstract concept, the implementation of which is greatly simplified by annotations that work like labels containing instructions which provide an explicit interpretation of affordances. Transferred into the context of a virtual world, this means that objects in the environment contain all of the information that an AI controlled entity will need to be able to use them, effectively making the environment ‘smart’.

A beneficial side effect of this use of ‘annotated’ objects (Doyle 1999) is that the complexity of the entities is neutral to the extent of the domain knowledge that is available for their use, i.e. the virtual entities themselves can not only be kept relatively simple, but they do not need to be changed at all to be able to make use of additional knowledge. This allows for the rapid development of game scenarios (Cornwell et al. 2003) and if all annotated objects use the same interface to provide knowledge to the world’s entities then there is no limit to the scalability of the system, i.e. the abilities of AI controlled entities can practically be extended indefinitely (Orkin 2002) despite a very low impact on the system’s overall performance. Furthermore, this method provides an efficient solution to the ‘anchoring problem’ (Coradeschi and Saffiotti 1999) of matching sensor data to the symbolic representation of the virtual entity’s knowledge as objects in the world themselves have the knowledge as to how other virtual entities can interact with them.

Annotations have been employed in several different types of applications in order to achieve different effects. They have proven popular for the animation of virtual actors in computer animation production, where they facilitate animation selection (Lee et al. 2006), i.e. the choice of appropriate animation sequences that fit the environment. Other uses of annotations include the storage of tactical information in the environment for war games and military simulations (Darken 2007), which is implemented as sensory annotations to direct the virtual entities’ perception of their environment. Probably the most common form of annotations found in real-time simulated virtual environments affects behaviour selection, usually in combination with animation selection (Orkin 2006), i.e. the virtual entity’s behaviour and its visual representation (animation) are directed by the annotated objects that it uses.

Virtual entities that inhabit these annotated worlds can be built utilising rule-based system based on simple FSMs in combination with a knowledge interface based on a trigger system that allows the entities to ‘use’ knowledge (instructions) for handling the annotated objects. The interaction protocol employed to facilitate the communication between entity and ‘smart’ object needs to enable the object to ‘advertise’ its features to the entities and then allow them to request from the object relevant instructions (annotations) on its usage (Macedonia 2000). The success of this technique is demonstrated by the best-selling computer game The Sims, where ‘Smart Objects’ were used for behaviour selection to great effect. Forbus and Wright (2001) state that in The Sims all game entities, objects as well as virtual characters, are implemented as scripts that are executed in their own threads within a multitasking virtual machine. A similar approach, based on a scripting language that can represent the behaviours of virtual entities, as well as the objects that the can interact with, has been presented more recently by Anderson (2008). These scripting-language based approaches are most likely to provide solutions for the creation of large scale virtual environments, such as the serious game component of the Rome Reborn project. This is the automatic generation of AI content (Nareyek 2007), which in combination with techniques such as procedural modelling of urban environments (Vanegas et al. 2009), will require the integration of the creation of complex annotations with the procedural generation of virtual worlds, automating the anchoring of virtual entities into their environment.

4 Conclusions

The success of computer games, fuelled among other factors by the great realism that can be attained using modern consumer hardware, and the key techniques of games technology that have resulted from this, have given way to new types of games, including serious games, and related application areas, such as virtual worlds, mixed reality, augmented reality and virtual reality. All of these types of application utilise core games technologies (e.g. 3D environments) as well as novel techniques derived from computer graphics, human computer interaction, computer vision and artificial intelligence, such as crowd modelling. Together these technologies have given rise to new sets of research questions, often following technologically driven approaches to increasing levels of fidelity, usability and interactivity.

Our aim has been to use this state-of-the-art report to demonstrate the potential of games technology for cultural heritage applications and serious games, to outline key problems and to indicate areas of technology where solutions for remaining challenges may be found. To illustrate that first we presented some characteristic case studies illustrating the application of methods and technologies used in cultural heritage. Next, we provided an overview of existing literature of relevance to the domain, discussed the strengths and weaknesses of the described methods and pointed out unsolved problems and challenges. It is our firm belief that we are only at the beginning of the evolution of games technology and that there will be further improvements in the quality and sophistication of computer games, giving rise to serious heritage games of greater complexity and fidelity than is now achievable.