Encyclopedia of Color Science and Technology

2016 Edition
| Editors: Ming Ronnier Luo

Compositing and Chroma Keying

Reference work entry
DOI: https://doi.org/10.1007/978-1-4419-8071-7_173



Compositing is the act of combining two images or video sequences, producing a new image or video sequence. There are several techniques to composite two images (or video frames): alpha matte, gradient-domain blending, deep compositing, multiply, overlay and screen modes, etc.

Chroma keying is the name of one of such techniques which uses color hues (chroma) to guide the compositing process. A portion of the target image is selected (based on a color or a range of colors) and substituted by the image to be inserted. This technique is widely adopted in video editing and postproduction.

Origin of the Terms

Traditionally, both in TV and films, there have been four basic compositing techniques: matting, physical compositing, background projection, and multiple exposure.
  • Matting is currently the most widespread technique and corresponds to the general definition given above. Digital compositing relies entirely on variations of this technique: chroma keying is an example of matting.

  • Physical compositing: When capturing the background image, an object is physically introduced into the scene to be composited as foreground image. For instance, a glass shot consists of recording a scene through a transparent glass where some elements (or most of it) are painted on the glass (some background buildings, for instance). The area of the frame where the action happens is left clear.

  • Multiple exposure: One of the earliest compositing techniques ever developed, achieved by recording multiple times with the same film, but exposing a different part each time with the help of a mask over the lens. Georges Méliès, pioneer of visual effects, used it to obtain multiple copies of himself in the film the One-man band (Fig. 1).

  • Background projection: This technique, currently fallen in disuse, is based on projecting the desired video or image onto a background screen with the foreground elements (actors, objects) between the camera and this screen. The development of digital compositing and its own complexities (synchronization issues, illumination constraints) rendered this method obsolete.

Compositing and Chroma Keying, Fig. 1

One of the first examples of compositing in films. A frame from George Méliès One-man band movie (1900)


This section describes some of the standard techniques in image compositing together with the latest advances in the last decade. For further exploration of digital compositing, visual effects, and specific examples in the film industry, please refer to the work of Ron Brinkmann [1].

Chroma Keying

This method renders a range of colors in the foreground image as transparent, revealing the image below. Blue and green are the most used color in films, videogames, and TV due to their hue distance from the human skin tone. The main disadvantage of chroma keying is that it requires a lighting set with a (sometimes rather large) chroma screen which has to be as evenly illuminated as possible in order to minimize the range of color variations (noise) in the background (Fig. 2). A secondary issue is that the object (or person) to be inserted cannot have any of the hue values used for chroma keying (e.g., a green dress).
Compositing and Chroma Keying, Fig. 2

Example of chroma key compositing (green screen). The actress is overlaid with a synthetic background in real time on camera (Source: Devianart artist: AngryDogDesigns)

Green is generally preferred over blue due to the lower energy required to produce an even illumination over the chroma screen. However, sometimes blue is used whenever there is a risk of green tones appearing in the foreground layer (e.g., outdoor scenes with vegetation which produce green interreflections). In computer graphics, chroma keying is usually obtained as a function of RGB values which measures the differences from the range of colors used for chroma keying (which is analogous to finding the distance to a closed 3D surface in space color):
$$ \alpha (p)=f\left({R}_0,{G}_0,{B}_0\right)=d\left({R}_0,{R}_{ck}\right)+d\left({G}_0,{G}_{ck}\right)+d\left({B}_0,{B}_{ck}\right) $$
where d is a distance function (e.g., Euclidean distance) and both (R0, G0, B0) and (R ck , G ck , B ck ) are the red, green, and blue channel values, respectively, of pixel p in the image and the color used for chroma keying.

Alpha Blending

Compositing is based on the information stored in the alpha channel of the inserted image. Introduced for the first time in the image editor Paint3 in 1970, the alpha channel [2, 3] stores a value between 0 and 1, 0 being a pixel which is completely transparent and 1 completely opaque. This channel in a 2D image is a grayscale image by itself called matte, which can be visualized and edited. Multiple file formats support alpha-extended data (like RGBA): PNG, TIFF, TGA, SGV, and OpenEXR.

Although binary alpha masking (exclusively 0 or 1) is generally suitable for several compositing scenarios, complex visual phenomena such as transparency or translucency require a subtle gradation of alpha values (e.g., separating a dog from its background requires dealing with strands of hair which capture and scatter the environment light).

Still an open problem, researchers have developed complex methods to obtain these subtle matte masks with the minimal user input. For instance, Levin et al. [4] propose a method called spectral matting, which finds clusters of pixels with affinity properties (such as X, Y, and/or RGB distances) by relying on spectral analysis (with matte Laplacian) to evaluate automatically the quality of a matte without explicitly estimating the foreground and background colors. This method requires less user input than most approaches (four strokes on average) in order to distinguish the foreground from the background.

Blend Modes

The majority of the image compositing software includes multiple blending modes, that is, different functions to mix each pair of pixels from two overlaid images depending on their RGB values and the layering order. There are several well-known modes [5]: multiply, screen, overlay, soft light, hard light, dissolve, color dodge, addition, subtraction, darken, lighten, etc. Some of them might differ in implementation and formulation, but some of the most established and agreed modes are described here:
  • Multiply: Both pixel values are multiplied. The result tends to be darker than any of the original images: dark (black) values are preserved, while white values have no effect in the final composition. This mode is especially useful when combining black and white line drawings with color images.

  • Screen: Considered as the opposite of the previous mode. Both layers are inverted and multiplied. The final result is inverted again. It tends to produce brighter images than the original, as bright values are preserved and black pixels have no effect. If a denotes the background pixel and b the foreground pixel, the following equation shows how the screen blend mode is applied:

    $$ f\left(a,b\right)=1-\left(1-a\right)\left(1-b\right) $$
  • Overlay: A hybrid mode based on a combination of multiply and screen modes, guided by the value of the background layer. Foreground colors overlay the background while preserving its highlights and shadows:

    $$ f\left(a,b\right)=\left\{\begin{array}{l}\begin{array}{ll}2 ab\hfill & \forall\ a\le 0.5\hfill \end{array}\hfill \\ {}\begin{array}{ll}1-2\left(1-a\right)\left(1-b\right)\hfill & \forall a>0.5\hfill \end{array}\hfill \end{array}\right. $$

Gradient-Domain Compositing

Gradient-domain techniques aim to merge two images, making the boundary between them imperceptible. They rely on image gradients (differences of pixel values instead of absolute values), looking for the composite that would produce the smoothest fuse of the gradient field (see Fig. 3). These approaches were introduced by Perez et al. in 2003 [6], being now the standard in commercial software (like the healing brush tool in Adobe Photoshop).
Compositing and Chroma Keying, Fig. 3

Example of image compositing based on gradient domain and proximity matching (Photoshop). An area from the input image (left) is selected with a loose stroke (middle), from which a radius is derived to analyze gradient and color affinity. The result is shown in the right image. Some artifacts are still noticeable and additional user input might be required

The latest research advances combine gradient domain with multi-scale methods and visual transfer to produce compositions which mimic even the structural noise at different spatial levels [7].

Deep Compositing

Deep compositing techniques, in addition to the usual color and opacity channels, take into account depth information stored at each pixel along the z-axis, perpendicular to the image plane (see Fig. 4). In opposition to traditional compositing, which arranges 2D layers in 3D with a single depth value for each element, deep compositing stores a range of depth values for each object, extending the gamut of editing possibilities. For instance, if the compositing artist aims to integrate an actor with a 3D rendered column of billowing smoke, without 3D deep data in the smoke element, the actor will appear to be in front or behind the column. However, with varying density values in the Z-axis, the actor can be placed inside the smoke and show both correct occlusion and partial visibility effects.
Compositing and Chroma Keying, Fig. 4

Example of deep compositing [8]. The fruit basket in the input image (left) is extracted from the background (right image, colored in green) by using the associated depth range data (middle image)

In order to mix and compose deep images, a proper depth buffer is required. This data is available for traditional 3D rendered graphics, as each pixel has a depth value associated (Z-buffer), whereas for footage obtained from camera, reliable depth values are available only at some arbitrary specified points. This sparsity is due to the limitations of the capturing device (stereo, time-of-flight cameras, etc.). The remaining pixels are then obtained through interpolation. For instance, Richardt et al. [8] incorporate a time of time-of-flight IR camera to a consumer-level video camera in order to capture a rough depth map which is subsequently filtered to obtain a high-quality depth image by means of spatiotemporal denoising and an upsampling scheme (see Fig. 4).

Although one of the first uses was Pixar RenderMan’s deep shadow technique [9], this technology has been progressively adopted by the media industry with the upsurge of 3D cinema and TV. Nowadays, most companies like Weta or DreamWorks rely on deep compositing pipelines and tools such as Nuke (The Foundry) to create their final compositions.

Visual Transfer and Relighting

The concepts of visual transfer and relighting refer to a set of techniques that aim to transfer some visual properties from the background to the image to be inserted. Illumination (shading, shadows, and highlights) is one of the main visual factors to be considered to homogenize a composition. When real objects have to be introduced into CGI environments with known illumination, this illumination is mimicked with actual light sources in a stage. In the early years of cinema, this technique was necessary when performing background projection (e.g., when recording an actor driving a car by night, the lights from cars or street lamps reproduced in the back screen were synchronized with actual lamps illuminating the actor in the studio). More sophisticated systems like Lightstage [10] have been developed to capture an actor performance.

By combining high-speed cameras and structured light sources from many directions, it is possible to create a database of views of the actor under different light environments which can be interpolated to re-render under arbitrary lighting conditions for compositing in any background (see Fig. 5).
Compositing and Chroma Keying, Fig. 5

Example of image capture in a Lightstage [9]. The actor is illuminated with an even omnidirectional light in order to capture the base reflectance of his face. In further shots at high speed, the light sources are switched on and off individually to capture the interaction with each light source for future interpolation



  1. 1.
    Brinkmann, R.: The Art and Science of Digital Compositing. Morgan Kaufmann, San Francisco (1999)Google Scholar
  2. 2.
    Ray Smith A.: Image Compositing Fundamentals. Microsoft Tech Memo 4 (1995)Google Scholar
  3. 3.
    Porter, T., Duff, T.: Compositing digital images. In: SIGGRAPH’ 84: Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques, pp. 253–259 (1984)Google Scholar
  4. 4.
    Levin, A., Rav-Acha, A., Lischinski, D.: Spectral matting. IEEE Trans. Pattern. Anal. Mach. Intel. 30, 1699–1712 (2008)CrossRefGoogle Scholar
  5. 5.
    Grasso A. (ed.): SVG Compositing Specification. W3C Working Draft (2011)Google Scholar
  6. 6.
    Perez, P., Gangnet, M., Blake, A.: Poisson image editing. ACM Trans. Graph. 22(3), 313–318 (2003)CrossRefGoogle Scholar
  7. 7.
    Sunkavalli, K., Johnson, M.K., Matusik, W., Pfister, H.: Multi-scale image harmonization. ACM Trans. Graph. 125, 1–125 (2010). SIGGRAPH, 10CrossRefGoogle Scholar
  8. 8.
    Richardt, C., Stoll, C., Dodgson, N., Siedel, H.-P., Theobalt, C.: Coherent spatiotermporal filtering, upsampling, and rendering of RGBZ videos. Cmp. Graph. Forum. Proc. of Eurographics. Eurographics Association, Cagliari, Sardinia (2012)Google Scholar
  9. 9.
    Lokovic, T., Veach, E.: Deep shadow maps. In: ACM SIGGRAPH 2000, pp. 85–392. (2000)Google Scholar
  10. 10.
    Debevec, P.: Virtual cinematography: relighting through computation. IEEE Comput. 39(8), 57–65 (2006)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.GMRV Group, Universidad Rey Juan CarlosMóstoles, MadridSpain