Color retargeting: Interactive time-varying color image composition from time-lapse sequences

In this paper, we present an interactive static image composition approach, namely color retargeting, to flexibly represent time-varying color editing effect based on time-lapse video sequences. Instead of performing precise image matting or blending techniques, our approach treats the color composition as a pixel-level resampling problem. In order to both satisfy the user’s editing requirements and avoid visual artifacts, we construct a globally optimized interpolation field. This field defines from which input video frames the output pixels should be resampled. Our proposed resampling solution ensures that (i) the global color transition in the output image is as smooth as possible, (ii) the desired colors/objects specified by the user from different video frames are well preserved, and (iii) additional local color transition directions in the image space assigned by the user are also satisfied. Various examples have been shown to demonstrate that our efficient solution enables the user to easily create time-varying color image composition results.


Introduction
Time-lapse video sequences capture rich visual information in the scene, including not only temporal motion of objects but also time-varying color evolution. With the wide popularization of numerous digital cameras, currently time-lapse videos can be easily generated by domestic consumers. Creative applications on further exploiting and processing such time-lapse videos attract a large number of artists and scientific researchers.
Editing and composing the time-varying color information into a static image, as will be shown in this paper, is among time-lapse applications. There are various artistic works that go in this direction. For instance, the components of an object are taken at different time instances and combined in the final object (see Fig. 1 the Four Seasons Tree Nexus). Another example is the day to night transition in a scene [1]. However, such artistic works require very tedious interactions using existing image and video editing tools, calling for the complicated extraction of different color layers as well as their seamless composition.
In this paper, we introduce an efficient timevarying color image composition solution based on time-lapse videos. Inspired by image retargeting  works [2,3], we name our time-varying color composition as color retargeting. In contrast to image retargeting, where the image is spatially resampled in a content-dependent manner, our work is concerned with the color resampling of the spatio-temporal pixels from the input timelapse video cubes. We suppose that the color variance is continuous in the temporal direction, and the output image can be composed by copying (or weighted summing) pixels which are at the same spatial position but may be originating from different video frames. In order to match the user's requirements of both preserving specified colors/objects and satisfying some desired local color transition directions in the image space, we construct a globally optimized interpolation field for resampling of all pixels from the input time-lapse video. With the proposed interpolation field, our system achieves globally smooth color transition for all pixels, and thus effectively avoids visual artifacts.
Our approach provides users a novel but also friendly composition solution for color images. With our system the user can easily produce timevarying editing results (see Fig. 2 for an example). We note that achieving such results with other existing editing tools would need much more tedious interaction. We believe that our efficient editing framework will further motivate more interesting applications on time-lapse videos.

Related work
Our color retargeting approach draws from numerous techniques developed for image composition, image recoloring, and time-lapse video based editing. In the following, we focus on the contributions most related to the above-mentioned research domains.
Image composition. The pioneering work by Perez et al. [4] performs image composition by solving the Poisson equation. Its numerical result aims to find seamless filling with some selected content under given boundary conditions. Agarwala et al. [5] propose a well-known digital PhotoMontage framework, where both the graphcut optimization [6] and gradient-domain fusion [7] are constructed to avoid visible seams. Pritch et al. [8] also introduce a graph-cut based shift-map optimization to minimize the gradient discontinuities of all pixels in the output image. In order to efficiently achieve seamless composition between the source and the target patches, Farbman et al. [9] propose a harmonic-like interpolation scheme using mean-value coordinates. Darabi et al. [10] introduce the image melding concept to further model potential geometric and photometric transformations when blending different images with inconsistent color and texture properties. We note that both the continuous and the discrete rescaling of image resolution, such as the famous seam carving [3,11], image retargeting [2,12], and other related applications [13,14], are well investigated. In contrast, our color retargeting work focuses on continuous color rescaling for image composition.
To avoid precise segmentation and to propagate the desired color to different video frames, Levin et al. [15] formulate a color intensity optimization model for space-time neighboring pixels. Such an approach can effectively achieve the recoloring effect by interactive editing. Another edge-preserving energy minimization method [16] is introduced for locally adjusting tonal values in digital photographs. Recoloring is also applied for editing the appearance of materials in all-pairs propagation [17] and its acceleration model [18].
A variety of solutions have been proposed by further considering the feature space of pixels, such as diffusion distance for the global distribution [19], locally linear embedding (LLE) for local manifold preservation [20], and sparse control samples for the propagation influence [21]. The literature also offers other recoloring-based applications, such as color correction for multiview videos [22,23].
Time-lapse video editing. In the last decades, researchers have also been interested in means to further exploit and edit visual information from time-lapse videos. The PhotoMontage approach [5] aims to easily compose the interesting objects into a single image, while it is unapplicable for a longterm video with hundreds of frames. Sunkavalli et al. [24] focus on the modeling and editing of shadow, illumination, and reflectance components under clear-sky conditions. Another interesting application is motion-oriented composition. Bennett et al. [25] investigate the video rearrangement by frame-level resampling. Dynamic narratives [26] enable the user to interactively represent the motion in a static background. In Ref. [27], the desired content can also be resembled by 4D min-cut based optimization. After extracting the motion objects, Lu et al. [28] efficiently achieve condensing and other manipulations. From a single image, Shih et al. [29] generate the same scene at a different time of day using a database of time-lapse videos. Oppositely, Estrada et al. [30] pursuit the long-exposure effect in a static image by frame-level color resampling. Latest interesting applications of time-lapse video editing can also be seen on decomposing paintings into different layers [31] and time-lapse generation from internet photos [32]. Our proposed approach differs from such motion object based editing in the sense that we compose the scene by considering the color consistency within a time-lapse image set or a video clip.

Problem formulation
Formally, suppose there are N video frames in the input time-lapse sequence, where we denote the nth frame as I n , where 0 n < N . The composed color image is then represented as G. Accordingly, pixels in the input and output images are I n ( x) and G( x) respectively, where x is the pixel coordinate in the image space. In principle, pixels in G are resampled from those being at the same position (or co-located) in the input sequence. That is, we need to construct a mapping function to satisfy that f : The proposed color retargeting approach is detailed in the following section.

Overview
The input to our approach is a time-lapse video sequence, and the output is a single edited color image that optimally represents the time variance information guided by the user interaction. The proposed framework is illustrated in Fig. 3. Firstly, the original video frames are preprocessed with frame-level color consistency correction and background panorama based scene alignment. Secondly, in the interaction phase, the user can (1) draw some desired areas (or objects) of any input frame that should be preserved in the output image, and/or (2) specify some local time variant directions that the output color image should follow. Next, our global optimization algorithm automatically computes the smooth interpolation field to satisfy the user's expectation. Finally, according to such globally smooth interpolation field, the output image is generated by pixel-level resampling from the input time-lapse video cube.

Preprocessing
In our solution, we suppose the input time-lapse videos are captured by static cameras. To this end, the input video could be simply aligned by homography parameters to match a reconstructed background panorama [28]. We note that the timelapse videos should follow consistency of appearance in the temporal direction. That is, the colors of each frame should continuously evolve in the temporal direction. However, individual photos or frames in time-lapse sequences easily suffer from different exposure and lighting conditions. In order to ensure such color consistency in frame-level evolution, the colors of each frame are directly corrected using histogram matching, where the targeted histogram is obtained by cubic interpolation under a sliding window consideration. Note that in our work we only perform such color correction for long-term video sequences (i.e., more than 100 frames), in which the frame-level color evolution is relatively slow. Alternatively, one could employ more complicated temporal color transfer models (e.g., Ref. [33] or our approach in Ref. [34]) to further improve the color consistency.

Optimized interpolation field
As mentioned before, our system needs to compute a good mapping between the input video and the output image. For convenience, we define here an interpolation field L, whose corresponding value at each pixel is between 0 and N . By means of this interpolation field, each pixel in G receives a label indicating the input frame from which it should be duplicated (see the example in Fig. 4). Here we consider the following three constraints to construct a globally optimized interpolation field. Firstly, in order to keep smooth transition for all pixels over the output image, we force the resampling only over neighboring pixels from the same or neighboring input frames. More precisely, we formulate it as a globally smooth gradient constraint, by which the gradient is zero everywhere: ∆L = 0 (2) where ∆ is the Laplace operator.
Secondly, for the areas covered by the user's strokes on specified frames, their colors in the output image should satisfy the minimization of the following energy: where M( x) is a subset of L and represents the objects/areas to be preserved, ω a is the weighting factor. N a is the sum of the user's strokes for specifying areas or objects.
Thirdly, to satisfy local color transition directions, which are assigned by the user, we consider that in the output image the following equation should also be minimized: where N ( x) represents the desired frame number from which the output color is sampled at pixel x. Note that in N all pixels are on fitted splines of local color transition directions specified by the user. N is also a subset of L. The weighting factor ω d corresponds the optimization energy of this constraint. N d is the sum of the local transition directions.
Considering the above-mentioned three constraints, our color retargeting problem is to find the smooth interpolation field L according to two given subsets M and N . Moreover, this problem can easily be reformulated as a typical linear system, its global optimization solution is then the expected time-varying interpolation field for the color resampling of the time-lapse video.

Postprocessing
Until now, we have described our color resampling algorithm as a one-to-one direct pixel copying between the input video and the output image. As a result, we found that the output image may easily suffer from visual artifacts. This problem is mainly caused by (1) the imperfect temporal color consistency and (2) the frame-level discretization errors of the temporal color transition. Therefore, inspired by the exposure fusion processing in Ref. [30], we generate each output pixel by linear combination of multiple neighboring temporal pixels, after obtaining the optimized interpolation field L. That is, for the pixel at x position, if its resampling index in the optimized interpolation field is L( x) = n, the corresponding output color is computed as where R is the resampling radius in the temporal direction. G(a, b) denotes the discretization value at position b of the normalized Gaussian function centered over position a.

User interface
We also developed an interface to enable the user's quick interaction. Firstly, the user can directly draw strokes on any frame to select some desired areas or objects, where the content should be well preserved in the output image. Secondly, the system supports the user to address local color transition directions. To achieve this function, we employ cubic spline interpolation to fit such transition directions drawn by the user, and record pixels on this spline as the desired resampling values. After that, we perform our optimization solution to finish the color image composition. It should be noted that the user can easily refine the optimized interpolation field by further interaction, and thus the edited results can also be easily refined under our optimization framework.

Implementation details
The proposed approach was implemented in Microsoft Visual Studio C++ 2010 on a high performance laptop with 2.3 GHz Quad-Core Intel-i7 CPU and 8 GB memory. The proposed system, including the user interaction and interpolation field optimization, are performed with interactive speed, and thus the composition result can be instantly generated. For the parameter setting, we empirically choose ω a and ω d as 100, which means that the target colors in the specified areas defined by the user are given more important priorities than the globally smooth gradients constraint in the interpolation field. The temporal sampling radius is set as R = 10 in our implementation. With such parameters, the solution gives good results. We use Eigen [35] to efficiently solve the proposed linear system. In order to accelerate this processing, we first obtain the low-resolution version of the optimized interpolation field with a factor of 4 in each spatial direction, and recover it with bicubic interpolation. It ensures the real-time interactivity in our released implementation. The efficiency of our system is marginally affected with respect to the number of user strokes. One can see that in the video demo in the Electronic Supplementary Material (ESM), which is recorded under the Debug model, our system can immediately generate the satisfactory result when the user draws more strokes.

Examples
We have performed our approach on a variety of time-lapse video sequences. For instance, In Fig. 2 we present the color transition result changing from spring to winter. The left two columns are composed by the 1st, 40th, 100th, and 200th input frames. To depict the time transition in one single image, we simply draw several strokes (see the sub-figures in the third column). In particular, we aim to preserve the color of the sun in the 40th frame and the green tree in the 200th frame, but also specify several other local horizontal and vertical color transition directions. Accordingly, the proposed global optimization solution computes the smooth interpolation field (see middle column, bottom part), and the composed image is obtained as shown in the right sub-figure of Fig. 2. One can observe that both the expected sun and the green tree are well preserved, while the ground and other areas are also smoothly changed from spring to winter following the user's expectation.
In Fig. 5 we show the London example. As it can be seen from the left two columns, the lighting is continuously varying from afternoon to night. By several interactive strokes, the shining textures of the tall building, captured during the day time, are well preserved in the output image. Moreover, according to the local color transition directions draw by the user, the expected time variance (i.e., color transition) from far and near also takes place smoothly. It demonstrates that our   To further investigate the effectiveness of our resampling algorithm on extremely short video clips, we also perform our approach with only four input images for the Jungle example in Fig. 8. In this case it is obvious that the bridge, the trees, and the water are clearly different among input frames. Interestingly, various specified scene areas/objects are elegantly blended following the desired local color transition directions. This thus demonstrates that the global smooth gradient constraint in our optimization model works well to eliminate the color gaps for even extremely short videos where the recorded colors are obviously different.
Finally, more results are shown in Fig. 9 for the Jungle and Philadelphia sequences. Again, with our color retargeting approach, the user just needs to draw several strokes and the proposed system can easily produce different composition results for the same scene.

Limitations
Our color retargeting approach has several drawbacks.
Firstly, as mentioned earlier, our solution is based on the assumption that the color evolution in the input time-lapse videos is consistent with the temporal change. In other words, our color resampling approach is to represent the temporal  variance by colors chosen from the corresponding frames. Thus, our solution cannot handle the color inconsistency cases with respect to the time direction. Also, if in the original video there are too many motion objects or the background is changed too frequently, it would be difficult for our approach to generate a satisfying composition result. Moreover, our system allows the user flexibility in specifying the color of objects or local transition directions in any input video frame. However, when the user's strokes are unreasonably placed or even conflicting, the proposed algorithm may fail to perform continuous resampling. This can introduce visual artifacts in the composed image. In Fig. 10 for example, one can see the color distortion in the sky area (close to the lamp). In this case, based on the user's strokes, the resampling is performed in frames whose indices are not consecutive. Currently our Gaussian-weighting resampling is based on the target frame of the interpolation field and its neighboring temporal frames from the input video. More complicated adjustments, such as incorporating the neighboring resampling frames on the surface of the optimized interpolation field, may improve the composition effect.

Conclusions and future work
In this paper we propose an interactive color retargeting approach to efficiently compose the timevarying color transition from time-lapse videos. We formulate the color composition as a pixel-level resampling problem instead of performing image matting or blending techniques. By constructing a globally optimized interpolation field, the resampling solution not only matches the user's editing requirements of preserving specified colors and satisfying local color transitions, but also effectively avoids visual artifacts in the composed image. Examples demonstrate that our efficient solution enables the user to easily edit various time-varying color image composition results. In the future we would like to extend our solution to motion-object oriented time-varying video composition and other related applications. Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.