Monday 28 May 2012

Steve, Otto and the green-screen

This post is not about MTF Mapper or lens sharpness. This story is about Steve.

And about chroma keying, commonly called "green screen" or "blue screen" effects. This technique is mostly needed by Hollywood. You cannot have an actor drive a car while having a spirited argument with another actor, all the time whilst driving in actual traffic. Nor can you quickly pop out to outer space to get a few shots of the starship Enterprise.

The typical solution is to capture the footage of your foreground objects, typically the actors or your model starship, against a uniformly coloured green background. Then you assign a transparency value (called an alpha channel) to each pixel in your image so that where the green background is showing through, you have full transparency. Hopefully the pixels that comprise the actor will have zero transparency, but you have to allow for partial transparency in the actor's hair, for example.

The problem can thus be stated as follows:
The green screen problem: Given a shot of the foreground object (actor or whatnot) against a uniform green (or blue) background, compute the correct transparency value for each pixel in the image.

Of course, you have to ensure that your foreground object does not have any parts that are close to the background colour, or these parts will become transparent.

James Blinn (and co-author A.R. Smith, in a paper titled "Blue screen matting") have shown that the green screen problem is in fact an underdetermined problem, meaning that you have more variables to determine than you measurements. In the simplest case, you know the background colour, in RGBA format, (Rb, Gb, Bb, 1.0). Note that the background is fully opaque, so its alpha is exactly 1.0. The foreground object's colour is (Rf, Gf, Bf, Af), where it is assumed that the RGBA values have already been pre-multiplied by the alpha (opacity) value. This means that the colour of a pixel in our captured image is simply

(Ri, Gi, Bi, 0.0) = (Rf, Gf, Bf, Af) + (1-Af)*(Rb, Gb, Bb, 1.0)

We can break this down to give

Ri = Rf + (1-Af)*Rb

and similarly for the other two colours. This gives us three equations with four unknown values (Rf, Gf, Bf, Af), which cannot be solved uniquely.

So, despite the fact that the maths shows you that green-screen techniques cannot work, Hollywood insists on using it! Of course, if you are willing to add some assumptions (constraints) to the above equations, you can find some unique solutions. One of these constraints is that the foreground colour is distinct from the background colour, e.g., no green actors. Even under such limiting assumptions we often find that the results are not really all that great; just think of the flying-on-the-broomstick scene from Harry Potter and the Order of the Phoenix movie.

Blinn and Smith have shown that unique solutions can be obtained by capturing the foreground object against two different backgrounds. You also have to capture an image of each background without any foreground objects. Although this is somewhat of a bother, it allows you to obtain much better results in practice. Remember that the foreground object opacity, Af, is same in the two images against different backgrounds, but that the apparent foreground object colour (Rf, Gf, Bf) will not necessarily be the same if the object is not fully opaque. Combining the equations from both images thus gives us six equations with six unknowns, which fortunately will have a unique solution, provided that the two backgrounds really are different for each pixel.

The solution suggested by Blinn and Smith goes as follows:
Let D = (Ri,1 - Ri,2, Gi,1 - Gi,2, Bi,1 - Bi,2)
where  Ri,1 denotes the red value of our pixel of interest in image 1, and Ri,2 denotes the red value of the same pixel in image 2. In other words, D is the difference between the two input images containing the foreground objects against the two different backgrounds.
Similarly, let E = (Rb,1 - Rb,2, Gb,1 - Gb,2, Bb,1 - Bb,2)
denote the difference between the two background-only images, without the foreground objects, i.e., the pair of background-only shots you have to capture as part of this method.
The alpha value for a pixel is thus given by
Af = 1 - (D · E) / (E · E)
where  (D · E) denotes the dot product between the two vectors.

And that is all there is to this method; you can reconstruct the foreground object's red component as
Rf = (Ri,1 - (1.0 - Af)*Rb,1)
and similarly for the other two colours.
Now you can recombine the foreground objects with a new background as
Rn = Rf + (1.0 - Af)*Rk
where Rk denotes the red component of the new background image. Just apply the same pattern for blue and green components, and you are done. Remember, Rf is pre-multiplied with Af, which explains why the re-blending with a new background appears in this form.

How well does this work? I called in the help of Steve and Otto to demonstrate. Here is the first shot against background 1 (click for a larger version; applies to all images in this post):
Input image 1, shot against background 1
This image contains several interesting things. Note that there is quite a bit of green in the foreground objects (duck's head, for example). The champagne flute is also a bit tricky, because we can see most of the background right through it. Otto's fur would have presented endless trouble if you tried to remove the background manually in Gimp, for example. Yes, the dog is called Otto (Steve is the lime).

And the second shot:
Input image 2, shot against background 2
Lastly, a shot of each of the backgrounds, after physically removing the foreground objects from the scene.
background 1

background 2

I pushed these four images through a quick-and-dirty implementation of Blinn and Smith's method that I cobbled together in C++ using OpenCV. Here is the resulting alpha mask produced by the program without any manual intervention:
Alpha mask produced using Blinn and Smith's method
Note how solid Steve's alpha mask is --- the fact the he's green, and one of the backgrounds happened to be green, made no difference whatsoever. Otto's interior is also completely opaque, but on the edges we see some fine details like these:

100% crop of the alpha mask near Otto's head







Notice how the fine fibres are partially transparent (top right), and how the interior gaps in the fur is also partially transparent (bottom left-ish).

The champagne flute has also been extracted quite nicely.

Now for the final test: to recombine the foreground objects with a new background. Here is the background image:
PovRay's benchmark scene should do nicely as a new background
And here is the blended result:
Recombining the extracted foreground with a new background
The new composition is essentially flawless. There are no tell-tale fringes or other signs that are commonly seen with green-screen techniques.

Here is a close-up of Otto's head after blending:
100% crop of Otto's head after blending

Of course, if you concentrate a bit, you will see something is amiss. Although the champagne flute has been composited perfectly, it is quite clear that Steve (the lime) is correctly refracted through the right-hand side of the glass, but that the PovRay logo in the background suffered no distortion on the left side of the glass. This is obviously a shortcoming of all such recomposition techniques, so I guess the only solution is avoid strongly-refractive foreground objects.

Limitations

Obviously it is somewhat inconvenient to have to shoot the subject against two different backgrounds. It will be downright impossible to use this technique with a toddler, for example. An adult might be able to hold still enough, provided you use something like a television to display your background like I did above.

This technique is fine for inanimate objects, though.

Oh yes, of course you have to keep the camera very, very still during the whole capture process. I used a sturdy tripod + mirror lock-up + IR shutter release. In theory, you could use image-to-image registration to correct for camera movement, but I have not tried it yet for this problem.

I recently saw another paper (published in the proceedings of SIGGRAPH'06) by McGuire and Matusik where they extend this technique to live video. The secret is to use a polarised background material, which will obviously present different images through different polarisations of the light. A special camera uses a prism to send the incoming light to two different sensors according to its polarisation. Although this means the technique falls into the category of "requires special tools", it is still pretty cool. And it produces perfect results, just like the method above. So why didn't they use this for the dreaded broomstick scene in Harry Potter and the Order of the Phoenix?

Notes:

No animals were harmed during the production of this article. Steve, however, did not make it. That was some good lemonade, though!

No comments:

Post a Comment