Tuesday, 1 August 2017

Image interpolation: Fighting the fade, part 1

Over the last six months or so I kept on bumping into a particularly vexing problem related to image interpolation: the contrast in an interpolated image drops to zero in the worst case of interpolating at an exact half-pixel shift relative to the original image.

Consider the case where you are trying to co-register (align) two images, such as two images captured by the same camera, but with a small translation of the camera between the two shots. If we translate the camera purely in the horizontal direction, then the shift between the two images will be h pixels, where h can be any real number. The integer part of h will not cause us any trouble for reasonable values of h, such that the two images still overlap, of course. The trouble really lies in the fractional part of h, since this forces us to interpolate pixel values from the moving image if we want it to line up correctly with the fixed image.

The worst case scenario, as mentioned above, is if the fractional part of h is exactly 0.5 pixels, since this implies that the value of a pixel in the interpolated moving image will be the mean of the two closest pixels from the original moving image. Figure 1 illustrates what such an interpolated moving image will look like for a half-pixel shift; the edges are annotated with their MTF50 values.
Figure 1: Scaled (Nearest Neighbour interpolation) view of an interpolated moving image that experienced a half-pixel shift. The numbers are the measured MTF50 values, in cycles/pixel.
Looking closely at the vertical edges of the gray square, we can see that there are some visible interpolation artifacts manifesting as overshoot and undershoot. This image was interpolated using OMOMS cubic spline interpolation [2], which is best method that I am aware of. Linear interpolation would produce much more blurring (but no overshoot/undershoot). And of course we see a marked drop in MTF50 on the vertical edges!

At any rate, the MTF curves for the edges are illustrated in Figure 2. The blue curve corresponds to the horizontal edge (i.e., the direction that experienced no interpolation), and the orange curve corresponds to the vertical edge (an 0.5-pixel horizontal shift interpolation). The green curve was obtained from another simulation where the moving image was shifted by 0.25 pixels.
Figure 2: MTF curves of interpolated moving images corresponding to fractional horizontal shifts of zero (blue), 0.25 pixels (green), and 0.5 pixels (orange).

Certainly the most striking feature of the orange curve is how the contrast drops to exactly zero at the Nyquist frequency (0.5 cycles/pixel). The smaller 0.25-pixel shift (green curve) shows a dip in contrast around Nyquist, but this would probably not be noticeable in most images.

In Figure 3 we can see that this loss of contrast around Nyquist follows a smooth progression as we approach a fractional shift of 0.5 pixels.
Figure 3: MTF curves of interpolated moving images corresponding to fractional horizontal shifts of 0.25 pixels (blue), 0.333 pixels (green), and 0.425 pixels (orange).
The conclusion from this experiment is that we really want to avoid interpolating an image with a fractional shift of 0.5 pixels (in either/both horizontal and vertical directions), since this will produce a very noticeable loss of contrast at higher frequencies, i.e., we will lose all the fine details in the interpolated image.

Radial distortion lens correction

An applied example of where this interpolation problem crops up is when we apply a radial distortion correction model to improve the geometry of images captured by a lens exhibiting some distortion (think barrel or pincushion distortion). I aim to write a more thorough article on this topic soon, but for now it suffices to say that our radial distortion correction model specifies for each pixel (x, y) in our corrected image where we have to go and sample the distorted image.

I prefer to use the division model [1], which implies that for a pixel (x, y) in the corrected image, we go and sample the pixel at
    x' = (x - xc) / (1 + k1r2 + k2r4) + xc
    r = sqrt((x - xc)2 + (y - yc)2)
and (xc, yc) denotes the centre of distortion (which could be the centre of the image, for example).
The value of y' is calculated the same way. The actual distortion correction is then simply a matter of visiting each pixel (x, y) in our undistorted image, and setting its value to the interpolated value extracted from (x', y') in the distorted image.

The important part to remember here is that the value (x', y') can assume any fractional pixel value, including the dreaded half-pixel shift.

An example of mild pincushion distortion

In order to illustrate the effects of radial distortion correction, I thought it best to start with synthetic images with known properties. Figure 4 illustrates a 100% crop near the top-left corner of the reference image, i.e., what we would have obtained if the lens did not have any distortion.
Figure 4: The pure, undistorted reference image. Note that the closely-spaced black lines blur into gray bars because of the simulated Gaussian Point Spread Function (PSF) with an MTF50 of 0.35 c/p. If you squint hard enough, you can see some traces of the original black bars. Rendered at 400% size with nearest-neighbour upscaling. (click for 100% view)

I simulated a very mild pincushion distortion with k1 = 0.025 and k2 = 0, which produces an SMIA lens distortion figure of about -1.62%. This distortion was applied to the polygon geometry, which was again rendered with a Gaussian PSF with an MTF50 of 0.35 c/p. The result is shown in Figure 5. Keep in mind that you cannot really see the pincusion distortion at this scale, since we are only looking at the top-left corner of a much larger image.
Figure 5: Similar to Figure 4, but with about 1.62% pincushion distortion applied to the polygon geometry. Rendered at 400% size with nearest-neighbour upscaling. (click for 100% view)

We can see the first signs of trouble in Figure 5: Notice how the black/white bars appear to "fade out" at regular intervals. The straight lines of Figure 4 are no longer perfectly straight, nor are they aligned with the image rows and columns. The lines thus cross from one row (or column) to the next, and the gray patches correspond to the regions where the lines fell halfway between two rows (or columns), leading to the apparent loss of contrast.

It is important to understand at this point that the fading in Figure 5 is not a processing artifact; this is exactly what would happen if you were to photograph similar thin bars that are not aligned with the image rows/columns.

Finally, we arrive at the radial distortion correction phase. Figure 6 illustrates what the corrected image would look like if we used standard cubic interpolation to resample the image.
Figure 6: The undistorted version of Figure 5. Resampling was performed using standard cubic interpolation. Rendered at 400% size with nearest-neighbour upscaling. (click for 100% view).
We see some additional fading that appears in Figure 6. If you flick between Figures 5 and 6 (after clicking for 100% view) you will notice that an extra set of fading patches appear in between the original fading patches. These extra fades are the manifestation of the phenomenon illustrated in Figure 2: the contrast drops to zero as the interpolation sample position approaches a fractional pixel offset of 0.5. The interesting thing about these additional fades is that they are not recoverable using sharpening --- once the contrast reaches zero, no amount of sharpening will be able to recover it.

A potential workaround

The aim of radial distortion correction is to remove the long range (or large scale) distortion, since the curving of supposedly straight lines (e.g., building walls) is only really visible once the distortion produces a shift of more than one pixel. Unfortunately we cannot simply ignore the fractional pixel shifts --- this would be equivalent to using nearest-neighbour interpolation, with its associated artifacts.

Perhaps we can cheat a little: what if we pushed out interpolation coordinates away from a fractional pixel shift of 0.5? Let x' be the real-valued x component of our interpolation coordinate obtained from the radial distortion correction model above. Further, let xf be the largest integer less than x' (the floor of x'). If x' - xf < 0.5, then let d = x' - xf. (We can deal with the d > 0.5 case by symmetry).

Now, if d > 0.375, we compress the value of d linearly such that 0.375 <= d' <= 0.425. We can obtain the new value of x', which we can call x", such that x" = xf + (x' - xf ) * 0.4 + 0.225. Looking back at Figure 3, we see that a fractional pixel shift of 0.425 seems to leave us with at least a little bit of contrast; this is where the magic numbers and thresholds were divined from.

Does this work? Well, Figure 7 shows the result of the above manipulation of the interpolation coordinates, followed by the same cubic interpolation method used in Figure 6.
Figure 7: The undistorted version of Figure 5. Resampling was performed using the modified interpolation coordinates followed by cubic interpolation. Rendered at 400% size with nearest-neighbour upscaling. (click for 100% view).
Careful squinting reveals that the additional fading patches observed in Figure 6 have been reduced noticeably. This looks promising. Of course, one might argue that I have just added some more aliasing to the image. Which might be the case.

Further testing will be necessary, especially on more natural looking scenes. I might be able to coax sufficient distortion from one of my lenses to perform some real-world experiments.

Further possibilities

Using the forced geometric error method proposed above, we can now extract at least some contrast at the frequencies near Nyquist. We also know what the fractional pixel shift was in both x and y, so we know what the worst-case loss-of-contrast would be. By combining these two bits of information we can sharpen the image adaptively, where the sharpening strength is adjusted according to the expected loss of contrast.

Stay tuned for part two, where I plan to investigate this further.


  1. Fitzgibbon, A.W.: Simultaneous linear estimation of multiple view geometry and lens distortion. In: Proc. IEEE International Conference on Computer Vision and Pattern Recognition, pp. 125–132 (2001).
  2. Thevenaz, P., Blu T. and Unser, M.: Interpolation revisited, IEEE Transactions on medical imaging, 19(7), pp. 39–758, 2000.

Tuesday, 18 July 2017

Windows binaries are now 64-bit

I figured that by 2017 most Windows users will probably be running a 64-bit version of Windows, so it should be reasonably safe to switch to distributing 64-bit binaries from version 0.6 onward.

The practical benefit of this move is that larger images can now be processed safely; late in the 0.5 series of MTF Mapper you could cause it to crash by feeding it a 100 MP image. While it is possible to rework some of MTF Mapper's code to use substantially less memory (e.g., some of the algorithms can be run in a sliding-window fashion rather than whole-image-at-a-time, and I could add some on-the-fly compression in other places), it just seemed like much less work to switch to 64-bit Windows binaries.

That being said, if there is sufficient demand, I am willing to build 32-bit binaries occasionally.

There are quite a few new things in the 0.6 series of MTF Mapper (check the Settings dialog of the GUI):

  1. Fully automatic radial distortion correction using only one image (preferably of an MTF Mapper test chart, but anything with black trapezoidal targets on a white background will work). Enabling this feature slows down the processing quite a bit, so I do not recommend using this by default. More on this in an upcoming blog article.
  2. Correction of equiangular (f-theta) fisheye lens images.
  3. Correction of stereographic fisheye lens images.
I plan on posting an article or two on these new features, so stay tuned!

Wednesday, 3 May 2017

New --single-roi input mode

My original vision was for MTF Mapper to be fully automated; all you had to do was provide it with an image of one of the MTF Mapper test charts. The implementation was centered on the idea that detecting a dark, roughly rectangular target on a white background was a much more tractable problem than detecting arbitrary edges (hopefully representing slanted edges) in arbitrary input images. Figure 1 illustrates what a suitable MTF Mapper input image looks like.
Figure 1: A single target (black rectangle) on a white background. MTF mapper can detect any number of such shapes in your input image; the target objects need not be perfectly rectangular either, as some deviation from perfect 90-degree corners is allowed.
This approach did pay off, and still does, allowing users to design their own test charts that just work with MTF Mapper without requiring specific support for each custom test chart design.

As it turns out, many users have a very different workflow which does not allow them to specify their own chart. Examples of this include Jack Hogan's analysis of the DP Review test chart images, or Jim Kasson's razor-blade focus rail experiments. This type of workflow produces a rectangular Region Of Interest (ROI) that contains only a single edge. Figure 2 illustrates what a typical input image from this use case looks like.
Figure 2: A rectangular ROI containing a single slanted edge.
In the past, MTF Mapper could only process images that look like Figure 2 by specifying the -b option, which would add a white border around the image, thereby transforming it to look more like the expected input convention illustrated in Figure 1. This was a bit of a hack, and has some severe drawbacks. The most prominent disadvantage of the -b option is that the automatic dark target detection code in MTF Mapper could fail to detect the target if the edge contrast was poor, or if the edge was extremely blurry. Fussing with the detection threshold (-t option) sometimes helped, but this just highlighted the fact that the -b option was a hack.

From MTF Mapper version 0.5.21 onwards, there is a new option, --single-roi, which is intended to replace the use of the -b option when the input images look like Figure 2. 
The --single-roi input mode completely bypasses the automatic thresholding and target detection code, and instead assumes that the input image contains only a single edge. The ROI does not have to be centered perfectly on the edge, but I recommend that your ROI must include at least 30 pixels on each side of the edge. MTF Mapper will automatically restrict the analysis to the region of the image that falls within a distance of 28 pixels from the actual edge, so it does not hurt to have a few extra pixels on the sides of the edge (meaning the left and/or right side of an edge oriented as shown in Figure 2).

A typical invocation would look like this:
mtf_mapper.exe --single-roi -q image.png output_dir
which would produce two files (edge_mtf_values.txt, edge_sfr_values.txt) in output_dir. The second and third columns of edge_mtf_values.txt give you the image coordinates of the centre of the detected edge (not really that useful in combination with --single-roi), and the fourth column gives you the measured MTF50 value. To learn the mysteries of the format of the edge_sfr_values.txt file you must first signal the secret MTF Mapper handshake.

Note that it is also possible to use the --single-roi mode in conjunction with the MTF Mapper GUI, provided that your images have already been cropped to look like Figure 2. Just add the string "--single-roi" to the "Arguments" field of the Settings dialog; now you can view the SFR curve of your edge as described in this post

You can still use the --bayer red option with the --single-roi option to process only the red channel (for example) from an un-demosaiced Bayer image, such as produced by dcraw -4 -d;
just be careful that your ROI is cropped such that the starting row/column of the Bayer pattern is RGGB (the only format currently supported by MTF Mapper).

Saturday, 15 April 2017

View MTF (SFR) curves in the GUI

The Easter bunny has delivered. That most elusive of egg-laying mammals has brought you a new GUI feature, which finally completes the feature set I originally envisioned for MTF Mapper.

To visualize the MTF curve (or SFR curve, if you prefer), load up any suitable image in MTF Mapper using the menu option "File -> Open". Select the "Annotated image" output mode, like so:
Make sure "Annotated image" is selected in the desired output types box
You may select any of the other output types (e.g., "Grid") concurrently, except the "Focus position" output type, which is not currently compatible with the other output types.

Click the "Open" button, and wait for the outputs to start appearing in the "Data set" tree-view panel. Expand the entry in the tree-view to expose the "annotated" entry, and click on it:
Note the cyan-coloured text superimposed on the edges of your black target squares. These values are the MTF50 values of the edges, expressed in cycles per pixel.
Those cyan-coloured text labels serve two purposes: a) they tell you the MTF50 (cycles per pixel) of the edge on top of which it is drawn, and b) they are clickable (left mouse button) targets to bring up the MTF curve display for the selected edge.

A brief digression: If the text label is displayed in yellow (rather than cyan), then it indicates that MTF Mapper has deemed the edge to be of "medium" quality. This usually means that the edge orientation is poor, that is, close to one of the critical angles that could mean the displayed MTF50 value is less accurate than the ideal. The edge will also be displayed in yellow if the edge length is sub-optimal (meaning too few pixels along the edge), which also degrades the MTF accuracy. Normally the values displayed in yellow are still usable, but be careful.

Occasionally you will see some of the MTF50 labels displayed in red, like in this example:
The leftmost red label coincides with an edge with a ~25 degree orientation, which is within 2 degrees of the critical angle at 26.565 degrees
It is possible that an edge labelled in red is still usable, but there is no way to know for certain. I would rather recommend that you try to re-align your camera so that no edges end up with red labels, or that you ignore the edges with red labels for any serious analysis.

Back to the main story: clicking on an edge label pops up the MTF curve display window,
The result of a single left-button click on an MTF50 label
I hope the plot is self-explanatory. It corresponds to the MTF curve plots found in other slanted edge tools (Imatest, QuickMTF, ImageJ slanted edge plugin, etc.), with the x-axis of the plot indicating spatial frequency in cycles per pixel, and the y-axis indicating the contrast at the indicated frequency. In the top-right corner you can see a tag "MTF50=0.087" in a shade of blue that matches the plotted curve; this indicates the MTF50 value of the edge you just clicked on, as you might have expected.

The vertical gray bar is a cursor that follows the mouse, which will read off the actual contrast value corresponding to the MTF curve at the indicated spatial frequency; the read-out of this cursor is displayed just below the plot ("frequency: 0.098 contrast: 0.403" in the example). Again, the colour of the "contrast: <xyz>" readout matches that of the plotted curve.

While the MTF curve window is open, you may left-click on any other edge in the "annotated" image to replace the contents of the MTF curve window with the data corresponding to the newly selected edge. This includes clicking on edges in any other "annotated" output available in the "Data set" tree-view.

If you would like to compare the MTF curves of two edges, select the first edge as above, but hold down <shift> while left-clicking on the second edge. This adds the second edge to the plot:
Note the addition of the green curve, and the two green text labels, corresponding to the newly added edge's MTF
The read-out below the plot tracks and displays the MTF for both curves at the current spatial frequency, making it easy to read off accurate values for comparison purposes. Note that you can again select the second edge from any other "annoted" output in the "Data set" tree-view, making it easy to compare curves from different lenses or cameras.

Lastly, you can add a third curve to the plot using the same <shitf>+left click method.
Adding a third curve behaves as expected

If you already have three curves plotted, the last curve is replaced by this action. If you left click on a new edge without holding down shift, the plot reverts back to displaying only a single MTF curve (using the newly selected edge's MTF).

Update (2017/04/16)

I have since added the two save buttons, grab a copy of version 0.5.19 or later to give it a try.

Future improvements

It might be useful to drop a pin, or some other visual marker onto the "annotated" image to indicate which edge was selected. It might be even more useful to show the actual ROI used by MTF Mapper, but that information is not currently available in the outputs.

Let me know if you have any other suggestions for useful features to add to the MTF curve display function.


This feature is available from MTF Mapper 0.5.18 onwards, available from SourceForge.

Tuesday, 4 April 2017

Focus peak measurement with MTF Mapper: Description and Validation

It is a truth universally acknowledged that a single man in possession of a large aperture lens must capture images with as shallow a depth of field as he can manage. (Jane, please forgive me ...).

All kidding aside, the downside to employing a shallow depth of field is the way in which it accentuates even the smallest focus error. By focus error I mean that the apparent position of the focus plane is not where the photographer intended. And of course there is no such thing as a focus plane, since in reality it is a curved surface, but for convenience I will use the term focus plane here.

Even if we accept the convenient notion of a focus plane, we still have not really explained clearly what a focus plane is. One way of describing the focus plane would be to say that it is the distance at which the circle of confusion is minimized (as projected onto the image sensor). Personally, I am not a fan of using the circle of confusion to measure focus (or defocus, to be more precise), mostly because it is hard to measure the circle of confusion. The other difficulty with the notion of the circle of confusion is that it conjures up the image of these perfect little circular discs being formed on the image sensor, which is a rather crude simplification that does not take into account the actual point spread function (PSF) of the imaging system.

A much more convenient (to me, at least) way of defining a focus plane is to do so in terms of MTF, since this explicitly acknowledges the full PSF. This idea has been proposed recently by Jim Kasson (example from his blogsample discussion from DPR forum), using MTF50 as the final criterion. It only takes a little bit of thought to see that circle of confusion diameter and MTF50 are both approximations of the degree of sharpness of an image; note that using MTF50 might discard some of the useful information that we could extract from the full MTF curve, but it is convenient to have only a single value to express our measure of "sharpness" (I am deliberately avoiding the term "resolution" here, since the slanted edge method measures MTF, not resolution).

We can plot the "sharpness" measure of our choice (MTF50) as a function of distance from the camera to produce a curve like this one:
Figure 1: An example of MTF50 at a fixed position on a hypothetical image sensor, plotted as a function of the distance between the sensor and the slanted edge target (it happens to be a 50 mm f/1.8 lens at f/2.8)

Figure 1 illustrates the MTF50 that we would measure as we move our slanted edge target relative to the camera. Now that we have something to visualize, it is easy to explain what a focus plane is: the hypothetical plane that is parallel to our image sensor, located at the distance that maximizes MTF50 (e.g., the peak of the curve, as indicated by the green line in Figure 1). Similarly, we could define depth of field (DOF) as the length of the interval between the two dashed gray lines, where the dashed gray lines correspond to the distances from the camera at which MTF50 = 0.15 cycles per pixel. Note that this is an arbitrary re-definition of DOF only for illustration of the concept as it applies to the curve shown in Figure 1.

In this article, I will use the terms "focus peak distance", "focus distance", and "focus plane position" interchangeably to refer to the distance corresponding to the green line in Figure 1. As you can probably deduce from the title, this article deals with the measurement of this focus distance value using MTF Mapper.

A new test chart

The "classic" MTF Mapper test charts proved to be inadequate when it came to accurate measurement of the focus peak distance. Firstly, none of the older charts provided the required density of slanted edges to obtain a robust measurement. Secondly, the older charts did not allow MTF Mapper to convert image space (pixel) coordinates to real-world coordinates (in mm). The new chart is illustrated in Figure 2:
Figure 2: The new "focus" MTF Mapper chart type
This chart is a 45-degree slanted chart design. The camera should be pointed towards the centre of the chart; the centre of the chart falls on the dashed line,  halfway between the two central fiducials (the large black dots). The chart should be tilted at 45 degrees around the axis illustrated with the dashed (horizontal) line. Notice that the large black bars down the centre of the chart decrease in size towards the bottom of the chart --- that end of the chart should be closer to the camera (so that perspective ends up distorting the bars to have roughly the same size in the final image, although this is not critical). Figures 3 (a) and (b)  illustrate two possible chart orientations relative to the camera.
Figure 3a: One possible set-up, with the chart tilting top-to-bottom at 45 degrees
Figure 3b: An alternative set-up, with the chart tilting left-to-right at 45 degrees

The chart does not have to be positioned in a portrait orientation; landscape orientation works just fine (compare Figure 3(a) to 3(b)). The camera can also be used in either landscape or portrait orientation, as long as you can fit in most of the chart in the image. If you happen to have a sub-optimal combination of focal length, chart size and distance from the chart, then you may crop the chart a little if you must. It is important that the 45-degree tilt of the chart is around the correct axis (running through the dashed line of the chart shown in Figure 2), and that the other two axes must be close to being square.

It is critical to print the chart at the correct size (without "fit to page" scaling). MTF Mapper relies on the fact that distances on the chart are correct --- note the four "+" markers near the corners of the chart, which you can use to verify that your print came out at the right scale. Of course, if you do print with some page scaling, or you print the A3 chart on an A4 page, then MTF Mapper will still work, but the distances that it reports will no longer be accurate. Lastly, note that the fiducials are coded to allow MTF Mapper to identify the correct chart size, so if you pay close attention, you will see the differently sized charts are not just scaled copies of a base chart size.

This is a manual-focus chart only.  The camera should preferably be focused (manually) on the centre of the chart, i.e., roughly at the point halfway between the two central fiducials (black dots). The chart features around this point are not suitable for auto-focus use because there is no way to tell what part of the chart (in the general region around the centre) the camera chooses to focus on. Just to clarify: This chart should not be used to perform PDAF micro-adjust / fine tuning with.

A minor digression: Although this chart is not suitable for use with auto-focus, nothing prevents you from using a removable overlay target. You could, for example, use a second printed page containing only a large black rectangle (like the central rectangle in the MTF Mapper "perspective" chart type) to perform the auto-focus operation. Lock the focus, remove the overlay, and capture the image of this new chart. If you plan ahead, you could use fridge magnets to make the process of adding/removing the auto-focus overlay target more convenient. Or you could wait for the eventual release of a new auto-focus MTF Mapper chart I plan on introducing.

A new output type

To process images of the new "focus" chart just introduced, a new output type has been added to MTF Mapper. As of MTF Mapper version 0.5.16, this output type is not compatible with other typical output types produced by MTF Mapper, i.e., when you choose the "focus position" output type, then you should not enable any other output types (they will not produce usable output). This is a temporary inconvenience, and I aim to fix this sometime. The corresponding command-line switch for this new output type is "--focus"; it produces a file called "focus_peak.png".

An example of this output type is illustrated in Figure 4:
Figure 4: An example of the "--focus" MTF Mapper output. Note that the chart was oriented as shown in Figure 3b
The curve illustrated in Figure 1 is overlaid on top of the captured image by back-projecting the curve onto the image using the estimated camera perspective transformation, to produce the green curve in Figure 4. The "height" of this curve is simply scaled to fill the image of the chart, so the peak of the curve will always be on the midline of the black slanted edge bars.
The dark blue line illustrates the intersection of the hypothetical focus plane with the surface of the chart. In the centre of the image illustrated in Figure 4 we see an orange-ish coordinate origin marker (four outwards pointing arrows), representing the physical center of the chart. The red reticule (with its four inwards pointing arrows) indicate the centre of the captured image; together these two features provide feedback for centering the camera to the chart.

Right under the peak of the green curve we see two values reported in cyan-coloured text. The first line is the MTF50 value measured at the peak, and the second is the focus plane position relative to the centre of the chart. In other words, MTF Mapper subtracts the estimated position (distance from the camera) of the centre of the chart from the focus peak distance to compute the value displayed in this second line below the green curve.

Lastly, it may be worth reading my article on chart orientation estimation, since the underlying method of extracting the camera pose parameters is the same one used by the "--chart-orientation" output mode. If you are using the command line version of MTF Mapper, take note that you may have to specify the focal ratio of your camera + lens combination in order for the camera pose parameters to be correct. For example, a 105 mm lens mounted on a Nikon APS-C body (23.6 mm sensor width) would require the command "--focal-ratio 4.45" to improve the accuracy of the "Estimated chart distance" value reported at the bottom of the "focus_peak.png" output image. The value 4.45 is derived from (lens focal length)/(sensor width), i.e., 105/23.6 ~ 4.45. Because the "focus peak depth" value reported in the output (the -24.7 mm in Figure 4) is a relative measurement, it is expected that an incorrect "Estimated chart distance" value will not have a large impact, but more testing has to be performed to confirm this.

The principle

The curve presented in Figure 1 seems to imply that we have a single slanted edge that we measure as we move it away from the camera, starting at a distance closer than the focus plane distance. This is an entirely valid way of obtaining the measurements required to produce Figure 1, and Jim Kasson has done exactly that. It does require a good linear rail, preferably a computer controlled one to automate the capture of a large number of images from our desired range of distances (from the camera).

We can obtain a fairly decent approximation if we use a 45-degree chart with a large number of slanted edges. The tilt in the chart naturally ensures that these slanted edges appear at different distances from the camera. All that MTF Mapper has to do is extract slanted edge MTF values, and reconstruct the MTF50 vs distance curve.

That sounds straightforward, but there is one fairly large caveat: if our edge is slanted (as required for the slanted edge method to work), then that edge will pass through a range of distances, e.g, the starting tip of the edge is closer to the camera, and the endpoint of the edge is further from the camera. If the MTF50 value varies as a function of distance (as illustrated in Figure 1), then strictly speaking the PSF of the image formation process also varies along this edge. This violates the central assumption of the slanted edge method, which implicitly assumes that we can measure the MTF at a single location in the field by examining a small region around that location. In practice, the MTF we measure with the slanted edge method is a blend of the MTFs at the various distances the edge passes through.

There is not much we can do about it, but it helps to oversample. Each of the long edges of the slanted edge bars in the "focus" test chart (see Figure 2) is processed with a sliding window that uses only a small section of the edge to apply the slanted edge method to. This approach increases our sampling density whilst minimising the range of depth values over which each slanted edge MTF calculation is performed, i.e., we only assume that the true MTF is constant over a very small section of the edge. It is a well-known fact that the slanted edge method produces estimates with a smaller standard deviation if the length of the edge that it is applied to is increased (and the true MTF remains constant); conversely, we expect that each of our individual slanted edge measurements performed with the sliding window method will result in a large standard deviation in the estimated MTF50 value. Fortunately, we can safely assume that our desired MTF50 vs distance curve must be smooth, thus we can fit a smooth model to our multiple noise-contaminated MTF50 measurements. It turns out that a rational polynomial function of order (4, 2) seems to fit rather nicely in all the cases I have examined so far, so that is what MTF Mapper uses internally.

This strategy violates any number of model-fitting assumptions (e.g., my noisy samples are bound to be correlated, and the noise might be correlated too), but it seems to work in practice.

One last observation: Why are the slanted edge bars oriented so that they run left-to-right if the chart is tilted at 45-degrees top-to-bottom (assuming portrait orientation, as shown in Figure 2)? What would happen if we had only a single edge running top-to-bottom, and we applied the sliding window approach to that edge? It turns out that this top-to-bottom method is viable, but because each short edge segment passes through a larger range of distance (from the camera) values, compared to the left-to-right edges, the sensitivity of the detection of the peak of the MTF50 vs distance curve is compromised. So it works, just not as well as the edge orientation of Figure 2.

Accuracy assessment: set-up

So does it work? This turns out to be a fairly hard question to answer. One approach would be to validate the single-image-45-degree-chart method against a computer-controlled focusing rail (i.e., physically moving the edge like Jim Kasson does), but I do not have one of those handy.

I settled on a rather different approach that relies on the observation that a lens fitted on an extension tube can no longer focus at infinity. From what I could gather, a lens set to focus at infinity will focus at a distance d = f*(f/e + 2) + e, where f is the focal length, and e is the extension length (update: see Appendix A below for a discussion of this formula). I happen to have a Micro-Nikkor 105 mm f/4 Ai lens with a hard stop at infinity. After a bit of iterative experimentation (translation: building something, then going back to the drawing board, then salvaging the hardware) I found that an extension of about 6.4 mm will cause the 105 mm lens to focus at a distance of about 1939 mm. At this distance, the lens covers an object size just a tad smaller than an A3 test chart.

Of course, there are some practical problems. Firstly, it is rather difficult to measure a distance of 1939 mm with good accuracy using my available tools. More importantly, I am not quite sure where to measure this distance from (update: As mentioned in Appendix A, this is the total lens conjugate distance, i.e., the distance between the image plane and the focus plane. Since I do not know the principal plane separation distance of my lens, I still cannot use the total lens conjugate distance directly). Even if I could solve the measurement problem, I would still only end up with a single measurement, and no experimental variables to vary.

My solution was to build a variable-length extension tube. The idea was that I could preset the effective length of the extension tube with good accuracy if I used shim stock (or feeler gauges) --- all I had to do is build an extension tube from scratch, since none of the commercially available ones appear to go below 8 mm. Here is a photo of my custom extension tube mounted between my D7000 and the 105 mm Nikkor lens:
Figure 5: The bronze-coloured ring is part of my extension tube
Here is what the front of my extension tube looks like with the lens removed:
Figure 6: extension tube with the lens removed
As you can see from Figure 6, it was a bit of a tight fit to build an adaptor that was wide enough to allow adjustment without removing the lens, but still small enough to physically fit below the prism housing.
Here is what the extension tube looks like with some shims installed:
Figure 7: extension tube with some shims installed, front view
Figure 8: extension tube with some shims installed, rear view
As can be seen in Figure 7, the front part of the extension tube comprises two parts: the front flange (visible as the large ring with the black pen markings), and a Nikon F-mount female bayonet mount. The inner four screws fix the female mount the the outer flange. 

The rear flange has an integrated Nikon F-mount male bayonet. I discovered that the male bayonet is a lot easier to manufacture than the female F-mount bayonet --- that probably explains why Nikon sells them :) Figure 8 also shows how the shims are installed between the front and rear flanges. Careful lapping of the flanges (an a bit of shimming with aluminium foil) ensured that the front surface of the female bayonet mount was parallel to the rear surface of the male bayonet mount to within 5 micron.

The front of the rear flange looks like this when we open up the extension tube:
Figure 9: front face of rear flange seen in the foreground
And lastly, we can see the rear face of the front flange:
Figure 10: rear face of front flange.
Notice the dowel pins in Figure 10: these acted as the registration mechanism so that the flanges always line up correctly without any rotation or tilt.

The length of the extension tube can be adjusted by installing three shims between the front and rear flanges. Measurements with a micrometer show that the repeatability of this process was around 5 micron. I also learned that bargain-store feeler gauges are not necessarily manufactured down to the tolerances that this experiment demanded --- I found some evidence that the feeler gauge thickness varied a little bit across their surfaces. I compensated as much as possible by labeling the position at which a particular shim should be installed, and I measured the effective extension tube length rather than relying on the nominal shim thickness.

I ended up with the following (effective) shim thicknesses: 130, 94, 72, 58, and 45 micron.

Accuracy assessment: the results

The basic experiment involves setting up the chart so that the apparent focus plane position was just slightly in front of the chart center when the 130 micron shims were installed. For each shim set, I then captured 10 images to yield 50 images in total. I repeated the whole experiment a second time to check for repeatability. Figure 11 presents the resulting box-and-whisker plot.
Figure 11: Focus position shift measured by MTF Mapper as a function of shim thickness
The y-axis denotes the "focus peak depth" value reported by MTF Mapper; all the values are positive indicating that the measured focus plane position was slightly in front of the chart centre in all cases.
Other than the large variability (across 10 images) of Set A with 45 micron shims, it would appear that the individual measurements were quite robust. Typical standard deviation within a particular batch of 10 images was below 0.3 mm.

Using the formula presented above, we can compute the expected focus plane position for each of the shim sets, however, we still have no idea how to measure these absolute distances (and the principal plane separation distance of the lens is unknown). Instead, we can subtract the focus distance obtained from the formula with the 45 micron shim; doing the same for the focus peak depth values reported by MTF Mapper allows us to perform a relative comparison. The results are presented in Figure 12:
Figure 12: Summary of results. All values reported in millimeters

The second column contains the "focus peak depth" value reported by MTF Mapper. The second last column lists the relative focus peak depth value; these values should be compared to the relative values derived from the formula appearing in the last column.

Overall we see a reasonable agreement between the relative values derived from the formula, and the relative values as measured by MTF Mapper. There are some outliers (set B, 58 micron shim), but the difference between expected and measured values are typically below 1 mm. Keep in mind that a 5 micron change in shim thickness produces a change of 1.3 mm in focus plane position using the formula, i.e., the measured values appear to be within the mechanical repeatability of the shimming process itself.

The smallest change in shim thickness tested here was 13 micron (45 micron shim set swapped out with 58 micron shim set), followed closely by the 72 vs 58 micron shim combination with a difference of 14 micron. In both those cases it is clear (see Figure 11) that, using the "focus peak depth" values reported by MTF Mapper, one can easily discern the change in focus plane position induced by a 13 micron change in shim thickness.

Why would we want to do this? One application would be the calibration of a camera system where we have to shim the flange distance (distance from sensor to lens mounting flange front surface) to ensure that the image formed on the sensor is in focus when a reference lens is mounted. This is particularly useful for systems with hard infinity focus stops.

Of course, one would have to consider things like actual image magnification relative to sensor resolution when considering this "smallest discernible change in flange distance" measurement, because MTF Mapper performs the analysis of images at the pixel level. More testing!


[Burke2012]: Burke, Michael W, Image acquisition: handbook of machine vision engineering, Springer Science & Business Media, 2012.

Appendix A

The formula used to calculate the focus distance of the lens with focal length f and extension e is f*(f/e + 2) + e. This formula is taken from [Burke2012, p311], where d is stated to be the total lens conjugate distance. The total lens conjugate distance is the sum of the object-to-lens-centre and image-to-lens-centre distances when looking at the thin lens model. Burke notes that the derivation of this equation depends on the lens being symmetric, which allows us to assume that d = do + 2+ di, where do is the object-to-focal-point distance, and di is the image-to-focal-point distance.

I strongly doubt that my Micro-Nikkor 105 mm f/4 Ai is really a symmetric lens, so I just assume that this formula still gives reasonable results. Burke's formula only applies to a thin lens, which I am fairly certain my lens is not (being a compound lens). The implication of this, from my understanding, is that there is an additional distance dp that separates the two principal planes which must be added to the total lens conjugate distance, which implies that d = do + 2ddp. Using my convention above, where di is called e (denoting extension), we see that the thick lens version of this equation should be f*(f/e + 2) + dp.

Unfortunately I have no idea what the value of dp would be for my lens. Serendipitously, I only end up using the difference between values computed using different values of e, meaning that the subtraction removes the dp term from the difference, so I can get away with using the thin lens version of the formula.

Friday, 10 February 2017

Automatic chart orientation estimation: validation experiment

In my previous post I mentioned that it is rather important to ensure that your MTF Mapper test chart is parallel to your sensor (or that the chart is perpendicular to the camera's optical axis, which is almost the same thing) to ensure that you do not confuse chart misalignment with a tilted lens element. I have added the functionality to automatically estimate the orientation of the MTF Mapper test chart relative to the camera using circular fiducials embedded in the test chart. Here is an early sample of the output, which nicely demonstrates what I am talking about:
Figure 1: Sample output of chart orientation estimation
Figure 1 shows an example of the MTF Mapper "lensprofile" chart type, with the new embedded circular fiducials (they are a bit like 2D circular bar codes). Notice that the actual photo of the chart is rendered in black-and-white; everything that appears in colour was drawn in by MTF Mapper.
There is an orange plus-shaped coordinate origin marker (in the centre of the chart), as well as a reticle (the red circle with the four triangles) to indicate where the camera is aimed at. Lastly, we have the three orientation indicators in red, green and blue, showing us the three Tait-Bryan angles: Roll, Pitch and Yaw.

But how do I know that the angles reported by MTF Mapper are accurate?

The set-up

I do not have access to any actual optics lab hardware, but I do have some machinist tools. Fortunately, being able to ensure that things are flat, parallel or perpendicular is a fairly important part of machining, so this might just work. First I have to ensure that I have a sturdy device for mounting my camera; in Figure 2 you can see the hefty steel block that serves as the base of my camera mount.
Figure 2: Overview of my set-up
I machined the steel block on a lathe to produce a "true" block, meaning that the two large faces of the large shiny steel block are parallel, and that those two large faces are also perpendicular to the rear face on which the steel block is standing in the photo. The large black block in Figure 2 is a granite surface plate; this one is flat to something ridiculous like 3.5 micron maximum deviation over its entire surface. The instrument with the clock face is a dial test indicator; this one has a resolution of 2 micron per division. It is used to accurately measure small relative displacements through the pivoting action of the lever you can see in contact with the lens mount flange of the camera body. 

Using this dial test indicator, surface plate and surface gauge, I first checked that the two large faces of the steel block were parallel: they were parallel to within about 4 micron. Next, I stood up the block on its rear face (bottom face in Figure 2), and measured the perpendicularity. The description of that method is a bit outside of the the scope of this post, but the answer is what matters: near the top of the steel block the deviation from perpendicularity was also about 4 micron. The result of all this fussing with parallelism and perpendicularity is that I know (because I measured it) that my camera mounting block can be flipped through 90 degrees by either placing it on the large face with the camera pointing horizontally, or stood up with the camera pointing to the ceiling.

That was the easiest part of the job. Now I had to align my camera mount so that the actual mounting flange was parallel to the granite surface plate. 
Figure 3: Still busy tweaking the mounting flange parallel to the surface plate
The idea is that you keep on adjusting the camera (bumping it with the tripod screw partially tightened, or adding shims) until the dial test indicator reads almost zero at four points, as illustrated between Figures 2 and 3. Eventually I got it parallel to the surface plate to within 10 micron, and called it good.

This means that when I flip the steel block into its horizontal position (see Figure 4) the lens mount flange is perpendicular to the surface plate with a reasonably high degree of accuracy. Eventually, I will arrange my test chart in a similar fashion, but bear with me while I go through the process.
Figure 4: Using a precision level to ensure my two reference surfaces are parallel
In Figure 4 you can see more of my set-up. The camera is close to its final position, and you can see a precision level placed on the granite surface plate just in front of the camera itself. That spirit level measures down to a one-division movement of the bubble for each 20 micron height change at a distance of one metre, or 0.0011459 decimal degrees if you prefer. I leveled the granite surface plate in both directions. Next, I placed a rotary table about 1 metre from the camera --- you can see it to the left in Figure 4. The rotary table is fairly heavy (always a good thing), quite flat, and will later be used to rotate the test chart. The rotary table was shimmed until it too was level in both directions.

The logic is as follows: I cannot directly measure if the rotary table's surface is parallel with the granite surface plate, but I can ensure that both of them are level, which is going to ensure that their surfaces are parallel to within the tolerances that I am working to here. This means that I know that my camera lens mount is perpendicular to the rotary table's surface. All I now have to do is place my test chart so that it is perpendicular to the rotary table's surface, and I can be certain that my test chart is parallel to my camera's mounting flange. I aligned and shimmed my test chart until it was perpendicular to the rotary table top, using a precision square, resulting in the set-up shown in Figure 5.
Figure 5: overview of the final set-up. Note the obvious change in colour temperature relative to Figure 4. Yes, it took that long to get two surfaces shimmed level.

One tiny little detail (or make that two)

Astute readers may have picked up on two important details:
  1. I am assuming that my camera's lens mounting flange is parallel to the sensor. In theory, I could stick the dial test indicator into the camera and drag the stylus over the sensor itself to check, but I do actually use my camera to take photographs occasionally, so no sense in ruining it just yet. Not even in the name of science.
  2. The entire process above only ensures that I have two planes (the test chart, and the camera's sensor) standing perpendicularly on a common plane. From the camera's point of view, this means there is no up/down tilt, but there may be any amount of left/right tilt between the sensor and the chart. This is not the end of the world, since my initial test will only involve the measurement of pitch (as illustrated in Figure 1).

The first measurements

Note: Results updated on 13/02/2017 to reflect improvements in MTF Mapper code. New results are a bit more robust, i.e., lower standard deviations.

From the set-up above, I know that my expected pitch angle should be zero. Or at least small. MTF Mapper appears to agree: the first measurement yielded a pitch angle of -0.163148 degrees, which is promising. Of course, if your software gives you the expected answer on the first try, you may not be quite done yet. More testing!

I decided to shim the base of the plywood board that the test chart was mounted on. The board is 20 mm thick, so the 180 micron shim (0.18 mm) that I happened to have handy should give me a tilt of about 0.52 degrees. I also had a 350 micron (0.35 mm) shim nearby, which yields a 1 degree tilt. That gives me three test cases (~zero degrees, ~zero degrees plus 0.52 degree relative tilt, and ~zero degrees plus 1 degree relative tilt). I captured 10 shots at each setting, which produced the following results:
  1. Expected = 0 degrees. Measurements ranged from -0.163 degrees to -0.153 degrees, for a mean measurement of  -0.1597 degrees and a standard deviation of  0.00286 degrees.
  2. Expected = 0.52 degrees. Measurements ranged from 0.377 to  0.394 degrees, for a mean measurement of 0.3910 degrees with a standard deviation of  0.00509 degrees. Given that our zero measurement started at -0.16 degrees, relative angle between the two test cases comes down to  0.5507 degrees (compared to the expected 0.52 degrees).
  3. Expected = 1.00 degrees. Measurements ranged from 0.814 to 0.828, for a mean measurement of 0.8210 degrees with a standard deviation of  0.00423 degrees. The tilt relative to the starting point is 0.9806 degrees (compared to the expected 1.00 degrees).
I am calling that good enough for government work. It seems that there may have been a small residual error in my set-up, leading to the initial "zero" measurement coming in at -0.16 degrees instead, or perhaps there is another source of bias that I have not considered.

Compound angles

Having established that the pitch angle measurement appears to be fairly close to the expected absolute angle, I set out to test the relative accuracy of yaw angle measurements. Since my set-up above does not establish an absolute zero for the yaw angle, I cheated a bit: I used MTF Mapper to bring the yaw angle close to zero by nudging the chart a bit, so I started from an estimated yaw angle of 0.67 degrees. At this setting, I zeroed my rotary table, which as you can see from Figure 5 above, will rotate the test chart approximately around the vertical (y) axis to produce a desired (relative) yaw angle. At this point I got a bit lazy, and only captured 5 shots per setting, but I did rotate the chart to produce the sequence of relative yaw rotations in 0.5 degree increments. The mean values measured over each set of 5 shots were 0.673, 1.189, 1.685, 2.211, 2.717, and 3.157. If we subtract the initial 0.67 degrees (which represents our zero for relative measurements), the we get 0.000, 0.5165, 1.012, 1.538, 2.044,  and 2.484, which seems pretty close to the expected multiples of 0.5.

In the final position, I introduced the 0.18 mm shim to produce a pitch angle of 0.5 degrees. Over 5 shots a mean yaw angle of 3.132 degrees was measured (or 2.459 if we subtract out zero-angle of 0.67). I should have captured a few more shots, since at such small sample sizes it is hard to tell if the added yaw angle has changed the pitch angle, or not. It is entirely possible that I moved the chart while inserting the shim. That is what you get with a shoddy experimental procedure, I guess. Next time I will have to machine a more positive mechanism for adjusting the chart position.


Note that MTF Mapper could only extract the chart orientation correctly if I provided the focal length of the lens explicitly. My previous post demonstrated why it appears to be impossible to estimate the focal length automatically when the test chart is so close to being parallel with the sensor. This is unfortunate, because it means that there is no way that MTF Mapper can estimate the chart orientation completely automatically --- some user-provided input is required.

The good news is that it seems that MTF Mapper can actually estimate the chart orientation with sufficient accuracy to aid the alignment of the test chart. Both repeatability (worst-case spread) and relative error appears to be better than 0.05 degrees, or about three minutes of arc, which compares favourably with the claimed accuracy of Hasselblad's linear mirror unit. Keep in mind that I tested under reasonably good conditions (ISO 100, 1/200 s shutter speed, f/2.8), so my accuracy figures do not represent the worst-case scenario. Lastly, because of the limitations of my set-up, my absolute error was around 0.16 degrees, or 10 minutes of arc; it is possible that actual accuracy was better than this.

How does this angular accuracy relate to the DOF of the set-up? To put some numbers up: I used a 50 mm lens on an APS-C size sensor at a focus distance of about 1 metre. If we take the above results, and simplify it to say that MTF Mapper can probably get us to within 0.1 degrees under these conditions, then we can calculate the depth error at the extreme edges of the test chart. I used an A3 chart, so our chart width is 420 mm. If the chart has a yaw angle of 0.1 degrees (and we are shooting for 0 degrees), then the right edge of our chart will be 0.37 mm further away than expected, or our total depth error from the left edge of the chart to the right edge will be twice that, about 0.73 mm. If I run the numbers through vwdof.exe, the "critical" DOF criterion (CoC of 0.01 mm) yields a DOF of 8.95 mm. So our total depth error will be around 8% of our DOF. Will that be enough to cause us to think our lens is tilted when we look at a full-field MTF map? 

Only one way to find out. More testing!

Wednesday, 8 February 2017

Limitations of using single-shot planar targets to perform automatic camera calibration

When you are trying to measure the performance of your system across the entire field, it is rather important to ensure that your test chart is parallel to your sensor. If you are not careful, then a slight tilt in your test chart could look very much like a tilted lens element if you are looking at the MTF values, i.e., two opposite corners of your MTF image would appear to be soft: is your lens titled along the diagonal, or is the chart tilted along the same diagonal?

My solution to this problem is to directly estimate the camera pose from the MTF test chart. I have embedded fiducial markers in the latest MTF Mapper test charts which will allow me to measure the angle between your sensor and your test chart. This post details a particular difficulty I encountered while implementing the camera pose estimation method as part of MTF Mapper.

The classical approach

Classical planar calibration target methods like Tsai [Tsai1987] or Zhang [Zhang2000] prescribe that you capture several images of your planar calibration target, while ensuring that there is sufficient translation and rotation between the individually captured images. From each of the images you can extract a set of correspondences, e.g., the location of a prominent image feature (corner of a square, for example) and the corresponding real-world coordinates of that feature.

This sounds tricky, until you realize that you are allowed to express the real-world coordinates in a special coordinate system attached to your planar calibration target. This implies that you can put all the reference features at z=0 in your world coordinate system (their other two coordinates are known through measurement with a ruler, for example), meaning that even if you moved the calibration object (rather than the camera) to capture your multiple calibration images, the model assumes that the calibration object was fixed and the camera moved around it.

A set of four such correspondences are sufficient to estimate a 3x3 homography matrix up to a scale factor, since four correspondences yields 8 equations to solve for the 8 free parameters of the matrix. A homography is a linear transformation that can map one plane onto another, such as mapping our planar calibration target onto the image sensor. For each of our captured calibration images we can solve these equations to obtain a different homography matrix. The key insight is that this homography matrix can be decomposed to separate the intrinsic camera parameters from the extrinsic camera parameters. We can use a top-down approach to understand how the homography matrix is composed.

To keep things a bit simpler, we can assume that the principal point of the system is fixed at the centre of the captured image. We can thus normalize our image coordinates so that the principal point maps to (0,0) in normalized image coordinates, and while we are at it we can divide the result by the width of the image so that x coordinates run from -0.5 to 0.5 in normalized image coordinates. This centering and rescaling generaly improves the numerical stability of the camera parameter estimation process. This gives us the intrinsic camera matrix K, such that
where f denotes the focal length of the camera. Note that I am forcing square pixels without skew. This appears to be a reasonable starting point for interchangeable lens cameras. We can combine the intrinsic camera parameters and the extrinsic camera parameters into a single 3x4 matrix P, such that
where the 3x3 matrix R represents a rotation matrix, and the vector t represents a translation vector. The extrinsic camera parameters R and t is often referred to as the camera pose, and represents the transformation required to transform from world coordinates (i.e., our calibration target local coordinates) to homogeneous camera coordinates. If we have multiple calibration images, then we obtain a different R and t for each image, but the intrinsic camera matrix K must be common to all views of the chart.

The process of estimating K and the set of  Ri and ti over all the images i is called bundle adjustment [Triggs1999]. Typically we will use all the available point correspondences (hopefully more than four) from each view to minimized the backprojection error, i.e., we take our known chart-local world coordinates from each correspondence, transform it with the appropriate P matrix, divide by the third (z) coordinate to convert homogeneous coordinates to normalized image coordinates, and calculate the Euclidean distance between this back-projected image point and the measured image coordinates (e.g., output of a corner-finding algorithm) of the corresponding point in the captured image. The usual recommendation is to use a Levenberg-Marquardt algorithm to solve this non-linear optimization problem to minimize the sum of the squared backprojection errors.

Strictly speaking, we usually include a radial distortion coefficient or two in the camera model to arrive at a more realistic camera model than the pinhole model presented here, but I am going to ignore radial distortion here to simplify the discussion.

Single-view calibration using a planar target

From the definition of the camera matrix P above we can see that even if we only have a single view of the planar calibration target, we can still estimate both our intrinsic and extrinsic camera parameters using the usual bundle adjustment algorithms. Zhang observed that when a planar calibration target is employed, we can estimate a 3x3 homography matrix H such that
where the vectors  r1  and  r2  define the first two basis vectors of the world coordinate frame in camera coordinates, and t is a translation vector. Since we require r1  and  r2  to be orthonormal, the third basis vector of the world coordinate frame is just the cross product of r1  and  r2. This little detail explains how the 8 free parameters of the homograph H are able to represent all the required degrees of freedom we expect in our full camera matrix P.

In the previous section we restricted our intrinsic camera parameters to a single unknown f, since both  Px  and  Py  are already know because we assume the principal point coincides with the image centre. With a little bit of algebraic manipulation we can see that Zhang's orthonormality constraints allows us to estimate the focal length f directly from the homography matrix H (see Appendix A below).

So this leaves me with a burning question: if we can estimate all the required camera parameters using only a single view of a planar calibration target, why do all the classical methods require multiple views (with different camera poses)?

Limitations of single-view calibration using planar targets

To answer that question, we simply have to find an example of where the single-view case would fail to estimate the camera parameters correctly. The simplest case would be to assume that our rotation matrix R is the 3x3 identity matrix (camera axis is perpendicular to planar calibration target), and that our translation vector is of the form [0 0 d] where d represents the distance of the calibration target from the camera's centre of projection. This scenario reduces our camera matrix P to
 A given point [x y 0] in world coordinates is thus transformed to [fx fy d] in homogeneous camera coordinates. We can divide out the homogeneous coordinate to obtain our desired normalized image coordinates as [fx/d fy/d].
And there we see the problem: the normalized image coordinates depend only on the ratio f/d, which implies that we do not have sufficient constraints to estimate both f and d from this single view. The intuitive interpretation is simple to understand: you can always increase d, i.e., move further away from the calibration target while adjusting the focal length f (zooming in) to keep f/d constant without affecting the image captured by the camera.
This happens because there is no variation in the depth of the calibration target correspondence points expressed in camera coordinates, thus the depth-dependent properties of a perspective projection are entirely absent.

We can try to apply the formula in Appendix A to estimate the focal length directly from the homography corresponding to the matrix P above, but we quickly run into a divide-by-zero problem. This should give us a hint. If we choose to ignore the hint, we can apply a bundle adjustment algorithm to estimate both the intrinsic and extrinsic camera parameters from correspondences generated using the matrix P. All that this will achieve is that we will find an arbitrary pair of  f and d values that satisfy the constant ratio f/d imposed by P.

The middle road

What happens if we have a slightly less pathological scenario? Let us assume that there is a small tilt between the calibration target plane and the sensor. For simplicity, we can just choose a rotation around the y axis so that
We know that for a small angle θ, sin(θ) ≈ 0, so our matrix P will be very similar to the sensor-parallel-to-chart case above. The corresponding homography H should be
We can apply the formula in Appendix A to H, which simplifies to f2 = f2, which is a relief. The question is: how accurately can we estimate the homography H using actual correspondences extracted from the captured images?

I know from simulations using MTF Mapper that the position of my circular fiducials can readily be estimated to an accuracy of 0.1 pixels under fairly heavy simulated noise. The objective now is to measure the impact of this uncertainty on the accuracy of the homography estimated using OpenCV's findHomography function. I start out with a camera matrix P like the one above with only a rotation around the y axis. A set of 25 points are generated on my virtual calibration target, serving as the world coordinates (with the same real-world dimensions as the actual A3 chart used by MTF Mapper). These are transformed using P to obtain the `perfect' simulated corresponding image coordinates representing the position of the fiducials. I perturb these perfect coordinates by adding Gaussian noise with a standard deviation of about 0.000020210 units, which corresponds to an error of 0.1 pixels, but expressed in normalized image coordinates (divided by 4948, the width of a D7000 raw image). Now I can systematically measure the uncertainty in the focal length estimated with the formula of Appendix A as a function of the angle between the chart and the sensor, θ. I ran 100000 iterations at a selection of angles, and calculated the difference between the 75th and 50th percentile of the estimated focal length as a measure of spread.
Figure 1
In Figure 1 we see that the spread of the focal length estimates increases dramatically once the angle θ drops below about 2 degrees. For the purpose of using the estimated camera pose to measure if you have aligned your chart parallel to your camera sensor, this is really terrible news: essentially, we cannot estimate the focal length of the camera reliably if the chart is close to being correctly aligned.
Figure 2
Figure 2 shows that the focal length estimate is relatively unbiased for angles above about 1 degree, but once the angle becomes small enough, we overestimate the focal length dramatically.

This experiment demonstrated that small errors in the estimated position of features (e.g., corners or centre of circular targets) leads to dramatic errors in focal length estimation. Intuitively, this makes sense, since the relative magnitude of perspective effects decreases the closer we approach a parallel alignment between the sensor and the calibration target. Since perspective effects depend on the distance from the chart, and the estimated distance from the chart is effectively controlled by the estimated focal length (assume the same framing), this seems reasonable.

I have tried using bundle adjustment, rather than homography estimation as an intermediate step, but clearly the problem lies with the unfavourable viewing geometry and the resulting subtlety of the perspective effects, not with the algorithm used to estimate the focal length. At least, as far as I can tell.

Hobson's choice

If we take the focal length of the camera as a given parameter, then the ambiguity is resolved, and we can obtain a valid, unique estimate of the calibration target distance d. This is not entirely surprising, since our assumed constrained intrinsic camera parameters depend only of the focal length f, i.e., K is known, thus the pose of the camera can be estimated for any given view, even the degenerate case where the calibration target is parallel to the sensor.

In other words, I see no way other than requiring the user to specify the focal length as an input to MTF Mapper. I will try to extract this information from the EXIF data when the MTF Mapper GUI is used, but it seems that not all cameras report this information. Fortunately, it seems that a user-provided focal length need not be 100% accurate in order to obtain a reasonable estimate of the chart orientation relative to the camera. 


  • [Zhang2000], Z. Zhang, A flexible new technique for camera calibration, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), pp. 1330-1334, 2000.
  • [Tsai1987], R. Tsai, A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses, IEEE Journal on Robotics and Automation, 3(4), pp. 323-344, 1987.
  • [Triggs1999], B. Triggs, P. McLauchlan, R. Hartley, A. Fitzgibbon, Bundle Adjustment — A Modern Synthesis, ICCV '99: Proceedings of the International Workshop on Vision Algorithms, Springer-Verlag, pp. 298-372, 1999.

Appendix A

If we have a homography H between our normalized image coordinate plane and our planar calibration target, such that
where h33 is an arbitrary scale factor, then the focal length of the camera can be estimated assuming square pixels, zero skew and a principal point of (0,0) in normalized image coordinates, using the formula
Note that this is only one possibility, derived from the constraint that r1 is a unit vector.