Photometric stereo for defect detection

This article is about a technique called photometric stereo used to estimate an object's surface normal vectors from pictures, and an alternative technique useful in the context of defect detection.

Photometric stereo

Since I've already been working several times with various photogrammetry techniques I was happy when I had recently the occasion to implement another one: photometric stereo. Unlike techniques like surface from motion or surface from silhouette, which estimate the geometry of the surface of an object from several 2D pictures from various points of view, photometric stereo estimates the normal vectors of the surface from several 2D pictures under different lighting conditions.

The photometric stereo method works as follow. Assuming that:

  1. we have several greyscale pictures of a static object from a still camera,
  2. the object is illuminated by a point light,
  3. the vectors from the object (approximated to a point at its center) to the lights are known,
  4. the camera is far enough from the object to have almost parallel view rays,
  5. the lights are far enough from the object to have almost parallel light rays for each light,
  6. the object has a uniform Lambertian surface,

then the intensity \(i\) of light reflected by the surface at the position corresponding to a pixel in an image is proportional, up to a factor \(k\) accounting for the albedo, to the dot product of the light ray direction \(\vec{l}\) for that image and the normal \(\vec{n}\) of the surface at that position: \(i=k(\vec{l}.\vec{n})\) .

For a given pixel/position on the object surface, \(k\vec{n}\) can then be calculated by solving a system of linear equations using the Moore-Penrose pseudoinverse: \(k\vec{n}=(L^TL)^{-1}L^T\vec{I}\) , where \(L\) is the matrix of light direction vectors for each image and \(\vec{I}\) is the vector of light intensity at that pixel for each image. Finally \(k\) and \(\vec{n}\) can be separated as we know that \(\vec{n}\) must have unit length: \(k=||k\vec{n}||\) and \(\vec{n}=(k\vec{n})/k\).

The assumptions seem impossible to hold in reality. Light will never be an exact point, same for rays parallelism, and perfect Lambertian surface are never seen in real world. Nonetheless, in practice the normals approximated by this method prove to be good enough to be useful.

Example of photometric stereo output

Let's have a look at what photometric stereo produces using my implementation in LibCapy. As usual I'm using POV-Ray to create a synthetic dataset. And we'll have never enough of that Stanford Bunny right ?

Normal are encoded in false color for visualisation, and there is not much more to say except that the base of the ear gives a clear example of how cast shadows affect the normal map.

For other examples and implementation in various programming languages, take a look at this page (Matlab implementation), this repo and this page (Python implementation), and this repo (Python implementation of a more robust version mitigating shadows and specular highlights).

Usage of photometric stereo

Now, we have a normal map, nice but what is it useful for ? One use case example is to render new images with lights at positions not in the initial dataset of 2D images: we were using \(\vec{l}\) to calculate \(\vec{n}\) from \(i\), now that we have \(\vec{n}\) we can calculate \(i\) for any \(\vec{l}\). One would have to address the problem of cast shadows though.

Another example is to estimate a 3D model of the object by integrating the variation of these normals over the surface with the Frankot-Chellappa algorithm. This is called "shape from shading". However, as the results in the links above show, the fidelity to the original surface is limited. The assumptions of the method lead to inaccuracies in the normals, which get amplified by the integration process.

If the normal map alone is not enough to get good 3D reconstruction, it's still useful information that we can exploit in combination with other methods, as explained in this paper. The photometric stereo technique is efficient to retrieve high frequency information (small details of the surface), while techniques like surface from motion are more efficient for low frequency information (global shape of the surface). Used together you get the best of both worlds.

Defect detection

That advantage about small details hints toward another use case: small defect detection (i.e. abnormal scratches, dents, cracks, protrusions, etc... on the surface of an object). This is the use case I was interested in.

Nowadays this type of detection is generally done using a neural network, but there are many options when it comes to choose what features to use as input to those networks. This paper gives a review of possible features, neural network architectures and training datasets. This paper shows how the output of photometric stereo can be used as such a feature for defect detection.

The point of using photometric stereo for defect detection is that even if small details do not create useful features in one image under ambient lighting, they do (in some cases) under point light illumination at various angles. I like to think about it as the well known method to find a lost screw on the floor: turn off the light and scan the floor at low angle with a flashlight, the shadow of the screw will stand up against the floor background.

One may wonder why not simply use one image under point light illumination. The answer is that we don't know in advance which angle will reveal the defect. So, we need to take several images at various angles. Then, instead of feeding several images to the detection algorithm, a normal map allows to encode nicely all the information into one single image.

Also, as the normal map describes the shape of the surface, it acts as a filtering step: if the surface has a colored texture interfering with the detection algorithm, the normal map filters it out (or at least helps reduce it). And the normal map can be easily postprocessed to accentuate the curvature, leading to potentially more convenient features for the detection algorithm.

Examples of photometric stereo for defect detection

Let's see on some synthetic examples how photometric stereo performs for defect detection. Using again POV-Ray and the Stanford bunny, I've generated a dataset of "Bunny coins" with three textures: matte, metal and wood. The bunny is flatten to the extreme and a slight dent is made on the coin surface near its back.

On the matte example, the details of the bunny and the defect are almost invisible under ambient light, but the normal map, and even more it's enhanced version, makes both clearly visible.

On the metal example, the shiny texture allows to clearly see the details and defect even under ambient light. This example is more interesting in showing that despite the shape is exactly the same as the matte coin, the normal map are quite different. It shows the impact of the assumption about the Lambertian surface.

On the wood example, the texture makes the details and defect completely invisible under ambient light, but the normal map, and even more it's enhanced version, makes both clearly visible. It also shows how the transformation to a normal map completely hides the texture.

Although these examples illustrate how the photometric stereo technique can indeed efficiently reveal small details and defects, their synthetic nature limits the evaluation of its actual efficiency. The light and texture models are simplification of reality, the coin surface is way too perfect, and the mesh of the bunny shows up if looked at carefully. I wish I could show you the tests I did with real photographs of metal, paper and plastic objects. They are really impressive most of the time and I can assure that's indeed a very interesting technique. As I can't share these results here you will have to try by yourself!

An alternative method

The photometric stereo method has an inconvenient in practice: the need to know the light directions. This led me to search for another solution of my own. Probably influenced by my other recent work about focus measurement I thought of using the variance of pixel intensity.

If the light is rotating around the camera in a plane perpendicular to the camera rays, under the same assumptions as before a surface facing the camera will show constant intensity, while a surface leaning in another direction will have a intensity varying in proportion to \(\vec{n}.\vec{c}\), where \(\vec{c}\) is the camera ray direction.

We can't calculate the normal anymore, but from the perspective of defect detection we don't really care. The (normalised) variance map alone gives a representation of the surface where details and defects are much easier to see. The following examples (on the same dataset as previously) show it clearly.

The variance method is less effective at hiding the wood texture, but returns results equally interesting to the photometric stereo's ones. So, given it doesn't need the light position, it's definitely an interesting technique too. I've also tried it on real objects with a real camera, and here again the conclusion is the same.

Conclusion

In conclusion, if you're interested in defect detection and didn't knew about these methods (in particular mine which I haven't found in the litterature) I hope this article will motivate you to give it a try. It's not magic, it's not working all the time, but as I said the results I've obtained in real tests were generally impressive.

2026-02-27
in AI/ML, All, Computer graphics,
17 views
A comment, question, correction ? A project we could work together on ? Email me!
Learn more about me in my profile.

Copyright 2021-2026 Baillehache Pascal