# Chromatic transformation "DNG style"

#### Doug Kerr

##### Well-known member
Of interest is how we start with the three sensor outputs for each pixel after demosaicing of the sensor raw outputs and get to a color representation (maybe sRGB) for the pixel.

A certain way to describe the first stanza of this process is built into the DNG (digital negative) doctrine. I will try and give a concise description of it here.

BACKGROUND

Notation

Typically the three kinds of photodetectors in a CFA sensor are called "red", "green", and "blue". This seems quite reasonable, since the spectral responses of these three types are concentrated in parts of the spectrum we can reasonably call "red", "green", and "blue".

A "natural" next step is to abbreviate these as "R", "G", and "B", but this is the beginning of a slippery slope to a problem.

The next step is to identify the outputs of these three groups of photodetectors as "R", "G", and "B", and at this stage of the process we have set up the prospect of serious misunderstanding. The reason is that, in this same area of work, "R", "G", and "B" are the designations of the three coordinates of an RGB color space. And so it is easy to think that the three outputs of the set of photodetectors are those coordinates, or are in some way equivalent. But they aren't, for at least two reasons:

• "R", "G", and "B" are nonlinear forms of the coordinates, but the three sensor outputs are linear.

• The three sensor outputs are not the linear coordinates of any RGB color space, and are in fact not the coordinates of any color space - even a totally-parochial one (as I will discuss further shortly).

Accordingly, to avoid any misunderstandings, I use "d", "e", and "f' as the generic designations of the three kinds of sensor outputs.

The implications of a non-colorimetric sensor

The camera sensors with which we are concerned are non-colorimetric. That means that the three outputs do not "consistently" indicate the color of light on an area of the sensor.

We know that the spectrum of an instance of light dictates its color, but that there can be many spectrums that have the same color (a phenomenon called metamerism). A colorimetric sensor (that means "measures color") gives an output that is the same for any spectrum that has a certain color.

But our sensors in general give different outputs for the different spectrums that have the same colors, and conversely, may give the same output for spectrums having different colors.

Thus, we cannot consider the three sensor outputs as being the coordinates of any color space, since a color space is a scheme for describing (precisely) the color of light, and this doesn't do that.

Transformation of the sensor outputs

So the question is. "how can we take the three output values from the sensor for a certain pixel and transform that to some standard representation (perhaps at first with the CIE coordinates X, Y, and Z) of the color of the light on that pixel"? Well, we can't. We can only arrange for a systematic transformation that produces the "least overall color error". Of course to plan that we have to answer several questions, most critically:

• What do we mean "overall"? Do we mean, perhaps, the average of the color error experienced as the sensor is exposed to some collection of standard spectrums? Maybe.

• How do we quantify "color error" Do we base it on the "delta E" metric? Maybe.

So you can imagine than any number of PhD dissertations have been generated pondering these dilemmas.

In any case, one straightforward way to implement a transformation that we hope will "do the best possible for this general approach" is to use a 3 × 3 matrix. We take the three sensor output values for a pixel and, treating that as a 3 × 1 matrix, multiply it by our "transformation matrix", getting the values X, Y, and Z, which describe "our best estimate of the actual color of the light at at that pixel".

We can then move from that color representation to the representation on which our image output will be based, perhaps sRGB.

THE DETAILS

Introduction

The specification for the DNG file (a standardized "raw" data file developed by Adobe) provides a rigorous description of a regimen for presenting the "transformation matrix" which the camera manufacturer has adopted for use in the procedure I described above. Or maybe the matrix that Adobe "suggests" for the particular camera model.

In many case, this matrix is explicitly embedded in the DNG file. In fact, there will usually be two forms included.

It turns out that because of the nature of the color error phenomenon, the "best" transformation matrix ("least overall color error") will depend on the assumed spectrum of the light that illuminated the scene. Most often, matrixes are included in the DNG file for two CIE standard illuminants (each of which has a defined spectrum), most often for illuminant "A" (which is representative of incandescent lighting) and illuminant "D65", which essentially represents noon daylight. For situations between those two, there are provisions for "interpolating" between the two matrixes.

Application

This regimen of defining and performing the transformation of the camera outputs to the CIE XYZ coordinates, which is described in the context of the DNG specification, is, for example, also use in dcraw, Dave Coffin's iconic "raw development engine", the core of may raw development packages (often used where a DNG file is not even involved).

The matrix itself

The matrixes, however, are presented in a somewhat surprising form. We might expect that the matrix would be set up to perform (by matrix multiplication) a transformation from the camera sensor outputs (which I call d, e, f) to the CIE coordinates X, Y, Z. My shorthand for a matrix intended to work that way is "def->XYZ".

But in fact the matrices that are stored in a DNG file and are used in the"DNG" transformation procedure are what I would symbolize as "XYZ->def". That is, they are set up to transform a set of XYZ values into the camera outputs (def) that would lead to them when transformed. This rather startling situation makes the matrix better fit into the slightly-circuitous mathematics involved in the overall transformation process.

White Balance

An operation that is often insinuated into the overall process is what is described as "white balancing of the camera outputs". That might be done in one of two ways:

• At some stage of the process, the three kinds of sensor outputs are multiplied by three fixed constants (which depend on the particular cameras, and are fixed for any assumed illuminant on the scene). The purpose of this is that these "scaled" outputs (which I designate d', e', and f') would have equal values for any region of the sensor where the light is in fact has the spectrum of the presumed illuminant. (This is often described in terms of the light reflected, from such illumination, by a "reflectively spectrally neutral" object in the scene. But that is by definition light with the spectrum of that illuminant!)

• Before the sensor analog outputs are digitized, the three kinds of outputs are given additional analog gain so that the above situation is attained.

Now is this a necessity for proper transformation of the sensor outputs to XYZ? No. That "best" matrix can be designed so that it runs from d, e, and f or d', e', and f'. But this "white balancing" scheme fits into the way that various camera manufactures early decided to handle that matter of setting the camera to produce the proper white value for any of several kinds of illuminant (for example, the ones intimated by the various "white balance presets" on a Canon digital camera). So the "DNG" transform doctrine has provisions for playing along with that approach.

Which XYZ is that, anyway?

I said that the transformation matrix as defined in the DNG file was essentially a n "XYZ->def" matrix. But the "XYZ" color space that is referenced is one whose white point is the chromaticity of the illuminant being considered.

But in the whole scene of things, the image representation in terms of XYZ is by convention based on a form of the XYZ color space in which the white point is the chromaticity of illuminant D50. And so come "chromatic adaptation" will need to be done along the way.

Note that "chromatic adaption" does not just mean "shifting the chromaticities so that the chromaticity of the assumed illumination is shifted to the chromaticity of D50 by the time we get to the XYZ form." Rather, it refers to the more subtle issue of mapping all chromaticities to compensate for the human visual responses under two different illuminants.

The mathematical chain

Now that we have all thus stuff in hand, let's look at the actual regimen defined in the DNG file for setting up to transform the camera outputs (and that can mean "d','e', f' " if "camera output white balancing" has been done to the raw data before this process can "get hold of it").

1. We take the "matrix as given" (which is "XYZ->def") and multiply it by the matrix that defines the camera output white balancing. That gives us a matrix we can think of as "XYZ->d'e'f' ").

2. We take the inverse of that matrix, giving us a matrix we can think of as "d'e'f'->XYZ (and XYZ here means, "the coordinates of the color in a CIE XYZ color space whose white point is the chromaticity of the illuminant in use").

3. We multiply that matrix by the "chromatic adaptation matrix" which corrects and adjust between an "illuminant in use" frame of reference to a "D50" frame of reference. We can think of the resulting matrix as "d'e'f'->XYZ_D50".

Now, in actual, operation, we take the d'e'f' values for each pixel and multiply them by this "d'e'f'->XYZ_D50" matrix, to get the XYZ coordinate we will considered to represent the actual color of light at that pixel in a "D50" frame of reference.

************

Was that fun or what?

Best regards,

Doug

#### Doug Kerr

##### Well-known member
It may not at first seem intuitively correct that if we start with an "XYZ->def" matrix, then to get the "XYZ->d'e'f' " matrix we would multiply the "XYZ->def" matrix by the "def->d'e'f' " matrix. It almost seems that we need to in fact "divide out" the "def->d'e'f' ", not multiply by it.

Of course there is no such thing, so perhaps we would do that by multiplying the "XYZ->def" by the inverse of the "def->d'e'f' " matrix.

But in fact, we need to do just what was originally stated.

The key is this. In this illustration, I will use some hypothetical matrixes that convert between various matrix variables I will call "A", "B", and "C".

Suppose we have the matrix that can be used to multiply A to get B. I will call that matrix "A->B".

And suppose we have the matrix that can be used to multiply B to get C. I will call that matrix "B->C".

Then suppose we want to know how to compute the matrix that can be used to multiply A to get C. I will call that matrix "A->C".

The matrix algebra for calculating that is this:
A->C = B->C * A->B​

I use the asterisk for clarity to mean matrix multiplication; in regular matrix algebra (not "programming language") no symbol is used to mean multiplication (not even a dot), just as when in ordinary algebra we write "JK" to mean "J multiplied by K".

The way to remember this is that as we "progress along the chain" from one variable through an intermediate one to the final one, we add factors to the left. "B->C"comes farther along the overall chain from A to C than "A->B", so it goes to the left.

Recall that matrix multiplication is not commutative, so we cannot just interchange those two factors; the sequence is critical​

So to get the matrix that is used to convert XYZ to d'e'f', we do:

XYZ->d'e'f'= def->d'e'f * XYZ->def​

(Yes, we really want to know d'e'f'->XYZ, but we will get that next by inverting the previous result.)

Best regards,

Doug