NEF "lossy" compression is clever

Emil Martinec · Apr 22, 2008

Noise in digital signal recording places an upper bound on how finely one may usefully digitize a noisy analog signal. One example of this is 12-bit vs 14-bit tonal depth -- current DSLR's with 14-bit capability have noise of more than four raw levels, so that the last two bits of digital encoding are random noise; the image could have been recorded at 12-bit tonal depth without loss of image quality.

The presence of noise masks tonal transitions -- one can't detect subtle changes of tonality of plus or minus one raw level, when the recorded signal plus noise is randomly jumping around by plus or minus four levels. Smooth can't be smoother than the random background.

NEF "lossy" compression appears to use this fact to our advantage. A uniformly illuminated patch of sensor will have a photon count which is roughly the same for each pixel. There are inherent fluctuations in the photon counts of the pixels, however, that are characteristically of order the square root of the number of photons. That is, if the average photon count is 10000, there will be fluctuations from pixel to pixel of as much as sqrt[10000]=100 photons in the sample. Suppose each increase by one in the raw level corresponds to counting ten more photons; then noise for this signal is 100/10=10 raw levels. The linear encoding of the raw signal wastes most of the raw levels.

In shadows, it's a different story. Suppose our average signal is 100 photons; then the photon fluctuations are sqrt[100]=10 photons, which translates to +/- one raw level. At low signal level, none of the raw levels are "wasted" in digitizing the noise.

Ideally, what one would want is an algorithm for thinning the level spacing at high signal, while keeping it intact for low signal, all the while keeping the level spacing below the noise level for any given signal. NEF "lossy" compression uses a lookup table to do just that, mapping raw levels 0-4095 (for 12-bit raw) into compressed values in such a way that there is no compression in shadows, but increasing thinning of levels for highlights, according to the square root relation between photon noise and signal. Here is a plot of the lookup table values (this one has 683 compressed levels, the compression varies from camera to camera depending on the relation between raw levels and photon counts):

The horizontal axis is the compressed value of the "lossy" NEF; the vertical axis is the raw level. The blue curve is the compression lookup table; a given compressed value on the horizontal axis corresponds to the 12-bit raw level plotted. "In-between" raw levels are rounded to the nearest one in the lookup table. The plot is linear at the low end because photon fluctuations at low signal aren't big enough to permit thinning of raw levels, it then rises starting at about compressed level 285, the curve steepens as more and more levels are thinned out.

The red curve is a fit of the photon noise model of how much compression one should be able to get away with. It is the best fit of the data to

raw level = A x (NEF compressed level - B)^2

where the constant A is determined by the sensor "gain", its efficiency in capturing photons; and the constant B is an offset to account for the linear part of the curve where no compression is being done. The model is just about a perfect match to the lookup table data.

The agreement strongly indicates that Nikon engineers are using the properties of light in a clever way, to thin the number of levels in recording the raw data, using only as many as are needed to prevent posterization, allowing for the noise inherent in the light signal to dither the tonal transitions over the increasingly large gaps at higher luminance. The gaps will be undetectable because of the noise, and the retained levels encode the image data with maximum efficiency.

Rather clever IMO.

Doug Kerr · Apr 22, 2008

Hi, Emil,

Emil Martinec said:
The agreement strongly indicates that Nikon engineers are using the properties of light in a clever way, to thin the number of levels in recording the raw data, using only as many as are needed to prevent posterization, allowing for the noise inherent in the light signal to dither the tonal transitions over the increasingly large gaps at higher luminance.

This is essentially the principle exploited in the sRGB color space through its use of the oft-called "gamma precompensation" function (and in other color spaces as well, such as L*a*b*). The same thing is of course done in almost all digital audio encoding (often via the "a-law" and "mu-law" functions).

Emil Martinec · Apr 22, 2008

I have seen two explanations of the gamma inherent in the definition of sRGB and other color spaces -- that it is related to (a) the historical legacy of CRT electron gun nonlinear response, and/or (b) nonlinearities of human perception. Now granted, evolution is likely to find a way to most efficiently encode a visual signal, so it would make sense that the encoding it derived would be aware of photon shot noise and arrive at a perceptual gamma that is close to the one that optimizes digital tonal compression. The first explanation seems like a random historical accident (though perhaps a happy one). Perhaps you know the origins better?

Typical gammas are 1.8 and 2.2, while photon shot noise is always a 2.000000... power law, so there is a quantitative distinction. There is also the offset accounting for the linear portion of the transform, which is different than the way sRGB for instance works at low signal. Those quibbles aside, the general structure of the color space transforms is indeed quite similar.

Doug Kerr · Apr 22, 2008

Hi, Emil,

Emil Martinec said:
I have seen two explanations of the gamma inherent in the definition of sRGB and other color spaces -- that it is related to (a) the historical legacy of CRT electron gun nonlinear response, and/or (b) nonlinearities of human perception.

Yes, and my take is that it began as the former and then the latter showed up as a reason to continue it (even though the function wasn't really optimal for that purpose)! "Better lucky than good", perhaps!

Now granted, evolution is likely to find a way to most efficiently encode a visual signal, so it would make sense that the encoding it derived would be aware of photon shot noise and arrive at a perceptual gamma that is close to the one that optimizes digital tonal compression. The first explanation seems like a random historical accident (though perhaps a happy one). Perhaps you know the origins better?

I don't really know, but note my conjecture above!

Some of the history of the evolution of the NTSC video "coding" system (where this concept first really shows up) suggests that path. Early textbooks (Fink, Glasford) emphasize only the real gamma precompensation outlook, and later works extol the perceptual virtues as well!

Typical gammas are 1.8 and 2.2, while photon shot noise is always a 2.000000... power law, so there is a quantitative distinction. There is also the offset accounting for the linear portion of the transform, which is different than the way sRGB for instance works at low signal. Those quibbles aside, the general structure of the color space transforms is indeed quite similar.

Oh, indeed, and I didn't mean to minimize the clever work you speak of on Nikon's part. I was just struck by the broad similarity.

In the broadly comparable work in audio encoding, there wasn't any "pragmatic" predicate (analogous to the CRT gamma issue) and the work proceeded on perceptual grounds (but motivated by encoding economy considerations). The base functions there are logarithmic, rather than power, functions, and as with color encoding different ploys are used near the origin.

Thanks for your insightful analysis.

Best regards,

Doug

Dierk Haasis · Apr 23, 2008

To get a practical view of what NEF compression really does to your images [not much], see Jeffrey Friedl's test/article with examples on the issue.

Emil Martinec · Apr 23, 2008

Dierk Haasis said:
To get a practical view of what NEF compression really does to your images [not much], see Jeffrey Friedl's test/article with examples on the issue.

Unfortunately, the "technical article" Friedl refers to contains a lot of incorrect guesswork on what the compression does. Friedl's test image, from which he cleverly generates both a compressed and an uncompressed NEF, is along the right track. Taking the difference between an uncompressed and a lossy compressed NEF of the same scene and then raising the exposure difference, will show the difference between unthinned and thinned raw levels. Since the first couple hundred compressed values are equal to the uncompressed values, in deep shadows there will be zero difference. Since the gap size (and therefore the maximum possible difference between the files) grows as the square root of the level, and the noise is larger than the gap size, effectively what is being shown in the difference file is random numbers whose mean goes as the square root of the level of the original image (separately for each color channel, so if R,G,B are not equal, some funky hues can arise). That is what will appear outside of shadows, since as mentioned in shadows there is no difference between the files.

What Friedl's test obscures is the fact that the difference file will have pixel values that are always smaller than the noise in a comparable patch of the original uncompressed image, which is what allows the level thinning to work without perceptible effect.

Chris Lilley · Jun 13, 2008

Emil Martinec said:
I have seen two explanations of the gamma inherent in the definition of sRGB and other color spaces -- that it is related to (a) the historical legacy of CRT electron gun nonlinear response,

That is in fact the correct explanation of the sRGB gamma response. Although the exponent is 2.4, the short linear portion near the origin means the overasll result is best fitted by a 2.2 power law.

Emil Martinec said:
and/or (b) nonlinearities of human perception.

Which goes the dame way but with a different exponent (3 rather than 2.2

Emil Martinec said:
Now granted, evolution is likely to find a way to most efficiently encode a visual signal, so it would make sense that the encoding it derived would be aware of photon shot noise and arrive at a perceptual gamma that is close to the one that optimizes digital tonal compression. The first explanation seems like a random historical accident (though perhaps a happy one). Perhaps you know the origins better?

I happened to be present when sRGB was propsed as a colour space for the Web, so having spoken to the authors I can confirm that alignment to the then-ubiquitous CRT monitor response was the primary factor; the fact that its more perceptually uniform than a linear response is a plus, but sRGB does not aim for perceptual uniformity.

Emil Martinec said:
Typical gammas are 1.8 and 2.2,

The typical gamma of a CRT is 2.2, if you use a single power law. The 1.8 (as Charles Poynton first documented, as far as I know) comes from the 0.45 lookup table in quickdraw.

Emil Martinec said:
while photon shot noise is always a 2.000000... power law, so there is a quantitative distinction. There is also the offset accounting for the linear portion of the transform, which is different than the way sRGB for instance works at low signal. Those quibbles aside, the general structure of the color space transforms is indeed quite similar.

sRGB has a linear portion near the origin, so I am not sure what you mean there. It gets this from Rec.709 video, I believe, and is to do with invertibility and ease of implementation in analogue hardware by limiting the slope (gain) of the transfer curve.

Doug Kerr · Jun 13, 2008

Hi, Chris,

The 1.8 (as Charles Poynton first documented, as far as I know) comes from the 0.45 lookup table in quickdraw.

I'm not exactly sure what you mean by the "0.45 lookup table", but if you mean that the exponent for the "encoding" (compression) is 0.45, then the corresponding exponent for the "decoding" (expansion, or "gamma emulation") would be nearly 2.2, not 1.8, or if you mean that the exponent for the decoding is 1/0.45, then that is nearly 2.2.

Best regards,

Doug

John Sheehy · Jun 13, 2008

Chris Lilley said:
sRGB has a linear portion near the origin, so I am not sure what you mean there. It gets this from Rec.709 video, I believe, and is to do with invertibility and ease of implementation in analogue hardware by limiting the slope (gain) of the transfer curve.

The noise in the deep shadows causes a gamma effect all its own. When I do extreme ISO pushes or pull up shadows from RAW data, I often find that no gamma adjustment is necessary at all, because the noise raises the mean of the true signal in the very deepest shadows where is noise is clipped at the blackpoint, and perceptually even in higher zones, as bright isolated dots have more of a perceptual averaging effects than blacker dots.

Chris Lilley · Jun 13, 2008

Doug_Kerr said:
Hi, Chris,

I'm not exactly sure what you mean by the "0.45 lookup table", but if you mean that the exponent for the "encoding" (compression) is 0.45, then the corresponding exponent for the "decoding" (expansion, or "gamma emulation") would be nearly 2.2, not 1.8, or if you mean that the exponent for the decoding is 1/0.45, then that is nearly 2.2.

Best regards,

Doug

Hi Doug,

I'm sorry, that will teach me to check rather than quote from memory. Quickdraw has a 1/1.45 lookup table (not 0.45 as I stated above). Combined with a theoretical CRT gamma of 2.5, that gives the oft quoted "1.8 Mac gamma".

Poynot, Charles (1996) "Gamma on the Apple Macintosh".
http://www.poynton.com/PDFs/Mac_gamma.pdf

Chris Lilley · Jun 13, 2008

John Sheehy said:
The noise in the deep shadows causes a gamma effect all its own. When I do extreme ISO pushes or pull up shadows from RAW data, I often find that no gamma adjustment is necessary at all, because the noise raises the mean of the true signal in the very deepest shadows where is noise is clipped at the blackpoint, and perceptually even in higher zones, as bright isolated dots have more of a perceptual averaging effects than blacker dots.

I see - yes. Thanks for the explanation.

I recall reading somewhere that some cameras encode the signal with a small amount of footroom (so zero encodes as a small positive value) while others do not; and that the cameras which use footroom in this way have a symmetrical noise distribution while others do not, due to clipping the distribution at the zero black point. Sorry if thats awfully vague - does it ring bells for anyone?

Doug Kerr · Jun 13, 2008

Hi, Chris,

Chris Lilley said:
I see - yes. Thanks for the explanation.

I recall reading somewhere that some cameras encode the signal with a small amount of footroom (so zero encodes as a small positive value) while others do not; and that the cameras which use footroom in this way have a symmetrical noise distribution while others do not, due to clipping the distribution at the zero black point. Sorry if thats awfully vague - does it ring bells for anyone?

Yes, all quite so.

Interesting enough, the ISO standard for determining ISO sensitivity on a "noise" basis utilizes a special trick to finesse the "issue" that at a photometric exposure where the signal-to-noise ratio would be unity, the clipping of the data numbers at the zero floor (if there is no footroom, which is assumed in the doctrine of the standards) would cause the noise to be significantly "underobserved".

The Canon EOS cameras do have a footroom offset in the raw data format.

Is this all fun or what!

Best regards,

Doug

John Sheehy · Jun 14, 2008

Chris Lilley said:
I see - yes. Thanks for the explanation.

I recall reading somewhere that some cameras encode the signal with a small amount of footroom (so zero encodes as a small positive value) while others do not; and that the cameras which use footroom in this way have a symmetrical noise distribution while others do not, due to clipping the distribution at the zero black point. Sorry if thats awfully vague - does it ring bells for anyone?

That describes Canon RAW data. Canon's black is well above zero in their RAW data. This can be very useful, but few, if any regular RAW converters take advantage of this. Because the mean of the RAW data is always linear, conversion styles are possible where there is no color cast in deep shadows, even after white balancing. Say the WB is tungsten; the blue channel is multiplied by 3x to 4x to achieve WB. If you black-clip the RAW data first, the noise in the blue channel of true black has a mean well above black. If you just subtract the offset so that black is 0, and there are negative values unclipped, then the mean of the blue channel is still black, averaged over many pixels. This can be very useful in a conversion style which keeps the deepest shadows fully saturated, but low-pass filters the hues (or downsamples the output before black-clipping). Most, if not all RAW converters clip the RAW data for the assumed blackpoint before doing anything else, because of the myth that "negative light" is meaningless, or that anything below black is "just noise". It *would* be meaningless, if the only noises were shot noise and dark current noise. With read noise, which is symmetrical about a mean, real signal can be dragged below black and is lost forever if clipped first.

None of this applies to high-key photos, of course. If nothing is close to black then the issues are non-issues.

Chris Lilley · Jun 14, 2008

John Sheehy said:
That describes Canon RAW data. Canon's black is well above zero in their RAW data.

Would it be accurate to extrapolate from your statement that Nikon raw data is not like that?

John Sheehy said:
With read noise, which is symmetrical about a mean, real signal can be dragged below black and is lost forever if clipped first.

None of this applies to high-key photos, of course. If nothing is close to black then the issues are non-issues.

If the median values were taken, per pixel, over a sufficient number of black frames, would it be possible to isolate out a 'read noise' frame which could be subtracted from raw files to reduce that noise source? Is it regular enough to do that? And would it matter if the read noise distribution was clipped to be asymetrically distributed?

John Sheehy · Jun 14, 2008

Chris Lilley said:
Would it be accurate to extrapolate from your statement that Nikon raw data is not like that?

Nikons are clipped at or just above black, at least all the recent models are.

If the median values were taken, per pixel, over a sufficient number of black frames, would it be possible to isolate out a 'read noise' frame which could be subtracted from raw files to reduce that noise source? Is it regular enough to do that? And would it matter if the read noise distribution was clipped to be asymetrically distributed?

The only noises you can cancel that way are ones that repeat every frame. Just subtracting one black frame from an exposure frame will actually increase read noise by 41%, so you really need to average many black frames together to get one that won't increase random read noise.

Chris Lilley · Jun 14, 2008

John Sheehy said:
The only noises you can cancel that way are ones that repeat every frame. Just subtracting one black frame from an exposure frame will actually increase read noise by 41%, so you really need to average many black frames together to get one that won't increase random read noise.

Yes, thats why I said

If the median values were taken, per pixel, over a sufficient number of black frames

(note median not mean, but apart from that yes, averaging many frames)

NEF "lossy" compression is clever

Emil Martinec

New member

Doug Kerr

Well-known member

Emil Martinec

New member

Doug Kerr

Well-known member

Dierk Haasis

pro member

Emil Martinec

New member

Chris Lilley

New member

Doug Kerr

Well-known member

John Sheehy

New member

Chris Lilley

New member

Chris Lilley

New member

Doug Kerr

Well-known member

John Sheehy

New member

Chris Lilley

New member

John Sheehy

New member

Chris Lilley

New member