• Please use real names.

    Greetings to all who have registered to OPF and those guests taking a look around. Please use real names. Registrations with fictitious names will not be processed. REAL NAMES ONLY will be processed

    Firstname Lastname

    Register

    We are a courteous and supportive community. No need to hide behind an alia. If you have a genuine need for privacy/secrecy then let me know!
  • Welcome to the new site. Here's a thread about the update where you can post your feedback, ask questions or spot those nasty bugs!

Canon-log - a new compression algorithm

Doug Kerr

Well-known member
Canon has (have, for you Brits) has introduced a new "compression" algorithm, called "Canon-log", as an alternative to the familiar "gamma precompensation" algorithm, in connection with their new line of professional digital cinema cameras. I suspect that before long, something like it will come into use in the realm of digital still photography.

I will give a brief description of it here, but first, I will give some extensive background, after which the description of the new algorithm will be almost trivial.

First, to avert misunderstanding, I need to point out that the term "compression" is used in two wholly different ways in the field of signal processing (including as it pertains to the digital representation of an image.

1. We take a suite of data, perhaps representing an audio waveform or a digital image, comprising a certain number of bits, and recode it into a form having a smaller number of bits, the objective being economy of storage or transmission, in such a way that at a later time or distant point the original suite of data can be reconstructed from the "compressed" data, either precisely or at least such that the reconstructed suite of data will describe (for example) an audio waveform or image that is, to the human listener/viewer, essentially indistinguishable from the original waveform or image.

2. We take a value having a certain range (in terms of its precision) and recode it so that it occupies a smaller range but still holds its original precision on a relative basis. (By that latter I mean, for example, what we have when we say of a certain instrument, "its precision is 2% of the reading".) The objective is that fewer bits for "sample" of the value are required than if the values were encoded directly, while still maintaining a (relative) precision consistent with the precision of the original value.

Our interest here is wholly with meaning 2.

One area in which compression of this sort came into use is in the digital representation of speech, first introduced into telephone transmission. Here, the algorithm used is a logarithmic one, which we can simplistically state as :

y = log(1+kx)/log(1+k)

where x is the input (the actual value of the variable) and y is the output (the value as stored or transmitted). The constant k is a parameter that controls the specific "curve" represented by the equation.

This simplified equation is only valid for positive values of x; in the real world, there are provisions in the equation to make it work for both positive and negative values of x.

If we encode y to a constant precision (perhaps in a fixed number of bits), then the precision of the x that it represents will be constant on a relative basis.

This is nicely consistent with human perception of sound or light, which is essentially logarithmic. That is, the smallest difference in value that the eye can distinguish is essentially proportional to the value.

Thus, the logarithmic recoding algorithm gives us the "best bang for the bit" perceptually.

By the way, in the American and international versions of this algorithm, the parameter I show as k is actually designated "mu" and "A", respectively. There are other subtle differences in the details of the two algorithms. They are usually distinguished as the "mu law" and "A-law" algorithms, drawing on the symbol arbitrarily used for the parameter in each.

But compression of a value in this sense was actually introduced earlier, in the original US scheme for the encoding of video waveform on an analog basis. But the main driver for doing was not as we suggested in connection with the encoding of audio waveforms. Rather it was the decision to precompensate, in the transmitted signal, for the nonlinear response of the display mechanism, then always a cathode-ray tube (CRT). Doing so eliminated the need for a nonlinear transfer circuit (a slightly complicated thing to make in those days) in the television receiver (where a low cost was an imperative for wide acceptance of the new system).

The response of the CRT was essentially described this way:

L = e^<gamma>

where e was the control grid voltage, L was the resulting luminance on the screen, and <gamma> (the Greek letter) was a constant. This general form is what we call a "power function", since we take a fixed power of the input to determine the output.

As a result, the transform used at the transmitter (essentially the inverse of the equation just above) came to be described as the "gamma precompensation function".

Now, did this function have the same beneficial property as the "logarithmic" transform used in digital audio, in terms of giving us the biggest perceptual bang for what we spend for precision of the transmitted signal.? Not exactly. The perception of the viewer follows (approximately) a logarithmic, not power, function. But we do get the benefit somewhat, at least compared to if we were to transmit the luminance signal in a linear way.

Now, when digital photography emerged, and standards were created for the digital representation of the image, it was concluded that it would be desirable to use a nonlinear transform of the camera's luminance "variable" in the recoded image data. Would this be logarithmic, to best match human perception? No, it would be a power function, to closely follow the TV signal convention, and - to allow the received signal to be directly applied to a CRT display without requiring any non-linear transform processing at the "receiving" end.

But the Canon cinema engineers recognized that a logarithmic, not power function, algorithm for "compressing" the signal values would be more beneficial in getting the best perceptual quality given a certain bit "quota" and considering the effects of noise as well. So they crafted a new compression algorithm, "Canon-log", based on a logarithmic compression function (There is now a new and improved version, Canon-log 2).

Those interested in the details of the algorithm, and a thorough discussion of the rationale for using it in Canon's professional cinema cameras, is given in the Canon white paper, "Canon-Log Cine Optoelectronic Transfer Function", available here:

https://www.google.com/url?sa=t&rct...FikLJ0rKFdNIH42ZQ&sig2=fKQv_W42frvJAOkjo-e7yA

A brief discussion is available here:

http://learn.usa.canon.com/resources/articles/2011/understand_log_gamma.shtml

Best regards,

Doug
 

Jerome Marot

Well-known member
The response of the CRT was essentially described this way:

L = e^<gamma>

where e was the control grid voltage, L was the resulting luminance on the screen, and <gamma> (the Greek letter) was a constant. This general form is what we call a "power function", since we take a fixed power of the input to determine the output.

As a result, the transform used at the transmitter (essentially the inverse of the equation just above) came to be described as the "gamma precompensation function".

Now, did this function have the same beneficial property as the "logarithmic" transform used in digital audio, in terms of giving us the biggest perceptual bang for what we spend for precision of the transmitted signal.? Not exactly.

Not exactly? I thought that ln( exp(x) ) = x.
 

Doug Kerr

Well-known member
Hi, Jerome,

Not exactly? I thought that ln( exp(x) ) = x.

That is so. Not sure what that has to do with what I described.

The response of the CRT is approximated as (to now use "v" rather than "e" as the control grid voltage, L' as the generated luminance and k and c as constant parameters):

L' = kv^c​

The precompensation function at the transmitter can be stated as (now using "V' as the transmitted signal, which becomes "mv" at the receiver, L for the scene luminance being encoded, and f and c as constant parameters, c being the same as above):

V = fE^(1/c)​

Thus, end-to-end:

L' = fkmL

That is, the generated luminance is proportional to the encoded scene luminance

None of the functions above are logarithmic, or exponential, functions (my point, in fact). They are power functions.

Perhaps my error was in using "e" for the control grid voltage (valid notation, but perhaps too evocative of the Naperian base).

Thanks for your interest.

Best regards,

Doug
 

Jerome Marot

Well-known member
Indeed I was mistaken. The idea that human vision follows a logarithmic scale is disputed, however, and it is argued that on a wider range it is better approached by a power law.

I also read the Canon white paper. It seems that the video standards recommends a gamma encoded curve (i.e. a power law), see ITU recommendation 709, but that some tweaks of the highlights were common (and indeed I have a video camera with the "Knee adjustment" setting). Canon's approach is to have a logarithm curve fit that part of the curve, it is only applied to the highest bits of the data range and maps highlight levels which would normally be out of range to the highest available bits.
 

Doug Kerr

Well-known member
Hi, Jerome,

Indeed I was mistaken. The idea that human vision follows a logarithmic scale is disputed, however, and it is argued that on a wider range it is better approached by a power law.

Indeed, and in my papers I actually mention that, and should probably have done so in this note.

I also read the Canon white paper. It seems that the video standards recommends a gamma encoded curve (i.e. a power law), see ITU recommendation 709, but that some tweaks of the highlights were common (and indeed I have a video camera with the "Knee adjustment" setting). Canon's approach is to have a logarithm curve fit that part of the curve, it is only applied to the highest bits of the data range and maps highlight levels which would normally be out of range to the highest available bits.

You have actually looked into that paper more than I have so far! Thanks for that insight into the details.

Best regards,

Doug
 

Doug Kerr

Well-known member
The actual transfer characteristic of the original Canon-log system is (I have truncated the values of the constants for conciseness):

y = 0.529 log10(10.16x +1) + 0.073​
where x is the linear-light output of the sensor and y is the "video output" value.

Note that this function is "logarithmic all the way". There is, for example, no portion of the range where the function is a power law, or is linear.

x can run from -0.045 to 8.01.
( I will later relate values of x to various reference photometric exposure values. This involves the magical "18% reflectance", and I need to give the matter further careful thought before I try to describe it.)​

y can thus run from -0.068 to 1.09, which can be thought of as from -6.8 to 109 on the the IRE video scale, which (simplistically) runs from 0 for black to 100 for maximum white.

This is the traditional vertical scale of, for example, video waveform monitor oscilloscopes.
In the overall Canon system, if y is captured in 10-bit form (both 10- and 12-but representations are used), the range of y is from 4 through 1016. There is an offset; a y value of 0 is encoded as a digital value of 64.
[Blue added in an edit.]​

Now off to prepare for breakfast, and then to prepare breakfast. Carla is in Durango, Colorado, at a Red Hat Society shindig, and so I am "batching it".

Best regards,

Doug
 

Doug Kerr

Well-known member
Well, breakfast went quite nicely. It was not a nice as Carla does it, but still was tasty and nourishing.

Now back to Canon-log.

The discussion of certain issues is hurt (and understanding impeded) by some carelessness in the use of terminology, especially "dynamic range". In some places it is used as the label on the scale of what is called "x" in the transform equation, essentially the sensor output. Other places it is used as a synonym for "exposure latitude".

The "native sensitivity" of the camera (consistent with the video production context of these cameras, that described as with the relative gain of the analog amplifiers being 0 dB) is ISO 640. Changes from that are described in terms of a change of gain, in dB (although of course for each such change there is a resulting implication in terms of ISO speed).

The "photometric trail" for this camera system is predicated on a white object of reflectance 89.9% (often stated as "90%" in charts and the like for conciseness). At the "native" sensitivity, with the aperture and shutter speed set for "optimum exposure" given the scene luminance, the sensor output, y, for that object would be 1.0. For the infamous "18% reflectance" object, the sensor output, y, would be 0.20 (there is a bit of rounding done for convenience).

In terms of dynamic range, as we understand the term, at native sensitivity, the maximum luminance at which the sensor can resolve detail is said to correspond to y = 6, presumably resulting from 30 times the illuminance of the "18% reflectance" test object. That is described as a 4.9 stop exposure latitude (actually, said to be "4.9 T-stops", given the arena in which we are working).

For this same exposure situation, again at native sensitivity the camera is said to be able to resolve "shadow detail" at a luminance of 7.1 stops below that of the "18% gray" object. The signal-to-noise ratio there (for luma) would be 54 dB.

We might then be tempted to say that the dynamic range of the camera (as we often use the term) is 12 stops (4.9 + 7.1).

All very interesting.

Best regards,

Doug
 

Doug Kerr

Well-known member
The Canon EOS C300 II and Canon Log2

The Canon EOD C300 II professional cinema camera offers a new "compression" algorithm, Canon Log2 (sometimes written "Canon Log 2").

It differs somewhat from the original Canon-log transfer function. I have not yet seen a mathematical expression of the new curve. But overall it seems to be generally at least quasi-logarithmic.

The major differences in the shape of the new curve are said to occur at the lower levels (below an output of 5 IRE units - recall that 100 IRE units is essentially "full white"), in order to better resolve detail at lower light levels, a matter that becomes of importance given the greater "dynamic range" (in the meaning familiar to us).

The labeling of some of the curves in the white paper is "peculiar", so it takes a while to figure out just what the author means. Perhaps his red pencil was out of town on Red Hat Society business or some such.​

For this new camera, the "native" sensitivity is ISO 800.

The exposure latitude is said to be 6.3 stops above the "18% reflectance" reference luminance and 8.7 stops below, for an overall "dynamic range" of 15 stops.

Error corrected in blue.​

Best regards,

Doug
 

Doug Kerr

Well-known member
In my discussion of the Canon Log2 compression curve, I said, essentially quoting from the White Paper on the C300 II, that the new curve especially differed from the old one in the region below IRE 5. That struck me as an odd way to describe it.

Considering the mislabeling of the axes on two curves in the paper , I now suspect that what was meant was that the changes were most significant in the region below x = 5% (the x scale is always given in percent in the papers, even though when I cited points on it I gave the actual number, as it would go into the equation).

As Carla says, "Once you get screwed up, it is hard to get unscrewed."

Best regards,

Doug
 

Doug Kerr

Well-known member
Hi, Asher,
......does this mean that we can sample low accumulations of photons with a higher "accuracy" now?

Well, the precision at which the Canon sensor chain delivers its output is 14 bits. As to accuracy, I have no idea, expect to the extent that the accuracy of any given measurement is randomly damaged by noise.

For the C300 II, at an ISO sensitivity of ISO 102,400, at the bottom of the "dynamic range" (8.7 stops below "reference gray"), the s/n ratio is said to be 47 dB, 20 dB worse that what it would be at this relative exposure point at the "native" sensitivity of ISO 800 (67 dB). That is 10 times greater noise amplitude than at the "native" sensitivity.

Best regards,

Doug
 

Asher Kelman

OPF Owner/Editor-in-Chief
Hi, Asher,


Well, the precision at which the Canon sensor chain delivers its output is 14 bits. As to accuracy, I have no idea, expect to the extent that the accuracy of any given measurement is randomly damaged by noise.

For the C300 II, at an ISO sensitivity of ISO 102,400, at the bottom of the "dynamic range" (8.7 stops below "reference gray"), the s/n ratio is said to be 47 dB, 20 dB worse that what it would be at this relative exposure point at the "native" sensitivity of ISO 800 (67 dB). That is 10 times greater noise amplitude than at the "native" sensitivity.

Doug,

........and will this be better than had Canon used their previous algorithm?

Asher
 

Doug Kerr

Well-known member
Hi, Asher,

Doug,

........and will this be better than had Canon used their previous algorithm?

Well, presumably something will be better! But I haven't yet got my head around where the advantage shows up. (So far I have been concentrating on the "what" rather than the "why".) Maybe later tonight (have to get ready now to go to the theater for a rehearsal).

Part of it has to do with the whole very elaborate chain of post-processing (and ultimately, distribution) in the serious cinema business. But I don't yet understand how.

Best regards,

Doug
 

Doug Kerr

Well-known member
Just so we get an idea what sort of machine we are dealing with here, this is a more-or-less full-blown EOS C300 II:

EOS-C300-Mark-II-FSL-24-105-f4L-LCD-Monitor-Up-635x403.jpg

This is not something you would hand to a stranger in an airport concourse and say, "Excuse me. but could you take a shot of me and my wife?"

Best regards,

Doug
 
Top