Doug Kerr
Well-known member
Several members have noticed that Photoshop encodes the copyright symbol ("©") in all metadata (Exif, IPTC IIM, and IPTC XMP) as the sequence of characters C2h A9h. [In Windows Code Page 1252, the usual "extended ASCII" encoding used inside Windows, that would be interpreted as "©".]
Here's the story.
IPTC XMP metadata
IPTC XMP metadata is encoded in UTF-8 encoding. In UTF-8, the character "©" is not encoded as the single byte A9h (as it is in Windows Code Page 1252). Rather, it is encoded as the two byte sequence C2h A9h.[Only ASCII characters get single-byte representations in UTF-8.]
Thus, the encoding used by Photoshop for "©" in IPTC XMP metadata (C2h A9h) is appropriate.
Any XMP interpreting program should render this on screen as "©".
IPTC IIM metadata
IPTC IIM metadata ("legacy" IPTC metadata) can use several encodings. The encoding used should be indicated by a data item, CodedCharacterSet.
IPTC IIM metadata generated by Photoshop indicates the encoding as UTF-8. Thus, the encoding used by Photoshop for "©" in IPTC metadata (C2h A9h) is appropriate.
Fully-observant IPTC IIM metadata XMP interpreting programs should render this on screen as "©".
Exif metadata
According to the Exif specification, the Exif metadata item Copyright should be encoded in ASCII. The character "©" does not exist in the ASCII character set.
Sometimes, to deal with this, characters beyond ASCII but which are included in Windows Code Page 1252 (such as "©") are encoded in Exif metadata in Windows Code Page 1252 form (A9h). Many Exif metadata-reading applications assume that text strings are in Windows Code Page 1252. Others apparently are prepared to recognize whether characters beyond the ASCII character set are encoded in UTF-8 form or Windows Code Page 1252 form.
Photoshop encodes the character "©" in UTF-8 form in Exif metadata. This cannot be said to either correct nor incorrect under the Exif specification given that the character "©" is not really allowed in Exif metadata. [But see my recommendation below.]
• Receiving Exif metadata applications that strictly follow the Exif specification will not display the sequence C2h A9h at all (those code values do not represent ASCII characters). [A substitute character - perhaps "?" - may be displayed for each byte.]
• Receiving Exif metadata applications that assume UTF-8 encoding of characters beyond the ASCII character set will display the sequence C2h A9h as "©". [This is what has been reported as an anomaly.]
• Receiving Exif metadata applications that are prepared to recognize whether Windows Code Page 1252 or UTF-8 encoding is being used for text strings will display the sequence C2h A9h as "©".
Conclusion
It is my opinion that it is inappropriate for Photoshop to encode the character "©" into Exif metadata in UTF-8 encoding. It would be more prudent for it to encode the character "©" into Exif metadata in Windows Code Page 1252 form (as the byte A9h).
It is my opinion that it is perfectly appropriate for Photoshop to encode the character "©" into IPTC IIM and IPTC XMP metadata in UTF-8 form (as it does now).
Here's the story.
IPTC XMP metadata
IPTC XMP metadata is encoded in UTF-8 encoding. In UTF-8, the character "©" is not encoded as the single byte A9h (as it is in Windows Code Page 1252). Rather, it is encoded as the two byte sequence C2h A9h.[Only ASCII characters get single-byte representations in UTF-8.]
Thus, the encoding used by Photoshop for "©" in IPTC XMP metadata (C2h A9h) is appropriate.
Any XMP interpreting program should render this on screen as "©".
IPTC IIM metadata
IPTC IIM metadata ("legacy" IPTC metadata) can use several encodings. The encoding used should be indicated by a data item, CodedCharacterSet.
IPTC IIM metadata generated by Photoshop indicates the encoding as UTF-8. Thus, the encoding used by Photoshop for "©" in IPTC metadata (C2h A9h) is appropriate.
Fully-observant IPTC IIM metadata XMP interpreting programs should render this on screen as "©".
Exif metadata
According to the Exif specification, the Exif metadata item Copyright should be encoded in ASCII. The character "©" does not exist in the ASCII character set.
Sometimes, to deal with this, characters beyond ASCII but which are included in Windows Code Page 1252 (such as "©") are encoded in Exif metadata in Windows Code Page 1252 form (A9h). Many Exif metadata-reading applications assume that text strings are in Windows Code Page 1252. Others apparently are prepared to recognize whether characters beyond the ASCII character set are encoded in UTF-8 form or Windows Code Page 1252 form.
Photoshop encodes the character "©" in UTF-8 form in Exif metadata. This cannot be said to either correct nor incorrect under the Exif specification given that the character "©" is not really allowed in Exif metadata. [But see my recommendation below.]
• Receiving Exif metadata applications that strictly follow the Exif specification will not display the sequence C2h A9h at all (those code values do not represent ASCII characters). [A substitute character - perhaps "?" - may be displayed for each byte.]
• Receiving Exif metadata applications that assume UTF-8 encoding of characters beyond the ASCII character set will display the sequence C2h A9h as "©". [This is what has been reported as an anomaly.]
• Receiving Exif metadata applications that are prepared to recognize whether Windows Code Page 1252 or UTF-8 encoding is being used for text strings will display the sequence C2h A9h as "©".
Conclusion
It is my opinion that it is inappropriate for Photoshop to encode the character "©" into Exif metadata in UTF-8 encoding. It would be more prudent for it to encode the character "©" into Exif metadata in Windows Code Page 1252 form (as the byte A9h).
It is my opinion that it is perfectly appropriate for Photoshop to encode the character "©" into IPTC IIM and IPTC XMP metadata in UTF-8 form (as it does now).