locked Re: Group transfers


 

Marlin,

As I see it now, what I did was a double space.

Just a double space between words, and the next character was a
quotation mark.
It is fairly common for rich-text systems, such as HTML, to turn successive spaces into non-breaking spaces. Such systems typically collapse multiple spaces into a single space, but the conversion to non-breaking spaces preserves the original count of spaces.
http://en.wikipedia.org/wiki/Non-breaking_space

The unfortunate thing, in this case, is that non-breaking space isn't encoded in the base 7-bit ASCII set that is common to most character sets. The result is that it gets mis-interpreted when a system uses a different character set than the original.

The lesson for Groups.io is that messages posted by email have a specified character set, and that should be preserved in messages passed through. But as displayed in the archive, and as built into digests, careful handling is required. Perhaps the best approach is to convert the text to UTF-8 encoding, but that has its own issues.

My recommendation for the Archives would be to preserve the original message text unaltered, and capture the character set encoding as metadata. Then convert to UTF-8 on the fly for display. If a message gets edited update the metadata to reflect the character set used in the editing - presumably UTF-8. Of course that means the metadata must be versioned along with the text.

It really had me all kinds of upset, enough so that I shut the computer
down and walked away.
Nothing to be upset about, just computers being incompatible as standards change over time.

-- Shal

Join main@beta.groups.io to automatically receive all group messages.