locked Re: Group transfers


On Sun, Jan 4, 2015 at 1:41 PM, Shal Farley <shal@...> wrote:

The lesson for Groups.io is that messages posted by email have a specified character set, and that should be preserved in messages passed through. But as displayed in the archive, and as built into digests, careful handling is required. Perhaps the best approach is to convert the text to UTF-8 encoding, but that has its own issues.

We do pay attention to character set encodings, along with content-transfer-encodings, and we do preserve the encodings. All Groups.io web pages are UTF-8, so when displaying archives, we convert whatever encoding the original message is in to UTF-8. When a message is edited through the web site, we convert the UTF-8 encoded edited message back to the original encoding of the email (along with preserving the original content-transfer-encoding).

Every message does get converted to UTF-8 when we first receive it, so that we can look for and remove any Groups.io footers in replies. But we convert it right back to the original encoding before sending it out and saving it in the database.

All of that adds a "fun" layer of complexity to things, but I'm unaware of any bugs with how we handle things. We take care to ignore any badly encoded bytes, which we do unfortunately encounter; we try to decode as much as possible.

I haven't had a chance yet to look at what happened with Marlin's email. 


Join main@beta.groups.io to automatically receive all group messages.