From Newsgroup: alt.comp.os.windows-11
Several kind posters have helped me track down why my replies sometimes get corrupted when responding to posts from Winston, so I wanted to kindly summarize the
findings so the whole picture is clear, which would be helpful to those who care.
While anything I say below can be wrong, it's what I "think" is happening...
1. Winston types his display name using Windows Alt-codes.
These produce raw Windows-1252 bytes:
A1 = ¡
F1 = ñ
A7 = §
B1 = ±
A4 = ¤
His full display name is literally:
...w¡ñ§±¤ñ
2. These bytes are legal in Windows-1252, but not legal in a Usenet header.
Usenet headers must be 7-bit ASCII unless they use a MIME encoded-word.
Winston’s header contains raw 8-bit bytes, not ASCII and not UTF-8.
3. Thunderbird displays those bytes as-is.
Thunderbird does not sanitize nor re-encode the header on send.
In the message viewer, Thunderbird shows:
...w¡ñ§±¤ñ <
winstonmvp@gmail.com>
When viewing the raw source, Thunderbird shows a MIME-encoded version,
but that is Thunderbird's internal representation, not what was sent.
4. My own workflow is strict ASCII.
I enforce 7-bit output. When I quote Winston, his raw 8-bit bytes get
copied into my attribution line. That can create a mojibake mismatch
between declared charsets and the actual bytes in my outgoing post.
5. Some NNTP servers may (apparently) try to 'repair' that mismatch.
Different nntp servers may handle illegal bytes differently. Some
may rewrite the charset, some might re-encode the body, and some
may simply corrupt the article into mojibake scrambled eggs.
I think that is why my replies sometimes get mangled on the way out.
6. By experiment, ASCII mode works better for me than UTF-8 mode.
When I declare US-ASCII and strip all non-ASCII before posting, the
article is internally consistent and servers seem to not interfere.
When I declare UTF-8, servers appear to try to validate the bytes to
fix what is not valid UTF-8, which may lead to unpredictable results.
Since this is a component of the perils of writing your own newsreader,
I am adding a normalization step in my shortcuts.xml so that any non-ASCII bytes in the attribution line are removed or replaced before posting. This keeps my outgoing articles 7-bit clean and prevents NNTP servers from
rewriting them. Modern newsreaders already do this automatically, so this likely perhaps mainly only affects older strict-ASCII workflows like mine.
Thanks to everyone who helped test this from the recipient's side. The
problem is now better understood & the workaround on my end is ongoing.
--
There are 2 types of posters on Usenet, only half of which can add value.
--- Synchronet 3.21d-Linux NewsLink 1.2