Re: [PATCH] Try URI quoting for embedded TAB and LF in pathnames

From: Paul Eggert <eggert@CS.UCLA.EDU>
Date: 2005-10-14 10:16:57
Linus Torvalds <torvalds@osdl.org> writes:

> So I repeat: 
>  - escape as little as possible
>  - make the _viewer_ decide how to view it.

Under my most recent proposal, the only bytes one must escape are ",
\, and LF.  Doesn't that satisfy these two main criteria?


> If GNU emacs does locale translations rather than just do a binary
> transfer of the data, then that's a sign that GNU emacs is being
> really stupid.

Perhaps so, but it has a lot of company.  I have even worse problems
with Mozilla Thunderbird.  And as we observed, Pine also has problems
sending properly-formatted email containing arbitrary binary data.

I suspect the vast majority of email clients will screw up in
relatively common cases involving unusual characters in file names.
Using attachments avoids many of the problems, but lots of patches are
emailed inline and I'd rather not force people to use attachments to
send diffs.


> I find that email is very robust - it's basically 8-bit clean. No 
> character encoding, no crap. Just a byte stream. It really _is_ the most 
> reliable format.

Hmm.  To test that theory, I just now sent plain-text email to myself,
containing a carriage-return (CR) byte in the middle of a line.

The CR byte was transliterated into a LF.  Ooops.

This was the very first (and only) test I tried, which isn't a good
sign for reliability.  If you're curious, I tracked the problem down
to Exim, a popular mail transfer agent that is running on my personal
Debian GNU/Linux (stable) box.  As to why Exim munges email, please see
<http://www.exim.org/exim-html-4.40/doc/html/spec_44.html#SECT44.1>.
(And I didn't know about the Exim glitch before trying my test.
I'm normally a Sendmail man myself.)

More generally, I suspect inline patches with weird bytes will suffer
greatly from encoding and recoding by mail agents.


> What matters is not what it looks like, but what it _saves_ as. If
> you save the email message, it should come out as the same reliable
> 8-bit byte stream

Unfortunately this isn't true for Emacs, and I suspect other mailers
will have similar problems.  For example, with Emacs I can easily save
either the exact byte-for-byte message body that my mail transfer
agent gave me; or I can have Emacs decode the message into its
constituent characters, reencode the result as UTF-8, and put that
into a file.  In neither case, though, am I saving the original byte
stream that you presented to your mail user agent.  Even if I save the
byte-for-byte message body, it is often in quoted-printable format so
I'll have to decode strings like "=EF" to recover the original bytes.
This is doable, yes, but it's inconvenient in practice, at least with
the mail user agents I'm familiar with.  And even if I do it, I don't
necessarily have the same byte stream you gave your mail user agent; I
merely have the byte stream that your MUA gave to your MTA, and these
may not be the same thing (they certainly aren't always the same thing
with Emacs).


The simplest fix for git may be to say "Don't use inline patches; use
attachments if you must email anything with strange characters in it."
That's fine.  But I prefer a format that also allows GNU diff, if it
chooses, to generate output that resists common inline-email botches.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Fri Oct 14 10:18:39 2005

This archive was generated by hypermail 2.1.8 : 2005-10-14 10:18:42 EST