Re: [PATCH] Try URI quoting for embedded TAB and LF in pathnames

From: Junio C Hamano <junkio@cox.net>
Date: 2005-10-11 17:37:57
Paul Eggert <eggert@CS.UCLA.EDU> writes:

> The convention I had been thinking of adding is to have GNU diff
> use shell-quoting style, e.g.,
>
> 'three
> o'\''clock'
>
> to represent a file name with a newline and an apostrophe in it.
> This sort of file name can be cut and pasted into the shell.
> The quoting could be used with any file name containing a
> troublesome character.
>
> Perhaps another quoting style would be better.

A patch header (both "diff --git" line and ---/+++ lines) I've
been considering, and have in the proposed updates branch, looks
something like this:

    diff --git a/def\nghi/pqr b/dee/pqr
    similarity index 72%
    rename from def\nghi/pqr
    rename to dee/pqr
    index 9ee055c..243fbbc 100644
    --- a/def\nghi/pqr
    +++ b/dee/pqr
    @@ -1 +1,3 @@
     Fri Oct  7 23:19:04 PDT 2005
    +foo
    +foo

If we can keep things on one line, that would help parsing the
stuff very simple, but more importantly, it is easier to see
what's happening.  The pattern is the same whether you have
funny pathnames or not, and that helps the human consumer.

Adjusting the "git diff" output to the style the GNU diff with
your shell quoting style would produce something like this:

    diff --git 'a/def
    ghi/pqr' b/dee/pqr
    similarity index 72%
    rename from 'def
    ghi/pqr'
    rename to dee/pqr
    index 9ee055c..243fbbc 100644
    --- 'a/def
    ghi/pqr'
    +++ b/dee/pqr
    @@ -1 +1,3 @@
     Fri Oct  7 23:19:04 PDT 2005
    +foo
    +foo

Which, while it is possible to make tools parse them, is very
distracting for humans to read and review.  Yes, LF is quoted,
but it still breaks the line, disrupting the pattern we are used
to see.  If you are talking about a funny file, whose name is
"a\ndiff --git a/b/c", your diff would look like this:

    diff --git 'a/
    diff --git a/b/c' 'b/
    diff --git a/b/c'
    index 9ee055c..243fbbc 100644
    --- 'a/
    diff --git a/b/c'
    +++ 'b/
    diff --git a/b/c'
    @@ -1 +1,3 @@
     Fri Oct  7 23:19:04 PDT 2005
    +foo
    +foo

We are used to tell the "less" command to do "/^diff --git .*"
while reviewing patches.  The shell quoting, while I admit I
learned its beauty from you, is a disaster for human consumption.

For diff output quoting purposes, LF is the only thing that
matters, as you mentioned in another message to me.  Our parsing
side ("GNU patch" counterpart) checks two pathnames on "diff
--git" line and makes sure what follows a/ and b/ are consistent
(that is, they should be identical, or each are the same as
"rename from" and "rename to"), so there is no ambiguity.  But
again for human consumption purposes, we cannot easily tell SP
and TAB apart by just reading, and a TAB is so unusual character
to have in pathname (as opposed to SP which is not that
uncommon), we may be better off making them visible.

Quoting TAB incidentally has an added benefit, which you as GNU
diff/patch person would probably not care too much about.  Our
other tools sometimes need to show two paths in one record, and
TAB is used as the field separator between two paths (LF is the
record separator).  The tools do have '-z' mode to let us use
anything but NUL in the pathname, and carefully written scripts
tend to run them with '-z' flag and use Perl or Python to parse
paths out, but it would be nicer if we did not always have to.

For example, the 'git commit' command prepares the log editor
with the status information about changes being committed, and
needs to mention paths.  This is purely for human consumption,
and showing something like:

	# Type commit message to this file.  Lines that start
        # with '#' are ignored.
        #
        # Updated but not checked in:
        #   (will commit)
        #
        #	new file: ab\n\tc/mno
        #	modified: abc/mno
        #	renamed: def\nghi/pqr -> dee/pqr
        ...

is perfectly readable for human users, and can be done without
running the tool in '-z' mode, if the tool output is quoted with
'\n' and '\t' convention -- the parsing and formatting side can
just split the field with TAB and show them, without worrying
about an embedded LF making the rest of the pathname spilling
over to the next line.  And once we start teaching the user we
represent funny characters in their paths this way, it becomes
nicer to be consistent in the diff output as well.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Tue Oct 11 17:41:57 2005

This archive was generated by hypermail 2.1.8 : 2005-10-11 17:42:01 EST