Re: How should I handle binary file with GIT

From: Junio C Hamano <junkio@cox.net>
Date: 2006-04-06 04:34:55
merlyn@stonehenge.com (Randal L. Schwartz) writes:

> I think the issue is related to being able to cherry-pick and merge
> when binaries are involved.  I've been worried about that myself.
> How well are binaries supported these days for all the operations
> we're taking for granted?  When is a "diff" expected to be a real
> "diff" and not just "binary files differ"?

First of all, binary files are handled by cherry-pick and merge
without needing to involve "diff"+"patch" (which is not so
useful for binary files anyway).  They use 3-way read-tree merge
which compares the object names and leave the index unmerged if
there are conflicting changes, so you should be able to sort it
out by running up to three "git-cat-file blob $sha1".

What involves "diff"+"patch" are rebases and processing mailed-in
patches as in the example by the original poster.

In our diff output, we record the blob object name of preimage
and postimage, along with filemode, on the "index" line.
git-apply does not do anything with it by default, but if:

 - --binary flag is given,

 - the postimage blob is already available locally, and,

 - the file the patch is being applied to is the same as the
   recorded preimage,

then the file is _replaced_ with the postimage.

This is good enough for git-rebase (which uses format-patch
piped to am) and is safe (we do not "apply delta" -- only
replace when the file "being patched" matches the recorded
preimage).  It does not do any good for transferring a postimage
that the person who applies the patch does not yet have.

I think "applying delta" to a binary file is not very useful
thing to do.  Depending on the nature of the file being patched,
it may produce a perfectly good result, but verifying if the
result makes sense by the end user and hand-fixing it if does
not, which can be done for text files, is near impossible for
binary files.  "replace with postimage only when you are
applying to the same preimage" rule would be the only practical,
sane thing.

If we wanted to use the patch+diff (i.e. "format-patch,
send-email, and then am" workflow) to transfer new version of
binary files to a recipient, which I think is useful in some
projects, the sanest way to handle this is probably to add
Nico's delta, going from preimage to postimage, encoded for
safer transport, to our diff output.  For safety and sanity, we
will not "apply" the patch unless the patched file exactly
matches the preimage that is recorded in the diff, and as long
as the recipient has the preimage, such a patch would be able to
reproduce the postimage and hopefully be smaller than
transferring the whole thing.

We've been trying to keep our diff output reversible (e.g. we
show what the filemode of the preimage is), so if we take the
above route, it probably should record deltas for both going
from preimage to postimage _and_ going the other way (unless
xdelta can be applied in-reverse, which I do not think is the
case).

Of course, to be _completely_ generic, you could include both
compressed then uuencoded preimage and postimage, and let the
recipient sort it out.  An advantage of that approach is that
the applicability of such a "patch" improves as the tools to
apply it improve, after the patch was originally generated.  I
however think that is only a theoretical advantage, not a very
practical one.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Thu Apr 06 04:35:33 2006

This archive was generated by hypermail 2.1.8 : 2006-04-06 04:35:49 EST