Re: maildir / read-tree trivial merging getting in the way?

From: Linus Torvalds <torvalds@osdl.org>
Date: 2006-02-14 13:32:35
On Tue, 14 Feb 2006, Ben Clifford wrote:
> 
> I've spent a few hours playing round with maildir-aware merging.
> 
> The basic idea I'm trying to implement is to flip the index round so that
> instead of looking at how the content has changed for a particular filename,
> I'm looking at how the filenames have changed for a particular content.
> 
> So I'm using git read-tree -m to populate the index with entries for the
> branches to merge so that I can then diddle round with those.
> 
> But the read-tree trivial merge logic seems to be getting in the way a bit.

You are much better off working with "git-ls-tree", or perhaps 
"git-diff-tree".

The latter in particular will show you what got added and what got 
deleted, but will quickly ignore any common entries (which is probably 
exactly what you want).

> So basically my question is: should I feel dirty about doing this and diddle
> read-tree so that there's a flag to not do the trivial merges automatically?

You should try to avoid git-read-tree entirely, I suspect.

All the things git-read-tree does are wrong for you. Notably, it very much 
on purpose will match things up name-by-name, and it does a lot of extra 
work to create a sorted version of the index to do the trivial merges 
quickly. The thing is, it doesn't even do that the smart way.

Now, git-read-tree actually does a _great_ job - don't get me wrong. It's 
just that the job it does isn't really suitable for your usage, and it's 
doing some things the "simple and stupid" way instead of being very smart 
about them, just because they aren't that important under normal loads.

For example, in a three-way merge (with an index), it will basically have 
four sorted inputs that it needs to interleave. Now, there's a _smart_ way 
to interleave sorted input, and there's a stupid one. The smart way is to 
read the sources all together, and just pick the right sorted order, and 
interleave them all together.

That's not what git-read-tree does.

git-read-tree will read them one by one, and use "insertion sort" to 
maintain the result in sorted order. Now, insertion sort isn't totally 
idiotic (it's not doing a bogo-sort, at least), but it _is_ pretty damn 
silly when all the sources are already sorted and known ahead of time.

So git-read-tree does some stupid things, and scales badly with really big 
trees. The good news is that we can fix it - the bad news is that my 
motivation for it is pretty low, since "really big" means "much bigger 
than the kernel" ;)

In contrast "git-diff-tree -r a b" does the _smart_ thing, and scales 
linearly with tree size _and_ can take advantage of subdirectories not 
changing (the latter is apparently not an issue for you, but can be one in 
other circumstances).

(The "raw output" from git-diff-tree is also very easy to parse. Don't do 
the "-p" (patch) form, the raw "this is how the SHA's changed" sounds 
like it's exactly what you want, assuming you're interested in renames 
with no content change)

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Tue Feb 14 13:33:49 2006

This archive was generated by hypermail 2.1.8 : 2006-02-14 13:34:00 EST