Re: impure renames / history tracking

From: Paul Jakma <paul@clubi.ie>
Date: 2006-03-02 05:50:21
Hi Linus,

On Wed, 1 Mar 2006, Linus Torvalds wrote:

> The thing is, it does better than anything that _tries_ to be 
> "reliable".
>
> I can pretty much _guarantee_ that you can't do it better.

I'm willing to take that argument to the 'project' concerned, I just 
need to be pretty sure of it.

> Tracking "inodes" - aka file identities - (which is what BK does, 
> and I assume what SVN does) is fundamentally problematic. I 
> particular, it's a horrible problem when two inodes "meet" under 
> the same name. You now have two identities for the same file, and 
> you're fundamentally screwed.

Yes, in that model it is. This interestingly, is not the BK model, I 
suspect (see below).

> It doesn't even need renames to be a problem. JUST THE FACT THAT 
> YOU TRY TO TRACK FILE "IDENTITY" HISTORY IS BROKEN.

If it's "file identity" globally across the lifetime of the project, 
I agree 100% per cent. The 'traditional' SCM concerned does this.

That's not what a solution I'd want to explore either, I'm only 
interested in the identity of files for any one /one/ commit. In 
saying that, I recognise it's pointless to try annotate file-change 
information in multi-parent commits (merges).

> For example, take CVS, which doesn't actually try to do renames, 
> but _does_ try to track the identity of a file, since all the 
> history is tied into that identity: think about what happens in 
> Attic when a file is deleted. Completely broken model.

ACK, {Attic,deleted_files}/ is just horrid.

> And that's really fundamental. CVS doesn't show the problems so 
> much, because CVS actively tries to make it hard to do these 
> things.

ACK.

> With renames-tracking-file-identities, it's _really_ easy to get 
> some major confusion going. What happens when one branch creates a 
> file, and another one renames a file to that same name, and they 
> merge?

Well, the conflict has to be resolved somehow, even today.

> Don't tell me it doesn't happen. It happened under BK. The way BK 
> "solved" it was to keep the two separate identities: one of them 
> got resolved to the new filename, the other one went into the 
> "deleted" directory.

Right. That's what the 'traditional workflow' SCM I'm thinking of 
does - not BK funnily enough, but an SCM predating BK which also 
happens to use SCCS files, and with some of the same high-level 
push/pull constructs as BK (interestingly).

It also tracks name history globally using a deleted_files/ history, 
which is maintained, but I don't think it does this for name merges 
like the above.

In the one I'm thinking of, it does (I /think/, I'm not an expert in 
it) the following:

Given two files, say:

'old:

1.1---1.2---1.3

new:

1.1

- constructs a 'fake' base SCCS revision, empty
- adds the top 'old' version as a branch
- adds the top new version as a new delta

    1.1.1.1
   /
1.1---------1.2

Where in the merged file:

 	1.1: empty
 	1.1.1.1: was 1.3 from 'old'
 	1.2: is 1.1 from 'new'

However, it does /not/ create a deleted_files entry for the 'old' 
file. (AFAICT - I may not have a sufficiently full understanding of 
this SCM)

> Guess what happens when the side that got merged into "deleted" 
> continues to edit the file? That's right - their edits happen on 
> the deleted file, and never show up in the real tree in a 
> subsequent merge ever again.

Indeed - horrid.

> And as far as I can tell, BK really did the best you can do. 
> Following file identities really _is_ fundamentally broken. It 
> sounds like a nice idea, but while you migth solve a few problems, 
> you create a whole raft of much more fundamental problems.

For tracking identity across more than one commit - I fully agree.

That's not what quite I'm thinking of though. Is it worth going on 
with the discussion on a:

 	 'track identities *only* from context of /the/ parent to
           this commit'

> So next time you think about a merge that migt have been improved 
> by tracking renames, please also think about a merge where one of 
> the filenames came from two or more different sources through an 
> earlier merge, and thank your benevolent Gods that they instructed 
> me to make git be based purely on file contents.

Oh, I agree muchely here.

I wouldn't change git. I only wonder if it give its rename-heuristics 
an additional advisory-only hint? (for single-parent commits at least 
- never merges - and only on a per-commit basis).

I probably should first explore how git deals with rename clashes..

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
I'm glad I was not born before tea.
 		-- Sidney Smith (1771-1845)
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Thu Mar 02 05:52:41 2006

This archive was generated by hypermail 2.1.8 : 2006-03-02 05:52:52 EST