Re: 'git status' is not read-only fs friendly

From: Shawn O. Pearce <spearce@spearce.org>
Date: 2007-02-11 18:23:58
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> It's not even a "technical issue". It's a fundamental optimization. Sure, 
> you can call optimizations just "technical issues", but the fact is, it's 
> one of the things that makes git so _usable_ on large archives. At some 
> point, an "optimization" is no longer just about making things slightly 
> faster, it's about something much bigger, and has real semantic meaning.
> 
> So the fact is, "git status" _needs_ to refresh the index. Because if it 
> doesn't, you'll see every file that doesn't match the index as "dirty", 
> and that is not just a "technical issue".

Indeed.  Except that `git-update-index --refresh` is itself not
very fast on Cygwin+NTFS and large projects (about the size of
the kernel).  So git-status is a real slouch there.  Not running
`git update-index --refresh` saves at least a couple of seconds.

This is why git-gui lets you disable the refresh, and is part of
the reason why it computes the status on its own by diff-index,
diff-files and ls-files --others.
 
> THIS IS NOT "JUST A TECHNICAL ISSUE". 
> 
> When the difference is 40 seconds vs 4 (uncached), or 2 seconds vs 0.06, 
> it's not about "just an optimization" any more. At that point, it's about 
> "unusable vs usable".
> 
> And yeah, waiting 40 seconds for a global "diff" for a big project may be 
> something that a person coming from CVS considers to be just par for the 
> course. Maybe I'm just unreasonable. But I think it's a _bug_ if I can't 
> get a small diff in about a tenth of a second. It needs to be so fast that 
> I never even _think_ about it.

Yes.  Which is why if git-gui finds a file that has an empty diff,
but that was reported as modified by diff-files, it tells the user
its about to go waste a few seconds running `update-index --refresh`,
then does so.

In practice I've found it rare that a file is dirty in the index,
but is not actually modified.  The typical culprit appears to
actually be the virus scanner on a Windows system.  For some reason
it feels a need to modify some random XML 'source' files that are
tracked by Git.  Out of 30,000 files it likes to modify about 100.
*sigh* At least I have Git to tell me it didn't change any content.
 
> I think it would be much better if "git status" always wrote the refreshed 
> index file. It could then choose to ignore any errors if they happen, 
> because if you have a broken setup like the NTFS read-only thing, then 
> tough, it's broken, but git can't do anythign about it. But people should 
> be aware that yes, "git status" absolutely _needs_ to write the index 
> file. 

Not only that, but I think we can do much better with git-runstatus
than we do now.  If we scan the working directory (to search for
untracked files), and we walk the index in parallel, we can update
the index with new stat data if necessary.

Of course that doesn't matter much on Linux; its VFS operations
don't take hours.

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Sun Feb 11 18:28:12 2007

This archive was generated by hypermail 2.1.8 : 2007-02-11 18:29:45 EST