Re: Caching directories

From: Junio C Hamano <junkio@cox.net>
Date: 2006-01-25 16:52:21
Pavel Roskin <proski@gnu.org> writes:

> Maybe it's time to start caching directories in git?  I mean,
> directories corresponding to tree objects could have their stats
> recorded in the cache.  This would allow to distinguish between tracked
> and untracked directories without scanning them recursively.

I do not understand the above logic.  

Given a directory path, finding out if the directory has
something tracked in it is an O(log n) operation in the current
index that does not "cache directory".  Your message implies
that you feel we could use the index file to list "untracked
directories" without recursively scanning the directory tree,
but to me, the only way to do that is to record a new directory
in the index file every time somebody (either Makefile or the
user) creates a junk directory.  That does not make much sense
to me, so I am probably misreading what you really meant.

I've been meaning to explore the possibility of recording 0{40}
SHA1 in the index file to mean "I do not want to place anything
on this path when I write the index out to a tree yet, but keep
an eye on the path in the working tree for me".

You can consider this as an "intent to add"; for example, with
such an index file, you could do something like this:

	$ git update-index --intent-to-add foo

This would record 0^{40} SHA1 with the 0 mode in the index at
"foo".  Then:

        $ git diff-files -p
        diff --git a/foo b/foo
        new file mode 100644
        index 0000000..6690023
        --- /dev/null
        +++ b/foo
        @@ -0,0 +1,24 @@
        +...
        +....
        ...

The index has heard about it, but does not actually have it, so
it reports an addition.  Since we currently do not have such,
after a "git add", the index not just has heard about it, but
actually has it, and as a consequence, there is no way to get
"new file" out of diff-files.

	$ git diff-index --cached HEAD ;# nothing

The index has heard about it, but does not have it.  If the HEAD
commit did not have it, diff-index --cached would report
nothing.

        $ git diff-index HEAD
        diff --git a/foo b/foo
        new file mode 100644
        index 0000000..6690023
        --- /dev/null
        +++ b/foo
        @@ -0,0 +1,24 @@
        +...
        +....
        ...

The index has heard about it, and without --cached it uses the
working tree file, so if HEAD did not have it you would see "new
file" out of diff-index.  If the comparison were with a tree
that has "foo" in it, diff-index using an index that does not
have "foo" would not say anything in the current system, but
with "intent to add", it would say "Oh, your index knows about
it so let me look in the working tree; ah, you have something
there.  Let me compare it with the version in the tree in
question".

One interesting thing the "intent to add" entries would do is
this:

        $ git diff-files --abbrev foo
        :000000 100644 0000... 0000...

Note that two "0^{40}" mean quite different things.  The one on
the LHS means "we've heard about it but we do not have it".  On
the other hand, the one on the RHS means "we do not cache the
SHA1 --- go look at the working tree file".

We might want to represent the existence of a tree that does not
have anything under using 0^{40} as well.  Or it might be better
kept out of the main index entries list, and become extra data
just like we have been discussing how to store "bind" entries in
the "Subprojects" thread.  I dunno.

I have no idea what 'clean' does, so would not comment on that
part of your message.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Wed Jan 25 16:55:19 2006

This archive was generated by hypermail 2.1.8 : 2006-01-25 16:55:27 EST