Re: [PATCH] rev-list: add "--full-objects" flag.

From: Eric W. Biederman <ebiederm@xmission.com>
Date: 2005-07-10 07:09:02
Linus Torvalds <torvalds@osdl.org> writes:

> On Thu, 7 Jul 2005, Junio C Hamano wrote:
>> 
>> However it does not automatically mean that the avenue I have
>> been pursuing would not work; the server side preparation needs
>> to be a bit more careful than what I sent, which unconditionally
>> runs "prune-packed".  It instead should leave the files that
>> "--whole-trees" would have packed as plain SHA1 files, so that
>> the bulk is obtained by statically generated packs and the rest
>> can be handled in the commit-chain walker as before.

> The "fetch one object, parse it, fetch the next one, parse that.." 
> approach is just horrible.

Agreed.  That does not cover up latency at all and depending on the 
parsing cost can potentially even keep you from having anything on
your network connection for a noticeable amount of time.

> I ended up preferring the "rsync" thing even though rsync sucked badly on
> big object stores too, if only because when rsync got working, it at least
> nicely pipelined the transfers, and would transfer things ten times faster
> than git-ssh-pull did (maybe I'm exaggerating, but I don't think so, it
> really felt that way).

This feels to me like an implementation issue (no pipelining) rather
than a design issue (pipelining is impossible).

> And the thing is, if you purely follow one tree (which is likely the
> common case for a lot of users), then you are actually always likely
> better off with the "mirror it" model. Which is _not_ a good model for
> developers (for example, me rsync'ing from Jeff's kernel repository always
> got me hundreds of useless objects), but it's fine for somebody who
> actually just wants to track somebody else.

I assume the problem with the mirror it model was simply there were
to many objects?

> And then you really can use just rsync or wget or ncftpget or anything
> else that has a "fetch recursively, optimizing existing objects" mode.

Sane.  But with an intelligent fetcher and a little extra information
a dumb server should still be able to not fetch branches we care
nothing about.  I think that extra information is simply commit
object graph and which packs those commit objects are in.  I assume
the commit graph information will be fairly modest.

Once you have that extra information you can generate incremental
packs whenever you upload to the server, and you can make the
incremental packs per branch.

That should allow an dumb fetcher to look at the list of commits
and just fetch those packs it cares about, and since it only has
to look one place first it should be fairly sane.

The core idea is that if the dumb-server-preparation can anticipate
common access patterns (mirror a branch) and give enough information
so that can be done cheaply and pipelined I don't expect it to be much
worse than an intelligent fetcher.

The current intelligent fetch currently has a problem that it cannot
be used to bootstrap a repository.  If you don't have an ancestor
of what you are fetching you can't fetch it.

Eric



-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Sun Jul 10 07:09:48 2005

This archive was generated by hypermail 2.1.8 : 2005-07-10 07:09:50 EST