Re: git-fetching from a big repository is slow

From: Andreas Ericsson <ae@op5.se>
Date: 2006-12-15 02:06:05
Johannes Schindelin wrote:
> Hi,
> 
> On Thu, 14 Dec 2006, Andreas Ericsson wrote:
> 
>> Andy Parkins wrote:
>>> Hello,
>>>
>>> I've got a big repository.  I've got two computers.  One has the repository
>>> up-to-date (164M after repack); one is behind (30M ish).
>>>
>>> I used git-fetch to try and update; and the sync took HOURS.  I zipped the
>>> .git directory and transferred that and it took about 15 minutes to
>>> transfer.
>>>
>>> Am I doing something wrong?  The git-fetch was done with a git+ssh:// URL.
>>> The zip transfer with scp (so ssh shouldn't be a factor).
>>>
>> This seems to happen if your repository consists of many large binary files,
>> especially many large binary files of several versions that do not deltify
>> well against each other. Perhaps it's worth adding gzip compression detecion
>> to git? I imagine more people than me are tracking gzipped/bzip2'ed content
>> that pretty much never deltifies well against anything else.
> 
> Or we add something like the heuristics we discovered in another thread, 
> where rename detection (which is related to delta candidate searching) is 
> not started if the sizes differ drastically.
> 

It wouldn't work for this particular case though. In our distribution 
repository we have ~300 bzip2 compressed tarballs with an average size 
of 3MiB. 240 of those are between 2.5 and 4 MiB, so they don't 
drastically differ, but neither do they delta well.

One option would be to add some sort of config option to skip attempting 
deltas of files with a certain suffix. That way we could just tell it to 
ignore *.gz,*.tgz,*.bz2 and everything would work just as it does today, 
but a lot faster.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Fri Dec 15 02:07:05 2006

This archive was generated by hypermail 2.1.8 : 2006-12-15 02:08:27 EST