Re: CAREFUL! No more delta object support!

From: Linus Torvalds <torvalds@osdl.org>
Date: 2005-06-29 03:36:15
On Tue, 28 Jun 2005, Daniel Barkalow wrote:
> 
> Actually, the ideal thing would be to move the packing code into an object
> file that git-ssh-push can include; that way it can write directly to the
> socket instead of going through disk

It doesn't work very easily that way because the index file (which
contains the object list and the offsets into the pack file) cannot be
created until after the pack file has been created (and we don't want to
evaluate that one in memory, since it can be quite big).

Now, what we could do is to stream out the pack file first to stdout, and
write the index file afterwards. But since we don't know how big the pack
file will be when we start packing, and the pack-file can contain
basically arbitrary patterns, that requires that the receiver actually 
parse the pack-file as it comes in.

The format of the pack-file is a fairly trivial data stream of

 - rinse and repeat for each object:

     - one character of type of file (C, T, B, G, D for "commit", "tree", 
       "blob", "tag" or "delta" respectively)

     - four bytes of network-order unpacked data length

     - [ if delta: 20 bytes of delta object ID ]

     - zlib-packed data (length unknown, except we know how much we want 
       it to unpack to)

 - Finally at the end: 20 bytes of SHA1 of the pack-file contents (up to 
   the SHA1)

so it's actually possible to pick up the objects as they come off the 
stream, since the SHA1 name is defined by the contents and you don't need 
the index file unless you want to look things up.

So the receiver side could try this algorithm:

 - unpack each object in memory on the receiving side

	If the unpack failed, it must have been the SHA1 at the end, so 
	verify it!

 - if it's a delta object and you haven't seen the object it's a delta 
   against, keep it in memory.

 - if it's a non-delta object, just write it to the object store, and try 
   to resolve any delta objects you have pending that this new object 
   satisfies. That in turn creates other objects that may have more deltas 
   they satisfy etc.

which looks quite doable. The delta objects are small, so keeping them in 
memory shouldn't be a problem (especially since we _tend_ to write deltas 
after the object they depend on).

I can certainly add an option to git-pack-file that disables writing of
the index file, and just writes the pack-file to stdout. I'm not sure I
want to write the "parse incoming pack-file" thing, but git-unpack-objects
comes _reasonably_ close (but right now it seeks around using the index
file to resolve deltas, instead of keeping them in memory and resolving
them when possible). But I can make the infrastructure ready for it.

Sounds like a plan.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Received on Wed Jun 29 03:35:49 2005

This archive was generated by hypermail 2.1.8 : 2005-06-29 03:35:53 EST