On Tue, 28 Jun 2005, Daniel Barkalow wrote: > > Actually, the ideal thing would be to move the packing code into an object > file that git-ssh-push can include; that way it can write directly to the > socket instead of going through disk It doesn't work very easily that way because the index file (which contains the object list and the offsets into the pack file) cannot be created until after the pack file has been created (and we don't want to evaluate that one in memory, since it can be quite big). Now, what we could do is to stream out the pack file first to stdout, and write the index file afterwards. But since we don't know how big the pack file will be when we start packing, and the pack-file can contain basically arbitrary patterns, that requires that the receiver actually parse the pack-file as it comes in. The format of the pack-file is a fairly trivial data stream of - rinse and repeat for each object: - one character of type of file (C, T, B, G, D for "commit", "tree", "blob", "tag" or "delta" respectively) - four bytes of network-order unpacked data length - [ if delta: 20 bytes of delta object ID ] - zlib-packed data (length unknown, except we know how much we want it to unpack to) - Finally at the end: 20 bytes of SHA1 of the pack-file contents (up to the SHA1) so it's actually possible to pick up the objects as they come off the stream, since the SHA1 name is defined by the contents and you don't need the index file unless you want to look things up. So the receiver side could try this algorithm: - unpack each object in memory on the receiving side If the unpack failed, it must have been the SHA1 at the end, so verify it! - if it's a delta object and you haven't seen the object it's a delta against, keep it in memory. - if it's a non-delta object, just write it to the object store, and try to resolve any delta objects you have pending that this new object satisfies. That in turn creates other objects that may have more deltas they satisfy etc. which looks quite doable. The delta objects are small, so keeping them in memory shouldn't be a problem (especially since we _tend_ to write deltas after the object they depend on). I can certainly add an option to git-pack-file that disables writing of the index file, and just writes the pack-file to stdout. I'm not sure I want to write the "parse incoming pack-file" thing, but git-unpack-objects comes _reasonably_ close (but right now it seeks around using the index file to resolve deltas, instead of keeping them in memory and resolving them when possible). But I can make the infrastructure ready for it. Sounds like a plan. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.htmlReceived on Wed Jun 29 03:35:49 2005
This archive was generated by hypermail 2.1.8 : 2005-06-29 03:35:53 EST