On Fri, 22 Apr 2005, Chris Mason wrote: > > The problem I see for git is that once you have enough data, it should degrade > over and over again somewhat quickly. I really doubt that. There's a more or less constant amount of new data added all the time: the number of changes does _not_ grow with history. The number of changes grows with the amount of changes going on in the tree, and while that isn't exactly constant, it definitely is not something that grows very fast. Btw, this is how git is able to be so fast in the first place. Git is fast because it knows that the "size of the change" is a lot smaller than the "size of the repository", so it fundamentally at all points tries to make sure that it only ever bothers with stuff that has changed. Stuff that hasn't changed, it ignores very _very_ efficiently. That's really the whole point of the index file: it's a way to quickly ignore the stuff that hasn't changed - both for simple operations like "show-diff", but also for complex operations like "merge these three trees". And it works exactly because the number of changes does _not_ grow at all linearly with the history of the project. In fact, in most projects, the rate of change does _down_ when the project grows, because the projects matures and generally gets more complicated and thus harder to change. (The kernel _really_ is pretty special. I am willing to bet that there are not a lot of big projects that have been able to continue to take changes at the kind of pace that the kernel does. But we've had to work at it a lot, including obviously using SCM tools that are very much geared towards scaling. Why do you think the kernel puts more pressure on SCM's than other projects? It's exactly because we're trying to scale our change acceptance to bigger numbers). So when you say "once you have enough data, it will degrade quickly" ignores the fact that the rate of change isn't (the "second derivative of the size of the project in time") really isn't that high. > I grabbed Ingo's tarball of 28,000 patches since 2.4.0 and applied them all > into git on ext3 (htree). It only took ~2.5 hrs to apply. Ok, I'd actually wish it took even less, but that's still a pretty impressive average of three patches a second. > Anyway, I ended up with a 2.6GB .git directory. Then I: > > rm .git/index > umount ; mount again > time read-tree `tree-id` (24.45s) > time checkout-cache --prefix=../checkout/ -a -f (4m30s) > > --prefix is neat ;) That sounds pretty acceptable. Four minutes is a long time, but I assume that the whole point of the exercise was to try to test worst-case behaviour. We can certainly make sure that real usage gets lower numbers than that (in particular, my "real usage" ends up being 100% in the disk cache ;) Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.htmlReceived on Sat Apr 23 05:41:41 2005
This archive was generated by hypermail 2.1.8 : 2005-04-23 05:41:41 EST