Re: More precise tag following

From: Simon 'corecode' Schubert <corecode@fs.ei.tum.de>
Date: 2007-01-27 20:04:52
Shawn O. Pearce wrote:
> This triplet would probably be encoded with descendant-commit using
> OBJ_REF, ancestor-commit being an OBJ_OFS style back reference within
> the pack (or OBJ_REF if not in this pack) and ancestor-object would
> also be an OBJ_REF.  So a triplet probably would wind up costing
> ~60 bytes.

I'd propose the following:

Have a "shadow" tree which is storing DAG information per entry of the original tree.  keep this shadow tree sorted in the same order the original tree object is.  for simplicity i will now add the new information to the tree examples, but of course this new data cannot be stored in the tree itself and can't be hashed.

tree
<filename> <mode> <object> (<new: ancestor commit> <new: ancestor object>)*

the ancestor object isn't necessary, it would just speed up annotations.  considering that we want to look only at the path, it is superfluous.  if we want to traverse copies/moves as well, this could be helpful.

now, this information allows us to build a object-level (read: path identifies object) DAG, by starting out on the current tree and retrieving the associated information.  using this it is possible to jump to the next commit/tree which changed the object and start over.

expense:  8 or 40 bytes per parent, per object, per commit, for each tree modified.

per parent:  we of course store information about all parents
per object:  as this is a "shadow" tree, we need to annotate all entries and not just changed ones
for each tree modified:  (sub)trees not being modified of course do not need annotation *again*

but:  this shadow tree can be deltified quite tightly, i'd say.  possibly with a different, specialized tree delta method.  then this boils down to 8 or 40 bytes per *changed* object, plus delta overhead.

> This of course penalizes objects which don't ever change, as we'd
> have to walk back a good chunk of the DAG before we find a matching
> triplet.  But I would suspect that files which never change are
> also not given to log/blame very often either.  And once we do find
> a triplet, we can skip through the DAG in time proportional to the
> rate of change for the path, rather than to the entire repository.

using my proposal this penalty does not exist.  i think it would be really awkward to have the annotation of a never-changed Makefile to take way longer than the operation on a recently/often changed file.

we'd have to compare the space requirements of both approaches.

cheers
  simon

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Received on Sat Jan 27 20:06:10 2007

This archive was generated by hypermail 2.1.8 : 2007-01-27 20:07:35 EST