On Wed, 2005-04-27 at 14:23 -0400, Chris Mason wrote: > Thanks. I originally called diff-tree without the file list so that I could > do the regexp matching, but this is probably one of those features that will > never get used. When I added this functionality to diff-tree I didn't want to add regexp support, but I did make sure it could handle the simple case of "changes within directory xxx/yyy". It can also take _multiple_ names. At the same time, I also posted a primitive script which attempted to do something similar to what you're doing. The output of rev-tree is useless, as Linus pointed out. Chronological sorting is counterproductive in all cases and should be avoided _everywhere_. My script is based on the original 'gitlog.sh' script, which walks the commit tree from the head to its parents. It lists only those commits where the file(s) in question actually changed, giving the commit ID and the changes. There's one problem with that already documented in my (attached) mail -- we don't print merge changesets where the file in the child is identical to the file in all the parents, but the changeset in question _is_ relevant to the history because it's merging two branches on which the file _independently_ changed. The other problem is that we still don't have enough information to piece together the full tree. With each commit we print, we're also printing the last _relevant_ child (see $lastprinted in the script). That allows us to piece together most of the graph, but when we eventually reach a commit which has already been processed (but not necessarily _printed_, we just stop -- so we don't have useful parent information for the oldset change in each branch and can't tie it back to the point at which it branched. We know the _immediate_ parent, but that parent isn't necessarily going to have been one of the commits we actually printed. I suspect the best way to do this is to start with a copy of rev-tree and do something like.. 1. Add a 'struct commit_list children' to 'struct commit' 2. Make process_commit() set it correctly: @@ wherever @@ process_commit while (parents) { process_commit(parents->item->object.sha1); + commit_list_insert(obj, &parents->item->children); parents = parents->next; } 3. Check each 'interesting' commit to see if it affects the file(s) in question. 4. Prune the tree: For each commit which isn't a merge and which doesn't touch the file(s), just dump it from the tree, changing the child pointer of its parent and the parent pointer of its child accordingly to maintain the tree. For each merge where there are no changes to the file(s) between the merge point and the point at which the branch was taken, drop that too. 5. Print the remaining commits. -- dwmw2
attached mail follows:
On Wed, 2005-04-13 at 14:57 +0100, David Woodhouse wrote:
> The plan is that this will also form the basis of a tool which will report the
> revision tree for a given file, which is why I really want to avoid the
> unnecessary recursion rather than just post-processing the output.
Script attached. Its output is something like this:
commit 97c9a63e76bf667c21f24a5cfa8172aff0dd1294 child
*100664->100644 blob 6e4064e920792d5b0219b9f8f55a38ab4a1af856->c1091cd15e2ed1be65b50eaa910f7b45c08d93ac rev-tree.c
--------------------------
commit 13b6f29ac1686955e15f0250f796362460b4992e child 97c9a63e76bf667c21f24a5cfa8172aff0dd1294
*100644->100644 blob 5b3090780d49cc610339a19f070a5954dce9a8bc->c1091cd15e2ed1be65b50eaa910f7b45c08d93ac rev-tree.c
--------------------------
commit 6420f0732f695269c0e3f28e62ed4b9aa6578d9f child 13b6f29ac1686955e15f0250f796362460b4992e
*100644->100644 blob 7429b9c4d0aab2e4a494eb4b65129a59da138106->5b3090780d49cc610339a19f070a5954dce9a8bc rev-tree.c
*100664->100644 blob 28a980482bf2053e022409cc3e50b2ad8adafd55->5b3090780d49cc610339a19f070a5954dce9a8bc rev-tree.c
<...>
As we walk the tree from the HEAD to its parents, we print only those
commits which modify the file(s) in question. We remember the last
commit we printed as we recurse, so that we can generate a complete
graph. The SHA-1 of the blobs themselves aren't good enough on their own
because they're not guaranteed to be unique -- if the same change
happens on two different branches, the SHA-1 will be the same, and we
won't know how it fits together.
As it is, it's not quite perfect because I'm still omitting merge
commits where the resulting file is identical to the same file in _all_
of the parents. So if we have the following tree (for the _file):
----- (AB) ----,
/ \
(A) ------ (AB) ----- (AB) --,
\ \
----- (AC) --------------(ABC)
(Where the delta A->AB is a trivial one-line fix which two people
independently reproduce, then they merge their trees together)
.. the point where the two independent instances of (AB) are merged
together won't be shown in the output of the attached script. The output
would show only this:
----- (AB) ----,
/ \
(A) ------ (AB) ----- (ABC)
\ /
----- (AC) ----'
Do we care about this? Or is it good enough? I don't really want to emit
output for _every_ merge commit we traverse, just in _case_ it happens
to be relevant later. Should just give in to the voices in my head which
are telling me I should through the damn thing away and rewrite it in C?
Given this output, it should be possible to display a pretty graph of
the history of the file, and easily find both diffs and whole files.
Creating a graphical tool which does this is left as an exercise for the
reader.
--
dwmw2
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
This archive was generated by hypermail 2.1.8 : 2005-04-28 23:03:44 EST