← Back to team overview

dulwich-users team mailing list archive

Re: [PATCH 00/34] Framework for walking commits

 

Here we go. Times in seconds. Benchmark script and full output attached.

Current revision_history:
git.git: 4.36
linux-2.6.git: 39.85

New revision_history:
git.git: 4.60 (+5.4%)
linux-2.6.git: 42.60 (+6.9%)

This should be unsurprising, because both approaches (after Timo's
improvements :) are queue-based, and walk.py has a bunch more code. There's
certainly some room for optimization, but it's not high on my list.

If we think it's beneficial to keep a noticeably faster implementation in
Repo.revision_history(), I don't mind leaving the old implementation in for
now. IMHO Walker is still useful on its own.

On Thu, Jul 28, 2011 at 10:16, <dborowitz@xxxxxxxxxx> wrote:

> Repo.revision_history is nice and simple, but doesn't expose a whole lot of
> options. This series builds up a framework for walking commits efficiently
> and
> correctly with respect to C git's implementation.
>
> I did have some benchmarks of this vs. the previous implementation, but
> there
> were some improvements between then and now, so I need to find and re-run
> them.
>
> This one has fewer incompatible API changes and more NEWS entries :)
>
> c0ce323 tests/utils: Function for quickly building commit graphs.
> ffb71f2 tests/utils: Build commit graphs with custom trees.
> e6c6233 tests/utils: Build graphs of commits with arbitrary attributes.
> b7e0347 diff_tree: Make rename/rewrite threshold constants public.
> acdedc3 objects: Make ShaFile.__eq__ work when other is not a ShaFile.
> b2d01e8 Add a simple, extensible framework for commit walking.
> f08b421 walk: Add option to limit the number of commits returned.
> c27aa61 walk: Encapsulate walk results in an object.
> b3230a8 objects: Add lookup_path method to Tree.
> 6862389 diff_tree: Add function for getting tree changes for merges.
> 54cd823 walk: Add WalkEntry method to get tree changes.
> c724a65 walk: Return only commits matching specified paths.
> af48e9f diff_tree: Construct RenameDetectors without passing tree SHAs.
> d932a45 diff_tree: Add want_unchanged to changes_with_renames.
> f3e58be diff_tree: Add rename_detector to tree_changes.
> 287dd49 diff_tree: Add rename_detector to tree_changes_for_merge.
> 5a73d58 walk: Add option for rename detection.
> 609cff5 walk: Add option to follow paths across rename/copy.
> f177ce6 walk: Raise MissingCommitError instead of KeyError.
> cc6b61f repo: Implement revision_history with a Walker.
> e0a0c53 walk: Add options to limit by commit time.
> a8b3be4 walk: Handle a small, fixed number of out-of-date-order commits.
> 3afc0a1 walk: Simplify WalkEntry constructor.
> b7a54b1 walk: Extract commit-time pqueue into its own class.
> ef41aaa walk: Separate reordering logic from iteration logic.
> f7b9736 test_walk: Simplify commit_time attr assignment.
> d8b2b02 walk: Allow topological ordering.
> eb06ea5 Add walk module to tests/__init__.py.
> a80b4f2 walk: Walk a fixed number of extra commits before returning.
> b56bc6b walk: Exclude parents more aggressively.
> 4548e33 walk: Reorganize 'since' boundary code.
> af3611b walk: Record last commit yielded by the pq.
> b356947 _compat: Add implementation of all().
> b203909 walk: Propagate excluded out-of-order commits.
>
>  NEWS                            |   19 ++
>  dulwich/_compat.py              |   14 ++
>  dulwich/diff_tree.py            |  136 +++++++++++--
>  dulwich/errors.py               |    1 +
>  dulwich/object_store.py         |   18 +--
>  dulwich/objects.py              |   30 +++-
>  dulwich/repo.py                 |   52 +----
>  dulwich/tests/__init__.py       |    1 +
>  dulwich/tests/test_diff_tree.py |  201 ++++++++++++++++++-
>  dulwich/tests/test_utils.py     |   84 ++++++++
>  dulwich/tests/test_walk.py      |  411
> +++++++++++++++++++++++++++++++++++++++
>  dulwich/tests/utils.py          |   80 ++++++++
>  dulwich/walk.py                 |  364 ++++++++++++++++++++++++++++++++++
>  13 files changed, 1330 insertions(+), 81 deletions(-)
>
>
Current:
@w:~$ for i in {1..10}; do python dulwich_walk_benchmark.py d/git; done
Walked 25712 commits in 4.38s
Walked 25712 commits in 4.37s
Walked 25712 commits in 4.36s
Walked 25712 commits in 4.36s
Walked 25712 commits in 4.34s
Walked 25712 commits in 4.37s
Walked 25712 commits in 4.38s
Walked 25712 commits in 4.40s
Walked 25712 commits in 4.41s
Walked 25712 commits in 4.35s
@w:~$ for i in {1..10}; do python dulwich_walk_benchmark.py d/linux-2.6; done
Walked 254922 commits in 39.82s
Walked 254922 commits in 39.93s
Walked 254922 commits in 40.35s
Walked 254922 commits in 39.73s
Walked 254922 commits in 39.76s
Walked 254922 commits in 40.01s
Walked 254922 commits in 39.92s
Walked 254922 commits in 39.66s
Walked 254922 commits in 39.73s
Walked 254922 commits in 39.58s

New:
@w:~$ for i in {1..10}; do python dulwich_walk_benchmark.py d/git; done
Walked 25712 commits in 4.57s
Walked 25712 commits in 4.61s
Walked 25712 commits in 4.59s
Walked 25712 commits in 4.58s
Walked 25712 commits in 4.61s
Walked 25712 commits in 4.63s
Walked 25712 commits in 4.63s
Walked 25712 commits in 4.63s
Walked 25712 commits in 4.57s
Walked 25712 commits in 4.56s
@w:~$ for i in {1..10}; do python dulwich_walk_benchmark.py d/linux-2.6; done
Walked 254922 commits in 42.71s
Walked 254922 commits in 42.82s
Walked 254922 commits in 42.35s
Walked 254922 commits in 42.49s
Walked 254922 commits in 42.55s
Walked 254922 commits in 42.77s
Walked 254922 commits in 42.56s
Walked 254922 commits in 42.55s
Walked 254922 commits in 42.62s
Walked 254922 commits in 42.61s
#!/usr/bin/python
#
# Copyright 2010 Google Inc. All Rights Reserved.

import sys
import time

from dulwich import repo

def main():
    path = sys.argv[1]

    r = repo.Repo(path)
    start_time = time.time()
    commits = r.revision_history(r.head())
    end_time = time.time()
    print 'Walked %i commits in %.2fs' % (len(commits), end_time - start_time)


if __name__ == '__main__':
    main()

References