← Back to team overview

dulwich-users team mailing list archive

[PATCH 00/28] Tree diffing and rename detection.

 

This long-promised series adds some infrastructure for doing content-based rename detection. Where possible, I've tried to match C git and/or JGit semantics. I've tried to optimize where possible, and some rough testing on git.git shows that we can do rename detection on a hot disk cache at ~100ms/commit on average.

Sorry for the delay on this (and the next series). It's mostly due to Shawn Pearce going on leave right at the start of it, since he's been doing my reviews for C/JGit compatibility.

Dave

960cd2e objects: Allow tree entries to be sorted by name.
72b1eae Add tree-diffing functionality in a 'diff' module.
f51f70b object_store: Allow MemoryObjectStore object deletion for tests.
49d730c diff: Prune identical subtrees during tree_changes.
e864ddb Use diff.tree_changes to implement BaseObjectStore.tree_changes.
f39e268 test_patch: Fix a tree-adding typo.
70487f5 Use diff.walk_trees for BaseObjectStore.iter_tree_contents.
548096b diff: Add function to count blocks in an object.
de898b6 diff: Add a function to get the similarity score between objects.
1fafae0 diff: Add TreeEntry.add() and delete() factory functions.
ec452cc test_diff: Extract a base TestCase class.
cadfd6c test_diff: Simplify commit_tree helper method.
a7966ca Move permutations from test_objects to misc.
d60c828 diff: Add key for sorting TreeChanges.
e9fe35d diff: RenameDetector with exact rename detection.
6cfd6f8 test_diff: Allow passing SHAs to commit_tree.
d1d9373 diff: Simple content-based rename detection.
d38cd34 diff: Add optional max_files limit for content rename detection.
eb34ccb diff: Factor out _is_tree function for TreeEntry objects.
1b4e642 diff: C implementation of _is_tree.
48f1b74 diff: C implementation of _merge_entries.
5e38157 diff: Consider dissimilar modifies to be delete/add pairs.
fce97c8 diff: Optimize _count_blocks inner loop.
a6f4d4f diff: Use hashcodes as block keys instead of strings.
170f7da tests/utils: Add builder functions to make C extension tests.
03ff2e6 Use ext_functest_builder for existing extension tests.
df15325 diff: C implementation of count_blocks.
344ac04 diff: Add find_copies_harder to consider unmodified files.

 dulwich/_diff.c                    |  444 ++++++++++++++++++++++++
 dulwich/_objects.c                 |   24 +-
 dulwich/diff.py                    |  493 ++++++++++++++++++++++++++
 dulwich/misc.py                    |   75 ++++
 dulwich/object_store.py            |   76 +---
 dulwich/objects.py                 |   25 +-
 dulwich/tests/test_diff.py         |  670 ++++++++++++++++++++++++++++++++++++
 dulwich/tests/test_object_store.py |   34 ++-
 dulwich/tests/test_objects.py      |   77 ++---
 dulwich/tests/test_patch.py        |   24 +-
 dulwich/tests/utils.py             |   42 +++
 setup.py                           |    2 +
 12 files changed, 1843 insertions(+), 143 deletions(-)



Follow ups