dulwich-users team mailing list archive
-
dulwich-users team
-
Mailing list archive
-
Message #00278
[PATCH 00/28] Tree diffing and rename detection.
This long-promised series adds some infrastructure for doing content-based rename detection. Where possible, I've tried to match C git and/or JGit semantics. I've tried to optimize where possible, and some rough testing on git.git shows that we can do rename detection on a hot disk cache at ~100ms/commit on average.
Sorry for the delay on this (and the next series). It's mostly due to Shawn Pearce going on leave right at the start of it, since he's been doing my reviews for C/JGit compatibility.
Dave
960cd2e objects: Allow tree entries to be sorted by name.
72b1eae Add tree-diffing functionality in a 'diff' module.
f51f70b object_store: Allow MemoryObjectStore object deletion for tests.
49d730c diff: Prune identical subtrees during tree_changes.
e864ddb Use diff.tree_changes to implement BaseObjectStore.tree_changes.
f39e268 test_patch: Fix a tree-adding typo.
70487f5 Use diff.walk_trees for BaseObjectStore.iter_tree_contents.
548096b diff: Add function to count blocks in an object.
de898b6 diff: Add a function to get the similarity score between objects.
1fafae0 diff: Add TreeEntry.add() and delete() factory functions.
ec452cc test_diff: Extract a base TestCase class.
cadfd6c test_diff: Simplify commit_tree helper method.
a7966ca Move permutations from test_objects to misc.
d60c828 diff: Add key for sorting TreeChanges.
e9fe35d diff: RenameDetector with exact rename detection.
6cfd6f8 test_diff: Allow passing SHAs to commit_tree.
d1d9373 diff: Simple content-based rename detection.
d38cd34 diff: Add optional max_files limit for content rename detection.
eb34ccb diff: Factor out _is_tree function for TreeEntry objects.
1b4e642 diff: C implementation of _is_tree.
48f1b74 diff: C implementation of _merge_entries.
5e38157 diff: Consider dissimilar modifies to be delete/add pairs.
fce97c8 diff: Optimize _count_blocks inner loop.
a6f4d4f diff: Use hashcodes as block keys instead of strings.
170f7da tests/utils: Add builder functions to make C extension tests.
03ff2e6 Use ext_functest_builder for existing extension tests.
df15325 diff: C implementation of count_blocks.
344ac04 diff: Add find_copies_harder to consider unmodified files.
dulwich/_diff.c | 444 ++++++++++++++++++++++++
dulwich/_objects.c | 24 +-
dulwich/diff.py | 493 ++++++++++++++++++++++++++
dulwich/misc.py | 75 ++++
dulwich/object_store.py | 76 +---
dulwich/objects.py | 25 +-
dulwich/tests/test_diff.py | 670 ++++++++++++++++++++++++++++++++++++
dulwich/tests/test_object_store.py | 34 ++-
dulwich/tests/test_objects.py | 77 ++---
dulwich/tests/test_patch.py | 24 +-
dulwich/tests/utils.py | 42 +++
setup.py | 2 +
12 files changed, 1843 insertions(+), 143 deletions(-)
Follow ups
-
Re: [PATCH 00/28] Tree diffing and rename detection.
From: Jelmer Vernooij, 2010-12-06
-
[PATCH 28/28] diff: Add find_copies_harder to consider unmodified files.
From: dborowitz, 2010-12-03
-
[PATCH 27/28] diff: C implementation of count_blocks.
From: dborowitz, 2010-12-03
-
[PATCH 26/28] Use ext_functest_builder for existing extension tests.
From: dborowitz, 2010-12-03
-
[PATCH 25/28] tests/utils: Add builder functions to make C extension tests.
From: dborowitz, 2010-12-03
-
[PATCH 24/28] diff: Use hashcodes as block keys instead of strings.
From: dborowitz, 2010-12-03
-
[PATCH 23/28] diff: Optimize _count_blocks inner loop.
From: dborowitz, 2010-12-03
-
[PATCH 22/28] diff: Consider dissimilar modifies to be delete/add pairs.
From: dborowitz, 2010-12-03
-
[PATCH 21/28] diff: C implementation of _merge_entries.
From: dborowitz, 2010-12-03
-
[PATCH 20/28] diff: C implementation of _is_tree.
From: dborowitz, 2010-12-03
-
[PATCH 19/28] diff: Factor out _is_tree function for TreeEntry objects.
From: dborowitz, 2010-12-03
-
[PATCH 18/28] diff: Add optional max_files limit for content rename detection.
From: dborowitz, 2010-12-03
-
[PATCH 17/28] diff: Simple content-based rename detection.
From: dborowitz, 2010-12-03
-
[PATCH 16/28] test_diff: Allow passing SHAs to commit_tree.
From: dborowitz, 2010-12-03
-
[PATCH 15/28] diff: RenameDetector with exact rename detection.
From: dborowitz, 2010-12-03
-
[PATCH 14/28] diff: Add key for sorting TreeChanges.
From: dborowitz, 2010-12-03
-
[PATCH 13/28] Move permutations from test_objects to misc.
From: dborowitz, 2010-12-03
-
[PATCH 12/28] test_diff: Simplify commit_tree helper method.
From: dborowitz, 2010-12-03
-
[PATCH 11/28] test_diff: Extract a base TestCase class.
From: dborowitz, 2010-12-03
-
[PATCH 10/28] diff: Add TreeEntry.add() and delete() factory functions.
From: dborowitz, 2010-12-03
-
[PATCH 09/28] diff: Add a function to get the similarity score between objects.
From: dborowitz, 2010-12-03
-
[PATCH 08/28] diff: Add function to count blocks in an object.
From: dborowitz, 2010-12-03
-
[PATCH 07/28] Use diff.walk_trees for BaseObjectStore.iter_tree_contents.
From: dborowitz, 2010-12-03
-
[PATCH 06/28] test_patch: Fix a tree-adding typo.
From: dborowitz, 2010-12-03
-
[PATCH 05/28] Use diff.tree_changes to implement BaseObjectStore.tree_changes.
From: dborowitz, 2010-12-03
-
[PATCH 04/28] diff: Prune identical subtrees during tree_changes.
From: dborowitz, 2010-12-03
-
[PATCH 03/28] object_store: Allow MemoryObjectStore object deletion for tests.
From: dborowitz, 2010-12-03
-
[PATCH 02/28] Add tree-diffing functionality in a 'diff' module.
From: dborowitz, 2010-12-03
-
[PATCH 01/28] objects: Allow tree entries to be sorted by name.
From: dborowitz, 2010-12-03