dulwich-users team mailing list archive
-
dulwich-users team
-
Mailing list archive
-
Message #00524
[PATCH 0/33] Rewrite and speed up pack inflation
The first of several patchbombs of code in use by the servers at code.google.com :)
This one accomplishes two big things:
1. A big rewrite of the pack inflation and indexing code, which organizes reads
around chains of deltas. This guarantees that each object is read and inflated
exactly once by organizing reads around delta chains. Overall, improves pack
indexing performance of large packs like linux-2.6.git by at least 4x, and
is "only" 2-3x slower than the optimized C git implementation
2. Adding an UnpackedGitObject that encapsulates some of the data formerly
passed around as tuples from the various functions in pack.py. Rather than
constant multiple return value packing and unpacking, just pass around single
objects and mutate their state.
Various additional cleanups on top of these.
4c6e275 pack: Standardize on single quotes.
8ec2987 pack: Clean up unpack_object.
9b3fc71 pack: Compute CRC32 during object unpacking.
de42ceb pack: Inline PackObjectIterator.
764ec04 object_store: Fix return type of MemoryObjectStore.get_raw.
3c4ec6a pack: Add a DeltaChainIterator for faster iteration of PackData.
5cd5f24 Make the server thread raise errors in compat tests.
8dceaac server: Fix short-circuit behavior for no-op fetches.
5651324 pack: Add a PackIndexer to index packs more quickly.
5690a9b pack: PackStreamReader SHA calculation and docstring cleanup.
7caefca pack: Expose which refs were external in DeltaChainIterator.
280f4b0 pack: Allow write_pack_object to compute a SHA.
f2000f2 server: Make PackStreamCopier optionally record delta chains.
3ded386 tests: Move write_pack_data to utils.build_pack.
b18c613 misc: Add SEEK_CUR.
2e0ffd3 pack: Include offset in PackStreamReader results.
d8eb15a tests/utils: Pass a file object into build_pack.
0d9deaa Move PackStreamReader from server to pack.
1e8d184 pack: use SEEK_END for PackData.get_stored_checksum().
b032b1e test_pack: Test checksum and length mismatch conditions.
9286dd6 pack: Extract a method to check pack length and SHA.
fe363ec pack: Extract a function to compute the SHA of a file.
17e1378 Rewrite add_thin_pack to use the fast PackIndexer.
9facb62 pack: Pass a zlib buffer size through to read_zlib_chunks.
870e006 pack: Fix a buffering issue with PackStreamReader; add tests.
40b145c pack: Add PackInflater to quickly inflate pack objects.
3f7b7dc pack: Nuke ThinPackData.
a2a6078 _compat: Use namedtuple recipe rather than hard-coding.
911076c _compat: Inline specific namedtuple instances.
23095f8 pack: Create an _UnpackedObject for better encapsulation.
00e2b20 pack: Add option to include compressed data in _UnpackedObjects.
91aba9d pack: Remove comp_len from _UnpackedObject.
d0174c1 pack: Extract a function to write a packed object header.
dulwich/_compat.py | 203 ++++++----
dulwich/diff_tree.py | 6 +-
dulwich/object_store.py | 116 ++++--
dulwich/objects.py | 6 +-
dulwich/pack.py | 766 ++++++++++++++++++++++++-----------
dulwich/repo.py | 2 -
dulwich/server.py | 80 ++--
dulwich/tests/compat/test_server.py | 3 +-
dulwich/tests/compat/test_web.py | 11 +-
dulwich/tests/test_object_store.py | 29 ++-
dulwich/tests/test_pack.py | 435 ++++++++++++++++++--
dulwich/tests/test_server.py | 2 +-
dulwich/tests/utils.py | 83 ++++
13 files changed, 1287 insertions(+), 455 deletions(-)
Follow ups
-
Re: [PATCH 0/33] Rewrite and speed up pack inflation
From: Jelmer Vernooij, 2011-07-27
-
[PATCH 26/33] pack: Add PackInflater to quickly inflate pack objects.
From: dborowitz, 2011-07-27
-
Re: [PATCH 0/33] Rewrite and speed up pack inflation
From: Dave Borowitz, 2011-07-27
-
[PATCH 33/33] pack: Extract a function to write a packed object header.
From: dborowitz, 2011-07-27
-
[PATCH 32/33] pack: Remove comp_len from _UnpackedObject.
From: dborowitz, 2011-07-27
-
[PATCH 31/33] pack: Add option to include compressed data in _UnpackedObjects.
From: dborowitz, 2011-07-27
-
[PATCH 30/33] pack: Create an _UnpackedObject for better encapsulation.
From: dborowitz, 2011-07-27
-
[PATCH 29/33] _compat: Inline specific namedtuple instances.
From: dborowitz, 2011-07-27
-
[PATCH 28/33] _compat: Use namedtuple recipe rather than hard-coding.
From: dborowitz, 2011-07-27
-
[PATCH 27/33] pack: Nuke ThinPackData.
From: dborowitz, 2011-07-27
-
[PATCH 25/33] pack: Fix a buffering issue with PackStreamReader; add tests.
From: dborowitz, 2011-07-27
-
[PATCH 24/33] pack: Pass a zlib buffer size through to read_zlib_chunks.
From: dborowitz, 2011-07-27
-
[PATCH 22/33] pack: Extract a function to compute the SHA of a file.
From: dborowitz, 2011-07-27
-
[PATCH 23/33] Rewrite add_thin_pack to use the fast PackIndexer.
From: dborowitz, 2011-07-27
-
[PATCH 21/33] pack: Extract a method to check pack length and SHA.
From: dborowitz, 2011-07-27
-
[PATCH 20/33] test_pack: Test checksum and length mismatch conditions.
From: dborowitz, 2011-07-27
-
[PATCH 19/33] pack: use SEEK_END for PackData.get_stored_checksum().
From: dborowitz, 2011-07-27
-
[PATCH 18/33] Move PackStreamReader from server to pack.
From: dborowitz, 2011-07-27
-
[PATCH 17/33] tests/utils: Pass a file object into build_pack.
From: dborowitz, 2011-07-27
-
[PATCH 16/33] pack: Include offset in PackStreamReader results.
From: dborowitz, 2011-07-27
-
[PATCH 15/33] misc: Add SEEK_CUR.
From: dborowitz, 2011-07-27
-
[PATCH 14/33] tests: Move write_pack_data to utils.build_pack.
From: dborowitz, 2011-07-27
-
[PATCH 13/33] server: Make PackStreamCopier optionally record delta chains.
From: dborowitz, 2011-07-27
-
[PATCH 12/33] pack: Allow write_pack_object to compute a SHA.
From: dborowitz, 2011-07-27
-
[PATCH 10/33] pack: PackStreamReader SHA calculation and docstring cleanup.
From: dborowitz, 2011-07-27
-
[PATCH 11/33] pack: Expose which refs were external in DeltaChainIterator.
From: dborowitz, 2011-07-27
-
[PATCH 09/33] pack: Add a PackIndexer to index packs more quickly.
From: dborowitz, 2011-07-27
-
[PATCH 08/33] server: Fix short-circuit behavior for no-op fetches.
From: dborowitz, 2011-07-27
-
[PATCH 07/33] Make the server thread raise errors in compat tests.
From: dborowitz, 2011-07-27
-
[PATCH 06/33] pack: Add a DeltaChainIterator for faster iteration of PackData.
From: dborowitz, 2011-07-27
-
[PATCH 05/33] object_store: Fix return type of MemoryObjectStore.get_raw.
From: dborowitz, 2011-07-27
-
[PATCH 04/33] pack: Inline PackObjectIterator.
From: dborowitz, 2011-07-27
-
[PATCH 03/33] pack: Compute CRC32 during object unpacking.
From: dborowitz, 2011-07-27
-
[PATCH 02/33] pack: Clean up unpack_object.
From: dborowitz, 2011-07-27
-
[PATCH 01/33] pack: Standardize on single quotes.
From: dborowitz, 2011-07-27