← Back to team overview

dulwich-users team mailing list archive

[PATCH 0/33] Rewrite and speed up pack inflation

 

The first of several patchbombs of code in use by the servers at code.google.com :)

This one accomplishes two big things:
1. A big rewrite of the pack inflation and indexing code, which organizes reads
   around chains of deltas. This guarantees that each object is read and inflated
   exactly once by organizing reads around delta chains. Overall, improves pack
   indexing performance of large packs like linux-2.6.git by at least 4x, and
   is "only" 2-3x slower than the optimized C git implementation
2. Adding an UnpackedGitObject that encapsulates some of the data formerly
   passed around as tuples from the various functions in pack.py. Rather than
   constant multiple return value packing and unpacking, just pass around single
   objects and mutate their state.

Various additional cleanups on top of these.

4c6e275 pack: Standardize on single quotes.
8ec2987 pack: Clean up unpack_object.
9b3fc71 pack: Compute CRC32 during object unpacking.
de42ceb pack: Inline PackObjectIterator.
764ec04 object_store: Fix return type of MemoryObjectStore.get_raw.
3c4ec6a pack: Add a DeltaChainIterator for faster iteration of PackData.
5cd5f24 Make the server thread raise errors in compat tests.
8dceaac server: Fix short-circuit behavior for no-op fetches.
5651324 pack: Add a PackIndexer to index packs more quickly.
5690a9b pack: PackStreamReader SHA calculation and docstring cleanup.
7caefca pack: Expose which refs were external in DeltaChainIterator.
280f4b0 pack: Allow write_pack_object to compute a SHA.
f2000f2 server: Make PackStreamCopier optionally record delta chains.
3ded386 tests: Move write_pack_data to utils.build_pack.
b18c613 misc: Add SEEK_CUR.
2e0ffd3 pack: Include offset in PackStreamReader results.
d8eb15a tests/utils: Pass a file object into build_pack.
0d9deaa Move PackStreamReader from server to pack.
1e8d184 pack: use SEEK_END for PackData.get_stored_checksum().
b032b1e test_pack: Test checksum and length mismatch conditions.
9286dd6 pack: Extract a method to check pack length and SHA.
fe363ec pack: Extract a function to compute the SHA of a file.
17e1378 Rewrite add_thin_pack to use the fast PackIndexer.
9facb62 pack: Pass a zlib buffer size through to read_zlib_chunks.
870e006 pack: Fix a buffering issue with PackStreamReader; add tests.
40b145c pack: Add PackInflater to quickly inflate pack objects.
3f7b7dc pack: Nuke ThinPackData.
a2a6078 _compat: Use namedtuple recipe rather than hard-coding.
911076c _compat: Inline specific namedtuple instances.
23095f8 pack: Create an _UnpackedObject for better encapsulation.
00e2b20 pack: Add option to include compressed data in _UnpackedObjects.
91aba9d pack: Remove comp_len from _UnpackedObject.
d0174c1 pack: Extract a function to write a packed object header.

 dulwich/_compat.py                  |  203 ++++++----
 dulwich/diff_tree.py                |    6 +-
 dulwich/object_store.py             |  116 ++++--
 dulwich/objects.py                  |    6 +-
 dulwich/pack.py                     |  766 ++++++++++++++++++++++++-----------
 dulwich/repo.py                     |    2 -
 dulwich/server.py                   |   80 ++--
 dulwich/tests/compat/test_server.py |    3 +-
 dulwich/tests/compat/test_web.py    |   11 +-
 dulwich/tests/test_object_store.py  |   29 ++-
 dulwich/tests/test_pack.py          |  435 ++++++++++++++++++--
 dulwich/tests/test_server.py        |    2 +-
 dulwich/tests/utils.py              |   83 ++++
 13 files changed, 1287 insertions(+), 455 deletions(-)


Follow ups