← Back to team overview

launchpad-reviewers team mailing list archive

Re: [Merge] lp:~stub/launchpad/swift-librarian into lp:launchpad

 

On Wed, Oct 23, 2013 at 11:51 AM, William Grant <me@xxxxxxxxxxxxxxxxxx> wrote:

> Review: Needs Fixing code
>
> 478     + container = swift.SWIFT_CONTAINER_PREFIX + str(container_num)
> 479     + names = sorted(
> 480     + swift_connection.get_container(
> 481     + container, full_listing=True)[1],
> 482     + key=lambda x: int(x['name']))
>
> This loop will terminate cleanly at the first missing block of 1 million files, but we have gaps of 16 million due to cross-volume symlink dances.

Urgh. Correct fix is to update the mock Swift to support listing
containers. But instead I've cheated and am just calculating the last
container name so we know when to stop looking.


> 728     + # Walk the Librarian on disk file store, searching for matching
> 729     + # files that may need to be copied into Swift. We need to follow
> 730     + # symlinks as they are being used span disk partitions.
> 731     + for dirpath, dirnames, filenames in os.walk(fs_root, followlinks=True):
>
> Is it worth doing the walk rather than just checking whether each file exists by brute force?

I think so, yes. During the transition period, it makes little
difference. Once we start actually removing files from local disk we
will only ever see new uploads, so an os.walk over a handful of files
vs. several million stats almost entirely 'file not found'.


> 749     + # Skip files which have been modified recently, as they
> 750     + # may be uploads still in progress.
> 751     + if os.path.getmtime(fs_path) > time.time() - (60 * 60):
>
> librariangc uses 24 hours. Might be worth being consistent.

Yup. It means our disk will need to buffer at least 1 days of uploads,
but that seems fine. Changed.

This also exposed a bug where tests would only pass in timezones UTC+1
or greater. Relevant tests now have the on-disk timestamps set into
the past so they are not detected as recent uploads and ignored.


> 776     + except swiftclient.ClientException as x:
> 777     + if x.http_status != 404:
> 778     + raise
>
> A 404 isn't a separate exception class?

Nope. Every time I wrote this stanza here and in another project I
felt like writing a Pythonic API for the Swift Python API. Maybe I
should have. Or submitted patches upstream.

I think I'm going to take what I've learned here and on my SwiftWAL
project and wrap the low level Python API into something usable
without having to first digest the Swift HTTP API documentation.


> 783     + swift_connection.head_object(container, obj_name)
> 784     + log.debug(
> 785     + "{0} already exists in Swift({1}, {2})".format(
> 786     + lfc, container, obj_name))
>
> Is it worth comparing the hash, or at least the size?

I can compare the size for non-large objects (this branch doesn't
support large objects, but the next update will). The mock swift
doesn't give me a hash in the headers, and I would need to calculate
the hash on the local file myself since this method doesn't have a
database connection around. I've updated to abort if there is a size
mismatch for <5GB sized files. Even this is probably unnecessary,
given the next bit of this review.


> 793     + swift_connection.put_object(
> 794     + container, obj_name,
> 795     + open(fs_path, 'rb'), os.path.getsize(fs_path))
>
> I'd really like to check that Swift agrees with the hash that we have in the DB. The chance of some kind of undetected corruption during the conversion of over 14TiB of data sounds non-negligible.

I've looked into this, and found that the way we should do this is pto
compare the etag returned from Swift with the md5 hash. <aside>This is
all stuff the Python library should be doing :-/</aside>

This will become more complicated with large files (the returned hash
is the hash of all the segment hashes concatenated), but that is the
next branch. I'm using the hash of the file on disk, rather than
whatever is stored in the db, as at the moment this code does not have
a database connection.


> 813     + max_objects_per_container = 1000000
> 814     +
> 815     + container_num = lfc_id // max_objects_per_container
> 816     +
> 817     + return (SWIFT_CONTAINER_PREFIX + str(container_num), str(lfc_id))
>
> All current storage is hex based, but I guess moving to decimal isn't terrible.

I think we just blindly copied Squid with the original filesystem
layout. It seemed silly with Swift to keep the 1990's design in play.


> 887     +class ConnectionPool:
> 888     + MAX_POOL_SIZE = 10
>
> This seems like it should be a fairly common pattern. There's no common implementation used by other swiftclient consumers?

Not a simple one like this that I can see. There is something overly
complex in swiftclient.multithreading, unlikely to be helpful with
Twisted.


> 915     + def _new_connection(self):
> 916     + return swiftclient.Connection(
> 917     + authurl=os.environ.get('OS_AUTH_URL', None),
> 918     + user=os.environ.get('OS_USERNAME', None),
> 919     + key=os.environ.get('OS_PASSWORD', None),
> 920     + tenant_name=os.environ.get('OS_TENANT_NAME', None),
> 921     + auth_version='2.0',
> 922     + )
>
> Should these settings be read from a config file?

I'm not sure. By using the standard OpenStack environment variables,
you get to share credentials with other tools like the swift(1) tool.
On the other hand, it is a bit magic.


> 1595    + swift_enabled = getFeatureFlag('librarian.swift.enabled') or False
> 1596    + if swift_enabled:
>
> We'd normally just say "if getFeatureFlag('librarian.swift.enabled'):".

Fixed.


> 1614    + @write_transaction
> 1615    + def _getFileAlias_swift(self, aliasID, token, path):
>
> This is identical to _getFileAlias, except that it also returns filesize.

Yes. I wanted to leave the existing code paths as untouched as
possible so we keep them until we turn on the feature flag. When we
drop the feature flag I'm hoping it is just a simple case of removing
the original methods and classes, and renaming the _swift versions
into place.


> 1631    + @defer.inlineCallbacks
> 1632    + def _cb_getFileAlias_swift(
> 1633    + self,
> 1634    + (dbcontentID, dbfilename, mimetype, date_created, size,
> 1635    + restricted),
> 1636    + filename, request
> 1637    + ):
>
> Most of this method is suspiciously identical to _cb_getFileAlias.

It is similar. I think the original is untouched. The Swift variant
gets the inlineCallbacks decorator allowing it to not block when it is
opening a connection to Swift.


> 1675    +class File_swift(static.File):
>
> That's no class name.

Right. I was being consistent with the plan to replace Foo with
Foo_swift when dropping the feature flag. I can CamelCase it if it
grates too much.


> 1678    + def __init__(
> 1679    + self, contentType, encoding, modification_time, stream, size):
>
> This is almost identical to File.__init__.

Yes, leaving the original File untouched.

> 1713    + def makeProducer(self, request, fileForReading):
> 1714    + # Unfortunately, by overriding the static.File's more
> 1715    + # complex makeProducer method we lose HTTP range support.
>
> This seems reasonable. I assume FileProducer will just ignore the Range bit of the request and return the whole thing.

Yes. FileProducer is a subclass of NoRangeStaticProducer which doesn't
look for the headers.


> 1730    +class FileProducer(static.NoRangeStaticProducer):
> 1731    + @defer.inlineCallbacks
> 1732    + def resumeProducing(self):
>
> resumeProducing's callers don't expect a deferred AFAICT. Are you sure this actually does anything?

It doesn't return a deferred - the call site of an inlineCallbacks
decorated method ends up with what is sent from returnValue (in this
case, None because there isn't one). The decorator means we can use
yield to avoid blocking calls. And test_librarian_serves_from_swift
fails if I remove it :)


> 1748    === added file 'lib/lp/services/twistedsupport/features.py'
> 1749    --- lib/lp/services/twistedsupport/features.py 1970-01-01 00:00:00 +0000
> 1750    +++ lib/lp/services/twistedsupport/features.py 2013-09-23 13:55:36 +0000
>
> Did you consider implementing this as a LoopingCall?

No, I didn't know about LoopingCall. I'll leave the current
implementation if that is ok, as it copes with the
twisted.flags.refresh flag getting a new value after it is started up.
This feature looks a bit messy with LoopingCall.



-- 
Stuart Bishop <stuart.bishop@xxxxxxxxxxxxx>

https://code.launchpad.net/~stub/launchpad/swift-librarian/+merge/179142
Your team Launchpad code reviewers is subscribed to branch lp:launchpad.


Follow ups

References