yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1884596] [NEW] image import copy-to-store will start multiple importing threads due to race condition

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Dan Smith <1884596@xxxxxxxxxxxxxxxxxx>
Date: Mon, 22 Jun 2020 19:52:17 -0000
Reply-to: Bug 1884596 <1884596@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
Public bug reported:

I'm filing this bug a little prematurely because Abhi and I didn't get a
chance to fully discuss it. However, looking at the code and the
behavior I'm seeing due to another bug (1884587), I feel rather
confident.

Especially in a situation where glance is running on multiple control
plane nodes (i.e. any real-world situation), I believe there is a race
condition whereby two closely-timed requests to copy an image to a store
will result in two copy operations in glance proceeding in parallel. I
believe this to be the case due to a common "test-and-set that isn't
atomic" error.

In the API layer, glance checks that an import copy-to-store operation
isn't already in progress here:

https://github.com/openstack/glance/blob/e6db0b10a703037f754007bef6f56451086850cd/glance/api/v2/images.py#L167

And if that passes, it proceeds to setup the task as a thread here:

https://github.com/openstack/glance/blob/e6db0b10a703037f754007bef6f56451086850cd/glance/api/v2/images.py#L197

which may start running immediately or sometime in the future. Once
running, that code updates a property on the image to indicate that the
task is running here:

https://github.com/openstack/glance/blob/e6db0b10a703037f754007bef6f56451086850cd/glance/async_/flows/api_image_import.py#L479-L484

Between those two events, if another API user makes the same request,
glance will not realize that a thread is already running to complete the
initial task and will start another. In a situation where a user spawns
a thousand new instances to a thousand compute nodes in a single
operation where the image needs copying first, it's highly plausible to
have _many_ duplicate glance operations going, impacting write
performance on the rbd cluster at the very least.

As evidence that this can happen, we see an abnormally extended race
window because of the aforementioned bug (1884587) where we fail to
update the property that indicates the task is running. In a test we see
a large number of them get started, followed by a cascade of failures
when they fail to update that image property, implying that many such
threads are running. If this situation is allowed to happen when the
property does *not* fail to update, I believe we would end up with
glance copying the image to the destination in multiple threads
simultaneously. That is much harder to simulate in practice in a
development environment, but the other bug makes it happen every time
since we never update the image property to prevent it and thus the
window is long.

Abhi also brought up the case where if this race occurs on the same
node, the second attempt *may* actually start copying the partial image
in the staging directory to the destination, finish early, and then mark
the image as "copied to $store" such that nova will attempt to use the
partial image immediately, resulting in a corrupted disk and various
levels of failure after that. Note that it's not clear if that's really
possible or not, but I'm putting it here so the glance gurus can
validate.

The use of the os_glance_importing_to_stores property to "lock" a copy
to a particular store is good, except that updating that list atomically
means that the above mentioned race will not have anything to check
after the update to see if it was the race loser. I don't see any checks
in the persistence layer to ensure that an UPDATE to the row with this
property doesn't already have a given store in it, or do any kind of
merge. This also leads me to worry that two parallel requests to copy an
image to two different stores may result in clobbering the list of
stores-in-progress and potentially also the final list of stores at
rest. This is just conjecture at this point, I just haven't seen
anywhere that situation is accounted for.

** Affects: glance
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1884596

Title:
  image import copy-to-store will start multiple importing threads due
  to race condition

Status in Glance:
  New

Bug description:
  I'm filing this bug a little prematurely because Abhi and I didn't get
  a chance to fully discuss it. However, looking at the code and the
  behavior I'm seeing due to another bug (1884587), I feel rather
  confident.

  Especially in a situation where glance is running on multiple control
  plane nodes (i.e. any real-world situation), I believe there is a race
  condition whereby two closely-timed requests to copy an image to a
  store will result in two copy operations in glance proceeding in
  parallel. I believe this to be the case due to a common "test-and-set
  that isn't atomic" error.

  In the API layer, glance checks that an import copy-to-store operation
  isn't already in progress here:

  https://github.com/openstack/glance/blob/e6db0b10a703037f754007bef6f56451086850cd/glance/api/v2/images.py#L167

  And if that passes, it proceeds to setup the task as a thread here:

  https://github.com/openstack/glance/blob/e6db0b10a703037f754007bef6f56451086850cd/glance/api/v2/images.py#L197

  which may start running immediately or sometime in the future. Once
  running, that code updates a property on the image to indicate that
  the task is running here:

  https://github.com/openstack/glance/blob/e6db0b10a703037f754007bef6f56451086850cd/glance/async_/flows/api_image_import.py#L479-L484

  Between those two events, if another API user makes the same request,
  glance will not realize that a thread is already running to complete
  the initial task and will start another. In a situation where a user
  spawns a thousand new instances to a thousand compute nodes in a
  single operation where the image needs copying first, it's highly
  plausible to have _many_ duplicate glance operations going, impacting
  write performance on the rbd cluster at the very least.

  As evidence that this can happen, we see an abnormally extended race
  window because of the aforementioned bug (1884587) where we fail to
  update the property that indicates the task is running. In a test we
  see a large number of them get started, followed by a cascade of
  failures when they fail to update that image property, implying that
  many such threads are running. If this situation is allowed to happen
  when the property does *not* fail to update, I believe we would end up
  with glance copying the image to the destination in multiple threads
  simultaneously. That is much harder to simulate in practice in a
  development environment, but the other bug makes it happen every time
  since we never update the image property to prevent it and thus the
  window is long.

  Abhi also brought up the case where if this race occurs on the same
  node, the second attempt *may* actually start copying the partial
  image in the staging directory to the destination, finish early, and
  then mark the image as "copied to $store" such that nova will attempt
  to use the partial image immediately, resulting in a corrupted disk
  and various levels of failure after that. Note that it's not clear if
  that's really possible or not, but I'm putting it here so the glance
  gurus can validate.

  The use of the os_glance_importing_to_stores property to "lock" a copy
  to a particular store is good, except that updating that list
  atomically means that the above mentioned race will not have anything
  to check after the update to see if it was the race loser. I don't see
  any checks in the persistence layer to ensure that an UPDATE to the
  row with this property doesn't already have a given store in it, or do
  any kind of merge. This also leads me to worry that two parallel
  requests to copy an image to two different stores may result in
  clobbering the list of stores-in-progress and potentially also the
  final list of stores at rest. This is just conjecture at this point, I
  just haven't seen anywhere that situation is accounted for.

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1884596/+subscriptions
Follow ups

[Bug 1884596] Re: image import copy-image will start multiple importing threads due to race condition
From: OpenStack Infra, 2020-08-25