yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1732976] Re: [OSSA-2017-006] Potential DoS by rebuilding the same instance with a new image multiple times (CVE-2017-17051)

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: OpenStack Infra <1732976@xxxxxxxxxxxxxxxxxx>
Date: Wed, 06 Dec 2017 19:00:35 -0000
Reply-to: Bug 1732976 <1732976@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
Reviewed:  https://review.openstack.org/521662
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=25a1d78e83065c5bea5d8e0a017fd9d0914d41d9
Submitter: Zuul
Branch:    master

commit 25a1d78e83065c5bea5d8e0a017fd9d0914d41d9
Author: Dan Smith <dansmith@xxxxxxxxxx>
Date:   Mon Nov 20 13:24:24 2017 -0800

    Fix doubling allocations on rebuild
    
    Commit 984dd8ad6add4523d93c7ce5a666a32233e02e34 makes a rebuild
    with a new image go through the scheduler again to validate the
    image against the instance.host (we rebuild to the same host that
    the instance already lives on). This fixes the subsequent doubling
    of allocations that will occur by skipping the claim process if
    a policy-only scheduler check is being performed.
    
    Closes-Bug: #1732976
    
    Related-CVE: CVE-2017-17051
    Related-OSSA: OSSA-2017-006
    
    Change-Id: I8a9157bc76ba1068ab966c4abdbb147c500604a8


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1732976

Title:
  [OSSA-2017-006] Potential DoS by rebuilding the same instance with a
  new image multiple times (CVE-2017-17051)

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  In Progress
Status in OpenStack Security Advisory:
  Fix Released

Bug description:
  As of the fix for bug 1664931 (OSSA-2017-005, CVE-2017-16239), a
  regression was introduced which allows a potential denial of service.

  Once all computes are upgraded to >=Pike and using the (default)
  FilterScheduler, a rebuild with a new image will go through the
  scheduler. The FilterScheduler doesn't know that this is a rebuild on
  the same host and creates VCPU/MEMORY_MB/DISK_GB allocations in
  Placement against the compute node that the instance is running on.
  The ResourceTracker in the nova-compute service will not adjust the
  allocations after the rebuild, so what can happen is over multiple
  rebuilds of the same instance with a new image, the Placement service
  will report the compute node as not having any capacity left and will
  take it out of scheduling consideration.

  Eventually the rebuild would fail once the compute node is at
  capacity, but an attacker could then simply create a new instance (on
  a new host) and start the process all over again.

  I have a recreate of the bug here:
  https://review.openstack.org/#/c/521153/

  This would not be a problem for anyone using another scheduler driver
  since only FilterScheduler uses Placement, and it wouldn't be a
  problem for any deployment that still has at least one compute service
  running Ocata code, because the ResourceTracker in the nova-compute
  service will adjust the allocations every 60 seconds.

  Beyond this issue, however, there are other problems with the fix for
  bug 1664931:

  1. Even if you're not using the FilterScheduler, e.g. using
  CachingScheduler, with the RamFilter or DiskFilter or CoreFilter
  enabled, if the compute node that the instance is running on is at
  capacity, a rebuild with a new image may still fail whereas before it
  wouldn't. This is a regression in behavior and the user would have to
  delete and recreate the instance with the new image.

  2. Before the fix for bug 1664931, one could rebuild an instance on a
  disabled compute service, but now they cannot if the ComputeFilter is
  enabled (which it is by default and presumably enabled in all
  deployments).

  3. Because of the way instance.image_ref is used with volume-backed
  instances, we are now *always* going through the scheduler during
  rebuild of a volume-backed instance, regardless of whether or not the
  image ref provided to the rebuild API is the same as the original in
  the root disk. I've already reported bug 1732947 for this.

  --

  The nova team has looked at some potential solutions, but at this
  point none of them are straightforward, and some involve using
  scheduler hints which are tied to filters that are not enabled by
  default (e.g. using the same_host scheduler hint which requires that
  the SameHostFilter is enabled). Hacking a fix in would likely result
  in more bugs in subtle or unforeseen ways not caught during testing.

  Long-term we think a better way to fix the rebuild + new image
  validation is to categorize each scheduler filter as being a
  'resource' or 'policy' filter, and with a rebuild + new image, we only
  run filters that are for policy constraints (like
  ImagePropertiesFilter) and not run RamFilter/DiskFilter/CoreFilter (or
  Placement for that matter). This would likely require an internal RPC
  API version change on the nova-scheduler interface, which is something
  we wouldn't want to backport to stable branches because of upgrade
  implications with the RPC API version bump.

  At this point it might be best to just revert the fix for bug 1664931.
  We can still revert that through all of the upstream branches that the
  fix was applied to (newton is not EOL yet). This is obviously a pain
  for downstream consumers that have picked up and put out fixes for the
  CVE already. It would also mean publishing an errata for
  CVE-2017-16239 (we have to do that anyway probably) and saying it's
  now no longer fixed but is a publicly known issue.

  Another possible alternative is shipping a new policy rule in nova
  that allows operators to disable rebuilding an instance with a new
  image, so they could decide based on the types of images and scheduler
  configuration they have if rebuilding with a new image is safe. Public
  and private cloud providers might see that rule useful in different
  ways, e.g. disable rebuild with a new image if you allow tenants to
  upload their own images to your cloud.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1732976/+subscriptions