← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1611443] [NEW] nova-scheduler doesn't account for create-new-volume disk space when using ceph

 

Public bug reported:


Description
===========

It seems that nova-scheduler may not account for disk space
appropriately when creating a new instance using a new cinder volume. We
have ceph backing cinder and glance, so in theory if we spin up a new
instance (boot from image create new volume) that is backed by ceph, the
scheduler should only take ceph disk space into account. However, it
seems like it may take local disk space on compute nodes (we have this
being used for ephemeral disks if not using cinder) when scheduling.

This causes an issue if we have limited local disk space but plenty of
storage space in ceph, and we try to spin up a new instance fully backed
by ceph but based on a flavor with root disk specification too large for
local nodes (even though this gets overwritten when spinning up on new
volume since you manually specify volume size). The instance fails to
boot.

Steps to reproduce
==================

1. Environment uses ephemeral storage local to compute nodes, ceph backs cinder/glance.
2. Create a flavor that has root disk size > available ephemeral storage on compute nodes.
3. Launch instance from image (create new volume) so it's fully backed by ceph and it should not need the ephemeral storage on compute nodes, using the previously created flavor. Specify a disk size for new volume that is smaller than available ceph space but larger than ephemeral disk
4. Instance will fail to launch and drop errors pasted below.

Now,

1. Create another flavor with root disk size < available ephemeral storage on compute nodes
2. Launch instance again using same settings, so still create new volume and ensure volume is greater in size than ephemeral space.
3. Note instance launches and works no issue.

This shows that the ephemeral disk space specified on flavor has no real
affect on ability to spin up the instance outside of initial scheduling,
because that space isn't actually used when spinning up an instance
where cinder/glance is backed by ceph. The only thing is it is taken
into consideration during scheduling and it will fail to try and create
the instance if there isn't enough ephemeral space.

Logs
=====

<180>Aug  9 15:55:37 controller01 nova-scheduler: 2016-08-09 15:55:37.261 155487 WARNING nova.scheduler.host_manager [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] Host compute02.gsrt.paloaltonetworks.local has more disk space than database expected (116gb > 89gb)
<182>Aug  9 15:55:37 controller01 nova-scheduler: 2016-08-09 15:55:37.262 155487 INFO nova.filters [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] Filter DiskFilter returned 0 hosts
<182>Aug  9 15:55:37 controller01 nova-scheduler: 2016-08-09 15:55:37.263 155487 INFO nova.filters [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] Filtering removed all hosts for the request with reservation ID 'r-4ziwar65' and instance ID '6034d716-fe3b-4d41-a564-47b36ae441e5'. Filter results: ['DifferentHostFilter: (start: 2, end: 2)', 'RetryFilter: (start: 2, end: 2)', 'AvailabilityZoneFilter: (start: 2, end: 2)', 'RamFilter: (start: 2, end: 2)', 'CoreFilter: (start: 2, end: 2)', 'DiskFilter: (start: 2, end: 0)']
<180>Aug  9 15:55:37 controller01 nova-conductor: 2016-08-09 15:55:37.266 155590 WARNING nova.scheduler.utils [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] Failed to compute_task_build_instances: No valid host was found. There are not enough hosts available.
Traceback (most recent call last):

  File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 142, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 84, in select_destinations
    filter_properties)

  File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 90, in select_destinations
    raise exception.NoValidHost(reason=reason)

NoValidHost: No valid host was found. There are not enough hosts available.
<180>Aug  9 15:55:37 controller01 nova-conductor: 2016-08-09 15:55:37.267 155590 WARNING nova.scheduler.utils [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] [instance: 6034d716-fe3b-4d41-a564-47b36ae441e5] Setting instance to ERROR state.

Expected Result
===============

If ephemeral disk space will not be utilized or touched as part of
instance launching, it should not be used as part of diskfilter /
scheduling as it may lead to unnecessary errors.

Actual Result
=============

Cannot launch instance because scheduler fails find a valid host (even
if enough disk space is available)

Environment
===========

Liberty based, MOS 8.0. It looks like Mirantis has some of their own
packages for nova-scheduler, not sure if this issue exists upstream in
liberty but imagine it does.

ii  nova-api                             2:12.0.2-1~u14.04+mos21                all          OpenStack Compute - compute API frontend
ii  nova-cert                            2:12.0.2-1~u14.04+mos21                all          OpenStack Compute - certificate manager
ii  nova-common                          2:12.0.2-1~u14.04+mos21                all          OpenStack Compute - common files
ii  nova-conductor                       2:12.0.2-1~u14.04+mos21                all          OpenStack Compute - conductor service
ii  nova-consoleauth                     2:12.0.2-1~u14.04+mos21                all          OpenStack Compute - Console Authenticator
ii  nova-consoleproxy                    2:12.0.2-1~u14.04+mos21                all          OpenStack Compute - NoVNC proxy
ii  nova-objectstore                     2:12.0.2-1~u14.04+mos21                all          OpenStack Compute - object store
ii  nova-scheduler                       2:12.0.2-1~u14.04+mos21                all          OpenStack Compute - virtual machine scheduler
ii  python-nova                          2:12.0.2-1~u14.04+mos21                all          OpenStack Compute - libraries
ii  python-novaclient                    2:2.30.2-1~u14.04+mos5                 all          client library for OpenStack Compute API

Libvert+KVM
Ceph 0.94.5 for cinder/glance. nova/ephemeral disk is using lvm I believe.
Neutron+openvswitch

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1611443

Title:
  nova-scheduler doesn't account for create-new-volume disk space when
  using ceph

Status in OpenStack Compute (nova):
  New

Bug description:
  
  Description
  ===========

  It seems that nova-scheduler may not account for disk space
  appropriately when creating a new instance using a new cinder volume.
  We have ceph backing cinder and glance, so in theory if we spin up a
  new instance (boot from image create new volume) that is backed by
  ceph, the scheduler should only take ceph disk space into account.
  However, it seems like it may take local disk space on compute nodes
  (we have this being used for ephemeral disks if not using cinder) when
  scheduling.

  This causes an issue if we have limited local disk space but plenty of
  storage space in ceph, and we try to spin up a new instance fully
  backed by ceph but based on a flavor with root disk specification too
  large for local nodes (even though this gets overwritten when spinning
  up on new volume since you manually specify volume size). The instance
  fails to boot.

  Steps to reproduce
  ==================

  1. Environment uses ephemeral storage local to compute nodes, ceph backs cinder/glance.
  2. Create a flavor that has root disk size > available ephemeral storage on compute nodes.
  3. Launch instance from image (create new volume) so it's fully backed by ceph and it should not need the ephemeral storage on compute nodes, using the previously created flavor. Specify a disk size for new volume that is smaller than available ceph space but larger than ephemeral disk
  4. Instance will fail to launch and drop errors pasted below.

  Now,

  1. Create another flavor with root disk size < available ephemeral storage on compute nodes
  2. Launch instance again using same settings, so still create new volume and ensure volume is greater in size than ephemeral space.
  3. Note instance launches and works no issue.

  This shows that the ephemeral disk space specified on flavor has no
  real affect on ability to spin up the instance outside of initial
  scheduling, because that space isn't actually used when spinning up an
  instance where cinder/glance is backed by ceph. The only thing is it
  is taken into consideration during scheduling and it will fail to try
  and create the instance if there isn't enough ephemeral space.

  Logs
  =====

  <180>Aug  9 15:55:37 controller01 nova-scheduler: 2016-08-09 15:55:37.261 155487 WARNING nova.scheduler.host_manager [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] Host compute02.gsrt.paloaltonetworks.local has more disk space than database expected (116gb > 89gb)
  <182>Aug  9 15:55:37 controller01 nova-scheduler: 2016-08-09 15:55:37.262 155487 INFO nova.filters [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] Filter DiskFilter returned 0 hosts
  <182>Aug  9 15:55:37 controller01 nova-scheduler: 2016-08-09 15:55:37.263 155487 INFO nova.filters [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] Filtering removed all hosts for the request with reservation ID 'r-4ziwar65' and instance ID '6034d716-fe3b-4d41-a564-47b36ae441e5'. Filter results: ['DifferentHostFilter: (start: 2, end: 2)', 'RetryFilter: (start: 2, end: 2)', 'AvailabilityZoneFilter: (start: 2, end: 2)', 'RamFilter: (start: 2, end: 2)', 'CoreFilter: (start: 2, end: 2)', 'DiskFilter: (start: 2, end: 0)']
  <180>Aug  9 15:55:37 controller01 nova-conductor: 2016-08-09 15:55:37.266 155590 WARNING nova.scheduler.utils [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] Failed to compute_task_build_instances: No valid host was found. There are not enough hosts available.
  Traceback (most recent call last):

    File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 142, in inner
      return func(*args, **kwargs)

    File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 84, in select_destinations
      filter_properties)

    File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 90, in select_destinations
      raise exception.NoValidHost(reason=reason)

  NoValidHost: No valid host was found. There are not enough hosts available.
  <180>Aug  9 15:55:37 controller01 nova-conductor: 2016-08-09 15:55:37.267 155590 WARNING nova.scheduler.utils [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] [instance: 6034d716-fe3b-4d41-a564-47b36ae441e5] Setting instance to ERROR state.

  Expected Result
  ===============

  If ephemeral disk space will not be utilized or touched as part of
  instance launching, it should not be used as part of diskfilter /
  scheduling as it may lead to unnecessary errors.

  Actual Result
  =============

  Cannot launch instance because scheduler fails find a valid host (even
  if enough disk space is available)

  Environment
  ===========

  Liberty based, MOS 8.0. It looks like Mirantis has some of their own
  packages for nova-scheduler, not sure if this issue exists upstream in
  liberty but imagine it does.

  ii  nova-api                             2:12.0.2-1~u14.04+mos21                all          OpenStack Compute - compute API frontend
  ii  nova-cert                            2:12.0.2-1~u14.04+mos21                all          OpenStack Compute - certificate manager
  ii  nova-common                          2:12.0.2-1~u14.04+mos21                all          OpenStack Compute - common files
  ii  nova-conductor                       2:12.0.2-1~u14.04+mos21                all          OpenStack Compute - conductor service
  ii  nova-consoleauth                     2:12.0.2-1~u14.04+mos21                all          OpenStack Compute - Console Authenticator
  ii  nova-consoleproxy                    2:12.0.2-1~u14.04+mos21                all          OpenStack Compute - NoVNC proxy
  ii  nova-objectstore                     2:12.0.2-1~u14.04+mos21                all          OpenStack Compute - object store
  ii  nova-scheduler                       2:12.0.2-1~u14.04+mos21                all          OpenStack Compute - virtual machine scheduler
  ii  python-nova                          2:12.0.2-1~u14.04+mos21                all          OpenStack Compute - libraries
  ii  python-novaclient                    2:2.30.2-1~u14.04+mos5                 all          client library for OpenStack Compute API

  Libvert+KVM
  Ceph 0.94.5 for cinder/glance. nova/ephemeral disk is using lvm I believe.
  Neutron+openvswitch

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1611443/+subscriptions