yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #54842
[Bug 1611443] [NEW] nova-scheduler doesn't account for create-new-volume disk space when using ceph
Public bug reported:
Description
===========
It seems that nova-scheduler may not account for disk space
appropriately when creating a new instance using a new cinder volume. We
have ceph backing cinder and glance, so in theory if we spin up a new
instance (boot from image create new volume) that is backed by ceph, the
scheduler should only take ceph disk space into account. However, it
seems like it may take local disk space on compute nodes (we have this
being used for ephemeral disks if not using cinder) when scheduling.
This causes an issue if we have limited local disk space but plenty of
storage space in ceph, and we try to spin up a new instance fully backed
by ceph but based on a flavor with root disk specification too large for
local nodes (even though this gets overwritten when spinning up on new
volume since you manually specify volume size). The instance fails to
boot.
Steps to reproduce
==================
1. Environment uses ephemeral storage local to compute nodes, ceph backs cinder/glance.
2. Create a flavor that has root disk size > available ephemeral storage on compute nodes.
3. Launch instance from image (create new volume) so it's fully backed by ceph and it should not need the ephemeral storage on compute nodes, using the previously created flavor. Specify a disk size for new volume that is smaller than available ceph space but larger than ephemeral disk
4. Instance will fail to launch and drop errors pasted below.
Now,
1. Create another flavor with root disk size < available ephemeral storage on compute nodes
2. Launch instance again using same settings, so still create new volume and ensure volume is greater in size than ephemeral space.
3. Note instance launches and works no issue.
This shows that the ephemeral disk space specified on flavor has no real
affect on ability to spin up the instance outside of initial scheduling,
because that space isn't actually used when spinning up an instance
where cinder/glance is backed by ceph. The only thing is it is taken
into consideration during scheduling and it will fail to try and create
the instance if there isn't enough ephemeral space.
Logs
=====
<180>Aug 9 15:55:37 controller01 nova-scheduler: 2016-08-09 15:55:37.261 155487 WARNING nova.scheduler.host_manager [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] Host compute02.gsrt.paloaltonetworks.local has more disk space than database expected (116gb > 89gb)
<182>Aug 9 15:55:37 controller01 nova-scheduler: 2016-08-09 15:55:37.262 155487 INFO nova.filters [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] Filter DiskFilter returned 0 hosts
<182>Aug 9 15:55:37 controller01 nova-scheduler: 2016-08-09 15:55:37.263 155487 INFO nova.filters [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] Filtering removed all hosts for the request with reservation ID 'r-4ziwar65' and instance ID '6034d716-fe3b-4d41-a564-47b36ae441e5'. Filter results: ['DifferentHostFilter: (start: 2, end: 2)', 'RetryFilter: (start: 2, end: 2)', 'AvailabilityZoneFilter: (start: 2, end: 2)', 'RamFilter: (start: 2, end: 2)', 'CoreFilter: (start: 2, end: 2)', 'DiskFilter: (start: 2, end: 0)']
<180>Aug 9 15:55:37 controller01 nova-conductor: 2016-08-09 15:55:37.266 155590 WARNING nova.scheduler.utils [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] Failed to compute_task_build_instances: No valid host was found. There are not enough hosts available.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 142, in inner
return func(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 84, in select_destinations
filter_properties)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 90, in select_destinations
raise exception.NoValidHost(reason=reason)
NoValidHost: No valid host was found. There are not enough hosts available.
<180>Aug 9 15:55:37 controller01 nova-conductor: 2016-08-09 15:55:37.267 155590 WARNING nova.scheduler.utils [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] [instance: 6034d716-fe3b-4d41-a564-47b36ae441e5] Setting instance to ERROR state.
Expected Result
===============
If ephemeral disk space will not be utilized or touched as part of
instance launching, it should not be used as part of diskfilter /
scheduling as it may lead to unnecessary errors.
Actual Result
=============
Cannot launch instance because scheduler fails find a valid host (even
if enough disk space is available)
Environment
===========
Liberty based, MOS 8.0. It looks like Mirantis has some of their own
packages for nova-scheduler, not sure if this issue exists upstream in
liberty but imagine it does.
ii nova-api 2:12.0.2-1~u14.04+mos21 all OpenStack Compute - compute API frontend
ii nova-cert 2:12.0.2-1~u14.04+mos21 all OpenStack Compute - certificate manager
ii nova-common 2:12.0.2-1~u14.04+mos21 all OpenStack Compute - common files
ii nova-conductor 2:12.0.2-1~u14.04+mos21 all OpenStack Compute - conductor service
ii nova-consoleauth 2:12.0.2-1~u14.04+mos21 all OpenStack Compute - Console Authenticator
ii nova-consoleproxy 2:12.0.2-1~u14.04+mos21 all OpenStack Compute - NoVNC proxy
ii nova-objectstore 2:12.0.2-1~u14.04+mos21 all OpenStack Compute - object store
ii nova-scheduler 2:12.0.2-1~u14.04+mos21 all OpenStack Compute - virtual machine scheduler
ii python-nova 2:12.0.2-1~u14.04+mos21 all OpenStack Compute - libraries
ii python-novaclient 2:2.30.2-1~u14.04+mos5 all client library for OpenStack Compute API
Libvert+KVM
Ceph 0.94.5 for cinder/glance. nova/ephemeral disk is using lvm I believe.
Neutron+openvswitch
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1611443
Title:
nova-scheduler doesn't account for create-new-volume disk space when
using ceph
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
It seems that nova-scheduler may not account for disk space
appropriately when creating a new instance using a new cinder volume.
We have ceph backing cinder and glance, so in theory if we spin up a
new instance (boot from image create new volume) that is backed by
ceph, the scheduler should only take ceph disk space into account.
However, it seems like it may take local disk space on compute nodes
(we have this being used for ephemeral disks if not using cinder) when
scheduling.
This causes an issue if we have limited local disk space but plenty of
storage space in ceph, and we try to spin up a new instance fully
backed by ceph but based on a flavor with root disk specification too
large for local nodes (even though this gets overwritten when spinning
up on new volume since you manually specify volume size). The instance
fails to boot.
Steps to reproduce
==================
1. Environment uses ephemeral storage local to compute nodes, ceph backs cinder/glance.
2. Create a flavor that has root disk size > available ephemeral storage on compute nodes.
3. Launch instance from image (create new volume) so it's fully backed by ceph and it should not need the ephemeral storage on compute nodes, using the previously created flavor. Specify a disk size for new volume that is smaller than available ceph space but larger than ephemeral disk
4. Instance will fail to launch and drop errors pasted below.
Now,
1. Create another flavor with root disk size < available ephemeral storage on compute nodes
2. Launch instance again using same settings, so still create new volume and ensure volume is greater in size than ephemeral space.
3. Note instance launches and works no issue.
This shows that the ephemeral disk space specified on flavor has no
real affect on ability to spin up the instance outside of initial
scheduling, because that space isn't actually used when spinning up an
instance where cinder/glance is backed by ceph. The only thing is it
is taken into consideration during scheduling and it will fail to try
and create the instance if there isn't enough ephemeral space.
Logs
=====
<180>Aug 9 15:55:37 controller01 nova-scheduler: 2016-08-09 15:55:37.261 155487 WARNING nova.scheduler.host_manager [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] Host compute02.gsrt.paloaltonetworks.local has more disk space than database expected (116gb > 89gb)
<182>Aug 9 15:55:37 controller01 nova-scheduler: 2016-08-09 15:55:37.262 155487 INFO nova.filters [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] Filter DiskFilter returned 0 hosts
<182>Aug 9 15:55:37 controller01 nova-scheduler: 2016-08-09 15:55:37.263 155487 INFO nova.filters [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] Filtering removed all hosts for the request with reservation ID 'r-4ziwar65' and instance ID '6034d716-fe3b-4d41-a564-47b36ae441e5'. Filter results: ['DifferentHostFilter: (start: 2, end: 2)', 'RetryFilter: (start: 2, end: 2)', 'AvailabilityZoneFilter: (start: 2, end: 2)', 'RamFilter: (start: 2, end: 2)', 'CoreFilter: (start: 2, end: 2)', 'DiskFilter: (start: 2, end: 0)']
<180>Aug 9 15:55:37 controller01 nova-conductor: 2016-08-09 15:55:37.266 155590 WARNING nova.scheduler.utils [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] Failed to compute_task_build_instances: No valid host was found. There are not enough hosts available.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 142, in inner
return func(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 84, in select_destinations
filter_properties)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 90, in select_destinations
raise exception.NoValidHost(reason=reason)
NoValidHost: No valid host was found. There are not enough hosts available.
<180>Aug 9 15:55:37 controller01 nova-conductor: 2016-08-09 15:55:37.267 155590 WARNING nova.scheduler.utils [req-c10f09a5-2909-4f84-bb01-2ff0114f2a43 9c9903e9d5d24411abfc6d9867ab054e 3bbe4dd24a5f497ca081d5b22011d39e - - -] [instance: 6034d716-fe3b-4d41-a564-47b36ae441e5] Setting instance to ERROR state.
Expected Result
===============
If ephemeral disk space will not be utilized or touched as part of
instance launching, it should not be used as part of diskfilter /
scheduling as it may lead to unnecessary errors.
Actual Result
=============
Cannot launch instance because scheduler fails find a valid host (even
if enough disk space is available)
Environment
===========
Liberty based, MOS 8.0. It looks like Mirantis has some of their own
packages for nova-scheduler, not sure if this issue exists upstream in
liberty but imagine it does.
ii nova-api 2:12.0.2-1~u14.04+mos21 all OpenStack Compute - compute API frontend
ii nova-cert 2:12.0.2-1~u14.04+mos21 all OpenStack Compute - certificate manager
ii nova-common 2:12.0.2-1~u14.04+mos21 all OpenStack Compute - common files
ii nova-conductor 2:12.0.2-1~u14.04+mos21 all OpenStack Compute - conductor service
ii nova-consoleauth 2:12.0.2-1~u14.04+mos21 all OpenStack Compute - Console Authenticator
ii nova-consoleproxy 2:12.0.2-1~u14.04+mos21 all OpenStack Compute - NoVNC proxy
ii nova-objectstore 2:12.0.2-1~u14.04+mos21 all OpenStack Compute - object store
ii nova-scheduler 2:12.0.2-1~u14.04+mos21 all OpenStack Compute - virtual machine scheduler
ii python-nova 2:12.0.2-1~u14.04+mos21 all OpenStack Compute - libraries
ii python-novaclient 2:2.30.2-1~u14.04+mos5 all client library for OpenStack Compute API
Libvert+KVM
Ceph 0.94.5 for cinder/glance. nova/ephemeral disk is using lvm I believe.
Neutron+openvswitch
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1611443/+subscriptions