yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #09075
[Bug 1053931] Re: Volume hangs in "creating" status even though scheduler raises "No valid host" exception
Dafna,
I think this bug has been fixed.
Let me explain a little bit more on the workflow of creating a volume:
1) User sends request to Cinder API service; 2) API creates a DB entry
for the volume and marks its status to 'creating'
(https://github.com/openstack/cinder/blob/stable/havana/cinder/volume/flows/create_volume/__init__.py#L545)
and sends a RPC message to scheduler; 3) scheduler picks up the message
and makes placement decision and if a back-end is available, it sends
the request via RPC to volume service; 4) volume service picks up the
message to perform the real job creating a volume for user.
There are multiple cases in which a volume's status can be stuck in
'creating':
a) something wrong happened during RPC message being processed by
scheduler (e.g. scheduler service is down - related to this change &
bug: https://review.openstack.org/#/c/64014/, message is lost, scheduler
service goes down while scheduler processing the message);
b) something went wrong AFTER backend is chosen, which means scheduler
successfully sends out the message to target back-end, but somehow the
message isn't picked up by target volume service or there is unhandled
exception while volume service handling the request.
If somehow this bug happened again, can you describe the steps how to
reproduce it?
** Changed in: cinder
Status: New => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1053931
Title:
Volume hangs in "creating" status even though scheduler raises "No
valid host" exception
Status in Cinder:
Fix Released
Status in OpenStack Compute (Nova):
Fix Released
Bug description:
When the volume creation process fails during scheduling (i.e. there
is no appropriate host) the status in DB (and in nova volume-list
output as a result) hangs with a "creating..." value.
In such case to figure out that volume creation failed one should go
and see /var/log/nova/nova-scheduler.log (which is not an obvious
action to do). Moreover, volume stuck with "creating..." status cannot
be deleted with nova volume-delete. To delete it one have to change
it's status to error in DB.
Simple scheduler is being used (nova.conf):
--scheduler_driver=nova.scheduler.simple.SimpleScheduler
Here is a sample output from DB:
*************************** 3. row ***************************
created_at: 2012-09-21 09:55:42
updated_at: NULL
deleted_at: NULL
deleted: 0
id: 15
ec2_id: NULL
user_id: b0aadfc80b094d94b78d68dcdc7e8757
project_id: 3b892f660ea2458aa9aa9c9a21352632
host: NULL
size: 1
availability_zone: nova
instance_id: NULL
mountpoint: NULL
attach_time: NULL
status: creating
attach_status: detached
scheduled_at: NULL
launched_at: NULL
terminated_at: NULL
display_name: NULL
display_description: NULL
provider_location: NULL
provider_auth: NULL
snapshot_id: NULL
volume_type_id: NULL
Here is a part of interest in nova-scheduler.log:
pic': u'volume', u'filter_properties': {u'scheduler_hints': {}}, u'snapshot_id': None, u'volume_id': 16}, u'_context_auth_token': '<SANITIZED>', u'_context_is_admin': True, u'_context_project_id': u'3b 892f660ea2458aa9aa9c9a21352632', u'_context_timestamp': u'2012-09-21T10:15:47.091307', u'_context_user_id': u'b0aadfc80b094d94b78d68dcdc7e8757', u'method': u'create_volume', u'_context_remote_address': u'172.18.67.146'} from (pid=11609) _safe_log /usr/lib/python2.7/dist-packages/nova/rpc/common.py:160
15 2012-09-21 10:15:47 DEBUG nova.rpc.amqp [req-01f7dd30-0421-4ef3-a675-16b0cf1362eb b0aadfc80b094d94b78d68dcdc7e8757 3b892f660ea2458aa9aa9c9a21352632] unpacked context: {'user_id': u'b0aadfc80b094d94b78d 68dcdc7e8757', 'roles': [u'admin'], 'timestamp': '2012-09-21T10:15:47.091307', 'auth_token': '<SANITIZED>', 'remote_address': u'172.18.67.146', 'is_admin': True, 'request_id': u'req-01f7dd30-0421-4ef3- a675-16b0cf1362eb', 'project_id': u'3b892f660ea2458aa9aa9c9a21352632', 'read_deleted': u'no'} from (pid=11609) _safe_log /usr/lib/python2.7/dist-packages/nova/rpc/common.py:160
14 2012-09-21 10:15:47 WARNING nova.scheduler.manager [req-01f7dd30-0421-4ef3-a675-16b0cf1362eb b0aadfc80b094d94b78d68dcdc7e8757 3b892f660ea2458aa9aa9c9a21352632] Failed to schedule_create_volume: No vali d host was found. Is the appropriate service running?
13 2012-09-21 10:15:47 ERROR nova.rpc.amqp [req-01f7dd30-0421-4ef3-a675-16b0cf1362eb b0aadfc80b094d94b78d68dcdc7e8757 3b892f660ea2458aa9aa9c9a21352632] Exception during message handling
12 2012-09-21 10:15:47 TRACE nova.rpc.amqp Traceback (most recent call last):
11 2012-09-21 10:15:47 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 253, in _process_data
10 2012-09-21 10:15:47 TRACE nova.rpc.amqp rval = node_func(context=ctxt, **node_args)
9 2012-09-21 10:15:47 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 97, in _schedule
8 2012-09-21 10:15:47 TRACE nova.rpc.amqp context, ex, *args, **kwargs)
7 2012-09-21 10:15:47 TRACE nova.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
6 2012-09-21 10:15:47 TRACE nova.rpc.amqp self.gen.next()
5 2012-09-21 10:15:47 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 92, in _schedule
4 2012-09-21 10:15:47 TRACE nova.rpc.amqp return driver_method(*args, **kwargs)
3 2012-09-21 10:15:47 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/scheduler/simple.py", line 227, in schedule_create_volume
2 2012-09-21 10:15:47 TRACE nova.rpc.amqp raise exception.NoValidHost(reason=msg)
1 2012-09-21 10:15:47 TRACE nova.rpc.amqp NoValidHost: No valid host was found. Is the appropriate service running?
0 2012-09-21 10:15:47 TRACE nova.rpc.amqp
To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/1053931/+subscriptions