← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1053931] Re: Volume hangs in "creating" status even though scheduler raises "No valid host" exception

 

Dafna,

I think this bug has been fixed.

Let me explain a little bit more on the workflow of creating a volume:
1) User sends request to Cinder API service; 2) API creates a DB entry
for the volume and marks its status to 'creating'
(https://github.com/openstack/cinder/blob/stable/havana/cinder/volume/flows/create_volume/__init__.py#L545)
and sends a RPC message to scheduler; 3) scheduler picks up the message
and makes placement decision and if a back-end is available, it sends
the request via RPC to volume service; 4) volume service picks up the
message to perform the real job creating a volume for user.

There are multiple cases in which a volume's status can be stuck in
'creating':

a) something wrong happened during RPC message being processed by
scheduler (e.g. scheduler service is down - related to this change &
bug: https://review.openstack.org/#/c/64014/, message is lost, scheduler
service goes down while scheduler processing the message);

b) something went wrong AFTER backend is chosen, which means scheduler
successfully sends out the message to target back-end, but somehow the
message isn't picked up by target volume service or there is unhandled
exception while volume service handling the request.

If somehow this bug happened again, can you describe the steps how to
reproduce it?

** Changed in: cinder
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1053931

Title:
  Volume hangs in "creating" status even though scheduler raises "No
  valid host" exception

Status in Cinder:
  Fix Released
Status in OpenStack Compute (Nova):
  Fix Released

Bug description:
  When the volume creation process fails during scheduling (i.e. there
  is no appropriate host) the status in DB (and in nova volume-list
  output as a result) hangs with a "creating..." value.

  In such case to figure out that volume creation failed one should go
  and see /var/log/nova/nova-scheduler.log (which is not an obvious
  action to do). Moreover, volume stuck with "creating..." status cannot
  be deleted with nova volume-delete. To delete it one have to change
  it's status to error in DB.

  
  Simple scheduler is being used (nova.conf):

  --scheduler_driver=nova.scheduler.simple.SimpleScheduler

  
  Here is a sample output from DB:

  *************************** 3. row ***************************
           created_at: 2012-09-21 09:55:42
           updated_at: NULL
           deleted_at: NULL
              deleted: 0
                   id: 15
               ec2_id: NULL
              user_id: b0aadfc80b094d94b78d68dcdc7e8757
           project_id: 3b892f660ea2458aa9aa9c9a21352632
                 host: NULL
                 size: 1
    availability_zone: nova
          instance_id: NULL
           mountpoint: NULL
          attach_time: NULL
               status: creating
        attach_status: detached
         scheduled_at: NULL
          launched_at: NULL
        terminated_at: NULL
         display_name: NULL
  display_description: NULL
    provider_location: NULL
        provider_auth: NULL
          snapshot_id: NULL
       volume_type_id: NULL

  
  Here is a part of interest in nova-scheduler.log:

      pic': u'volume', u'filter_properties': {u'scheduler_hints': {}}, u'snapshot_id': None, u'volume_id': 16}, u'_context_auth_token': '<SANITIZED>', u'_context_is_admin': True, u'_context_project_id': u'3b    892f660ea2458aa9aa9c9a21352632', u'_context_timestamp': u'2012-09-21T10:15:47.091307', u'_context_user_id': u'b0aadfc80b094d94b78d68dcdc7e8757', u'method': u'create_volume', u'_context_remote_address':     u'172.18.67.146'} from (pid=11609) _safe_log /usr/lib/python2.7/dist-packages/nova/rpc/common.py:160
   15 2012-09-21 10:15:47 DEBUG nova.rpc.amqp [req-01f7dd30-0421-4ef3-a675-16b0cf1362eb b0aadfc80b094d94b78d68dcdc7e8757 3b892f660ea2458aa9aa9c9a21352632] unpacked context: {'user_id': u'b0aadfc80b094d94b78d    68dcdc7e8757', 'roles': [u'admin'], 'timestamp': '2012-09-21T10:15:47.091307', 'auth_token': '<SANITIZED>', 'remote_address': u'172.18.67.146', 'is_admin': True, 'request_id': u'req-01f7dd30-0421-4ef3-    a675-16b0cf1362eb', 'project_id': u'3b892f660ea2458aa9aa9c9a21352632', 'read_deleted': u'no'} from (pid=11609) _safe_log /usr/lib/python2.7/dist-packages/nova/rpc/common.py:160
   14 2012-09-21 10:15:47 WARNING nova.scheduler.manager [req-01f7dd30-0421-4ef3-a675-16b0cf1362eb b0aadfc80b094d94b78d68dcdc7e8757 3b892f660ea2458aa9aa9c9a21352632] Failed to schedule_create_volume: No vali    d host was found. Is the appropriate service running?
   13 2012-09-21 10:15:47 ERROR nova.rpc.amqp [req-01f7dd30-0421-4ef3-a675-16b0cf1362eb b0aadfc80b094d94b78d68dcdc7e8757 3b892f660ea2458aa9aa9c9a21352632] Exception during message handling
   12 2012-09-21 10:15:47 TRACE nova.rpc.amqp Traceback (most recent call last):
   11 2012-09-21 10:15:47 TRACE nova.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 253, in _process_data
   10 2012-09-21 10:15:47 TRACE nova.rpc.amqp     rval = node_func(context=ctxt, **node_args)
    9 2012-09-21 10:15:47 TRACE nova.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 97, in _schedule
    8 2012-09-21 10:15:47 TRACE nova.rpc.amqp     context, ex, *args, **kwargs)
    7 2012-09-21 10:15:47 TRACE nova.rpc.amqp   File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    6 2012-09-21 10:15:47 TRACE nova.rpc.amqp     self.gen.next()
    5 2012-09-21 10:15:47 TRACE nova.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 92, in _schedule
    4 2012-09-21 10:15:47 TRACE nova.rpc.amqp     return driver_method(*args, **kwargs)
    3 2012-09-21 10:15:47 TRACE nova.rpc.amqp   File "/usr/lib/python2.7/dist-packages/nova/scheduler/simple.py", line 227, in schedule_create_volume
    2 2012-09-21 10:15:47 TRACE nova.rpc.amqp     raise exception.NoValidHost(reason=msg)
    1 2012-09-21 10:15:47 TRACE nova.rpc.amqp NoValidHost: No valid host was found. Is the appropriate service running?
    0 2012-09-21 10:15:47 TRACE nova.rpc.amqp

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/1053931/+subscriptions