← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1457359] [NEW] race condition in quick detach/attach to the same volume and vm

 

Public bug reported:

tested on Juno with Cell enabled.

The race condition happens as follows:
1. send a detach request to an existing VM with a volume; 
2. send an attach request to attach the same volume to the same VM immediately after #1 in another process.

Expected result:
a.  #2 get refused due to #1 is in progress, or
b. #2 finishes after #1 finished. 

However race may happen with following sequences:

 Req #1 finished physical action of detach >> 
 Req #1 finished cinder call (setting volume to available) >>  
 Req #2 came into Nova API and got through the call flow since volume is available now >> 
 Req #2 ran faster then Req #1 and updated Nova DB BDMs  with volume info >> 
 Req #2 finished and removed the existing volume info in BDMs >> 
 now cinder volume status and nova bdm states went mismatched. The volume became inoperable of either attaching or detaching that both operations will be refused.

Also in our test case, child cell nova db and parent cell nova db went
mismatched since Req #2 passed Req#1 when Req#1 is call updating from
child cell to parent cell.

This issue is caused by no guard check against nova bdm table in attach
process. The suggested fix is to add a volume id check against nova bdm
table in the beginning of the request to guarantee so that for 1 single
volume/instance pair, no parallel modification will happen.

The attachment is a slice of logs show the message disorder triggered in
the test case

** Affects: nova
     Importance: Undecided
         Status: New

** Attachment added: "the message disorder triggered in the test case"
   https://bugs.launchpad.net/bugs/1457359/+attachment/4401486/+files/attachdetach%20logs.txt

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1457359

Title:
  race condition in quick detach/attach to the same volume and vm

Status in OpenStack Compute (Nova):
  New

Bug description:
  tested on Juno with Cell enabled.

  The race condition happens as follows:
  1. send a detach request to an existing VM with a volume; 
  2. send an attach request to attach the same volume to the same VM immediately after #1 in another process.

  Expected result:
  a.  #2 get refused due to #1 is in progress, or
  b. #2 finishes after #1 finished. 

  However race may happen with following sequences:

   Req #1 finished physical action of detach >> 
   Req #1 finished cinder call (setting volume to available) >>  
   Req #2 came into Nova API and got through the call flow since volume is available now >> 
   Req #2 ran faster then Req #1 and updated Nova DB BDMs  with volume info >> 
   Req #2 finished and removed the existing volume info in BDMs >> 
   now cinder volume status and nova bdm states went mismatched. The volume became inoperable of either attaching or detaching that both operations will be refused.

  Also in our test case, child cell nova db and parent cell nova db went
  mismatched since Req #2 passed Req#1 when Req#1 is call updating from
  child cell to parent cell.

  This issue is caused by no guard check against nova bdm table in
  attach process. The suggested fix is to add a volume id check against
  nova bdm table in the beginning of the request to guarantee so that
  for 1 single volume/instance pair, no parallel modification will
  happen.

  The attachment is a slice of logs show the message disorder triggered
  in the test case

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1457359/+subscriptions


Follow ups

References