yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #37637
[Bug 1467581] Re: Concurrent interface attachment corrupts info cache
** Changed in: nova
Status: Fix Committed => Fix Released
** Changed in: nova
Milestone: None => liberty-3
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1467581
Title:
Concurrent interface attachment corrupts info cache
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Concurrently attaching multiple network interfaces to a single
instance can often result in corruption of the instance's information
cache in Nova. The result is that some network interfaces may be
missing from 'nova list', and silently fail to detach when 'nova
interface-detach' is run. The ports are listed in 'nova interface-
list', however, and can be seen in 'neutron port-list'.
Initially seen on CentOS7 running Juno. Reproduced on Ubuntu 14.04
running devstack (master branch).
This issue is similar (possibly identical) to bug 1326183, and the
steps to reproduce it are similar also.
1) Devstack with trunk with the following local.conf:
disable_service n-net
enable_service q-svc
enable_service q-agt
enable_service q-dhcp
enable_service q-meta
RECLONE=yes
# and other options as set in the trunk's local
2) Create few networks:
$> neutron net-create testnet1
$> neutron net-create testnet2
$> neutron net-create testnet3
$> neutron subnet-create testnet1 192.168.1.0/24
$> neutron subnet-create testnet2 192.168.2.0/24
$> neutron subnet-create testnet3 192.168.3.0/24
3) Create a testvm in testnet1:
$> nova boot --flavor m1.tiny --image cirros-0.3.4-x86_64-uec --nic net-id=`neutron net-list | grep testnet1 | cut -f 2 -d ' '` testvm
4) Run the following shell script to attach and detach interfaces for this vm in the remaining two networks in a loop until we run into the issue at hand:
---------
#! /bin/bash
c=10000
netid1=`neutron net-list | grep testnet2 | cut -f 2 -d ' '`
netid2=`neutron net-list | grep testnet3 | cut -f 2 -d ' '`
while [ $c -gt 0 ]
do
echo "Round: " $c
echo -n "Attaching two interfaces concurrently... "
nova interface-attach --net-id $netid1 testvm &
nova interface-attach --net-id $netid2 testvm &
wait
echo "Done"
echo "Sleeping until both those show up in nova show"
waittime=0
while [ $waittime -lt 60 ]
do
count=`nova show testvm | grep testnet | wc -l`
if [ $count -eq 3 ]
then
break
fi
sleep 2
(( waittime+=2 ))
done
echo "Waited for " $waittime " seconds"
if [ $waittime -ge 60 ]
then
echo "bad case"
exit 1
fi
echo "Detaching both... "
nova interface-list testvm | grep $netid1 | awk '{print "deleting ",$4; system("nova interface-detach testvm "$4 " ; sleep 2");}'
nova interface-list testvm | grep $netid2 | awk '{print "deleting ",$4; system("nova interface-detach testvm "$4 " ; sleep 2");}'
echo "Done; check interfaces are gone in a minute."
waittime=0
while [ $waittime -lt 60 ]
do
count=`nova interface-list testvm | wc -l`
echo "line count: " $count
if [ $count -eq 5 ]
then
break
fi
sleep 2
(( waittime+=2 ))
done
if [ $waittime -ge 60 ]
then
echo "failed to detach interfaces - raise another bug!"
exit 1
fi
echo "Interfaces are gone"
(( c-- ))
done
---------
Eventually the test will stop with a failure ("bad case") and the
interface remaining either from testnet2 or testnet3 can not be
detached at all.
For me, eventually is every time.
Based on my analysis of the source code, the concurrent requests cause
corruption of the instance network info cache. Each takes a copy of
the info cache at the start of the request processing, which contains
only the initial network. Each request thread then allocates a network
port and adds it to the network info. This info object is then saved
back to the DB. In each case, the info contains the initial network
and the network that has been added by that thread. Therefore, the
last thread to save wins, and the other network is lost.
I have a patch that appears to fix the issue, by refreshing the info
cache whilst holding the refresh-cache-<id> lock. However, I'm not
intimately familiar with the nova networking code so would appreciate
more experienced eyes on it. I will submit the change to gerrit for
analysis and comments.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1467581/+subscriptions
References