yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #79319
[Bug 1836253] Re: Sometimes InstanceMetada API returns 404 due to invalid InstaceID returned by _get_instance_and_tenant_id()
I don't know when William will read my previous comment, but overall
what I found is this:
The cache of metadata-agent was designed to be invalidated by time-based
expiry. That method has the reported kind of side effect if a client is
too fast. Which is not perfect, but usually can be addressed by tweaking
the cache TTL and/or waiting more in the client.
A more correct cache invalidation is theoretically possible, but I think
it is not feasible, because it would introduce cross-dependencies
between metadata-agent and far-away parts of neutron.
Therefore I'm inclined to mark this bug report as Invalid (not a bug).
Let me know please if I missed something here.
** Changed in: neutron
Status: Confirmed => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1836253
Title:
Sometimes InstanceMetada API returns 404 due to invalid InstaceID
returned by _get_instance_and_tenant_id()
Status in neutron:
Invalid
Bug description:
Sometimes on instance initialization, the metadata step fails.
On metadata-agent.log there are lots of 404:
"GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 404 len: 297 time: 0.0771070
On nova-api.log we get 404 too:
"GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 404
After some debuging we get that problem occurs when new instances is getting same IP used by deleted instances.
The problem is related to cache implementation on method "_get_ports_for_remote_address()" on "/neutron/agent/metadata/agent.py" that returns an port from deleted instance (with the same IP) which returns wrong InstanceID that will be sent to nova-api which will fail because this instanceId not exists.
This problem only occurs with cache enabled on neuton metadata-agent.
Version: Queens
How to reproduce:
---
#!/bin/bash
computenodelist=(
'computenode00.test.openstack.net'
'computenode01.test.openstack.net'
'computenode02.test.openstack.net'
'computenode03.test.openstack.net'
)
validate_metadata(){
cat << EOF > /tmp/metadata
#!/bin/sh -x
if curl 192.168.10.2
then
echo "ControllerNode00 - OK"
else
echo "ControllerNode00 - ERROR"
fi
EOF
#SUBNAME=$(date +%s)
openstack server delete "${node}" 2>/dev/null
source /root/admin-openrc
openstack server create --image cirros --nic net-id=internal --flavor Cirros --security-group default --user-data /tmp/metadata --availability-zone nova:${node} --wait "${node}" &> /dev/null
i=0
until [ $i -gt 3 ] || openstack console log show "${node}" | grep -q "ControllerNode00"
do
i=$((i+1))
sleep 1
done
openstack console log show "${node}" | grep -q "ControllerNode00 - OK"
if [ $? == 0 ]; then
echo "Metadata Servers OK: ${node}"
else
echo "Metadata Servers ERROR: ${node}"
fi
rm /tmp/metadata
}
for node in ${computenodelist[@]}
do
export node
validate_metadata
done
echo -e "\n"
---
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1836253/+subscriptions
References