← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1836253] Re: Sometimes InstanceMetada API returns 404 due to invalid InstaceID returned by _get_instance_and_tenant_id()

 

I don't know when William will read my previous comment, but overall
what I found is this:

The cache of metadata-agent was designed to be invalidated by time-based
expiry. That method has the reported kind of side effect if a client is
too fast. Which is not perfect, but usually can be addressed by tweaking
the cache TTL and/or waiting more in the client.

A more correct cache invalidation is theoretically possible, but I think
it is not feasible, because it would introduce cross-dependencies
between metadata-agent and far-away parts of neutron.

Therefore I'm inclined to mark this bug report as Invalid (not a bug).
Let me know please if I missed something here.

** Changed in: neutron
       Status: Confirmed => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1836253

Title:
  Sometimes InstanceMetada API returns 404 due to invalid InstaceID
  returned by _get_instance_and_tenant_id()

Status in neutron:
  Invalid

Bug description:
  Sometimes on instance initialization, the metadata step fails.

  On metadata-agent.log there are lots  of 404:
  "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 404  len: 297 time: 0.0771070

  On nova-api.log we get 404 too:
  "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 404

  After some debuging we get that problem occurs when new instances is getting same IP used by deleted instances.
  The problem is related to cache implementation on method "_get_ports_for_remote_address()" on "/neutron/agent/metadata/agent.py" that returns an port from deleted instance (with the same IP) which returns wrong InstanceID that will be sent to nova-api which will fail because this instanceId not exists.
  This problem only occurs with cache enabled on neuton metadata-agent.

  Version: Queens

  How to reproduce:
  ---
  #!/bin/bash

  computenodelist=(
    'computenode00.test.openstack.net'
    'computenode01.test.openstack.net'
    'computenode02.test.openstack.net'
    'computenode03.test.openstack.net'
  )

  validate_metadata(){
  cat << EOF > /tmp/metadata
  #!/bin/sh -x
  if curl 192.168.10.2
  then
   echo "ControllerNode00 - OK"
  else
   echo "ControllerNode00 - ERROR"
  fi
  EOF

    #SUBNAME=$(date +%s)
    openstack server delete "${node}" 2>/dev/null
    source /root/admin-openrc
    openstack server create --image cirros --nic net-id=internal --flavor Cirros --security-group default --user-data /tmp/metadata --availability-zone nova:${node} --wait "${node}" &> /dev/null

    i=0
    until [ $i -gt 3 ] || openstack console log show "${node}" | grep -q "ControllerNode00"
    do
      i=$((i+1))
      sleep 1
    done
    openstack console log show "${node}" | grep -q "ControllerNode00 - OK"
    if [ $? == 0 ]; then
          echo "Metadata Servers OK: ${node}"
    else
          echo "Metadata Servers ERROR: ${node}"
    fi

    rm /tmp/metadata
  }

  for node in ${computenodelist[@]}
  do
    export node
    validate_metadata
  done
  echo -e "\n"
  ---

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1836253/+subscriptions


References