← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1856523] Re: Sometimes instance can't get public keys due to cirros metadata request failure

 

This was probably fixed in the Cirros image so I'm closing that bug now.

** Changed in: neutron
       Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1856523

Title:
  Sometimes instance can't get public keys due to cirros metadata
  request failure

Status in neutron:
  Fix Released

Bug description:
  On our CI we see random failures of random jobs related to getting public keys from metadata.
  As an example I would like to show this change [1]. In addition to current implementation of tests it adds three instances and test security groups.

  Sometimes random jobs like:
  neutron-tempest-plugin-scenario-linuxbridge
  neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid-stein
  and others fail on checking SSH connectivity to just created instance. 

  * It didn't work because the instance refused public key authentication, example:
  ------------------------------------------------------------------------------------------------
  2019-12-13 14:43:48,694 31953 INFO     [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.186:22' as 'cirros' with public key authentication
  2019-12-13 14:43:48,704 31953 WARNING  [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.186 ([Errno None] Unable to connect to port 22 on 172.24.5.186). Number attempts: 1. Retry after 2 seconds.
  ------------------------------------------------------------------------------------------------

  * While checking the instance console log we can find that the instance failed to get public keys list on boot (FIP: 172.24.5.186, Instance IP: 10.1.0.10):
  -------------------------------------------------------------
  cirros-ds 'net' up at 11.67
  checking http://169.254.169.254/2009-04-04/instance-id
  successful after 1/20 tries: up 12.13. iid=i-0000003c
  failed to get http://169.254.169.254/2009-04-04/meta-data/public-keys
  warning: no ec2 metadata for public-keys
  -------------------------------------------------------------

  * In addition to current Neutron logs I added more debugs to Neutron Metadata Agent in order to find out if the response from Nova Metadata is empty, then I verified Neutron Metadata logs related to this instance:
  -----------------------------------------------------------------------------
  Dec 13 14:43:49.572244 ubuntu-bionic-rax-ord-0013383633 neutron-metadata-agent[17234]: DEBUG neutron.agent.metadata.agent [-] REQUEST: HEADERS {'X-Forwarded-For': '10.1.0.10', 'X-Instance-ID': 'e77a44fc-249f-4c85-8f9c-40f299534c12', 'X-Tenant-ID': '8975f89b119046b48f5a674fa6a296c3', 'X-Instance-ID-Signature': '908153d94493c68c9cb8fae8aa78fab18244a260d7fe55b5b707ed9b369f45cd'} DATA: b'' URL: http://10.210.224.88:8775/2009-04-04/meta-data/public-keys {{(pid=17720) _proxy_request /opt/stack/neutron/neutron/agent/metadata/agent.py:214}}
  Dec 13 14:43:49.572451 ubuntu-bionic-rax-ord-0013383633 neutron-metadata-agent[17234]: DEBUG neutron.agent.metadata.agent [-] RESPONSE: HEADERS: {'Content-Length': '32', 'Content-Type': 'text/plain; charset=UTF-8', 'Connection': 'close'} DATA: b'0=tempest-keypair-test-231375855' {{(pid=17720) _proxy_request /opt/stack/neutron/neutron/agent/metadata/agent.py:217}}
  Dec 13 14:43:49.572977 ubuntu-bionic-rax-ord-0013383633 neutron-metadata-agent[17234]: INFO eventlet.wsgi.server [-] 10.1.0.10,<local> "GET /2009-04-04/meta-data/public-keys HTTP/1.1" status: 200  len: 168 time: 0.3123491
  -----------------------------------------------------------------------------

  The response was 200 with body: '0=tempest-keypair-test-231375855'. It
  is the key used also for other instances, so that worked.

  
  Conclusions:
  1) Neutron metadata responds with 200
  2) Nova metadata responds with 200 and valid data

  Questions:
  1) Is this cirros issue? Why there is no retry? 
  2) Maybe its network issue that the data are not send back (connection dropped during delivery)?
  3) Why we don't have more logs in cirros on this request failure?

  [1] https://review.opendev.org/#/c/682369/
  [2] https://review.opendev.org/#/c/698001/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1856523/+subscriptions



References