yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #87862
[Bug 1856523] Re: Sometimes instance can't get public keys due to cirros metadata request failure
This was probably fixed in the Cirros image so I'm closing that bug now.
** Changed in: neutron
Status: Confirmed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1856523
Title:
Sometimes instance can't get public keys due to cirros metadata
request failure
Status in neutron:
Fix Released
Bug description:
On our CI we see random failures of random jobs related to getting public keys from metadata.
As an example I would like to show this change [1]. In addition to current implementation of tests it adds three instances and test security groups.
Sometimes random jobs like:
neutron-tempest-plugin-scenario-linuxbridge
neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid-stein
and others fail on checking SSH connectivity to just created instance.
* It didn't work because the instance refused public key authentication, example:
------------------------------------------------------------------------------------------------
2019-12-13 14:43:48,694 31953 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.186:22' as 'cirros' with public key authentication
2019-12-13 14:43:48,704 31953 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.186 ([Errno None] Unable to connect to port 22 on 172.24.5.186). Number attempts: 1. Retry after 2 seconds.
------------------------------------------------------------------------------------------------
* While checking the instance console log we can find that the instance failed to get public keys list on boot (FIP: 172.24.5.186, Instance IP: 10.1.0.10):
-------------------------------------------------------------
cirros-ds 'net' up at 11.67
checking http://169.254.169.254/2009-04-04/instance-id
successful after 1/20 tries: up 12.13. iid=i-0000003c
failed to get http://169.254.169.254/2009-04-04/meta-data/public-keys
warning: no ec2 metadata for public-keys
-------------------------------------------------------------
* In addition to current Neutron logs I added more debugs to Neutron Metadata Agent in order to find out if the response from Nova Metadata is empty, then I verified Neutron Metadata logs related to this instance:
-----------------------------------------------------------------------------
Dec 13 14:43:49.572244 ubuntu-bionic-rax-ord-0013383633 neutron-metadata-agent[17234]: DEBUG neutron.agent.metadata.agent [-] REQUEST: HEADERS {'X-Forwarded-For': '10.1.0.10', 'X-Instance-ID': 'e77a44fc-249f-4c85-8f9c-40f299534c12', 'X-Tenant-ID': '8975f89b119046b48f5a674fa6a296c3', 'X-Instance-ID-Signature': '908153d94493c68c9cb8fae8aa78fab18244a260d7fe55b5b707ed9b369f45cd'} DATA: b'' URL: http://10.210.224.88:8775/2009-04-04/meta-data/public-keys {{(pid=17720) _proxy_request /opt/stack/neutron/neutron/agent/metadata/agent.py:214}}
Dec 13 14:43:49.572451 ubuntu-bionic-rax-ord-0013383633 neutron-metadata-agent[17234]: DEBUG neutron.agent.metadata.agent [-] RESPONSE: HEADERS: {'Content-Length': '32', 'Content-Type': 'text/plain; charset=UTF-8', 'Connection': 'close'} DATA: b'0=tempest-keypair-test-231375855' {{(pid=17720) _proxy_request /opt/stack/neutron/neutron/agent/metadata/agent.py:217}}
Dec 13 14:43:49.572977 ubuntu-bionic-rax-ord-0013383633 neutron-metadata-agent[17234]: INFO eventlet.wsgi.server [-] 10.1.0.10,<local> "GET /2009-04-04/meta-data/public-keys HTTP/1.1" status: 200 len: 168 time: 0.3123491
-----------------------------------------------------------------------------
The response was 200 with body: '0=tempest-keypair-test-231375855'. It
is the key used also for other instances, so that worked.
Conclusions:
1) Neutron metadata responds with 200
2) Nova metadata responds with 200 and valid data
Questions:
1) Is this cirros issue? Why there is no retry?
2) Maybe its network issue that the data are not send back (connection dropped during delivery)?
3) Why we don't have more logs in cirros on this request failure?
[1] https://review.opendev.org/#/c/682369/
[2] https://review.opendev.org/#/c/698001/
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1856523/+subscriptions
References