← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1862315] [NEW] Sometimes VMs can't get IP when spawned concurrently

 

Public bug reported:

Version: Stein
Scenario description:
Rally creates 60 VMs with 6 threads. Each thread:
 - creates a VM
 - pings it
 - if successful ping, tries to reach the VM via ssh and execute a command. It tries to do that during 2 minutes.
 - if successful ssh - deletes the VM

For some VMs ping fails. Console log shows that VM failed to get IP from
DHCP.

tcpdump on corresponding DHCP port shows VM's DHCP requests, but dnsmasq does not reply.
>From dnsmasq logs:

Feb  6 00:15:43 dnsmasq[4175]: read /var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/addn_hosts - 28 addresses
Feb  6 00:15:43 dnsmasq[4175]: duplicate dhcp-host IP address 10.2.0.194 at line 28 of /var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/host

So it must be something wrong with neutron-dhcp-agent network cache.

>From neutron-dhcp-agent log:

2020-02-06 00:15:20.282 40 DEBUG neutron.agent.dhcp.agent [req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Resync event has been scheduled _periodic_resync_helper /var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:276
2020-02-06 00:15:20.282 40 DEBUG neutron.common.utils [req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Calling throttled function clear wrapper /var/lib/openstack/lib/python3.6/site-packages/neutron/common/utils.py:102
2020-02-06 00:15:20.283 40 DEBUG neutron.agent.dhcp.agent [req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] resync (da73026e-09b9-4f8d-bbdd-84d89c2487b2): ['Duplicate IP addresses found, DHCP cache is out of sync'] _periodic_resync_helper /var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:293

so the agent is aware of invalid cache for the net, but for unknown
reason actual net resync happens only in 8 minutes:

2020-02-06 00:23:55.297 40 INFO neutron.agent.dhcp.agent [req-f5107bdd-
d53a-4171-a283-de3d7cf7c708 - - - - -] Synchronizing state

** Affects: neutron
     Importance: High
     Assignee: Oleg Bondarev (obondarev)
         Status: New


** Tags: l3-ipam-dhcp

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1862315

Title:
  Sometimes VMs can't get IP when spawned concurrently

Status in neutron:
  New

Bug description:
  Version: Stein
  Scenario description:
  Rally creates 60 VMs with 6 threads. Each thread:
   - creates a VM
   - pings it
   - if successful ping, tries to reach the VM via ssh and execute a command. It tries to do that during 2 minutes.
   - if successful ssh - deletes the VM

  For some VMs ping fails. Console log shows that VM failed to get IP
  from DHCP.

  tcpdump on corresponding DHCP port shows VM's DHCP requests, but dnsmasq does not reply.
  From dnsmasq logs:

  Feb  6 00:15:43 dnsmasq[4175]: read /var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/addn_hosts - 28 addresses
  Feb  6 00:15:43 dnsmasq[4175]: duplicate dhcp-host IP address 10.2.0.194 at line 28 of /var/lib/neutron/dhcp/da73026e-09b9-4f8d-bbdd-84d89c2487b2/host

  So it must be something wrong with neutron-dhcp-agent network cache.

  From neutron-dhcp-agent log:

  2020-02-06 00:15:20.282 40 DEBUG neutron.agent.dhcp.agent [req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Resync event has been scheduled _periodic_resync_helper /var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:276
  2020-02-06 00:15:20.282 40 DEBUG neutron.common.utils [req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Calling throttled function clear wrapper /var/lib/openstack/lib/python3.6/site-packages/neutron/common/utils.py:102
  2020-02-06 00:15:20.283 40 DEBUG neutron.agent.dhcp.agent [req-f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] resync (da73026e-09b9-4f8d-bbdd-84d89c2487b2): ['Duplicate IP addresses found, DHCP cache is out of sync'] _periodic_resync_helper /var/lib/openstack/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py:293

  so the agent is aware of invalid cache for the net, but for unknown
  reason actual net resync happens only in 8 minutes:

  2020-02-06 00:23:55.297 40 INFO neutron.agent.dhcp.agent [req-
  f5107bdd-d53a-4171-a283-de3d7cf7c708 - - - - -] Synchronizing state

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1862315/+subscriptions


Follow ups