← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1825345] [NEW] admin-state-down doesn't evacuate bindings in the dhcp_agent_id column

 

Public bug reported:

Hi,

This is a real report from the production front, with a deployment
causing us a lot of head-scratch because of a somehow broken hardware.

If, for some reason, a node running the neutron-dhcp-agent has some
hardware issue, then an admin will probably want to disable the agent
there. This is done with, for example:

neutron agent-update --admin-state-down e865d619-b122-4234-aebb-
3f5c24df1c8e

or something like this too:

openstack network agent set --disable e865d619-b122-4234-aebb-
3f5c24df1c8e

This works, and no new network will be assigned to this agent in the
future, however, if there was some networks already assigned to this
agent, they wont be evacuated.

What needs to be done is:

1/ Perform an update of the networkdhcpagentbindings table, and remove all instances of e865d619-b122-4234-aebb-3f5c24df1c8e that we see in dhcp_agent_id. The networks should be reassigned to another agent. Best would be to spread the load on many, if possible, otherwise reassigning all networks to a single agent would be ok-ish.
2/ Restart the neutron-dhcp-agent process where the network have been moved, so that new dnsmasq process start for this network.
3/ Attempt to get the disabled agent to restart as well, knowing that reaching it may fail (since it has been disabled, that's probably because it's broken somehow...).

Currently, one needs to do all of this by hand. I've done that, and
restored connectivity to a working DHCP server, as our user expected.
This is kind of painful and boring to do, plus that's not really what an
openstack user is expecting.

In fact, if we could also provide something like this, it'd be super
nice:

openstack network agent evacuate e865d619-b122-4234-aebb-3f5c24df1c8e

then we'd be using it during the "set --disable" process.

Cheers,

Thomas Goirand (zigo)

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1825345

Title:
  admin-state-down doesn't evacuate bindings in the dhcp_agent_id column

Status in neutron:
  New

Bug description:
  Hi,

  This is a real report from the production front, with a deployment
  causing us a lot of head-scratch because of a somehow broken hardware.

  If, for some reason, a node running the neutron-dhcp-agent has some
  hardware issue, then an admin will probably want to disable the agent
  there. This is done with, for example:

  neutron agent-update --admin-state-down e865d619-b122-4234-aebb-
  3f5c24df1c8e

  or something like this too:

  openstack network agent set --disable e865d619-b122-4234-aebb-
  3f5c24df1c8e

  This works, and no new network will be assigned to this agent in the
  future, however, if there was some networks already assigned to this
  agent, they wont be evacuated.

  What needs to be done is:

  1/ Perform an update of the networkdhcpagentbindings table, and remove all instances of e865d619-b122-4234-aebb-3f5c24df1c8e that we see in dhcp_agent_id. The networks should be reassigned to another agent. Best would be to spread the load on many, if possible, otherwise reassigning all networks to a single agent would be ok-ish.
  2/ Restart the neutron-dhcp-agent process where the network have been moved, so that new dnsmasq process start for this network.
  3/ Attempt to get the disabled agent to restart as well, knowing that reaching it may fail (since it has been disabled, that's probably because it's broken somehow...).

  Currently, one needs to do all of this by hand. I've done that, and
  restored connectivity to a working DHCP server, as our user expected.
  This is kind of painful and boring to do, plus that's not really what
  an openstack user is expecting.

  In fact, if we could also provide something like this, it'd be super
  nice:

  openstack network agent evacuate e865d619-b122-4234-aebb-3f5c24df1c8e

  then we'd be using it during the "set --disable" process.

  Cheers,

  Thomas Goirand (zigo)

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1825345/+subscriptions


Follow ups