← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1650611] Re: dhcp agent reporting state as down during the initial sync

 

Reviewed:  https://review.openstack.org/413010
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f15851b98974dc16606da195cf3ecee577cd0ef8
Submitter: Jenkins
Branch:    master

commit f15851b98974dc16606da195cf3ecee577cd0ef8
Author: Bertrand Lallau <bertrand.lallau@xxxxxxxxxxxxxxx>
Date:   Tue Dec 20 10:53:41 2016 +0100

    DHCP: enhance DHCPAgent startup procedure
    
    During DhcpAgent startup procedure all the following networks
    initialization is actually perform twice:
     * Killing old dnsmasq processes
     * set and configure all TAP interfaces
     * building all Dnsmasq config files (lease and host files)
     * launching dnsmasq processes
    What is done during the second iteration is just clean and redo
    exactly the same another time! This is really inefficient and
    increase dramatically DHCP startup time (near twice than needed).
    
    Initialization process 'sync_state' method is called twice:
     * one time during init_host()
     * another time during _report_state()
    
    sync_state() call must stay in init_host() due to bug #1420042.
    
    sync_state() is always called during startup in init_host()
    and will be periodically called by periodic_resync()
    to do reconciliation.
    Hence it can safely be removed from the run() method.
    
    Change-Id: Id6433598d5c833d2e86be605089d42feee57c257
    Closes-bug: #1651368
    Closes-Bug: #1650611


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1650611

Title:
  dhcp agent reporting state as down during the initial sync

Status in neutron:
  Fix Released

Bug description:
  When dhcp agent is started, neutron agent-list reports its state as
  dead until the initial sync is complete.

  This can lead to unwanted alarms in monitoring systems, especially in
  large environments where the initial sync may take hours. During this
  time, systemctl shows that the agent is actually alive while neutron
  agent-list reports it as down.

  Technical details:

  If I'm right, this line [0] is the exact point where the initial sync
  takes place right after the first state report (with start_flag=True)
  is sent to the server. As it's being done in the same thread, it won't
  send a second state report until it's done with the sync.

  Doing it in a separate thread would let the heartbeat task to continue
  sending state reports to the server but I don't know whether this have
  any unwanted side effects.

  
  [0] https://github.com/openstack/neutron/blob/master/neutron/agent/dhcp/agent.py#L751

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1650611/+subscriptions


References