← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1622833] Re: timestamp mechanism in linux bridge false positives

 

Reviewed:  https://review.openstack.org/369179
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a2bd0b4b53db8468681eb2905e2fbc2f9073869a
Submitter: Jenkins
Branch:    master

commit a2bd0b4b53db8468681eb2905e2fbc2f9073869a
Author: Kevin Benton <kevin@xxxxxxxxxx>
Date:   Mon Sep 12 22:27:33 2016 -0700

    LinuxBridge: Use ifindex for logical 'timestamp'
    
    With Xenial (and maybe older versions), the modified timestamps
    in /sys/class/net/(device_name) are not stable. They appear to
    work for a period of time, and then when some kind of cache clears
    on the kernel side, all of the timestamps are reset to the latest
    access time.
    
    This was causing the Linux Bridge agent to think that the interfaces
    were experiencing local changes much more frequently than they actually
    were, resulting in more polling to the Neutron server and subsequently
    more BUILD->ACTIVE->BUILD->ACTIVE transitions in the logical model.
    
    The purpose of the timestamp patch was to catch rapid server REBUILD
    operations where the interface would be deleted and re-added within
    a polling interval. Without it, these would be stuck in the BUILD
    state since the agent wouldn't realize it needed to wire the ports.
    
    This patch switches to looking at the IFINDEX of the interfaces to
    use as a sort of logical timestamp. If an interface gets removed
    and readded, it will get a different index, so the original timestamp
    comparison logic will still work.
    
    In the future, the agent should undergo a larger refactor to just
    watch 'ip monitor' for netlink events to replace the polling of the
    interface listing and the timestamp logic entirely. However, this
    approach was taken due to the near term release and the ability to
    back-port it to older releases.
    
    This was verified with both Nova rebuild actions and Nova interface
    attach/detach actions.
    
    Change-Id: I016019885446bff6806268ab49cd5476d93ec61f
    Closes-Bug: #1622833


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1622833

Title:
  timestamp mechanism in linux bridge false positives

Status in neutron:
  Fix Released

Bug description:
  The linux bridge agent is picking up too many false positives in its
  detection mechanism for when devices have been modified locally. In
  the following the 4 tap devices attached to a particular bridge had
  timestamps that jumped forward even though none of the interfaces
  actually changed:

  2016-09-13 00:13:38.744 14179 DEBUG neutron.plugins.ml2.drivers.agent._common_agent [req-82c02245-80fd-4712-baa6-cdd4033315d1 - -] Adding locally changed devices to updated set: set(['tap422b85d9-95', 'tap9b365584-34', 'tapee2684f8-51', 'tap66ef2d8e-3b']) scan_devices /opt/stack/new/neutron/neutron/plugins/ml2/drivers/agent/_common_agent.py:397
  2016-09-13 00:13:38.744 14179 DEBUG neutron.plugins.ml2.drivers.agent._common_agent [req-82c02245-80fd-4712-baa6-cdd4033315d1 - -] Agent loop found changes! {'current': set(['tap422b85d9-95', 'tapee2684f8-51', 'tap6028e7a2-c0', 'tap9b365584-34', 'tap0960ffac-f9', 'tap7ba5f865-54', 'tap66ef2d8e-3b', 'tapfe427ba3-63', 'tap475f33ef-c3']), 'timestamps': {'tap422b85d9-95': 1473725618.73996, 'tapee2684f8-51': 1473725618.73996, 'tap6028e7a2-c0': None, 'tap9b365584-34': 1473725618.73996, 'tap0960ffac-f9': 1473725618.73996, 'tap7ba5f865-54': 1473725616.7399597, 'tap66ef2d8e-3b': 1473725618.73996, 'tapfe427ba3-63': 1473725616.7399597, 'tap475f33ef-c3': None}, 'removed': set([]), 'added': set([]), 'updated': set(['tap422b85d9-95', 'tap9b365584-34', 'tapee2684f8-51', 'tap66ef2d8e-3b'])} daemon_loop /opt/stack/new/neutron/neutron/plugins/ml2/drivers/agent/_common_agent.py:448

  
  This leads to the agent refetching the details, which puts the port in BUILD and then back to ACTIVE. This leads to sporadic failures when tempest tests are asserting that a port should be in the ACTIVE status.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1622833/+subscriptions


References