← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1390511] Re: conductor: race between processes to join servicegroups when zk driver is used

 

** Changed in: nova
       Status: Fix Committed => Fix Released

** Changed in: nova
    Milestone: None => kilo-2

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1390511

Title:
  conductor: race between processes to join servicegroups when zk driver
  is used

Status in OpenStack Compute (Nova):
  Fix Released

Bug description:
  The bugs manifests only when two processes of the same service tries
  to register the same node.

  When multiple processes of one service (nova-conductor for now) tries to join servicegroup and zookeeper driver is used 
  there is a race between processes: all processes tries to register itself in zookeeper as the same member in the same namespace.

  Zookeeper path looks like this:

  /servicegroups/conductor/MEMBER_ID

  Each process tries to create this node, which already exists.

  This ends up with each process trying endlessly register itself which
  ends with traceback:

  Traceback (most recent call last):
    File "/opt/stack/nova/nova/servicegroup/drivers/zk.py", line 106, in join
      member = membership.Membership(self._session, path, member_id)
    File "/usr/local/lib/python2.7/dist-packages/evzookeeper/membership.py", line 130, in __init__
      self.refresh(quiet=False)
    File "/usr/local/lib/python2.7/dist-packages/evzookeeper/membership.py", line 155, in refresh
      self._join()
    File "/usr/local/lib/python2.7/dist-packages/evzookeeper/membership.py", line 203, in _join
      raise RuntimeError("Duplicated membership name %s" % path)
  RuntimeError: Duplicated membership name /servicegroups/conductor/MEMBER_ID

  
  For now only nova-conductor is affected because it's the one only service that forks.

  There is not other consequences except polluted logs and confusion of
  operator.

  ubuntu 14.04 + zookeeper 3.4.5

  The bug is related to other bug [1] that any of the processes isn't going to register itself, because processes locks on
  communication with zookeeper.

  [1] https://bugs.launchpad.net/nova/+bug/1389782

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1390511/+subscriptions


References