yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #24366
[Bug 1390511] [NEW] conductor: race between processes to join servicegroups when zk driver is used
Public bug reported:
The bugs manifests only when two processes of the same service tries to
register the same node.
When multiple processes of one service (nova-conductor for now) tries to join servicegroup and zookeeper driver is used
there is a race between processes: all processes tries to register itself in zookeeper as the same member in the same namespace.
Zookeeper path looks like this:
/servicegroups/conductor/MEMBER_ID
Each process tries to create this node, which already exists.
This ends up with each process trying endlessly register itself which
ends with traceback:
Traceback (most recent call last):
File "/opt/stack/nova/nova/servicegroup/drivers/zk.py", line 106, in join
member = membership.Membership(self._session, path, member_id)
File "/usr/local/lib/python2.7/dist-packages/evzookeeper/membership.py", line 130, in __init__
self.refresh(quiet=False)
File "/usr/local/lib/python2.7/dist-packages/evzookeeper/membership.py", line 155, in refresh
self._join()
File "/usr/local/lib/python2.7/dist-packages/evzookeeper/membership.py", line 203, in _join
raise RuntimeError("Duplicated membership name %s" % path)
RuntimeError: Duplicated membership name /servicegroups/conductor/MEMBER_ID
For now only nova-conductor is affected because it's the one only service that forks.
There is not other consequences except polluted logs and confusion of
operator.
ubuntu 14.04 + zookeeper 3.4.5
The bug is related to other bug [1] that any of the processes isn't going to register itself, because processes locks on
communication with zookeeper.
[1] https://bugs.launchpad.net/nova/+bug/1389782
** Affects: nova
Importance: Undecided
Status: New
** Tags: conductor servicegroups zookeeper
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1390511
Title:
conductor: race between processes to join servicegroups when zk driver
is used
Status in OpenStack Compute (Nova):
New
Bug description:
The bugs manifests only when two processes of the same service tries
to register the same node.
When multiple processes of one service (nova-conductor for now) tries to join servicegroup and zookeeper driver is used
there is a race between processes: all processes tries to register itself in zookeeper as the same member in the same namespace.
Zookeeper path looks like this:
/servicegroups/conductor/MEMBER_ID
Each process tries to create this node, which already exists.
This ends up with each process trying endlessly register itself which
ends with traceback:
Traceback (most recent call last):
File "/opt/stack/nova/nova/servicegroup/drivers/zk.py", line 106, in join
member = membership.Membership(self._session, path, member_id)
File "/usr/local/lib/python2.7/dist-packages/evzookeeper/membership.py", line 130, in __init__
self.refresh(quiet=False)
File "/usr/local/lib/python2.7/dist-packages/evzookeeper/membership.py", line 155, in refresh
self._join()
File "/usr/local/lib/python2.7/dist-packages/evzookeeper/membership.py", line 203, in _join
raise RuntimeError("Duplicated membership name %s" % path)
RuntimeError: Duplicated membership name /servicegroups/conductor/MEMBER_ID
For now only nova-conductor is affected because it's the one only service that forks.
There is not other consequences except polluted logs and confusion of
operator.
ubuntu 14.04 + zookeeper 3.4.5
The bug is related to other bug [1] that any of the processes isn't going to register itself, because processes locks on
communication with zookeeper.
[1] https://bugs.launchpad.net/nova/+bug/1389782
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1390511/+subscriptions
Follow ups
References