← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1239864] Re: nova-api fails to query ServiceGroup status from Zookeeper

 

** Changed in: nova
       Status: Fix Committed => Fix Released

** Changed in: nova
    Milestone: None => juno-rc1

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1239864

Title:
  nova-api fails to query ServiceGroup status from Zookeeper

Status in OpenStack Compute (Nova):
  Fix Released

Bug description:
  I am running with the ZooKeeper servicegroup driver on CentOS 6.4
  (Python 2.6) with the RDO distro of Grizzly.

  All nova services are successfully connecting to ZooKeeper, which I've
  verified using zkCli.

  However, when I run `nova service-list` I get an HTTP 500 error from
  nova-api.  The nova-api log (/var/log/nova/api.log) shows:

  2013-10-14 16:33:15.110 6748 TRACE nova.api.openstack   File "/usr/lib/python2.6/site-packages/nova/servicegroup/api.py"\
  , line 93, in service_is_up
  2013-10-14 16:33:15.110 6748 TRACE nova.api.openstack     return self._driver.is_up(member)
  2013-10-14 16:33:15.110 6748 TRACE nova.api.openstack   File "/usr/lib/python2.6/site-packages/nova/servicegroup/drivers\
  /zk.py", line 116, in is_up
  2013-10-14 16:33:15.110 6748 TRACE nova.api.openstack     all_members = self.get_all(group_id)
  2013-10-14 16:33:15.110 6748 TRACE nova.api.openstack   File "/usr/lib/python2.6/site-packages/nova/servicegroup/drivers\
  /zk.py", line 141, in get_all
  2013-10-14 16:33:15.110 6748 TRACE nova.api.openstack     raise exception.ServiceGroupUnavailable(driver="ZooKeeperDrive\
  r")
  2013-10-14 16:33:15.110 6748 TRACE nova.api.openstack ServiceGroupUnavailable: The service from servicegroup driver ZooK\
  eeperDriver is temporarily unavailable.

  The problem seems to be around evzookeeper (using version 0.4.0).

  To isolate the problem, I added some evzookeeper.ZKSession synchronous
  get() calls to test the roundtrip communication to ZooKeeper.  When I
  do a `self._session.get(CONF.zookeeper.sg_prefix)` in the zk.py
  ZooKeeperDriver __init__() method it works fine.  The logs show that
  this is immediately before the wsgi server starts up.

  When I do the get() operation from within the ZooKeeperDriver
  get_all() method, the web request hangs indefinitely.  However, if I
  recreate the evzookeeper.ZKSession within the get_all() method (after
  the wsgi server has started) the nova-api request is successful.

  diff --git a/nova/servicegroup/drivers/zk.py b/nova/servicegroup/drivers/zk.py
  index 2a3edae..7de2488 100644
  --- a/nova/servicegroup/drivers/zk.py
  +++ b/nova/servicegroup/drivers/zk.py
  @@ -122,7 +122,14 @@ class ZooKeeperDriver(api.ServiceGroupDriver):
           monitor = self._monitors.get(group_id, None)
           if monitor is None:
               path = "%s/%s" % (CONF.zookeeper.sg_prefix, group_id)
  -            monitor = membership.MembershipMonitor(self._session, path)
  +
  +            null = open(os.devnull, "w")
  +            local_session = evzookeeper.ZKSession(CONF.zookeeper.address,
  +                                                  recv_timeout=
  +                                                    CONF.zookeeper.recv_timeout,
  +                                                  zklog_fd=null)
  +
  +            monitor = membership.MembershipMonitor(local_session, path)
               self._monitors[group_id] = monitor
               # Note(maoy): When initialized for the first time, it takes a
               # while to retrieve all members from zookeeper. To prevent

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1239864/+subscriptions