← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1326901] Re: ServiceBinaryExists - binary for nova-conductor already exists

 

** Changed in: nova/icehouse
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1326901

Title:
  ServiceBinaryExists - binary for nova-conductor already exists

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) icehouse series:
  Fix Released
Status in nova package in Ubuntu:
  Fix Released
Status in nova source package in Trusty:
  Triaged
Status in nova source package in Utopic:
  Fix Released

Bug description:
  We're hitting an intermittent issue where ServiceBinaryExists is
  raised for nova-conductor on deployment.

  From nova-conductor's upstart log ( /var/log/upstart/nova-
  conductor.log ):

  2014-05-15 12:02:25.206 34494 INFO nova.openstack.common.periodic_task [-] Skipping periodic task _periodic_update_dns because its interval is negative
  2014-05-15 12:02:25.241 34494 INFO nova.openstack.common.service [-] Starting 8 workers
  2014-05-15 12:02:25.242 34494 INFO nova.openstack.common.service [-] Started child 34501
  2014-05-15 12:02:25.244 34494 INFO nova.openstack.common.service [-] Started child 34502
  2014-05-15 12:02:25.246 34494 INFO nova.openstack.common.service [-] Started child 34503
  2014-05-15 12:02:25.246 34501 AUDIT nova.service [-] Starting conductor node (version 2014.1)
  2014-05-15 12:02:25.247 34502 AUDIT nova.service [-] Starting conductor node (version 2014.1)
  2014-05-15 12:02:25.247 34494 INFO nova.openstack.common.service [-] Started child 34504
  2014-05-15 12:02:25.249 34503 AUDIT nova.service [-] Starting conductor node (version 2014.1)
  2014-05-15 12:02:25.251 34504 AUDIT nova.service [-] Starting conductor node (version 2014.1)
  2014-05-15 12:02:25.254 34505 AUDIT nova.service [-] Starting conductor node (version 2014.1)
  2014-05-15 12:02:25.250 34494 INFO nova.openstack.common.service [-] Started child 34505
  2014-05-15 12:02:25.261 34494 INFO nova.openstack.common.service [-] Started child 34506
  2014-05-15 12:02:25.263 34494 INFO nova.openstack.common.service [-] Started child 34507
  2014-05-15 12:02:25.266 34494 INFO nova.openstack.common.service [-] Started child 34508
  2014-05-15 12:02:25.267 34507 AUDIT nova.service [-] Starting conductor node (version 2014.1)
  2014-05-15 12:02:25.268 34506 AUDIT nova.service [-] Starting conductor node (version 2014.1)
  2014-05-15 12:02:25.271 34508 AUDIT nova.service [-] Starting conductor node (version 2014.1)
  /usr/lib/python2.7/dist-packages/nova/openstack/common/db/sqlalchemy/session.py:379: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
    match = pattern.match(integrity_error.message)
  /usr/lib/python2.7/dist-packages/nova/openstack/common/db/sqlalchemy/session.py:379: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
    match = pattern.match(integrity_error.message)
  Traceback (most recent call last):
    File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 346, in fire_timers
      timer()
    File "/usr/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 56, in __call__
      cb(*args, **kw)
    File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 194, in main
  2014-05-15 12:02:25.862 34502 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on localhost:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
      result = function(*args, **kwargs)
    File "/usr/lib/python2.7/dist-packages/nova/openstack/common/service.py", line 480, in run_service
      service.start()
    File "/usr/lib/python2.7/dist-packages/nova/service.py", line 172, in start
      self.service_ref = self._create_service_ref(ctxt)
    File "/usr/lib/python2.7/dist-packages/nova/service.py", line 224, in _create_service_ref
      service = self.conductor_api.service_create(context, svc_values)
    File "/usr/lib/python2.7/dist-packages/nova/conductor/api.py", line 202, in service_create
      return self._manager.service_create(context, values)
    File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 966, in wrapper
      return func(*args, **kwargs)
    File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 461, in service_create
      svc = self.db.service_create(context, values)
    File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 139, in service_create
      return IMPL.service_create(context, values)
    File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 146, in wrapper
      return f(*args, **kwargs)
    File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 521, in service_create
      binary=values.get('binary'))
  ServiceBinaryExists: Service with host glover binary nova-conductor exists.
  2014-05-15 12:02:25.864 34503 ERROR nova.openstack.common.threadgroup [-] Service with host glover binary nova-conductor exists.
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup Traceback (most recent call last):
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/nova/openstack/common/threadgroup.py", line 117, in wait
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup     x.wait()
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/nova/openstack/common/threadgroup.py", line 49, in wait
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup     return self.thread.wait()
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 168, in wait
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup     return self._exit_event.wait()
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup     return hubs.get_hub().switch()
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup     return self.greenlet.switch()
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 194, in main
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup     result = function(*args, **kwargs)
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/nova/openstack/common/service.py", line 480, in run_service
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup     service.start()
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/nova/service.py", line 172, in start
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup     self.service_ref = self._create_service_ref(ctxt)
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/nova/service.py", line 224, in _create_service_ref
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup     service = self.conductor_api.service_create(context, svc_values)
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/nova/conductor/api.py", line 202, in service_create
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup     return self._manager.service_create(context, values)
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 966, in wrapper
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup     return func(*args, **kwargs)
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 461, in service_create
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup     svc = self.db.service_create(context, values)
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 139, in service_create
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup     return IMPL.service_create(context, values)
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 146, in wrapper
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup     return f(*args, **kwargs)
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup   File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 521, in service_create
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup     binary=values.get('binary'))
  2014-05-15 12:02:25.864 34503 TRACE nova.openstack.common.threadgroup ServiceBinaryExists: Service with host glover binary nova-conductor exists.

  After looking into the traceback, I came across this piece of code
  which starts a service.  Of note is the 2nd except path.  This
  exception was introduced in January to handle races when creating a
  record.  It seems as if we are racing similarly in our situation and
  might need to handle ServiceBinaryExists the same way
  ServiceTopicExists is being handled.  I'll submit a patch for this.

  From nova/service.py:

   158     def start(self):
   159         verstr = version.version_string_with_package()
   160         LOG.audit(_('Starting %(topic)s node (version %(version)s)'),
   161                   {'topic': self.topic, 'version': verstr})
   162         self.basic_config_check()
   163         self.manager.init_host()
   164         self.model_disconnected = False
   165         ctxt = context.get_admin_context()
   166         try:
   167             self.service_ref = self.conductor_api.service_get_by_args(ctxt,
   168                     self.host, self.binary)
   169             self.service_id = self.service_ref['id']
   170         except exception.NotFound:
   171             try:
   172                 self.service_ref = self._create_service_ref(ctxt)
   173             except exception.ServiceTopicExists:
   174                 # NOTE(danms): If we race to create a record with a sibling
   175                 # worker, don't fail here.
   176                 self.service_ref = self.conductor_api.service_get_by_args(ctxt,
   177                     self.host, self.binary)

  $ git blame -L 158,178  nova/service.py
  b3f5aba0 (Andy Smith         2010-12-09 15:25:14 -0800 158)     def start(self):
  481d6ff1 (Daniel P. Berrange 2012-12-17 12:17:59 +0000 159)         verstr = version.version_string_with_package()
  481d6ff1 (Daniel P. Berrange 2012-12-17 12:17:59 +0000 160)         LOG.audit(_('Starting %(topic)s node (version %(versi
  481d6ff1 (Daniel P. Berrange 2012-12-17 12:17:59 +0000 161)                   {'topic': self.topic, 'version': verstr})
  e8386a27 (Rafi Khardalian    2013-01-23 01:55:09 +0000 162)         self.basic_config_check()
  065257fb (Vishvananda Ishaya 2010-09-23 12:43:41 -0700 163)         self.manager.init_host()
  9c98cfb4 (Vishvananda Ishaya 2010-08-30 00:55:19 -0700 164)         self.model_disconnected = False
  5e3da586 (Vishvananda Ishaya 2010-10-01 05:57:17 -0700 165)         ctxt = context.get_admin_context()
  57a103b3 (Vishvananda Ishaya 2010-09-02 14:13:22 -0700 166)         try:
  e9d34263 (Russell Bryant     2013-01-14 17:39:29 -0500 167)             self.service_ref = self.conductor_api.service_get
  e9d34263 (Russell Bryant     2013-01-14 17:39:29 -0500 168)                     self.host, self.binary)
  e34bc343 (Wenhao Xu          2013-01-21 19:07:34 +0800 169)             self.service_id = self.service_ref['id']
  57a103b3 (Vishvananda Ishaya 2010-09-02 14:13:22 -0700 170)         except exception.NotFound:
  f6c341b4 (Dan Smith          2014-01-27 14:03:57 -0800 171)             try:
  f6c341b4 (Dan Smith          2014-01-27 14:03:57 -0800 172)                 self.service_ref = self._create_service_ref(c
  f6c341b4 (Dan Smith          2014-01-27 14:03:57 -0800 173)             except exception.ServiceTopicExists:
  f6c341b4 (Dan Smith          2014-01-27 14:03:57 -0800 174)                 # NOTE(danms): If we race to create a record 
  f6c341b4 (Dan Smith          2014-01-27 14:03:57 -0800 175)                 # worker, don't fail here.
  f6c341b4 (Dan Smith          2014-01-27 14:03:57 -0800 176)                 self.service_ref = self.conductor_api.service
  f6c341b4 (Dan Smith          2014-01-27 14:03:57 -0800 177)                     self.host, self.binary)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1326901/+subscriptions


References