← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1589880] Re: report state failed

 

*** This bug is a duplicate of bug 1517926 ***
    https://bugs.launchpad.net/bugs/1517926

kaka

I think it's duplicate of another problem (#1517926) that was fixed in:

rpodolyaka@rpodolyaka-pc:~/src/nova$ git tag --contains e0647dd4b2ae9f5f6f908102d2ac447440622785
12.0.1
12.0.2
12.0.3
12.0.4
rpodolyaka@rpodolyaka-pc:~/src/nova$ git show e0647dd4b2ae9f5f6f908102d2ac447440622785
commit e0647dd4b2ae9f5f6f908102d2ac447440622785
Author: Roman Podoliaka <rpodolyaka@xxxxxxxxxxxx>
Date:   Thu Nov 19 16:00:01 2015 +0200

    servicegroup: stop zombie service due to exception
    
    If an exception is raised out of the _report_state call, we find that
    the service no longer reports any updates to the database, so the
    service is considered dead, thus creating a kind of zombie service.
    
    I55417a5b91282c69432bb2ab64441c5cea474d31 seems to introduce a
    regression, which leads to nova-* services marked as 'down', if an
    error happens in a remote nova-conductor while processing a state
    report: only Timeout errors are currently handled, but other errors
    are possible, e.g. a DBError (wrapped with RemoteError on RPC
    client side), if a DB temporarily goes away. This unhandled exception
    will effectively break the state reporting thread - service will be
    up again only after restart.
    
    While the intention of I55417a5b91282c69432bb2ab64441c5cea474d31 was
    to avoid cathing all the possible exceptions, but it looks like we must
    do that to avoid creating a zombie.
    The other part of that change was to ensure that during upgrade, we do
    not spam the log server about MessagingTimeouts while the
    nova-conductors are being restarted. This change ensures that still
    happens.
    
    Closes-Bug: #1517926
    
    Change-Id: I44f118f82fbb811b790222face4c74d79795fe21
    (cherry picked from commit 49b0d1741c674714fabf24d8409810064b953202)


and you seem to be using version 12.0.0.

Please try to update to the latest stable/liberty version of code and
re-open the bug if it's still reproduced (I did not manage to reproduce
it locally, presumably due to the fact that I use the version with the
commit above ^ applied).

** No longer affects: oslo.service

** This bug has been marked a duplicate of bug 1517926
   Nova services stop to report state via remote conductor

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1589880

Title:
  report state failed

Status in OpenStack Compute (nova):
  Incomplete

Bug description:
  Description:
  =================
  set master database read_only=on when switching master nova database to slave,after that,I check nova service status
  # nova-manage service list
  Binary           Host                              Zone             Status     State Updated_At
  nova-consoleauth 11_120                            internal         enabled    XXX   2016-06-07 08:28:46
  nova-conductor   11_120                            internal         enabled    XXX   2016-06-07 08:28:45
  nova-cert        11_120                            internal         enabled    XXX   2016-05-17 08:12:10
  nova-scheduler   11_120                            internal         enabled    XXX   2016-05-17 08:12:24
  nova-compute     11_121                            bx               enabled    XXX   2016-06-07 08:28:49
  nova-compute     11_122                            bx               enabled    XXX   2016-06-07 08:28:42
  =================

  Steps to reproduce
  =================
  # mysql
  MariaDB [nova]> set global read_only=on;
  =================

  Environment
  ================
  Version:Liberty
  openstack-nova-conductor-12.0.0-1.el7.noarch

  Logs
  ================

  2016-05-12 11:01:20.343 9198 ERROR oslo.service.loopingcall 
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'nova.servicegroup.drivers.db.DbDriver._report_state' failed
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall Traceback (most recent call last):
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 113, in _run_loop
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     result = func(*self.args, **self.kw)
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py", line 87, in _report_state
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     service.service_ref.save()
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 213, in wrapper
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     return fn(self, *args, **kwargs)
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/nova/objects/service.py", line 250, in save
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     db_service = db.service_update(self._context, self.id, updates)
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 153, in service_update
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     return IMPL.service_update(context, service_id, values)
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 146, in wrapper
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     ectxt.value = e.inner_exc
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in __exit__
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     six.reraise(self.type_, self.value, self.tb)
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 136, in wrapper
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     return f(*args, **kwargs)
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 532, in service_update
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     service_ref.update(values)
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 490, in __exit__
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     self.rollback()
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     compat.reraise(exc_type, exc_value, exc_tb)
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 487, in __exit__
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     self.commit()
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 392, in commit
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     self._prepare_impl()
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 372, in _prepare_impl
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     self.session.flush()
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2004, in flush
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     self._flush(objects)
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2122, in _flush
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     transaction.rollback(_capture_exception=True)
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     compat.reraise(exc_type, exc_value, exc_tb)
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2086, in _flush
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     flush_context.execute()
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     rec.execute(self)
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 532, in execute
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     uow
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 170, in save_obj
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     mapper, table, update)
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 672, in _emit_update_statements
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     execute(statement, multiparams)
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall     return meth(self, multiparams, params)
  2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
  2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall     return connection._execute_clauseelement(self, multiparams, params)
  2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
  2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall     compiled_sql, distilled_params
  2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
  2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall     context)
  2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1335, in _handle_dbapi_exception
  2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall     util.raise_from_cause(newraise, exc_info)
  2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
  2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall     reraise(type(exception), exception, tb=exc_tb)
  2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
  2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall     context)
  2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 442, in do_execute
  2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall     cursor.execute(statement, parameters)
  2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute
  2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall     self.errorhandler(self, exc, value)
  2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall   File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
  2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall     raise errorclass, errorvalue
  2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall OperationalError: (_mysql_exceptions.OperationalError) (1290, 'The MariaDB server is running with the --read-only option so it cannot execute this statement') [SQL: u'UPDATE services SET updated_at=%s, report_count=%s, last_seen_up=%s WHERE services.id = %s'] [parameters: (datetime.datetime(2016, 5, 12, 3, 1, 17, 572767), 1189606, datetime.datetime(2016, 5, 12, 3, 1, 17, 571934), 2L)]

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1589880/+subscriptions


References