yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #57154
[Bug 1589880] Re: report state failed
*** This bug is a duplicate of bug 1517926 ***
https://bugs.launchpad.net/bugs/1517926
kaka
I think it's duplicate of another problem (#1517926) that was fixed in:
rpodolyaka@rpodolyaka-pc:~/src/nova$ git tag --contains e0647dd4b2ae9f5f6f908102d2ac447440622785
12.0.1
12.0.2
12.0.3
12.0.4
rpodolyaka@rpodolyaka-pc:~/src/nova$ git show e0647dd4b2ae9f5f6f908102d2ac447440622785
commit e0647dd4b2ae9f5f6f908102d2ac447440622785
Author: Roman Podoliaka <rpodolyaka@xxxxxxxxxxxx>
Date: Thu Nov 19 16:00:01 2015 +0200
servicegroup: stop zombie service due to exception
If an exception is raised out of the _report_state call, we find that
the service no longer reports any updates to the database, so the
service is considered dead, thus creating a kind of zombie service.
I55417a5b91282c69432bb2ab64441c5cea474d31 seems to introduce a
regression, which leads to nova-* services marked as 'down', if an
error happens in a remote nova-conductor while processing a state
report: only Timeout errors are currently handled, but other errors
are possible, e.g. a DBError (wrapped with RemoteError on RPC
client side), if a DB temporarily goes away. This unhandled exception
will effectively break the state reporting thread - service will be
up again only after restart.
While the intention of I55417a5b91282c69432bb2ab64441c5cea474d31 was
to avoid cathing all the possible exceptions, but it looks like we must
do that to avoid creating a zombie.
The other part of that change was to ensure that during upgrade, we do
not spam the log server about MessagingTimeouts while the
nova-conductors are being restarted. This change ensures that still
happens.
Closes-Bug: #1517926
Change-Id: I44f118f82fbb811b790222face4c74d79795fe21
(cherry picked from commit 49b0d1741c674714fabf24d8409810064b953202)
and you seem to be using version 12.0.0.
Please try to update to the latest stable/liberty version of code and
re-open the bug if it's still reproduced (I did not manage to reproduce
it locally, presumably due to the fact that I use the version with the
commit above ^ applied).
** No longer affects: oslo.service
** This bug has been marked a duplicate of bug 1517926
Nova services stop to report state via remote conductor
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1589880
Title:
report state failed
Status in OpenStack Compute (nova):
Incomplete
Bug description:
Description:
=================
set master database read_only=on when switching master nova database to slave,after that,I check nova service status
# nova-manage service list
Binary Host Zone Status State Updated_At
nova-consoleauth 11_120 internal enabled XXX 2016-06-07 08:28:46
nova-conductor 11_120 internal enabled XXX 2016-06-07 08:28:45
nova-cert 11_120 internal enabled XXX 2016-05-17 08:12:10
nova-scheduler 11_120 internal enabled XXX 2016-05-17 08:12:24
nova-compute 11_121 bx enabled XXX 2016-06-07 08:28:49
nova-compute 11_122 bx enabled XXX 2016-06-07 08:28:42
=================
Steps to reproduce
=================
# mysql
MariaDB [nova]> set global read_only=on;
=================
Environment
================
Version:Liberty
openstack-nova-conductor-12.0.0-1.el7.noarch
Logs
================
2016-05-12 11:01:20.343 9198 ERROR oslo.service.loopingcall
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'nova.servicegroup.drivers.db.DbDriver._report_state' failed
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall Traceback (most recent call last):
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 113, in _run_loop
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall result = func(*self.args, **self.kw)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py", line 87, in _report_state
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall service.service_ref.save()
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 213, in wrapper
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall return fn(self, *args, **kwargs)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/nova/objects/service.py", line 250, in save
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall db_service = db.service_update(self._context, self.id, updates)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 153, in service_update
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall return IMPL.service_update(context, service_id, values)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 146, in wrapper
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall ectxt.value = e.inner_exc
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in __exit__
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall six.reraise(self.type_, self.value, self.tb)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 136, in wrapper
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall return f(*args, **kwargs)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 532, in service_update
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall service_ref.update(values)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 490, in __exit__
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall self.rollback()
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall compat.reraise(exc_type, exc_value, exc_tb)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 487, in __exit__
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall self.commit()
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 392, in commit
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall self._prepare_impl()
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 372, in _prepare_impl
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall self.session.flush()
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2004, in flush
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall self._flush(objects)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2122, in _flush
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall transaction.rollback(_capture_exception=True)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall compat.reraise(exc_type, exc_value, exc_tb)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2086, in _flush
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall flush_context.execute()
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall rec.execute(self)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 532, in execute
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall uow
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 170, in save_obj
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall mapper, table, update)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 672, in _emit_update_statements
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall execute(statement, multiparams)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall return meth(self, multiparams, params)
2016-05-12 11:01:20.473 9178 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall return connection._execute_clauseelement(self, multiparams, params)
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall compiled_sql, distilled_params
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall context)
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1335, in _handle_dbapi_exception
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall util.raise_from_cause(newraise, exc_info)
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall reraise(type(exception), exception, tb=exc_tb)
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall context)
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 442, in do_execute
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall cursor.execute(statement, parameters)
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall self.errorhandler(self, exc, value)
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall raise errorclass, errorvalue
2016-05-12 11:01:17.575 9177 ERROR oslo.service.loopingcall OperationalError: (_mysql_exceptions.OperationalError) (1290, 'The MariaDB server is running with the --read-only option so it cannot execute this statement') [SQL: u'UPDATE services SET updated_at=%s, report_count=%s, last_seen_up=%s WHERE services.id = %s'] [parameters: (datetime.datetime(2016, 5, 12, 3, 1, 17, 572767), 1189606, datetime.datetime(2016, 5, 12, 3, 1, 17, 571934), 2L)]
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1589880/+subscriptions
References