openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #03789
Re: New nova service proposal
I would think we have enough tracking information to support the goal of
identifying failures. In any scenario, some of the failures will simply be
unrecoverable.
Regarding the process crashing, who's to say the retry process also
wouldn't crash? We could endlessly argue the arbiter/watchdog processes
will crash at each tier. As such, I think it's better to say we need a
simpler mechanism for identifying failures and perhaps a best-effort
retry.
Retrying can be scary, to say the least. You can't possibly handle all of
the possible failure scenarios, and some of the ones you think you can
might be different in subtle ways such that retrying them only causes more
issues.
I agree with Lamar that we could make things significantly more reliable,
and I think that's where we should start. We may find that, after some
stabilization work, the failure rate is acceptably low and any retry
mechanism is no longer required.
On 8/29/11 11:24 AM, "Kevin L. Mitchell" <kevin.mitchell@xxxxxxxxxxxxx>
wrote:
>On Fri, 2011-08-26 at 23:10 +0000, Monsyne Dragon wrote:
>> First off, I think it would be better if whatever had the failure
>> responded by sending a request somewhere (a cast) to say "Hey, this
>> bombed. Retry it. "
>
>What if the failure was due to the process crashing, so that it can't
>possibly send a request/cast off for retry?
>--
>Kevin L. Mitchell <kevin.mitchell@xxxxxxxxxxxxx>
>
>This email may include confidential information. If you received it in
>error, please delete it.
>_______________________________________________
>Mailing list: https://launchpad.net/~openstack
>Post to : openstack@xxxxxxxxxxxxxxxxxxx
>Unsubscribe : https://launchpad.net/~openstack
>More help : https://help.launchpad.net/ListHelp
This email may include confidential information. If you received it in error, please delete it.
References