openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #06010
Re: [Orchestration] Handling error events ... explicit vs. implicit
On Wed, Dec 7, 2011 at 7:26 AM, Sandy Walsh <sandy.walsh@xxxxxxxxxxxxx> wrote:
> For orchestration (and now the scheduler improvements) we need to know when an operation fails ... and specifically, which resource was involved. In the majority of the cases it's an instance_uuid we're looking for, but it could be a security group id or a reservation id.
>
> With most of the compute.manager calls the resource id is the third parameter in the call (after self & context), but there are some oddities. And sometimes we need to know the additional parameters (like a migration id related to an instance uuid). So simply enforcing parameter orders may be insufficient and impossible to enforce programmatically.
>
> A little background:
>
> In nova, exceptions are generally handled in the RPC or middleware layers as a logged event and life goes on. In an attempt to tie this into the notification system, a while ago I added stuff to the wrap_exception decorator. I'm sure you've seen this nightmare scattered around the code:
> @exception.wrap_exception(notifier=notifier, publisher_id=publisher_id())
>
> What started as a simple decorator now takes parameters and the code has become nasty.
>
> But it works ... no matter where the exception was generated, the notifier gets:
> * compute.<host_id>
> * <method name>
> * and whatever arguments the method takes.
>
> So, we know what operation failed and the host it failed on, but someone needs to crack the argument nut to get the goodies. It's a fragile coupling from publisher to receiver.
I'm just wondering if we can get the notification message down to
something more standardized, and avoid including the full argument
list.
That is one way to reduce the coupling.
What is the minimum information we need to know when a failure occurs ?
I think we have
operation
host it failed on,
instance_id,
migration_id (maybe)
reservation_id, (maybe)
security group id (maybe)
If we can avoid cracking open the remaining arguments, a list this
long might be manageable.
>
> One, less fragile, alternative is to put a try/except block inside every top-level nova.compute.manager method and send meaningful exceptions right from the source. More fidelity, but messier code. Although "explicit is better than implicit" keeps ringing in my head.
>
I like explicit better than implicit, but I think we need to trigger
off any and all exceptions to make all of this reliable.
> Or, we make a general event parser that anyone can use ... but again, the link between the actual method and the parser is fragile. The developers have to remember to update both.
>
> Opinions?
>
> -S
>
>
References