openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #09306
Re: Validation of floating IP opertaions in Essex codebase ?
Thanks Vish, that makes it clearer. I guess the validation can be handled by which ever manager picks up the call rather than having to be validates on the manager of a specific host (assuming multi-host of course), which should mean it's still reasonably responsive.
Just looking through the code it looks to me that there a few things that might still need clearing up to make this separation work. For example:
_add_flaoting_ip calls to the compute.api (makes sense now) - which could do whatever validation makes sense at the instance level and then passes on the network.api. But _remove_floating_ip calls direct to network_api, so even if the Instance wanted to do some validation it can't. Shouldn't both pass through compute.api in this new model ?
There are also a few other casts left in the Network API layer:
release_floating_ip
deallocate_for_instance
add_fixed_ip_to_instance
remove_fixed_ip_from_instance
add_network_to_project
If the network manager if now the only thing that can perform validation shouldn't all of these be turned into calls as well ?
Cheers,
Phil
From: Vishvananda Ishaya [mailto:vishvananda@xxxxxxxxx]
Sent: 28 March 2012 23:26
To: Day, Phil
Cc: openstack@xxxxxxxxxxxxxxxxxxx (openstack@xxxxxxxxxxxxxxxxxxx) (openstack@xxxxxxxxxxxxxxxxxxx)
Subject: Re: [Openstack] Validation of floating IP opertaions in Essex codebase ?
On Mar 28, 2012, at 10:04 AM, Day, Phil wrote:
Hi Folks,
At the risk of looking lazy in my first question by following up with a second:
So I tracked this down in the code and can see that the validation has moved into network/manager.py, and what was a validation/cast in network/api.py has been replaced with a call - but that seems to make the system more tightly coupled across components (i.e. if my there is a problem getting the message to the Network Manager then even an invalid request will be blocked until the call returns or times out).
This is a side effect of trying to decouple compute and network, see the explanation below.
It also looks as if the validation for disassociate_floating_ip has also been moved to the manager, but this is still a cast from the api layer - so those error messages never get back to the user.
Good point. This probably needs to be a call with the current model.
Coming from Diablo it all feels kind of odd to me - I thought we were trying to validate what we could of requests in the API server and return immediate errors at that stage and then cast into the system (so that only internal errors can stop something from working at this stage). Was there a deliberate design policy around this at some stage ?
There are a few things going on here.
First we have spent a lot of time decoupling network and compute. Ultimately network will be an external service, so we can't depend on having access to the network database on the compute api side. We can do a some checks in compute_api to make sure that it isn't attached to another instance that we know about, but ultimately the network service has to be responsible for saying what can happen with the ip address.
So the second part is about why it is happening in network_manager vs network_api. This is a side-effect of the decision to plug in quantum/melange/etc. at the manager layer instead of the api layer. The api layer is therefore being very dumb, just passing requests on to the manager.
So that explains where we are. Here is the plan (as I understand) for the future:
a) move the quantum plugin to the api layer
(At this point we could move validation into the api if necessary.)
b) define a more complete network api which includes all of the necessary features that are currently compute extensions
c) make a client to talk to the api
d) make compute talk through the client to the api instead of using rabbit messages
(this decouples network completely, allowing us to deploy and run network as a completely separate service if need be. At this point the quantum-api-plugin could be part of quantum or a new shared NaaS project. More to decide at the summit here)
In general, we are hoping to switch to quantum as the default by Folsom, and not have to touch the legacy network code very much. If there are serious performance issues we could make some optimizations by doing checks in network-api, but these will quickly become moot if we are moving towards using a client and talking through a rest interface.
So Looks like the following could be done in the meantime:
a) switch disassociate from a cast to a call -> i would consider this one a a bug and would appreciate someone verifying that it fails and reporting it
b) add some validation in compute api -> I'm not sure what we can assert here. Perhaps we could use the network_info cache and check for duplicates etc.
c) if we have serious performance issues, we could add another layer of checks in the compute_api, but we may have to make sure that we make sure it is ignored for quantum.
References