← Back to team overview

openstack team mailing list archive

Diablo multi-zone, basescheduler, host_filter, various bugs running over KVM

 

Sandy, Ed, Chris,

We personalize this mail to you guys cause in the source you are listed as
the authors of the Scheduler module. Here's the thing, we have :

- 2 nova Diablo clusters (16 nodes, 1 controller, 15 computes - controller
running sched, api and network services - KVM as HV) + keystone
-- Using BaseScheduler on nova.conf and AllHostsFilter filter
-- Zone_db_update_interval set to 10 seconds
-- Cluster1 is the parent of the zone, and Cluster2 is the only child
-- Zone capabilities are listed and refreshed succesfully from the parent
zone
-- Integration with keystone is working without problems

What we have faced so far :

- A bug in host_filter.py ( on the scheduler module ) , that cause that
filter_name never matches against the hole key
"--default_host_filter=nova.scheduler.filters.AllHostsFilter" cause
filter_name was only returning the last part of the key only
"AllHostsFilter" . After dirty hacking host_filter.py and adding a split on
the if condition " if filter_class.__name__ == filter_name.split(".")[3]: "
we manage to move on debugging since the "AllHostFilter" class was actually
loaded.
- After that and a lot of debug entries we realize that on the
base_scheduler.py when we tried to return "AllHostsFilter.filter_hosts"
that uses zone_manager.service_state dict , this one was empty. So after
reading : https://answers.launchpad.net/nova/+question/163986 we tried to
call the ZoneManager.update_service_capabilities by hand using nova-manage
as is commented on the launchpad post, and the service_state dict, gets
filled but only with None and the TimeStamp. So, when we tried to run
instances, the scheduler doesnt know where to cast it, cause theres no host
to select, cause theres no service info on the the service_state dict, so
if we look at the nova-scheduler.log we can see a clear "casting to
compute.None" message.

So, we manage to update the service capabilites of the 2 cluster controller
nodes, but it seems that we're not calling it with all the needed
parameters cause were not getting the state of any other servers in the
cluster. Is there any way to call periodically the
update_service_capabilities in a way that we can add it to both clusters ,
with all the required params, so we can actually get multi-zone working on
diablo with KVM as HV ?

PS: we're copying the openstack user list, but we didnt get any help yet
since we submitted this question last week and as a bug today.

Best Regards
Lean