openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #15886
Re: [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)
vishvananda@xxxxxxxxx wrote on 08/15/2012 06:54:58 PM:
> From: Vishvananda Ishaya <vishvananda@xxxxxxxxx>
> To: OpenStack Development Mailing List
<openstack-dev@xxxxxxxxxxxxxxxxxxx>,
> Cc: "openstack@xxxxxxxxxxxxxxxxxxx \(openstack@xxxxxxxxxxxxxxxxxxx
> \)" <openstack@xxxxxxxxxxxxxxxxxxx>
> Date: 08/15/2012 06:58 PM
> Subject: Re: [Openstack] [openstack-dev] Discussion about where to
> put database for bare-metal provisioning (review 10726)
> Sent by: openstack-bounces+mjfork=us.ibm.com@xxxxxxxxxxxxxxxxxxx
>
> On Aug 15, 2012, at 3:17 PM, Michael J Fork <mjfork@xxxxxxxxxx> wrote:
>
> > I am interested in finding a solution that enables bare-metal and
> > virtualized requests to be serviced through the same scheduler where
> > the compute_nodes table has a full view of schedulable resources.
> > This would seem to simplify the end-to-end flow while opening up
> > some additional use cases (e.g. dynamic allocation of a node from
> > bare-metal to hypervisor and back).
> >
> > One approach would be to have a proxy running a single nova-compute
> > daemon fronting the bare-metal nodes . That nova-compute daemon
> > would report up many HostState objects (1 per bare-metal node) to
> > become entries in the compute_nodes table and accessible through the
> > scheduler HostManager object.
> > The HostState object would set cpu_info, vcpus, member_mb and
> > local_gb values to be used for scheduling with the hypervisor_host
> > field holding the bare-metal machine address (e.g. for IPMI based
> > commands) and hypervisor_type = NONE. The bare-metal Flavors are
> > created with an extra_spec of hypervisor_type= NONE and the
> > corresponding compute_capabilities_filter would reduce the available
> > hosts to those bare_metal nodes. The scheduler would need to
> > understand that hypervisor_type = NONE means you need an exact fit
> > (or best-fit) host vs weighting them (perhaps through the multi-
> > scheduler). The scheduler would cast out the message to the
> > <topic>.<service-hostname> (code today uses the HostState hostname),
> > with the compute driver having to understand if it must be serviced
> > elsewhere (but does not break any existing implementations since it
> > is 1 to 1).
> >
> > Does this solution seem workable? Anything I missed?
> > The bare metal driver already is proxying for the other nodes so it
> sounds like we need a couple of things to make this happen:
>
> a) modify driver.get_host_stats to be able to return a list of host
> stats instead of just one. Report the whole list back to the
> scheduler. We could modify the receiving end to accept a list as
> well or just make multiple calls to
> self.update_service_capabilities(capabilities)
>
> b) make a few minor changes to the scheduler to make sure filtering
> still works. Note the changes here may be very helpful:
>
> https://review.openstack.org/10327
>
> c) we have to make sure that instances launched on those nodes take
> up the entire host state somehow. We could probably do this by
> making sure that the instance_type ram, mb, gb etc. matches what the
> node has, but we may want a new boolean field "used" if those aren't
> sufficient.
My initial thought is that showing the actual resources the guest requested
as being consumed in HostState would enable use cases like migrating a
guest running on a too-big machine to a right-size one. However, that
would required the bare-metal node to store the state of the requested
guest when that information could be obtained from the instance_type.
For now, the simplest is probably to have the bare-metal virt driver set
the disk_available = 0 and host_memory_free = 0 so the scheduler removes
them from consideration, with the vcpus, disk_total, host_memory_total set
to the physical machine values. If the requested guest size is easily
accessible, the _used values could be set to those values (although not
clear if anything would break though with _total != _free + _used, in which
case setting _used = _total would seem to be acceptable for now).
Another options is to add num_instances to HostState and have the
bare-metal filter remove hypervisor_type = NONE with num_instances > 0.
The scheduler would never see them and then would be no need to show them
fully consumed. Drawback is that the num_instances call is marked as being
expensive and would incur some overhead.
> I This approach seems pretty good. We could potentially get rid of
> the shared bare_metal_node table. I guess the only other concern is
> how you populate the capabilities that the bare metal nodes are
> reporting. I guess an api extension that rpcs to a baremetal node to
> add the node. Maybe someday this could be autogenerated by the bare
> metal host looking in its arp table for dhcp requests! :)
>
> Vish
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp
Michael
-------------------------------------------------
Michael Fork
Cloud Architect, Emerging Solutions
IBM Systems & Technology Group
References