openstack team mailing list archive

Thread
Date

Re: [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

To: OpenStack Development Mailing List <openstack-dev@xxxxxxxxxxxxxxxxxxx>
From: David Kang <dkang@xxxxxxx>
Date: Wed, 15 Aug 2012 18:57:29 -0700 (PDT)
Cc: "openstack@xxxxxxxxxxxxxxxxxxx \(openstack@xxxxxxxxxxxxxxxxxxx\)" <openstack@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <B74307F6-13B0-4B1E-AB04-DE9704FE2F3B@gmail.com>

 I see.
We (NTT and USC/ISI) will discuss more about that.
And we will implement according to your input.
>From now, we assume to use current db.

 Here are my understanding of your comment. Please correct me if it is not correct.

 a) is clear.
 b) NTT Docomo has implemented some features that you mentioned in a new file baremetal_host_manager.py.
  We will not use baremetal_host_manager.py but will modify host_manager.py directly.
  I'm not sure if we have to change the scheduler. 
 c) I think "used" field my work. We'll look into it.

 Thanks,
 David

----------------------
Dr. Dong-In "David" Kang
Computer Scientist
USC/ISI

----- Original Message -----
> This was an immediate goal, the bare-metal nova-compute node could
> keep an internal database, but report capabilities through nova in the
> common way with the changes below. Then the scheduler wouldn't need
> access to the bare metal database at all.
> 
> On Aug 15, 2012, at 4:23 PM, David Kang <dkang@xxxxxxx> wrote:
> 
> >
> > Hi Vish,
> >
> > Is this discussion for long-term goal or for this Folsom release?
> >
> > We still believe that bare-metal database is needed
> > because there is not an automated way how bare-metal nodes report
> > their capabilities
> > to their bare-metal nova-compute node.
> >
> > Thanks,
> > David
> >
> >>
> >> I am interested in finding a solution that enables bare-metal and
> >> virtualized requests to be serviced through the same scheduler
> >> where
> >> the compute_nodes table has a full view of schedulable resources.
> >> This
> >> would seem to simplify the end-to-end flow while opening up some
> >> additional use cases (e.g. dynamic allocation of a node from
> >> bare-metal to hypervisor and back).
> >>
> >> One approach would be to have a proxy running a single nova-compute
> >> daemon fronting the bare-metal nodes . That nova-compute daemon
> >> would
> >> report up many HostState objects (1 per bare-metal node) to become
> >> entries in the compute_nodes table and accessible through the
> >> scheduler HostManager object.
> >>
> >>
> >>
> >>
> >> The HostState object would set cpu_info, vcpus, member_mb and
> >> local_gb
> >> values to be used for scheduling with the hypervisor_host field
> >> holding the bare-metal machine address (e.g. for IPMI based
> >> commands)
> >> and hypervisor_type = NONE. The bare-metal Flavors are created with
> >> an
> >> extra_spec of hypervisor_type= NONE and the corresponding
> >> compute_capabilities_filter would reduce the available hosts to
> >> those
> >> bare_metal nodes. The scheduler would need to understand that
> >> hypervisor_type = NONE means you need an exact fit (or best-fit)
> >> host
> >> vs weighting them (perhaps through the multi-scheduler). The
> >> scheduler
> >> would cast out the message to the <topic>.<service-hostname> (code
> >> today uses the HostState hostname), with the compute driver having
> >> to
> >> understand if it must be serviced elsewhere (but does not break any
> >> existing implementations since it is 1 to 1).
> >>
> >>
> >>
> >>
> >>
> >> Does this solution seem workable? Anything I missed?
> >>
> >> The bare metal driver already is proxying for the other nodes so it
> >> sounds like we need a couple of things to make this happen:
> >>
> >>
> >> a) modify driver.get_host_stats to be able to return a list of host
> >> stats instead of just one. Report the whole list back to the
> >> scheduler. We could modify the receiving end to accept a list as
> >> well
> >> or just make multiple calls to
> >> self.update_service_capabilities(capabilities)
> >>
> >>
> >> b) make a few minor changes to the scheduler to make sure filtering
> >> still works. Note the changes here may be very helpful:
> >>
> >>
> >> https://review.openstack.org/10327
> >>
> >>
> >> c) we have to make sure that instances launched on those nodes take
> >> up
> >> the entire host state somehow. We could probably do this by making
> >> sure that the instance_type ram, mb, gb etc. matches what the node
> >> has, but we may want a new boolean field "used" if those aren't
> >> sufficient.
> >>
> >>
> >> I This approach seems pretty good. We could potentially get rid of
> >> the
> >> shared bare_metal_node table. I guess the only other concern is how
> >> you populate the capabilities that the bare metal nodes are
> >> reporting.
> >> I guess an api extension that rpcs to a baremetal node to add the
> >> node. Maybe someday this could be autogenerated by the bare metal
> >> host
> >> looking in its arp table for dhcp requests! :)
> >>
> >>
> >> Vish
> >>
> >> _______________________________________________
> >> OpenStack-dev mailing list
> >> OpenStack-dev@xxxxxxxxxxxxxxxxxxx
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev@xxxxxxxxxxxxxxxxxxx
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev@xxxxxxxxxxxxxxxxxxx
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

References

Re: [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)
From: Vishvananda Ishaya, 2012-08-16