openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #16180
Re: [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)
Let's summarize what we are agreeing on.
One service entry with multiple bare-metal compute_node entries are registered at the start of bare-metal nova-compute.
'hypervisor_hostname' must be different for each bare-metal machine, such as 'bare-metal-0001.xxx.com', 'bare-metal-0002.xxx.com', etc.) [I think Arata suggests to use just 0001, 0002, ....]
One extension we need to do at the scheduler side is using (host, hypervisor_hostname) instead of (host) only in host_manager.py.[I think Arata suggest to use "host/baremetal-node-id" as the key.]
'HostManager.service_state' is { <host> : { <service > : { cap k : v }}}.
The followings were suggested for new "HostaManager.service_state'
{ <host> : { <service> : { <hypervisor_name> : { cap k : v }}}}
{ <host>/<bm_node_id> : { <service> : { cap k : v }}}
Please correct/edit it.
Thanks,
David
----- Original Message -----
> Hi Michael,
>
> > Looking at line 203 in nova/scheduler/filter_scheduler.py, the
> > target host in the cast call is weighted_host*.*host_state*.*host
> > and not a service host. (My guess is this will likely require a fair
> > number of changes in the scheduler area to change cast calls to
> > target a service host instead of a compute node)
>
> weighted_host.host_state.host still seems to be service['host']...
> Please look at it again with me.
>
> # First, HostStateManager.get_all_host_states:
> # host_manager.py:264
> compute_nodes = db.compute_node_get_all(context)
> for compute in compute_nodes:
> # service is from services table (joined-loaded with compute_nodes)
> service = compute['service']
> if not service:
> LOG.warn(_("No service for compute ID %s") % compute['id'])
> continue
> host = service['host']
> capabilities = self.service_states.get(host, None)
> # go to HostState constructor:
> # the 1st parameter 'host' is service['host']
> host_state = self.host_state_cls(host, topic,
> capabilities=capabilities,
> service=dict(service.iteritems()))
>
> # host_manager.py:101
> def __init__(self, host, topic, capabilities=None, service=None):
> self.host = host
> self.topic = topic
> # here, HostState.host is service['host']
>
> Then, update_from_compute_node(compute) is called but it leaves
> self.host unchanged.
> WeightedHost.host_state is this HostState. So, host at
> filter_scheduler.py:203 is service['host']. We can use existing code
> about RPC target. Do I miss something?
>
> Thanks,
> Arata
>
>
> (2012/08/28 6:45), Michael J Fork wrote:
> > VTJ NOTSU Arata <notsu@xxxxxxxxxxxxxx> wrote on 08/27/2012 05:19:40
> > PM:
> >
> > > From: VTJ NOTSU Arata <notsu@xxxxxxxxxxxxxx>
> > > To: Michael J Fork/Rochester/IBM@IBMUS,
> > > Cc: David Kang <dkang@xxxxxxx>, OpenStack Development Mailing
> > > List
> > > <openstack-dev@xxxxxxxxxxxxxxxxxxx>, openstack-bounces
> > > +mjfork=us.ibm.com@xxxxxxxxxxxxxxxxxxx,
> > > "openstack@xxxxxxxxxxxxxxxxxxx (openstack@xxxxxxxxxxxxxxxxxxx)"
> > > <openstack@xxxxxxxxxxxxxxxxxxx>
> > > Date: 08/27/2012 05:19 PM
> > > Subject: Re: [Openstack] [openstack-dev] Discussion about where
> > > to
> > > put database for bare-metal provisioning (review 10726)
> > >
> > > Hello all,
> > >
> > > It seems that the requirement for keys of
> > > HostManager.service_state
> > > is just to be unique;
> > > these do not have to be valid hostnames or queues (Already,
> > > existingcode casts
> > > messages to <topic>.<service-hostname>. Michael, doesn't it?).
> >
> > Looking at line 203 in nova/scheduler/filter_scheduler.py, the
> > target host in the cast call is weighted_host*.*host_state*.*host
> > and not a service host. (My guess is this will likely require a fair
> > number of changes in the scheduler area to change cast calls to
> > target a service host instead of a compute node)
> >
> > > So, I tried
> > > '<host>/<bm_node_id>' as 'host' of capabilities. Then,
> > > HostManager.service_state is:
> > > { <host>/<bm_node_id> : { <service> : { cap k : v }}}.
> > > So far, it works fine. How about this way?
> >
> > I will defer to Vish here, but seems like a reasonable solution.
> >
> > > I paste relevant code in the bottom of this mail just to make
> > > sure.
> > > NOTE: I added a new column 'nodename' to compute_nodes to store
> > > bm_node_id,
> > > but storing it in 'hypervisor_hostname' may be a right solution.
> >
> > Again, I will defer to Vish, but seems like using the existing
> > "hypervisor_hostname" would be correct (otherwise I have no idea
> > what that field would have been intended for).
> >
> > > (The whole code is in our github(NTTdocomo-openstack/nova, branch
> > > 'multinode'),
> > > multiple resource_trackers are also implemented.)
> > >
> > > Thanks,
> > > Arata
> > >
> > >
> > > diff --git a/nova/scheduler/host_manager.py
> > > b/nova/scheduler/host_manager.py
> > > index 33ba2c1..567729f 100644
> > > --- a/nova/scheduler/host_manager.py
> > > +++ b/nova/scheduler/host_manager.py
> > > @@ -98,9 +98,10 @@ class HostState(object):
> > > previously used and lock down access.
> > > """
> > >
> > > - def __init__(self, host, topic, capabilities=None,
> > > service=None):
> > > + def __init__(self, host, topic, capabilities=None,
> > > service=None, nodename=None):
> > > self.host = host
> > > self.topic = topic
> > > + self.nodename = nodename
> > >
> > > # Read-only capability dicts
> > >
> > > @@ -175,8 +176,8 @@ class HostState(object):
> > > return True
> > >
> > > def __repr__(self):
> > > - return ("host '%s': free_ram_mb:%s free_disk_mb:%s" %
> > > - (self.host, self.free_ram_mb, self.free_disk_mb))
> > > + return ("host '%s' / nodename '%s': free_ram_mb:%s
> > > free_disk_mb:%s" %
> > > + (self.host, self.nodename, self.free_ram_mb,
> > > self.free_disk_mb))
> > >
> > >
> > > class HostManager(object):
> > > @@ -268,11 +269,16 @@ class HostManager(object):
> > > LOG.warn(_("No service for compute ID %s") %
> > > compute['id'])
> > > continue
> > > host = service['host']
> > > - capabilities = self.service_states.get(host, None)
> > > + if compute['nodename']:
> > > + host_node = '%s/%s' % (host, compute['nodename'])
> > > + else:
> > > + host_node = host
> > > + capabilities = self.service_states.get(host_node, None)
> > > host_state = self.host_state_cls(host, topic,
> > > capabilities=capabilities,
> > > - service=dict(service.iteritems()))
> > > + service=dict(service.iteritems()),
> > > + nodename=compute['nodename'])
> > > host_state.update_from_compute_node(compute)
> > > - host_state_map[host] = host_state
> > > + host_state_map[host_node] = host_state
> > >
> > > return host_state_map
> > >
> > > diff --git a/nova/virt/baremetal/driver.py
> > > b/nova/virt/baremetal/driver.py
> > > index 087d1b6..dbcfbde 100644
> > > --- a/nova/virt/baremetal/driver.py
> > > +++ b/nova/virt/baremetal/driver.py
> > > (skip...)
> > > + def _create_node_cap(self, node):
> > > + dic = self._node_resources(node)
> > > + dic['host'] = '%s/%s' % (FLAGS.host, node['id'])
> > > + dic['cpu_arch'] = self._extra_specs.get('cpu_arch')
> > > + dic['instance_type_extra_specs'] = self._extra_specs
> > > + dic['supported_instances'] = self._supported_instances
> > > + # TODO: put node's extra specs
> > > + return dic
> > >
> > > def get_host_stats(self, refresh=False):
> > > - return self._get_host_stats()
> > > + caps = []
> > > + context = nova_context.get_admin_context()
> > > + nodes = bmdb.bm_node_get_all(context,
> > > + service_host=FLAGS.host)
> > > + for node in nodes:
> > > + node_cap = self._create_node_cap(node)
> > > + caps.append(node_cap)
> > > + return caps
> > >
> > >
> > > (2012/08/28 5:55), Michael J Fork wrote:
> > > > openstack-bounces+mjfork=us.ibm.com@xxxxxxxxxxxxxxxxxxx wrote
> > > > on
> > > 08/27/2012 02:58:56 PM:
> > > >
> > > > > From: David Kang <dkang@xxxxxxx>
> > > > > To: Vishvananda Ishaya <vishvananda@xxxxxxxxx>,
> > > > > Cc: OpenStack Development Mailing List <openstack-
> > > > > dev@xxxxxxxxxxxxxxxxxxx>, "openstack@xxxxxxxxxxxxxxxxxxx \
> > > > > (openstack@xxxxxxxxxxxxxxxxxxx\)"
> > > > > <openstack@xxxxxxxxxxxxxxxxxxx>
> > > > > Date: 08/27/2012 03:06 PM
> > > > > Subject: Re: [Openstack] [openstack-dev] Discussion about
> > > > > where to
> > > > > put database for bare-metal provisioning (review 10726)
> > > > > Sent by:
> > > > > openstack-bounces+mjfork=us.ibm.com@xxxxxxxxxxxxxxxxxxx
> > > > >
> > > > >
> > > > > Hi Vish,
> > > > >
> > > > > I think I understand your idea.
> > > > > One service entry with multiple bare-metal compute_node
> > > > > entries are
> > > > > registered at the start of bare-metal nova-compute.
> > > > > 'hypervisor_hostname' must be different for each bare-metal
> > > > > machine,
> > > > > such as 'bare-metal-0001.xxx.com',
> > > > > 'bare-metal-0002.xxx.com', etc.)
> > > > > But their IP addresses must be the IP address of bare-metal
> > > > > nova-
> > > > > compute, such that an instance is casted
> > > > > not to bare-metal machine directly but to bare-metal
> > > > > nova-compute.
> > > >
> > > > I believe the change here is to cast out the message to the
> > > <topic>.<service-hostname>. Existing code sends it to the
> > > compute_node hostname (see line 202 of nova/scheduler/
> > > filter_scheduler.py, specifically
> > > host=weighted_host.host_state.host). Changing that to cast to the
> > > service hostname would send the message to the bare-metal proxy
> > > node
> > > and should not have an effect on current deployments since the
> > > service hostname and the host_state.host would always be equal.
> > > This model will also let you keep the bare-metal compute node IP
> > > in
> > > the compute node table.
> > > >
> > > > > One extension we need to do at the scheduler side is using
> > > > > (host,
> > > > > hypervisor_hostname) instead of (host) only in
> > > > > host_manager.py.
> > > > > 'HostManager.service_state' is { <host> : { <service > : {
> > > > > cap k : v }}}.
> > > > > It needs to be changed to { <host> : { <service> : {
> > > > > <hypervisor_name> : { cap k : v }}}}.
> > > > > Most functions of HostState need to be changed to use (host,
> > > > > hypervisor_name) pair to identify a compute node.
> > > >
> > > > Would an alternative here be to change the top level "host" to
> > > > be
> > > the hypervisor_hostname and enforce uniqueness?
> > > >
> > > > > Are we on the same page, now?
> > > > >
> > > > > Thanks,
> > > > > David
> > > > >
> > > > > ----- Original Message -----
> > > > > > Hi David,
> > > > > >
> > > > > > I just checked out the code more extensively and I don't
> > > > > > see why you
> > > > > > need to create a new service entry for each compute_node
> > > > > > entry. The
> > > > > > code in host_manager to get all host states explicitly
> > > > > > gets all
> > > > > > compute_node entries. I don't see any reason why multiple
> > > > > > compute_node
> > > > > > entries can't share the same service. I don't see any
> > > > > > place in the
> > > > > > scheduler that is grabbing records by "service" instead of
> > > > > > by "compute
> > > > > > node", but if there is one that I missed, it should be
> > > > > > fairly easy to
> > > > > > change it.
> > > > > >
> > > > > > The compute_node record is created in the
> > > > > > compute/resource_tracker.py
> > > > > > as of a recent commit, so I think the path forward would
> > > > > > be to make
> > > > > > sure that one of the records is created for each bare
> > > > > > metal node by
> > > > > > the bare metal compute, perhaps by having multiple
> > > > > > resource_trackers.
> > > > > >
> > > > > > Vish
> > > > > >
> > > > > > On Aug 27, 2012, at 9:40 AM, David Kang <dkang@xxxxxxx>
> > > > > > wrote:
> > > > > >
> > > > > > >
> > > > > > > Vish,
> > > > > > >
> > > > > > > I think I don't understand your statement fully.
> > > > > > > Unless we use different hostnames, (hostname,
> > > > > > > hypervisor_hostname)
> > > > > > > must be the
> > > > > > > same for all bare-metal nodes under a bare-metal
> > > > > > > nova-compute.
> > > > > > >
> > > > > > > Could you elaborate the following statement a little
> > > > > > > bit more?
> > > > > > >
> > > > > > >> You would just have to use a little more than hostname.
> > > > > > >> Perhaps
> > > > > > >> (hostname, hypervisor_hostname) could be used to update
> > > > > > >> the entry?
> > > > > > >>
> > > > > > >
> > > > > > > Thanks,
> > > > > > > David
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > ----- Original Message -----
> > > > > > >> I would investigate changing the capabilities to key
> > > > > > >> off of
> > > > > > >> something
> > > > > > >> other than hostname. It looks from the table structure
> > > > > > >> like
> > > > > > >> compute_nodes could be have a many-to-one relationship
> > > > > > >> with
> > > > > > >> services.
> > > > > > >> You would just have to use a little more than hostname.
> > > > > > >> Perhaps
> > > > > > >> (hostname, hypervisor_hostname) could be used to update
> > > > > > >> the entry?
> > > > > > >>
> > > > > > >> Vish
> > > > > > >>
> > > > > > >> On Aug 24, 2012, at 11:23 AM, David Kang
> > > > > > >> <dkang@xxxxxxx> wrote:
> > > > > > >>
> > > > > > >>>
> > > > > > >>> Vish,
> > > > > > >>>
> > > > > > >>> I've tested your code and did more testing.
> > > > > > >>> There are a couple of problems.
> > > > > > >>> 1. host name should be unique. If not, any repetitive
> > > > > > >>> updates of
> > > > > > >>> new
> > > > > > >>> capabilities with the same host name are simply
> > > > > > >>> overwritten.
> > > > > > >>> 2. We cannot generate arbitrary host names on the fly.
> > > > > > >>> The scheduler (I tested filter scheduler) gets host
> > > > > > >>> names from
> > > > > > >>> db.
> > > > > > >>> So, if a host name is not in the 'services' table,
> > > > > > >>> it is not
> > > > > > >>> considered by the scheduler at all.
> > > > > > >>>
> > > > > > >>> So, to make your suggestions possible, nova-compute
> > > > > > >>> should
> > > > > > >>> register
> > > > > > >>> N different host names in 'services' table,
> > > > > > >>> and N corresponding entries in 'compute_nodes' table.
> > > > > > >>> Here is an example:
> > > > > > >>>
> > > > > > >>> mysql> select id, host, binary, topic, report_count,
> > > > > > >>> disabled,
> > > > > > >>> availability_zone from services;
> > > > > > >>> +----+-------------+----------------+-----------
> > > > > +--------------+----------+-------------------+
> > > > > > >>> | id | host | binary | topic | report_count | disabled
> > > > > > >>> | |
> > > > > > >>> | availability_zone |
> > > > > > >>> +----+-------------+----------------+-----------
> > > > > +--------------+----------+-------------------+
> > > > > > >>> | 1 | bespin101 | nova-scheduler | scheduler | 17145
> > > > > > >>> | | 0 | nova |
> > > > > > >>> | 2 | bespin101 | nova-network | network | 16819 | 0
> > > > > > >>> | | nova |
> > > > > > >>> | 3 | bespin101-0 | nova-compute | compute | 16405 |
> > > > > > >>> | 0 | nova |
> > > > > > >>> | 4 | bespin101-1 | nova-compute | compute | 1 | 0 |
> > > > > > >>> | nova |
> > > > > > >>> +----+-------------+----------------+-----------
> > > > > +--------------+----------+-------------------+
> > > > > > >>>
> > > > > > >>> mysql> select id, service_id, hypervisor_hostname from
> > > > > > >>> compute_nodes;
> > > > > > >>> +----+------------+------------------------+
> > > > > > >>> | id | service_id | hypervisor_hostname |
> > > > > > >>> +----+------------+------------------------+
> > > > > > >>> | 1 | 3 | bespin101.east.isi.edu |
> > > > > > >>> | 2 | 4 | bespin101.east.isi.edu |
> > > > > > >>> +----+------------+------------------------+
> > > > > > >>>
> > > > > > >>> Then, nova db (compute_nodes table) has entries of
> > > > > > >>> all bare-metal
> > > > > > >>> nodes.
> > > > > > >>> What do you think of this approach.
> > > > > > >>> Do you have any better approach?
> > > > > > >>>
> > > > > > >>> Thanks,
> > > > > > >>> David
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> ----- Original Message -----
> > > > > > >>>> To elaborate, something the below. I'm not absolutely
> > > > > > >>>> sure you
> > > > > > >>>> need
> > > > > > >>>> to
> > > > > > >>>> be able to set service_name and host, but this gives
> > > > > > >>>> you the
> > > > > > >>>> option
> > > > > > >>>> to
> > > > > > >>>> do so if needed.
> > > > > > >>>>
> > > > > > >>>> iff --git a/nova/manager.py b/nova/manager.py
> > > > > > >>>> index c6711aa..c0f4669 100644
> > > > > > >>>> --- a/nova/manager.py
> > > > > > >>>> +++ b/nova/manager.py
> > > > > > >>>> @@ -217,6 +217,8 @@ class
> > > > > > >>>> SchedulerDependentManager(Manager):
> > > > > > >>>>
> > > > > > >>>> def update_service_capabilities(self, capabilities):
> > > > > > >>>> """Remember these capabilities to send on next
> > > > > > >>>> periodic
> > > > > > >>>> update."""
> > > > > > >>>> + if not isinstance(capabilities, list):
> > > > > > >>>> + capabilities = [capabilities]
> > > > > > >>>> self.last_capabilities = capabilities
> > > > > > >>>>
> > > > > > >>>> @periodic_task
> > > > > > >>>> @@ -224,5 +226,8 @@ class
> > > > > > >>>> SchedulerDependentManager(Manager):
> > > > > > >>>> """Pass data back to the scheduler at a periodic
> > > > > > >>>> interval."""
> > > > > > >>>> if self.last_capabilities:
> > > > > > >>>> LOG.debug(_('Notifying Schedulers of capabilities
> > > > > > >>>> ...'))
> > > > > > >>>> -
> > > > > > >>>> self.scheduler_rpcapi.update_service_capabilities(context,
> > > > > > >>>> - self.service_name, self.host,
> > > > > > >>>> self.last_capabilities)
> > > > > > >>>> + for capability_item in self.last_capabilities:
> > > > > > >>>> + name = capability_item.get('service_name',
> > > > > > >>>> self.service_name)
> > > > > > >>>> + host = capability_item.get('host', self.host)
> > > > > > >>>> +
> > > > > > >>>> self.scheduler_rpcapi.update_service_capabilities(context,
> > > > > > >>>> + name, host, capability_item)
> > > > > > >>>>
> > > > > > >>>> On Aug 21, 2012, at 1:28 PM, David Kang
> > > > > > >>>> <dkang@xxxxxxx> wrote:
> > > > > > >>>>
> > > > > > >>>>>
> > > > > > >>>>> Hi Vish,
> > > > > > >>>>>
> > > > > > >>>>> We are trying to change our code according to your
> > > > > > >>>>> comment.
> > > > > > >>>>> I want to ask a question.
> > > > > > >>>>>
> > > > > > >>>>>>>> a) modify driver.get_host_stats to be able to
> > > > > > >>>>>>>> return a list
> > > > > > >>>>>>>> of
> > > > > > >>>>>>>> host
> > > > > > >>>>>>>> stats instead of just one. Report the whole list
> > > > > > >>>>>>>> back to the
> > > > > > >>>>>>>> scheduler. We could modify the receiving end to
> > > > > > >>>>>>>> accept a list
> > > > > > >>>>>>>> as
> > > > > > >>>>>>>> well
> > > > > > >>>>>>>> or just make multiple calls to
> > > > > > >>>>>>>> self.update_service_capabilities(capabilities)
> > > > > > >>>>>
> > > > > > >>>>> Modifying driver.get_host_stats to return a list of
> > > > > > >>>>> host stats
> > > > > > >>>>> is
> > > > > > >>>>> easy.
> > > > > > >>>>> Calling muliple calls to
> > > > > > >>>>> self.update_service_capabilities(capabilities)
> > > > > > >>>>> doesn't seem to
> > > > > > >>>>> work,
> > > > > > >>>>> because 'capabilities' is overwritten each time.
> > > > > > >>>>>
> > > > > > >>>>> Modifying the receiving end to accept a list seems
> > > > > > >>>>> to be easy.
> > > > > > >>>>> However, 'capabilities' is assumed to be dictionary
> > > > > > >>>>> by all other
> > > > > > >>>>> scheduler routines,
> > > > > > >>>>> it looks like that we have to change all of them to
> > > > > > >>>>> handle
> > > > > > >>>>> 'capability' as a list of dictionary.
> > > > > > >>>>>
> > > > > > >>>>> If my understanding is correct, it would affect
> > > > > > >>>>> many parts of
> > > > > > >>>>> the
> > > > > > >>>>> scheduler.
> > > > > > >>>>> Is it what you recommended?
> > > > > > >>>>>
> > > > > > >>>>> Thanks,
> > > > > > >>>>> David
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> ----- Original Message -----
> > > > > > >>>>>> This was an immediate goal, the bare-metal
> > > > > > >>>>>> nova-compute node
> > > > > > >>>>>> could
> > > > > > >>>>>> keep an internal database, but report capabilities
> > > > > > >>>>>> through nova
> > > > > > >>>>>> in
> > > > > > >>>>>> the
> > > > > > >>>>>> common way with the changes below. Then the
> > > > > > >>>>>> scheduler wouldn't
> > > > > > >>>>>> need
> > > > > > >>>>>> access to the bare metal database at all.
> > > > > > >>>>>>
> > > > > > >>>>>> On Aug 15, 2012, at 4:23 PM, David Kang
> > > > > > >>>>>> <dkang@xxxxxxx> wrote:
> > > > > > >>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>> Hi Vish,
> > > > > > >>>>>>>
> > > > > > >>>>>>> Is this discussion for long-term goal or for this
> > > > > > >>>>>>> Folsom
> > > > > > >>>>>>> release?
> > > > > > >>>>>>>
> > > > > > >>>>>>> We still believe that bare-metal database is
> > > > > > >>>>>>> needed
> > > > > > >>>>>>> because there is not an automated way how
> > > > > > >>>>>>> bare-metal nodes
> > > > > > >>>>>>> report
> > > > > > >>>>>>> their capabilities
> > > > > > >>>>>>> to their bare-metal nova-compute node.
> > > > > > >>>>>>>
> > > > > > >>>>>>> Thanks,
> > > > > > >>>>>>> David
> > > > > > >>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> I am interested in finding a solution that
> > > > > > >>>>>>>> enables bare-metal
> > > > > > >>>>>>>> and
> > > > > > >>>>>>>> virtualized requests to be serviced through the
> > > > > > >>>>>>>> same
> > > > > > >>>>>>>> scheduler
> > > > > > >>>>>>>> where
> > > > > > >>>>>>>> the compute_nodes table has a full view of
> > > > > > >>>>>>>> schedulable
> > > > > > >>>>>>>> resources.
> > > > > > >>>>>>>> This
> > > > > > >>>>>>>> would seem to simplify the end-to-end flow while
> > > > > > >>>>>>>> opening up
> > > > > > >>>>>>>> some
> > > > > > >>>>>>>> additional use cases (e.g. dynamic allocation of
> > > > > > >>>>>>>> a node from
> > > > > > >>>>>>>> bare-metal to hypervisor and back).
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> One approach would be to have a proxy running a
> > > > > > >>>>>>>> single
> > > > > > >>>>>>>> nova-compute
> > > > > > >>>>>>>> daemon fronting the bare-metal nodes . That
> > > > > > >>>>>>>> nova-compute
> > > > > > >>>>>>>> daemon
> > > > > > >>>>>>>> would
> > > > > > >>>>>>>> report up many HostState objects (1 per
> > > > > > >>>>>>>> bare-metal node) to
> > > > > > >>>>>>>> become
> > > > > > >>>>>>>> entries in the compute_nodes table and accessible
> > > > > > >>>>>>>> through the
> > > > > > >>>>>>>> scheduler HostManager object.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> The HostState object would set cpu_info, vcpus,
> > > > > > >>>>>>>> member_mb and
> > > > > > >>>>>>>> local_gb
> > > > > > >>>>>>>> values to be used for scheduling with the
> > > > > > >>>>>>>> hypervisor_host
> > > > > > >>>>>>>> field
> > > > > > >>>>>>>> holding the bare-metal machine address (e.g. for
> > > > > > >>>>>>>> IPMI based
> > > > > > >>>>>>>> commands)
> > > > > > >>>>>>>> and hypervisor_type = NONE. The bare-metal
> > > > > > >>>>>>>> Flavors are
> > > > > > >>>>>>>> created
> > > > > > >>>>>>>> with
> > > > > > >>>>>>>> an
> > > > > > >>>>>>>> extra_spec of hypervisor_type= NONE and the
> > > > > > >>>>>>>> corresponding
> > > > > > >>>>>>>> compute_capabilities_filter would reduce the
> > > > > > >>>>>>>> available hosts
> > > > > > >>>>>>>> to
> > > > > > >>>>>>>> those
> > > > > > >>>>>>>> bare_metal nodes. The scheduler would need to
> > > > > > >>>>>>>> understand that
> > > > > > >>>>>>>> hypervisor_type = NONE means you need an exact
> > > > > > >>>>>>>> fit (or
> > > > > > >>>>>>>> best-fit)
> > > > > > >>>>>>>> host
> > > > > > >>>>>>>> vs weighting them (perhaps through the
> > > > > > >>>>>>>> multi-scheduler). The
> > > > > > >>>>>>>> scheduler
> > > > > > >>>>>>>> would cast out the message to the
> > > > > > >>>>>>>> <topic>.<service-hostname>
> > > > > > >>>>>>>> (code
> > > > > > >>>>>>>> today uses the HostState hostname), with the
> > > > > > >>>>>>>> compute driver
> > > > > > >>>>>>>> having
> > > > > > >>>>>>>> to
> > > > > > >>>>>>>> understand if it must be serviced elsewhere (but
> > > > > > >>>>>>>> does not
> > > > > > >>>>>>>> break
> > > > > > >>>>>>>> any
> > > > > > >>>>>>>> existing implementations since it is 1 to 1).
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Does this solution seem workable? Anything I
> > > > > > >>>>>>>> missed?
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> The bare metal driver already is proxying for the
> > > > > > >>>>>>>> other nodes
> > > > > > >>>>>>>> so
> > > > > > >>>>>>>> it
> > > > > > >>>>>>>> sounds like we need a couple of things to make
> > > > > > >>>>>>>> this happen:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> a) modify driver.get_host_stats to be able to
> > > > > > >>>>>>>> return a list
> > > > > > >>>>>>>> of
> > > > > > >>>>>>>> host
> > > > > > >>>>>>>> stats instead of just one. Report the whole list
> > > > > > >>>>>>>> back to the
> > > > > > >>>>>>>> scheduler. We could modify the receiving end to
> > > > > > >>>>>>>> accept a list
> > > > > > >>>>>>>> as
> > > > > > >>>>>>>> well
> > > > > > >>>>>>>> or just make multiple calls to
> > > > > > >>>>>>>> self.update_service_capabilities(capabilities)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> b) make a few minor changes to the scheduler to
> > > > > > >>>>>>>> make sure
> > > > > > >>>>>>>> filtering
> > > > > > >>>>>>>> still works. Note the changes here may be very
> > > > > > >>>>>>>> helpful:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> https://review.openstack.org/10327
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> c) we have to make sure that instances launched
> > > > > > >>>>>>>> on those
> > > > > > >>>>>>>> nodes
> > > > > > >>>>>>>> take
> > > > > > >>>>>>>> up
> > > > > > >>>>>>>> the entire host state somehow. We could probably
> > > > > > >>>>>>>> do this by
> > > > > > >>>>>>>> making
> > > > > > >>>>>>>> sure that the instance_type ram, mb, gb etc.
> > > > > > >>>>>>>> matches what the
> > > > > > >>>>>>>> node
> > > > > > >>>>>>>> has, but we may want a new boolean field "used"
> > > > > > >>>>>>>> if those
> > > > > > >>>>>>>> aren't
> > > > > > >>>>>>>> sufficient.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> I This approach seems pretty good. We could
> > > > > > >>>>>>>> potentially get
> > > > > > >>>>>>>> rid
> > > > > > >>>>>>>> of
> > > > > > >>>>>>>> the
> > > > > > >>>>>>>> shared bare_metal_node table. I guess the only
> > > > > > >>>>>>>> other concern
> > > > > > >>>>>>>> is
> > > > > > >>>>>>>> how
> > > > > > >>>>>>>> you populate the capabilities that the bare metal
> > > > > > >>>>>>>> nodes are
> > > > > > >>>>>>>> reporting.
> > > > > > >>>>>>>> I guess an api extension that rpcs to a baremetal
> > > > > > >>>>>>>> node to add
> > > > > > >>>>>>>> the
> > > > > > >>>>>>>> node. Maybe someday this could be autogenerated
> > > > > > >>>>>>>> by the bare
> > > > > > >>>>>>>> metal
> > > > > > >>>>>>>> host
> > > > > > >>>>>>>> looking in its arp table for dhcp requests! :)
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Vish
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> _______________________________________________
> > > > > > >>>>>>>> OpenStack-dev mailing list
> > > > > > >>>>>>>> OpenStack-dev@xxxxxxxxxxxxxxxxxxx
> > > > > > >>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/
> > > openstack-dev
> > > > > > >>>>>>>
> > > > > > >>>>>>> _______________________________________________
> > > > > > >>>>>>> OpenStack-dev mailing list
> > > > > > >>>>>>> OpenStack-dev@xxxxxxxxxxxxxxxxxxx
> > > > > > >>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/
> > > openstack-dev
> > > > > > >>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>> _______________________________________________
> > > > > > >>>>>> OpenStack-dev mailing list
> > > > > > >>>>>> OpenStack-dev@xxxxxxxxxxxxxxxxxxx
> > > > > > >>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/
> > > openstack-dev
> > > > > > >>>>>
> > > > > > >>>>> _______________________________________________
> > > > > > >>>>> OpenStack-dev mailing list
> > > > > > >>>>> OpenStack-dev@xxxxxxxxxxxxxxxxxxx
> > > > > > >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>> _______________________________________________
> > > > > > >>>> OpenStack-dev mailing list
> > > > > > >>>> OpenStack-dev@xxxxxxxxxxxxxxxxxxx
> > > > > > >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > > > >
> > > > > _______________________________________________
> > > > > Mailing list: https://launchpad.net/~openstack
> > > > > Post to : openstack@xxxxxxxxxxxxxxxxxxx
> > > > > Unsubscribe : https://launchpad.net/~openstack
> > > > > More help : https://help.launchpad.net/ListHelp
> > > > >
> > > >
> > > > Michael
> > > >
> > > > -------------------------------------------------
> > > > Michael Fork
> > > > Cloud Architect, Emerging Solutions
> > > > IBM Systems & Technology Group
> > > >
> > > >
> > > > _______________________________________________
> > > > Mailing list: https://launchpad.net/~openstack
> > > > Post to : openstack@xxxxxxxxxxxxxxxxxxx
> > > > Unsubscribe : https://launchpad.net/~openstack
> > > > More help : https://help.launchpad.net/ListHelp
> > > >
> > >
> >
> > Michael
> >
> > -------------------------------------------------
> > Michael Fork
> > Cloud Architect, Emerging Solutions
> > IBM Systems & Technology Group
> >
>
>
> --
> 日本仮想化技術株式会社(http://VirtualTech.jp)
> 技術部 開発課 課長 野津 新(notsu@xxxxxxxxxxxxxx)
>
> 〒150-0002 東京都渋谷区渋谷1-8-1 第3西青山ビル 8F
> TEL:03-6419-7841 FAX:03-5774-9462
References