openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #16128
Re: [openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)
I would investigate changing the capabilities to key off of something other than hostname. It looks from the table structure like compute_nodes could be have a many-to-one relationship with services. You would just have to use a little more than hostname. Perhaps (hostname, hypervisor_hostname) could be used to update the entry?
Vish
On Aug 24, 2012, at 11:23 AM, David Kang <dkang@xxxxxxx> wrote:
>
> Vish,
>
> I've tested your code and did more testing.
> There are a couple of problems.
> 1. host name should be unique. If not, any repetitive updates of new capabilities with the same host name are simply overwritten.
> 2. We cannot generate arbitrary host names on the fly.
> The scheduler (I tested filter scheduler) gets host names from db.
> So, if a host name is not in the 'services' table, it is not considered by the scheduler at all.
>
> So, to make your suggestions possible, nova-compute should register N different host names in 'services' table,
> and N corresponding entries in 'compute_nodes' table.
> Here is an example:
>
> mysql> select id, host, binary, topic, report_count, disabled, availability_zone from services;
> +----+-------------+----------------+-----------+--------------+----------+-------------------+
> | id | host | binary | topic | report_count | disabled | availability_zone |
> +----+-------------+----------------+-----------+--------------+----------+-------------------+
> | 1 | bespin101 | nova-scheduler | scheduler | 17145 | 0 | nova |
> | 2 | bespin101 | nova-network | network | 16819 | 0 | nova |
> | 3 | bespin101-0 | nova-compute | compute | 16405 | 0 | nova |
> | 4 | bespin101-1 | nova-compute | compute | 1 | 0 | nova |
> +----+-------------+----------------+-----------+--------------+----------+-------------------+
>
> mysql> select id, service_id, hypervisor_hostname from compute_nodes;
> +----+------------+------------------------+
> | id | service_id | hypervisor_hostname |
> +----+------------+------------------------+
> | 1 | 3 | bespin101.east.isi.edu |
> | 2 | 4 | bespin101.east.isi.edu |
> +----+------------+------------------------+
>
> Then, nova db (compute_nodes table) has entries of all bare-metal nodes.
> What do you think of this approach.
> Do you have any better approach?
>
> Thanks,
> David
>
>
>
> ----- Original Message -----
>> To elaborate, something the below. I'm not absolutely sure you need to
>> be able to set service_name and host, but this gives you the option to
>> do so if needed.
>>
>> iff --git a/nova/manager.py b/nova/manager.py
>> index c6711aa..c0f4669 100644
>> --- a/nova/manager.py
>> +++ b/nova/manager.py
>> @@ -217,6 +217,8 @@ class SchedulerDependentManager(Manager):
>>
>> def update_service_capabilities(self, capabilities):
>> """Remember these capabilities to send on next periodic update."""
>> + if not isinstance(capabilities, list):
>> + capabilities = [capabilities]
>> self.last_capabilities = capabilities
>>
>> @periodic_task
>> @@ -224,5 +226,8 @@ class SchedulerDependentManager(Manager):
>> """Pass data back to the scheduler at a periodic interval."""
>> if self.last_capabilities:
>> LOG.debug(_('Notifying Schedulers of capabilities ...'))
>> - self.scheduler_rpcapi.update_service_capabilities(context,
>> - self.service_name, self.host, self.last_capabilities)
>> + for capability_item in self.last_capabilities:
>> + name = capability_item.get('service_name', self.service_name)
>> + host = capability_item.get('host', self.host)
>> + self.scheduler_rpcapi.update_service_capabilities(context,
>> + name, host, capability_item)
>>
>> On Aug 21, 2012, at 1:28 PM, David Kang <dkang@xxxxxxx> wrote:
>>
>>>
>>> Hi Vish,
>>>
>>> We are trying to change our code according to your comment.
>>> I want to ask a question.
>>>
>>>>>> a) modify driver.get_host_stats to be able to return a list of
>>>>>> host
>>>>>> stats instead of just one. Report the whole list back to the
>>>>>> scheduler. We could modify the receiving end to accept a list as
>>>>>> well
>>>>>> or just make multiple calls to
>>>>>> self.update_service_capabilities(capabilities)
>>>
>>> Modifying driver.get_host_stats to return a list of host stats is
>>> easy.
>>> Calling muliple calls to
>>> self.update_service_capabilities(capabilities) doesn't seem to work,
>>> because 'capabilities' is overwritten each time.
>>>
>>> Modifying the receiving end to accept a list seems to be easy.
>>> However, 'capabilities' is assumed to be dictionary by all other
>>> scheduler routines,
>>> it looks like that we have to change all of them to handle
>>> 'capability' as a list of dictionary.
>>>
>>> If my understanding is correct, it would affect many parts of the
>>> scheduler.
>>> Is it what you recommended?
>>>
>>> Thanks,
>>> David
>>>
>>>
>>> ----- Original Message -----
>>>> This was an immediate goal, the bare-metal nova-compute node could
>>>> keep an internal database, but report capabilities through nova in
>>>> the
>>>> common way with the changes below. Then the scheduler wouldn't need
>>>> access to the bare metal database at all.
>>>>
>>>> On Aug 15, 2012, at 4:23 PM, David Kang <dkang@xxxxxxx> wrote:
>>>>
>>>>>
>>>>> Hi Vish,
>>>>>
>>>>> Is this discussion for long-term goal or for this Folsom release?
>>>>>
>>>>> We still believe that bare-metal database is needed
>>>>> because there is not an automated way how bare-metal nodes report
>>>>> their capabilities
>>>>> to their bare-metal nova-compute node.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>>
>>>>>> I am interested in finding a solution that enables bare-metal and
>>>>>> virtualized requests to be serviced through the same scheduler
>>>>>> where
>>>>>> the compute_nodes table has a full view of schedulable resources.
>>>>>> This
>>>>>> would seem to simplify the end-to-end flow while opening up some
>>>>>> additional use cases (e.g. dynamic allocation of a node from
>>>>>> bare-metal to hypervisor and back).
>>>>>>
>>>>>> One approach would be to have a proxy running a single
>>>>>> nova-compute
>>>>>> daemon fronting the bare-metal nodes . That nova-compute daemon
>>>>>> would
>>>>>> report up many HostState objects (1 per bare-metal node) to
>>>>>> become
>>>>>> entries in the compute_nodes table and accessible through the
>>>>>> scheduler HostManager object.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> The HostState object would set cpu_info, vcpus, member_mb and
>>>>>> local_gb
>>>>>> values to be used for scheduling with the hypervisor_host field
>>>>>> holding the bare-metal machine address (e.g. for IPMI based
>>>>>> commands)
>>>>>> and hypervisor_type = NONE. The bare-metal Flavors are created
>>>>>> with
>>>>>> an
>>>>>> extra_spec of hypervisor_type= NONE and the corresponding
>>>>>> compute_capabilities_filter would reduce the available hosts to
>>>>>> those
>>>>>> bare_metal nodes. The scheduler would need to understand that
>>>>>> hypervisor_type = NONE means you need an exact fit (or best-fit)
>>>>>> host
>>>>>> vs weighting them (perhaps through the multi-scheduler). The
>>>>>> scheduler
>>>>>> would cast out the message to the <topic>.<service-hostname>
>>>>>> (code
>>>>>> today uses the HostState hostname), with the compute driver
>>>>>> having
>>>>>> to
>>>>>> understand if it must be serviced elsewhere (but does not break
>>>>>> any
>>>>>> existing implementations since it is 1 to 1).
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Does this solution seem workable? Anything I missed?
>>>>>>
>>>>>> The bare metal driver already is proxying for the other nodes so
>>>>>> it
>>>>>> sounds like we need a couple of things to make this happen:
>>>>>>
>>>>>>
>>>>>> a) modify driver.get_host_stats to be able to return a list of
>>>>>> host
>>>>>> stats instead of just one. Report the whole list back to the
>>>>>> scheduler. We could modify the receiving end to accept a list as
>>>>>> well
>>>>>> or just make multiple calls to
>>>>>> self.update_service_capabilities(capabilities)
>>>>>>
>>>>>>
>>>>>> b) make a few minor changes to the scheduler to make sure
>>>>>> filtering
>>>>>> still works. Note the changes here may be very helpful:
>>>>>>
>>>>>>
>>>>>> https://review.openstack.org/10327
>>>>>>
>>>>>>
>>>>>> c) we have to make sure that instances launched on those nodes
>>>>>> take
>>>>>> up
>>>>>> the entire host state somehow. We could probably do this by
>>>>>> making
>>>>>> sure that the instance_type ram, mb, gb etc. matches what the
>>>>>> node
>>>>>> has, but we may want a new boolean field "used" if those aren't
>>>>>> sufficient.
>>>>>>
>>>>>>
>>>>>> I This approach seems pretty good. We could potentially get rid
>>>>>> of
>>>>>> the
>>>>>> shared bare_metal_node table. I guess the only other concern is
>>>>>> how
>>>>>> you populate the capabilities that the bare metal nodes are
>>>>>> reporting.
>>>>>> I guess an api extension that rpcs to a baremetal node to add the
>>>>>> node. Maybe someday this could be autogenerated by the bare metal
>>>>>> host
>>>>>> looking in its arp table for dhcp requests! :)
>>>>>>
>>>>>>
>>>>>> Vish
>>>>>>
>>>>>> _______________________________________________
>>>>>> OpenStack-dev mailing list
>>>>>> OpenStack-dev@xxxxxxxxxxxxxxxxxxx
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>> _______________________________________________
>>>>> OpenStack-dev mailing list
>>>>> OpenStack-dev@xxxxxxxxxxxxxxxxxxx
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>>
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> OpenStack-dev@xxxxxxxxxxxxxxxxxxx
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev@xxxxxxxxxxxxxxxxxxx
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev@xxxxxxxxxxxxxxxxxxx
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Follow ups
References