← Back to team overview

fuel-dev team mailing list archive

Re: Nodes discovering mechanism in nailgun (nailgun agent)

 

Ok, it was AMQP vs HTTP.

About async worker:

There is Openstack officially supported implementation of REST which is
Pecan+WSME and whether it is running under threaded worker or under reactor
depends on your wishes, but I doubt this is the place where we need to
think about async worker. Ironic agent will mostly run deferred tasks which
need to be forked (thread or separate process).

Vladimir Kozhukalov


On Thu, Mar 20, 2014 at 3:33 PM, Vladimir Kozhukalov <
vkozhukalov@xxxxxxxxxxxx> wrote:

> Vladimir,
>
> The reasons are following:
>
> 0) HTTP does not depend on any other intermediate service. ZeroMQ is the
> only service independent available implementation of AMQP but it is not
> even ready to use implementation, it is rather a framework and it is much
> more complicated to use it than a plain HTTP which is programming language
> independent as well.
> 1) If you want your nodes (remote agents) to be able to receive requests
> they need to keep their connections to AMQP service alive, but, as far as I
> know, for example, RabbitMQ does not work properly if we deal with
> thousands of nodes. HTTP seems to be much more scalable solution.
> 2) It is easy to organize secure transport using HTTPS not depending on
> implementation.
>
>
>
> Vladimir Kozhukalov
>
>
> On Thu, Mar 20, 2014 at 2:59 PM, Vladimir Kuklin <vkuklin@xxxxxxxxxxxx>wrote:
>
>> Guys
>>
>> Why not use AMQP for this purpose instead of REST API or some kind of
>> async worker if you want to leave REST API?
>>
>>
>> On Thu, Mar 20, 2014 at 2:28 PM, Oleg Gelbukh <ogelbukh@xxxxxxxxxxxx>wrote:
>>
>>> Vladimir,
>>>
>>> Thank you for extended answer, this is interesting information.
>>>
>>> --
>>> Best regards,
>>> Oleg Gelbukh
>>>
>>>
>>> On Thu, Mar 20, 2014 at 2:25 PM, Vladimir Kozhukalov <
>>> vkozhukalov@xxxxxxxxxxxx> wrote:
>>>
>>>> Oleg,
>>>>
>>>> As far as I know, nobody from Ironic core team does not think that
>>>> discovery is not in scope of Ironic. Just to be sure, we discussed this
>>>> topic yesterday with R. Prykhodchenko tete-a-tete and then with Devananda
>>>> and others in #openstack-ironic. This blueprint
>>>> https://blueprints.launchpad.net/ironic/+spec/discovery-ramdisk which
>>>> is about discovery has been postponded so as just to follow nova baremetal
>>>> driver compatibility. But it has not been canceled.
>>>>
>>>> And yes, it is supposed that user can have their own CMDB and they must
>>>> be able to add their nodes into Ironic via ir-api, but this fact does not
>>>> restrict Ironic's ability to discover nodes. Right now we (A. Gordeev and
>>>> I) are working together with some Rackspace guys on a generic pluggable
>>>> Ironic agent which is supposed to able to do a variety of tasks such as OS
>>>> provisioning, node discovering, firmware updates, RAID configuring, etc.
>>>> This agent is supposed to have REST API and to expose hardware info via
>>>> this API. Discovering will follow the flow:
>>>> 0) node boots via PXE and heartbeat url (where node sends it "I'am here
>>>> and alive" requests) is passed via kernel parameter,
>>>> 1) agent starts and sends "I'am here and alive" request,
>>>> 2) conductor sends hardware info request to agent REST API,
>>>>
>>>> Nick,
>>>>
>>>> It's not supposed (however it is possible) that thousands of nodes will
>>>> try to send http requests on nailgun API. List of discovered nodes and
>>>> their state could be gotten from Ironic API. But it, of course, does not
>>>> revoke the necessity of nailgun performance improvements.
>>>>
>>>> Vladimir Kozhukalov
>>>>
>>>>
>>>> On Thu, Mar 20, 2014 at 12:27 PM, Evgeniy L <eli@xxxxxxxxxxxx> wrote:
>>>>
>>>>> Agree that it's important to improve nailgun performance and use uWSGI
>>>>> server, but it will not solve the problem when thousands of nodes try to
>>>>> register in nailgun, we have to create a lot of objects (nodes, interfaces
>>>>> etc), for that we need to use separate service which will be able to
>>>>> retrieve data from nodes and send just several nodes at the same time to
>>>>> nailgun for registration.
>>>>>
>>>>>
>>>>> On Wed, Mar 19, 2014 at 4:06 PM, Nikolay Markov <nmarkov@xxxxxxxxxxxx>wrote:
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> The problem in current approach for discovering is in low performance
>>>>>> of Nailgun app and database interaction which is not really effective.
>>>>>> If we'll use the same code for registering new nodes in Nailgun DB and
>>>>>> keepalive - we will still be experiencing some issues with its
>>>>>> performance.
>>>>>>
>>>>>> I would start with these two steps without doing any serious changes:
>>>>>>
>>>>>> 1) Moving Nailgun from built-in Python server to Nginx+uWSGI (it's
>>>>>> performance is being tested with a help from Igor Shishkin right now
>>>>>> and uWSGI shows really good improvement).
>>>>>> 2) Refactoring and optimizing DB queries using joinedloads and indexes
>>>>>> and profiling code execution. Almost every fix possible here will be a
>>>>>> huge improvement of RPS, because right now we're overloading DB with
>>>>>> queries and some places really need code optimization.
>>>>>>
>>>>>> On Wed, Mar 19, 2014 at 2:20 PM, Evgeniy L <eli@xxxxxxxxxxxx> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > Let me describe main points of the document:
>>>>>> > 1. create a middleware service between nailgun and nodes (for
>>>>>> discovering
>>>>>> > and online/offline status monitoring)
>>>>>> > 2. remove from agents ability to make requests directly to nailgun,
>>>>>> instead
>>>>>> > we want to request data from nodes when we need it
>>>>>> >
>>>>>> > This approach is very similar to what Vladimir described. But in my
>>>>>> doc I
>>>>>> > described the solution with mcollective, because we already have it
>>>>>> and it
>>>>>> > works. In fact there can be any other transport.
>>>>>> >
>>>>>> > I have several question about Ironic solution:
>>>>>> > 1. when (roughly speaking) agent in Ironic will be ready?
>>>>>> > 2. do we want to make this system via mcollective and then replace
>>>>>> with http
>>>>>> > based solution from ironic?
>>>>>> > 3. how are you going to update data in nailgun if interface or disk
>>>>>> was
>>>>>> > added/removed to/from node?
>>>>>> >
>>>>>> > Thanks,
>>>>>> >
>>>>>> >
>>>>>> > On Wed, Mar 19, 2014 at 12:07 PM, Oleg Gelbukh <
>>>>>> ogelbukh@xxxxxxxxxxxx>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> Vladimir,
>>>>>> >>
>>>>>> >> I might be wrong, but I heard directly from Devananda that Ironic
>>>>>> don't
>>>>>> >> plan to have Discovery as a part of it's scope. Things might have
>>>>>> changed
>>>>>> >> since then (it was at HK summit), but general idea was that Ironic
>>>>>> won't
>>>>>> >> serve as hosts directory or CMDB, and nodes will be enrolled to it
>>>>>> from some
>>>>>> >> external source.
>>>>>> >>
>>>>>> >> However, I think it is natural that discovery capabilities should
>>>>>> be
>>>>>> >> supported by a unified agent used by Ironic and hypothetical
>>>>>> Discovery
>>>>>> >> service (e.g. Nailgun).
>>>>>> >>
>>>>>> >> --
>>>>>> >> Best regards,
>>>>>> >> Oleg
>>>>>> >>
>>>>>> >>
>>>>>> >> On Wed, Mar 19, 2014 at 11:49 AM, Vladimir Kozhukalov
>>>>>> >> <vkozhukalov@xxxxxxxxxxxx> wrote:
>>>>>> >>>
>>>>>> >>> My suggestion is to stop inventing discovering mechanism on our
>>>>>> own.
>>>>>> >>> Openstack is supposed to use Ironic for provisioning,
>>>>>> discovering, firmware
>>>>>> >>> updates, RAID configuring, power management. In Ironic project
>>>>>> there is a
>>>>>> >>> blueprint for utility ramdisk (it is similar to Fuel bootstrap)
>>>>>> >>> https://blueprints.launchpad.net/ironic/+spec/utility-ramdisk.
>>>>>> Our current
>>>>>> >>> activities in substituting Cobbler with Ironic include
>>>>>> contributing in
>>>>>> >>> python ironic agent
>>>>>> https://wiki.openstack.org/wiki/Ironic-python-agent. We
>>>>>> >>> discussed the general architecture of this agent and agreed that
>>>>>> it should
>>>>>> >>> expose REST API and every piece of its functionality needs to be
>>>>>> implemented
>>>>>> >>> as pluggable driver.
>>>>>> >>>
>>>>>> >>> Discovery flow could be implemented as a series of http requests
>>>>>> to these
>>>>>> >>> agents running on nodes. Discovery will be just a part of full
>>>>>> functionality
>>>>>> >>> of these agents. The list of IP addresses where we need to send
>>>>>> discovery
>>>>>> >>> requests could be known from the list of  leased addresses from
>>>>>> DHCP server.
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> Vladimir Kozhukalov
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> On Tue, Mar 18, 2014 at 4:25 PM, Mike Scherbakov
>>>>>> >>> <mscherbakov@xxxxxxxxxxxx> wrote:
>>>>>> >>>>
>>>>>> >>>> Looks like it's still open question.
>>>>>> >>>> Andrew, can you respond please on Eugene's question in the doc?
>>>>>> >>>>
>>>>>> >>>> My personal opinion: refactor the current approach in the way so
>>>>>> it's
>>>>>> >>>> more performant (reduce amount of data), as it will be required
>>>>>> anyway. See
>>>>>> >>>> how it works. If we still have issues, go further, perhaps with
>>>>>> >>>> re-implementation to use polling of servers instead, whether
>>>>>> using tiny REST
>>>>>> >>>> services on nodes or AMQP or anything else.
>>>>>> >>>>
>>>>>> >>>> Basically, let's eliminate issues step by step.
>>>>>> >>>> Thanks,
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> On Mon, Mar 3, 2014 at 12:46 PM, Evgeniy L <eli@xxxxxxxxxxxx>
>>>>>> wrote:
>>>>>> >>>>>
>>>>>> >>>>> Hi,
>>>>>> >>>>>
>>>>>> >>>>> We had a discussion about nailgun agent which some of us want to
>>>>>> >>>>> rewrite in python, I don't think that we need to rewrite
>>>>>> nailgun agent
>>>>>> >>>>> one-to-one to solve a single problem.
>>>>>> >>>>> I tried to describe problems which we have and how we can solve
>>>>>> them.
>>>>>> >>>>> [0]
>>>>>> >>>>>
>>>>>> >>>>> Comments are welcome.
>>>>>> >>>>>
>>>>>> >>>>> [0]
>>>>>> >>>>>
>>>>>> https://docs.google.com/a/mirantis.com/document/d/1zqV58LZBLQ-0gllb_i3MyIKIMj-Qx8ELJohjcWs459s/edit#
>>>>>> >>>>>
>>>>>> >>>>> --
>>>>>> >>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>> >>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>> >>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>> >>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>> >>>>>
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> --
>>>>>> >>>> Mike Scherbakov
>>>>>> >>>> #mihgen
>>>>>> >>>>
>>>>>> >>>> --
>>>>>> >>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>> >>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>> >>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>> >>>> More help   : https://help.launchpad.net/ListHelp
>>>>>> >>>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> --
>>>>>> >>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>> >>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>> >>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>> >>> More help   : https://help.launchpad.net/ListHelp
>>>>>> >>>
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Mailing list: https://launchpad.net/~fuel-dev
>>>>>> > Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>> > Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>> > More help   : https://help.launchpad.net/ListHelp
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Nick Markov
>>>>>>
>>>>>> --
>>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>>
>>>>
>>>> --
>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>>
>>>
>>> --
>>> Mailing list: https://launchpad.net/~fuel-dev
>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>>
>> --
>> Yours Faithfully,
>> Vladimir Kuklin,
>> Senior Deployment Engineer,
>> Mirantis, Inc.
>> +7 (495) 640-49-04
>> +7 (926) 702-39-68
>> Skype kuklinvv
>> 45bk3, Vorontsovskaya Str.
>> Moscow, Russia,
>> www.mirantis.com <http://www.mirantis.ru/>
>> www.mirantis.ru
>> vkuklin@xxxxxxxxxxxx
>>
>
>

References