← Back to team overview

fuel-dev team mailing list archive

Re: Stop openstack patching feature

 

I don't think that we should implement this feature even
in api, because in this case user will be able to interrupt
patching via cli, I think it's really risky to provide such feature
especially if we know that user can loose his production
nodes.

My suggestion is to remove the ticket [1] from 5.1 or set
it as won't fix.

[1] https://bugs.launchpad.net/fuel/+bug/1364907

On Tue, Sep 9, 2014 at 1:44 PM, <bdobrelia@xxxxxxxxxxxx> wrote:

>  Perhaps, some ideas could be taken from [0] ([1])
> Note, that the linked full spec doc [1] status is rather a brainstorming
> discussion than the spec ready for implementation.
> I strongly believe we should follow the suggested concepts (finite-machine
> states in Nailgun DB, running in HA mode, of cause) it in order to track
> offline / interrupted statuses for nodes (including the master node) as
> well.
>
> [0]
> https://blueprints.launchpad.net/fuel/+spec/nailgun-unified-object-model
> [1] https://etherpad.openstack.org/p/nailgun-unified-object-model
>
> Regards,
> Bogdan Dobrelya.
>
> Sent from Windows Mail
>
> *From:* Mike Scherbakov <mscherbakov@xxxxxxxxxxxx>
> *Sent:* ‎Tuesday‎, ‎September‎ ‎9‎, ‎2014 ‎10‎:‎15‎ ‎AM
> *To:* Vladimir Kuklin <vkuklin@xxxxxxxxxxxx>
> *Cc:* Igor Kalnitsky <ikalnitsky@xxxxxxxxxxxx>, fuel-dev
> <fuel-dev@xxxxxxxxxxxxxxxxxxx>
>
> Folks,
> I was the one who initially requested this. I thought it's going to be
> pretty similar to Stop Deployment. I becomes obvious, that it is not.
>
> I'm fine if we have it in API. Though I think what is much more important
> here is an ability for the user to choose a few hosts for patching first,
> in order to check how patching would work on a very small part of the
> cluster. Ideally we would even move workloads to other nodes before doing
> patching. We should disable scheduling of workloads for sure for these
> experimental hosts.
> Then user can run patching against these nodes, and see how it goes. If
> all goes fine, patching can be applied to the rest of the environment. I do
> not think though that we should do all, let's say 100 nodes, at once. This
> sounds dangerous to me. I think we would need to come up with some less
> dangerous scenario.
>
> Also, let's think and work on possible failures. What if Fuel Master node
> goes off during patching? What is going to be affected? How we can complete
> patching when Fuel Master comes back online?
>
> Or compute node under patching breaks for some reason (e.g. disk issues or
> memory), how would it affect the patching process? How we can safely
> continue patching of other nodes?
>
> Thanks,
>
> On Tue, Sep 9, 2014 at 12:08 PM, Vladimir Kuklin <vkuklin@xxxxxxxxxxxx>
> wrote:
>
>> Sorry again. Look 2 messages below, please.
>>  09 сент. 2014 г. 12:06 пользователь "Vladimir Kuklin" <
>> vkuklin@xxxxxxxxxxxx> написал:
>>
>>> Sorry, hit reply instead of replyall.
>>> 09 сент. 2014 г. 12:05 пользователь "Vladimir Kuklin" <
>>> vkuklin@xxxxxxxxxxxx> написал:
>>>
>>>> +1
>>>>
>>>> Also, I think, we should add stop patching at least to api in order to
>>>> allow advanced users and service team to do what they want.
>>>> 09 сент. 2014 г. 12:02 пользователь "Igor Kalnitsky" <
>>>> ikalnitsky@xxxxxxxxxxxx> написал:
>>>>
>>>> What we should to do with nodes in case of interrupt patching? I think
>>>>> we need to mark them for re-deployment, since nodes' state may be
>>>>> broken.
>>>>>
>>>>> Any opinion?
>>>>>
>>>>> - Igor
>>>>>
>>>>> On Mon, Sep 8, 2014 at 3:28 PM, Evgeniy L <eli@xxxxxxxxxxxx> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > We were working on implementation of experimental feature
>>>>> > where user could interrupt openstack patching procedure [1].
>>>>> >
>>>>> > It's not as easy to implement as we thought it would be.
>>>>> > Current stop deployment mechanism [2] stops puppet, erases
>>>>> > nodes and reboots them into bootstrap. It's ok for stop
>>>>> > deployment, but it's not ok for patching, because user
>>>>> > can loose his data. We can rewrite some logic in nailgun
>>>>> > and in orchestrator to stop puppet and not to erase nodes.
>>>>> > But I'm not sure if it works correctly because such use
>>>>> > case wasn't tested. And I can see the problems like
>>>>> > yum/apt-get locks cleaning after puppet interruption.
>>>>> >
>>>>> > As result I have several questions:
>>>>> > 1. should we try to make it work for the current release?
>>>>> > 2. if we shouldn't, will we need this feature for the future
>>>>> >     releases? Definitely additional design and research is
>>>>> >     required.
>>>>> >
>>>>> > [1] https://bugs.launchpad.net/fuel/+bug/1364907
>>>>> > [2]
>>>>> >
>>>>> https://github.com/stackforge/fuel-astute/blob/b622d9b36dbdd1e03b282b9ee5b7435ba649e711/lib/astute/server/dispatcher.rb#L163-L164
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Mailing list: https://launchpad.net/~fuel-dev
>>>>> > Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>> > Unsubscribe : https://launchpad.net/~fuel-dev
>>>>> > More help   : https://help.launchpad.net/ListHelp
>>>>> >
>>>>>
>>>>> --
>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>
>> --
>> Mailing list: https://launchpad.net/~fuel-dev
>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~fuel-dev
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
> Mike Scherbakov
> #mihgen
>
>
> --
> Mailing list: https://launchpad.net/~fuel-dev
> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~fuel-dev
> More help   : https://help.launchpad.net/ListHelp
>
>

Follow ups

References