fuel-dev team mailing list archive
-
fuel-dev team
-
Mailing list archive
-
Message #01532
Re: Stop openstack patching feature
Moved to 6.0 with "Opinion".
Please find my questions in previous email which are left without answer:
how good we are (plan to be if we are not) in terms of handling different
sorts of failures during patching.
On Tue, Sep 9, 2014 at 9:07 PM, David Easter <deaster@xxxxxxxxxxxx> wrote:
> Deleting a node and forcing it to be re-deployed from scratch during an
> update would not be a positive user experience. I’d rather explain to a
> customer that while new deployments can be stopped, updating can’t (but it
> can be rolled back). This would be preferable to explaining that stopping
> the update would result in having to redploy the entire cloud again.
>
> +1 for reducing https://bugs.launchpad.net/fuel/+bug/1364907 down from
> critical status. I think it reasonable to defer it rather than closing it
> as will not fix, however, to give us a chance to think about if we can
> solve it in a future release.
>
> Thanks,
>
> - David J. Easter
> Director of Product Management, Mirantis, Inc.
>
> From: Evgeniy L <eli@xxxxxxxxxxxx>
> Date: Tuesday, September 9, 2014 at 9:14 AM
> To: Bogdan Dobrelya <bdobrelia@xxxxxxxxxxxx>
> Cc: Igor Kalnitsky <ikalnitsky@xxxxxxxxxxxx>, fuel-dev <
> fuel-dev@xxxxxxxxxxxxxxxxxxx>
> Subject: Re: [Fuel-dev] Stop openstack patching feature
>
> I don't think that we should implement this feature even
> in api, because in this case user will be able to interrupt
> patching via cli, I think it's really risky to provide such feature
> especially if we know that user can loose his production
> nodes.
>
> My suggestion is to remove the ticket [1] from 5.1 or set
> it as won't fix.
>
> [1] https://bugs.launchpad.net/fuel/+bug/1364907
>
> On Tue, Sep 9, 2014 at 1:44 PM, <bdobrelia@xxxxxxxxxxxx> wrote:
>
>> Perhaps, some ideas could be taken from [0] ([1])
>> Note, that the linked full spec doc [1] status is rather a brainstorming
>> discussion than the spec ready for implementation.
>> I strongly believe we should follow the suggested concepts
>> (finite-machine states in Nailgun DB, running in HA mode, of cause) it in
>> order to track offline / interrupted statuses for nodes (including the
>> master node) as well.
>>
>> [0]
>> https://blueprints.launchpad.net/fuel/+spec/nailgun-unified-object-model
>> [1] https://etherpad.openstack.org/p/nailgun-unified-object-model
>>
>> Regards,
>> Bogdan Dobrelya.
>>
>> Sent from Windows Mail
>>
>> *From:* Mike Scherbakov <mscherbakov@xxxxxxxxxxxx>
>> *Sent:* ?Tuesday?, ?September? ?9?, ?2014 ?10?:?15? ?AM
>>
>> *To:* Vladimir Kuklin <vkuklin@xxxxxxxxxxxx>
>> *Cc:* Igor Kalnitsky <ikalnitsky@xxxxxxxxxxxx>, fuel-dev
>> <fuel-dev@xxxxxxxxxxxxxxxxxxx>
>>
>> Folks,
>> I was the one who initially requested this. I thought it's going to be
>> pretty similar to Stop Deployment. I becomes obvious, that it is not.
>>
>> I'm fine if we have it in API. Though I think what is much more important
>> here is an ability for the user to choose a few hosts for patching first,
>> in order to check how patching would work on a very small part of the
>> cluster. Ideally we would even move workloads to other nodes before doing
>> patching. We should disable scheduling of workloads for sure for these
>> experimental hosts.
>> Then user can run patching against these nodes, and see how it goes. If
>> all goes fine, patching can be applied to the rest of the environment. I do
>> not think though that we should do all, let's say 100 nodes, at once. This
>> sounds dangerous to me. I think we would need to come up with some less
>> dangerous scenario.
>>
>> Also, let's think and work on possible failures. What if Fuel Master node
>> goes off during patching? What is going to be affected? How we can complete
>> patching when Fuel Master comes back online?
>>
>> Or compute node under patching breaks for some reason (e.g. disk issues
>> or memory), how would it affect the patching process? How we can safely
>> continue patching of other nodes?
>>
>> Thanks,
>>
>> On Tue, Sep 9, 2014 at 12:08 PM, Vladimir Kuklin <vkuklin@xxxxxxxxxxxx>
>> wrote:
>>
>>> Sorry again. Look 2 messages below, please.
>>> 09 сент. 2014 г. 12:06 пользователь "Vladimir Kuklin" <
>>> vkuklin@xxxxxxxxxxxx> написал:
>>>
>>>> Sorry, hit reply instead of replyall.
>>>> 09 сент. 2014 г. 12:05 пользователь "Vladimir Kuklin" <
>>>> vkuklin@xxxxxxxxxxxx> написал:
>>>>
>>>>> +1
>>>>>
>>>>> Also, I think, we should add stop patching at least to api in order to
>>>>> allow advanced users and service team to do what they want.
>>>>> 09 сент. 2014 г. 12:02 пользователь "Igor Kalnitsky" <
>>>>> ikalnitsky@xxxxxxxxxxxx> написал:
>>>>>
>>>>> What we should to do with nodes in case of interrupt patching? I think
>>>>>> we need to mark them for re-deployment, since nodes' state may be
>>>>>> broken.
>>>>>>
>>>>>> Any opinion?
>>>>>>
>>>>>> - Igor
>>>>>>
>>>>>> On Mon, Sep 8, 2014 at 3:28 PM, Evgeniy L <eli@xxxxxxxxxxxx> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > We were working on implementation of experimental feature
>>>>>> > where user could interrupt openstack patching procedure [1].
>>>>>> >
>>>>>> > It's not as easy to implement as we thought it would be.
>>>>>> > Current stop deployment mechanism [2] stops puppet, erases
>>>>>> > nodes and reboots them into bootstrap. It's ok for stop
>>>>>> > deployment, but it's not ok for patching, because user
>>>>>> > can loose his data. We can rewrite some logic in nailgun
>>>>>> > and in orchestrator to stop puppet and not to erase nodes.
>>>>>> > But I'm not sure if it works correctly because such use
>>>>>> > case wasn't tested. And I can see the problems like
>>>>>> > yum/apt-get locks cleaning after puppet interruption.
>>>>>> >
>>>>>> > As result I have several questions:
>>>>>> > 1. should we try to make it work for the current release?
>>>>>> > 2. if we shouldn't, will we need this feature for the future
>>>>>> > releases? Definitely additional design and research is
>>>>>> > required.
>>>>>> >
>>>>>> > [1] https://bugs.launchpad.net/fuel/+bug/1364907
>>>>>> > [2]
>>>>>> >
>>>>>> https://github.com/stackforge/fuel-astute/blob/b622d9b36dbdd1e03b282b9ee5b7435ba649e711/lib/astute/server/dispatcher.rb#L163-L164
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Mailing list: https://launchpad.net/~fuel-dev
>>>>>> > Post to : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>> > Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>> > More help : https://help.launchpad.net/ListHelp
>>>>>> >
>>>>>>
>>>>>> --
>>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>> Post to : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>> More help : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>
>>> --
>>> Mailing list: https://launchpad.net/~fuel-dev
>>> Post to : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>> More help : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>>
>> --
>> Mike Scherbakov
>> #mihgen
>>
>>
>> --
>> Mailing list: https://launchpad.net/~fuel-dev
>> Post to : fuel-dev@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~fuel-dev
>> More help : https://help.launchpad.net/ListHelp
>>
>>
> -- Mailing list: https://launchpad.net/~fuel-dev Post to :
> fuel-dev@xxxxxxxxxxxxxxxxxxx Unsubscribe : https://launchpad.net/~fuel-dev
> More help : https://help.launchpad.net/ListHelp
>
> --
> Mailing list: https://launchpad.net/~fuel-dev
> Post to : fuel-dev@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~fuel-dev
> More help : https://help.launchpad.net/ListHelp
>
>
--
Mike Scherbakov
#mihgen
References