← Back to team overview

fuel-dev team mailing list archive

Re: Stop openstack patching feature

 

Evgeniy is right. For non-critical roles we continue deploy. List of
critical roles for HA:
https://github.com/stackforge/fuel-web/blob/master/nailgun/nailgun/orchestrator/deployment_serializers.py#L340


On Thu, Sep 11, 2014 at 4:02 PM, Evgeniy L <eli@xxxxxxxxxxxx> wrote:

> Hi,
>
> >> Also, let's think and work on possible failures. What if Fuel Master
> node goes off during patching? What is going to be affected? How we can
> complete patching when Fuel Master comes back online?
>
> The question can be summarised as "What if you kill orchestrator during
> the deployment?"
> In this case user will get hung progress bar on UI until he removes task
> from nailgun.
> And I'm not sure if after that he will be able to continue deployment
> without additional changes in db.
> Actually the same questions related not only to patching, but to every
> task which we run under orchestrator.
> The reason for this is our architecture, orchestrator was designed as a
> worker without persistent state.
> But you need to keep somewhere the state in order to complete task after
> failure.
> As far as I understand Mistral can help as with this issue.
>
> >> Or compute node under patching breaks for some reason (e.g. disk
> issues or memory), how would it affect the patching process? How we can
> safely continue patching of other nodes?
>
> How it works now, Vladimir Sharshov, correct me if I'm wrong.
> We use the same strategy as for deployment.
>
> Error during primary-controller patching - fail whole patching process
> Error during patching of other roles -  continue patching process
>
> And I'm not sure if current strategy is wrong or right.
> On the one hand we shouldn't leave user's env in half patched state.
> On the other hand we can break whole user's cluster because we ignore the
> fact that several computes died during the patching procedure.
>
> Thanks,
>
>
> On Tue, Sep 9, 2014 at 12:15 PM, Mike Scherbakov <mscherbakov@xxxxxxxxxxxx
> > wrote:
>
>> Folks,
>> I was the one who initially requested this. I thought it's going to be
>> pretty similar to Stop Deployment. I becomes obvious, that it is not.
>>
>> I'm fine if we have it in API. Though I think what is much more important
>> here is an ability for the user to choose a few hosts for patching first,
>> in order to check how patching would work on a very small part of the
>> cluster. Ideally we would even move workloads to other nodes before doing
>> patching. We should disable scheduling of workloads for sure for these
>> experimental hosts.
>> Then user can run patching against these nodes, and see how it goes. If
>> all goes fine, patching can be applied to the rest of the environment. I do
>> not think though that we should do all, let's say 100 nodes, at once. This
>> sounds dangerous to me. I think we would need to come up with some less
>> dangerous scenario.
>>
>> Also, let's think and work on possible failures. What if Fuel Master node
>> goes off during patching? What is going to be affected? How we can complete
>> patching when Fuel Master comes back online?
>>
>> Or compute node under patching breaks for some reason (e.g. disk issues
>> or memory), how would it affect the patching process? How we can safely
>> continue patching of other nodes?
>>
>> Thanks,
>>
>> On Tue, Sep 9, 2014 at 12:08 PM, Vladimir Kuklin <vkuklin@xxxxxxxxxxxx>
>> wrote:
>>
>>> Sorry again. Look 2 messages below, please.
>>>  09 сент. 2014 г. 12:06 пользователь "Vladimir Kuklin" <
>>> vkuklin@xxxxxxxxxxxx> написал:
>>>
>>>> Sorry, hit reply instead of replyall.
>>>> 09 сент. 2014 г. 12:05 пользователь "Vladimir Kuklin" <
>>>> vkuklin@xxxxxxxxxxxx> написал:
>>>>
>>>>> +1
>>>>>
>>>>> Also, I think, we should add stop patching at least to api in order to
>>>>> allow advanced users and service team to do what they want.
>>>>> 09 сент. 2014 г. 12:02 пользователь "Igor Kalnitsky" <
>>>>> ikalnitsky@xxxxxxxxxxxx> написал:
>>>>>
>>>>> What we should to do with nodes in case of interrupt patching? I think
>>>>>> we need to mark them for re-deployment, since nodes' state may be
>>>>>> broken.
>>>>>>
>>>>>> Any opinion?
>>>>>>
>>>>>> - Igor
>>>>>>
>>>>>> On Mon, Sep 8, 2014 at 3:28 PM, Evgeniy L <eli@xxxxxxxxxxxx> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > We were working on implementation of experimental feature
>>>>>> > where user could interrupt openstack patching procedure [1].
>>>>>> >
>>>>>> > It's not as easy to implement as we thought it would be.
>>>>>> > Current stop deployment mechanism [2] stops puppet, erases
>>>>>> > nodes and reboots them into bootstrap. It's ok for stop
>>>>>> > deployment, but it's not ok for patching, because user
>>>>>> > can loose his data. We can rewrite some logic in nailgun
>>>>>> > and in orchestrator to stop puppet and not to erase nodes.
>>>>>> > But I'm not sure if it works correctly because such use
>>>>>> > case wasn't tested. And I can see the problems like
>>>>>> > yum/apt-get locks cleaning after puppet interruption.
>>>>>> >
>>>>>> > As result I have several questions:
>>>>>> > 1. should we try to make it work for the current release?
>>>>>> > 2. if we shouldn't, will we need this feature for the future
>>>>>> >     releases? Definitely additional design and research is
>>>>>> >     required.
>>>>>> >
>>>>>> > [1] https://bugs.launchpad.net/fuel/+bug/1364907
>>>>>> > [2]
>>>>>> >
>>>>>> https://github.com/stackforge/fuel-astute/blob/b622d9b36dbdd1e03b282b9ee5b7435ba649e711/lib/astute/server/dispatcher.rb#L163-L164
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Mailing list: https://launchpad.net/~fuel-dev
>>>>>> > Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>> > Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>> > More help   : https://help.launchpad.net/ListHelp
>>>>>> >
>>>>>>
>>>>>> --
>>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>
>>> --
>>> Mailing list: https://launchpad.net/~fuel-dev
>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>>
>> --
>> Mike Scherbakov
>> #mihgen
>>
>>
>> --
>> Mailing list: https://launchpad.net/~fuel-dev
>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~fuel-dev
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
> --
> Mailing list: https://launchpad.net/~fuel-dev
> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~fuel-dev
> More help   : https://help.launchpad.net/ListHelp
>
>

References