← Back to team overview

fuel-dev team mailing list archive

Re: Stop deployment concerns

 

I think we have consensus.  Here's the way I'd paraphrase it, so please
correct me if I'm wrong:

Customer starts deployment, for example with 3 controllers (HA), 10 compute
nodes and 5 cinder nodes.  During the deployment, 2 of the compute nodes
fail.  The customer does not want to wait for the entire deployment to
"finish", so he presses the Stop Deployment button.

At this point, the UI screens remain locked ­ I.e. configurations cannot be
changed.  The user can correct the issues on the nodes if they are HW or OS
related.  Once corrected, the user can click the Deploy Changes button and
Fuel will retry installing any node that did not deploy correctly.  Fuel
will not redeploy any nodes that successfully installed during the first
Deploy Changes effort.

If the user does want to make changes to the configuration (e.g. the disk
layout on one of the compute nodes), then the user will have to select the
Reset Environment button which will reset the environment to a state as if
the Deploy Changes had never been clicked.  The UI will be unlocked and all
previous choices will be retained.  The user can now make any changes to the
environment.  Once the changes are made, the user can click Deploy Changes
and Fuel will begin the deployment again from the beginning.


Does that cover two backlog stories properly?

Thanks,

-Dave Easter

From:  Roman Alekseenkov <ralekseenkov@xxxxxxxxxxxx>
Date:  Friday, November 22, 2013 3:53 AM
To:  Bogdan Dobrelya <bdobrelia@xxxxxxxxxxxx>
Cc:  Mike Scherbakov <mscherbakov@xxxxxxxxxxxx>, David Easter
<deaster@xxxxxxxxxxxx>, Evgeniy L <eli@xxxxxxxxxxxx>, Nikolay Markov
<nmarkov@xxxxxxxxxxxx>, "fuel-dev@xxxxxxxxxxxxxxxxxxx"
<fuel-dev@xxxxxxxxxxxxxxxxxxx>
Subject:  Re: Stop deployment concerns

David,
1. Do we have a consensus here? Can you drive it with the team to
completion?
2. On a separate note, I think we should schedule a call to go through all
the features and discuss requirements. To ensure that you and dev team are
on the same page.
Thanks,
Roman

On Friday, November 22, 2013, Bogdan Dobrelya  wrote:
>     
>  
> On 11/22/2013 11:16 AM, Mike Scherbakov wrote:
>  
>  
>>  
>>  
>> + fuel-dev
>>  
>> 
>>  
>>  We had a meeting on the topic yesterday. Research shows the following.
>> 
>>  
>>  
>> It would be great to be able to stop deployment at any moment, and then
>> continue with the redeployment only failed nodes. However:
>>  
>>  
>> * If network configuration is changed - environment will not be operational
>> after deployment
>>> * user may change net CIDRs, and without an additional functionality in Fuel
>>> it is not currently possible to reconfigure OpenStack (replace network
>>> information in OpenStack database)
>> * If some settings are changed - the same
>>> * such as passwords, etc. - for example, controllers are already deployed,
>>> and computes will get new information
>>  
>> So, we have come to the decision that resetting of the whole environment is
>> essential at the moment. We expect the following workflow:
>>  
>>  
>>  
>> 1. If it becomes obvious that the deployment will not finish with the
>> success, user goes to Actions tab and clicks on "Reset Environment" button.
>> 2. Environment changes the status to "Resetting"
>> 3. All settings on env become unlocked, and user is allowed to change
>> anything. Settings stay the same as when user clicked "Deploy"
>> 4. Resetting of environment implies rebooting all the nodes to boostrap
>> state. When it is done, status of env is changed to "New", and "Deploy"
>> button becomes active.
>> 5. When user is done with re-configuration, he clicks "Deploy". Fuel should
>> use same IP addresses / hostnames as at the time of initial deployment, if no
>> changes are made to networking.
>>  
>> Thanks,
>>  
>>  
>> 
>>  
>>  
>> On Wed, Nov 20, 2013 at 7:14 PM, Mike Scherbakov <mscherbakov@xxxxxxxxxxxx>
>> wrote:
>>  
>>>  
>>> + Evgeniy, Nick
>>>  
>>>  
>>>  
>>> 
>>>  
>>>  
>>> On Wed, Nov 20, 2013 at 7:01 PM, David Easter <deaster@xxxxxxxxxxxx> wrote:
>>>  
>>>>  
>>>>  
>>>> I thought about this some more last night and what about this for a
>>>> resolution?
>>>>  
>>>> 
>>>>  
>>>>  
>>>> 1. When stop deployment is done, any successfully deployed are flagged as
>>>> successful and would not be reinstalled when Deploy Changes is pressed
>>>> again. 
>>>> 2. If a customer wants to reset the environment and start over, they can
>>>> use the "Reset environment" option to wipe the partially installed
>>>> environment and start over.
>>>> 3. Otherwise, when Deploy Changes is clicked again, Fuel will try to deploy
>>>> only the unfinished or error-state nodes againŠ just as it does today.
>>>>  
>>>> That way, the customer has the option of starting over or just continuing
>>>> from where they left off.  If controllers or network install failed, Fuel
>>>> would consider that an unrecoverable error condition and just reinstall
>>>> those nodes
>  1) I believe, we should reflect related Environment Operations changes in
> Nailgun API as well
> https://docs.google.com/a/mirantis.com/document/d/1KQPEG62wBF-U-s8mUzAcP3_rLKO
> BgyEyUY9e9yKE49U/edit#heading=h.qcspsp3wasyy
>  2) Having an ability to reset the given node as well as the deployment, is
> vital for cluster self-healing. F.e., if we have STONITH'ed the failed
> controller node and want just redeploy it from the scratch, we might use
> nailgun API to reset the node to ensure it would be re-provisioned and
> re-deployed at the next boot...
>  
>>  
>>  
>>  
>>  
>>>  
>>>  
>>>  
>>>  
>>>  
>>>>  
>>>>  
>>>>  
>>>>  
>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>>>  
>>>>>>>  
>>>>>>>  
>>>>>>> 
>>>>>>>  
>>>>>>> 1. this feature required only for developers (or maybe services),
>>>>>>> because in this case user will not be able to reconfigure cluster via
>>>>>>> rest-api (i.e. UI, CLI) after deployment was Stopped. If we allow
>>>>>>> configuration, then deployment in 90% cases likely to fail.
>>>>>>>  2. we cannot interrupt network configuration being in progress, to
>>>>>>> resolve this issue we need some kind of recovery mechanism for networks
>>>>>>>  3. also we cannot interrupt apt-get (and maybe yum) because it creates
>>>>>>> a lock file and puppet will fail when we will try to run it for a
>> This body part will be downloaded on demand.
>>  
>  
>  
>  
> -- 
> Best regards,
> Bogdan Dobrelya,
> Researcher TechLead, Mirantis, Inc.
> +38 (066) 051 07 53
> Skype bogdando_at_yahoo.com <http://bogdando_at_yahoo.com>
> 38, Lenina ave.
> Kharkov, Ukraine
> www.mirantis.com <http://www.mirantis.com> www.mirantis.ru
> <http://www.mirantis.ru> bdobrelia@xxxxxxxxxxxx <javascript:_e({}, 'cvml',
> 'bdobrelia@xxxxxxxxxxxx');>
>  



Follow ups

References