fuel-dev team mailing list archive

Thread
Date

Re: Replacing failed controller in Fuel

To: Dmitriy Novakovskiy <dnovakovskiy@xxxxxxxxxxxx>
From: Vladimir Kuklin <vkuklin@xxxxxxxxxxxx>
Date: Tue, 29 Jul 2014 17:28:18 +0400
Cc: "fuel-dev@xxxxxxxxxxxxxxxxxxx" <fuel-dev@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CAKvyFTTHOXBL4PV5ZdjL8H68FO0VUhaVuOfqv7KP5H1cBB13zg@mail.gmail.com>

1) Redeployment does not affect already running services except haproxy
restart. It will lead to a couple of seconds downtime of API services.
Newer 5.1 release will contain better galera and RabbitMQ deployment, so
that it will lead again to not more than a couple of seconds downtime

2) We do not have a complete sequence of actions to work it around. You can
use maintenance mode for pacemaker, update corosync configs, then restart
corosync. Then bring pacemaker maintenance-mode off.

3) Workflow is  the same for the bug: env description, logs, sequence of
actions, expected result, actual result, additional info, logs if available

4) Substitution will lead to the update corosync configuration. You can
hack it a little bit in nailgun database to issue the same IPs for the node
being replaced - I am currently not aware of any mechanisms that retain IPs
for substituted controllers. Anyway, this is a good point to start a
blueprint.





On Tue, Jul 29, 2014 at 5:19 PM, Dmitriy Novakovskiy <
dnovakovskiy@xxxxxxxxxxxx> wrote:

> Vladimir,
>
> Thanks for the answers, however I'm getting a bit confused, so will ask
> more questions
>
> *2) redeployment of the cluster is needed because you need to update all
>> the config files.*
>
>
> Does re-deployment (when adding new controller and removing old one)
> involve loss of API connectivity and DB re-creation on controllers?
>
> *3) there is no workaround available right now as it required sufficient
>> rewriting of puppet code and modification of the architecture to get rid of
>> all the issues.*
>
>
> The workaround may be - go to corosync's config, add new controller's
> IP/any other parameters, restart corosync. I mean - manual steps for user
> to execute while the overall issue is not yet solved in 5.1. Is it possible?
>
> *4) please share them*
>
>
> Will do, but again - what data should I capture from user?
>
> *5) also, controller substitution may face some of the issues as it is
>> sometimes similar to controller addition.*
>
>
> Can you add more details here about the issues?
>
>
>
> ---
> Regards,
>
> *Dmitriy Novakovskiy*
> Sales Engineer, Mirantis EMEA
>
> *Skype:* dmitriy.novakovskiy
> *Operating from:* Ukraine
>
>
> On Tue, Jul 29, 2014 at 2:34 PM, Vladimir Kuklin <vkuklin@xxxxxxxxxxxx>
> wrote:
>
>> 1) corosync restart happens becaude you need to modify config file in
>> unicast mode. If you use multicast - everything is fine.
>> 2) redeployment of the cluster is needed because you need to update all
>> the config files.
>> 3) there is no workaround available right now as it required sufficient
>> rewriting of puppet code and modification of the architecture to get rid of
>> all the issues. I hope we will fix all of them in the upcoming 5.1 release.
>> 4) please share them
>> 5) also, controller substitution may face some of the issues as it is
>> sometimes similar to controller addition.
>>  29 июля 2014 г. 16:27 пользователь "Dmitriy Novakovskiy" <
>> dnovakovskiy@xxxxxxxxxxxx> написал:
>>
>>  thanks guys,
>>>
>>> Q3. So do i understand correctly that some earlier existing behavior
>>> (when adding a controller caused all controllers to re-deploy and, in turn,
>>> API downtime (not sure about DB data loss)) is no longer the case?
>>> Q4. Is there a documented "workaround" for corosync addition?
>>> Q5. I have a user who's facing sporadic issues with the controller
>>> substitution workflow that we've discussed here. Sometimes new controller
>>> is added fine, sometimes issues occur. Should I ask for Fuel screenshots,
>>> diagnostic snapshots, all together?
>>>
>>> ---
>>> Regards,
>>>
>>> *Dmitriy Novakovskiy*
>>> Sales Engineer, Mirantis EMEA
>>>
>>> *Skype:* dmitriy.novakovskiy
>>> *Operating from:* Ukraine
>>>
>>>
>>> On Mon, Jul 28, 2014 at 4:32 PM, Vladimir Kuklin <vkuklin@xxxxxxxxxxxx>
>>> wrote:
>>>
>>>> new node corosync insertion issue is related to
>>>> https://bugs.launchpad.net/fuel/+bug/1312627 and will be addressed in
>>>> 5.1 release.
>>>>
>>>>
>>>> On Mon, Jul 28, 2014 at 6:05 PM, Sergii Golovatiuk <
>>>> sgolovatiuk@xxxxxxxxxxxx> wrote:
>>>>
>>>>> Hi Dmitriy,
>>>>>
>>>>> The algorithm you described is correct. Currently, Fuel is really
>>>>> close to the procedure you describe.
>>>>>
>>>>> 1. Remove controller from environment
>>>>> Puppet will remove the controller from files, services re-triggered.
>>>>> Though, the case requires one manual step from operator as corosync can't
>>>>> remove/add new node to redundant ring protocol on the fly.
>>>>> 2. Not a problem and already implemented.
>>>>> 3. Everything should work fine except insertion a new node to corosync.
>>>>>
>>>>> We have a blueprint to tune/fix corosync additional/removal nodes. I
>>>>> hope this functionality will be implemented soon.
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Sergii Golovatiuk,
>>>>> Skype #golserge
>>>>> IRC #holser
>>>>>
>>>>>
>>>>> On Mon, Jul 28, 2014 at 3:49 PM, Dmitriy Novakovskiy <
>>>>> dnovakovskiy@xxxxxxxxxxxx> wrote:
>>>>>
>>>>>> Hi Fuelers,
>>>>>>
>>>>>> I recently got a question from one of the prospects - what should
>>>>>> Fuel user do if one of the OpenStack controllers fails (completely) and
>>>>>> there's a need to replace it with new box.
>>>>>>
>>>>>> My educated guess was:
>>>>>> 1. Remove controller from the environment in Fuel UI (*Q1:* is it
>>>>>> actually possible? assuming that server is out and Fuel won't be able to do
>>>>>> cleanup)
>>>>>> 2. Get new controller discovered
>>>>>> 3. Add new controller to the environment in Fuel UI (*Q2:* how does
>>>>>> this happen right now? Does Fuel re-reploy all controllers? Will cloud
>>>>>> experience services downtime? Will DB state be preserved?)
>>>>>>
>>>>>> Is it anywhere close to reality? Do we actually test the cases like
>>>>>> that?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> ---
>>>>>> Regards,
>>>>>>
>>>>>> *Dmitriy Novakovskiy*
>>>>>> Sales Engineer, Mirantis EMEA
>>>>>>
>>>>>> *Skype:* dmitriy.novakovskiy
>>>>>> *Operating from:* Ukraine
>>>>>>
>>>>>> --
>>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Yours Faithfully,
>>>> Vladimir Kuklin,
>>>> Fuel Library Tech Lead,
>>>> Mirantis, Inc.
>>>> +7 (495) 640-49-04
>>>> +7 (926) 702-39-68
>>>> Skype kuklinvv
>>>> 45bk3, Vorontsovskaya Str.
>>>> Moscow, Russia,
>>>> www.mirantis.com <http://www.mirantis.ru/>
>>>> www.mirantis.ru
>>>> vkuklin@xxxxxxxxxxxx
>>>>
>>>
>>>
>


-- 
Yours Faithfully,
Vladimir Kuklin,
Fuel Library Tech Lead,
Mirantis, Inc.
+7 (495) 640-49-04
+7 (926) 702-39-68
Skype kuklinvv
45bk3, Vorontsovskaya Str.
Moscow, Russia,
www.mirantis.com <http://www.mirantis.ru/>
www.mirantis.ru
vkuklin@xxxxxxxxxxxx

Follow ups

Re: Replacing failed controller in Fuel
From: Andrey Danin, 2014-07-29

References

Replacing failed controller in Fuel
From: Dmitriy Novakovskiy, 2014-07-28
Re: Replacing failed controller in Fuel
From: Sergii Golovatiuk, 2014-07-28
Re: Replacing failed controller in Fuel
From: Vladimir Kuklin, 2014-07-28
Re: Replacing failed controller in Fuel
From: Dmitriy Novakovskiy, 2014-07-29
Re: Replacing failed controller in Fuel
From: Vladimir Kuklin, 2014-07-29
Re: Replacing failed controller in Fuel
From: Dmitriy Novakovskiy, 2014-07-29