← Back to team overview

fuel-dev team mailing list archive

Re: Replacing failed controller in Fuel

 

I expect that Nailgun will reassign an IP from an old node to a new one.


On Tue, Jul 29, 2014 at 5:28 PM, Vladimir Kuklin <vkuklin@xxxxxxxxxxxx>
wrote:

> 1) Redeployment does not affect already running services except haproxy
> restart. It will lead to a couple of seconds downtime of API services.
> Newer 5.1 release will contain better galera and RabbitMQ deployment, so
> that it will lead again to not more than a couple of seconds downtime
>
> 2) We do not have a complete sequence of actions to work it around. You
> can use maintenance mode for pacemaker, update corosync configs, then
> restart corosync. Then bring pacemaker maintenance-mode off.
>
> 3) Workflow is  the same for the bug: env description, logs, sequence of
> actions, expected result, actual result, additional info, logs if available
>
> 4) Substitution will lead to the update corosync configuration. You can
> hack it a little bit in nailgun database to issue the same IPs for the node
> being replaced - I am currently not aware of any mechanisms that retain IPs
> for substituted controllers. Anyway, this is a good point to start a
> blueprint.
>
>
>
>
>
> On Tue, Jul 29, 2014 at 5:19 PM, Dmitriy Novakovskiy <
> dnovakovskiy@xxxxxxxxxxxx> wrote:
>
>> Vladimir,
>>
>> Thanks for the answers, however I'm getting a bit confused, so will ask
>> more questions
>>
>> *2) redeployment of the cluster is needed because you need to update all
>>> the config files.*
>>
>>
>> Does re-deployment (when adding new controller and removing old one)
>> involve loss of API connectivity and DB re-creation on controllers?
>>
>> *3) there is no workaround available right now as it required sufficient
>>> rewriting of puppet code and modification of the architecture to get rid of
>>> all the issues.*
>>
>>
>> The workaround may be - go to corosync's config, add new controller's
>> IP/any other parameters, restart corosync. I mean - manual steps for user
>> to execute while the overall issue is not yet solved in 5.1. Is it possible?
>>
>> *4) please share them*
>>
>>
>> Will do, but again - what data should I capture from user?
>>
>> *5) also, controller substitution may face some of the issues as it is
>>> sometimes similar to controller addition.*
>>
>>
>> Can you add more details here about the issues?
>>
>>
>>
>> ---
>> Regards,
>>
>> *Dmitriy Novakovskiy*
>> Sales Engineer, Mirantis EMEA
>>
>> *Skype:* dmitriy.novakovskiy
>> *Operating from:* Ukraine
>>
>>
>> On Tue, Jul 29, 2014 at 2:34 PM, Vladimir Kuklin <vkuklin@xxxxxxxxxxxx>
>> wrote:
>>
>>> 1) corosync restart happens becaude you need to modify config file in
>>> unicast mode. If you use multicast - everything is fine.
>>> 2) redeployment of the cluster is needed because you need to update all
>>> the config files.
>>> 3) there is no workaround available right now as it required sufficient
>>> rewriting of puppet code and modification of the architecture to get rid of
>>> all the issues. I hope we will fix all of them in the upcoming 5.1 release.
>>> 4) please share them
>>> 5) also, controller substitution may face some of the issues as it is
>>> sometimes similar to controller addition.
>>>  29 июля 2014 г. 16:27 пользователь "Dmitriy Novakovskiy" <
>>> dnovakovskiy@xxxxxxxxxxxx> написал:
>>>
>>>  thanks guys,
>>>>
>>>> Q3. So do i understand correctly that some earlier existing behavior
>>>> (when adding a controller caused all controllers to re-deploy and, in turn,
>>>> API downtime (not sure about DB data loss)) is no longer the case?
>>>> Q4. Is there a documented "workaround" for corosync addition?
>>>> Q5. I have a user who's facing sporadic issues with the controller
>>>> substitution workflow that we've discussed here. Sometimes new controller
>>>> is added fine, sometimes issues occur. Should I ask for Fuel screenshots,
>>>> diagnostic snapshots, all together?
>>>>
>>>> ---
>>>> Regards,
>>>>
>>>> *Dmitriy Novakovskiy*
>>>> Sales Engineer, Mirantis EMEA
>>>>
>>>> *Skype:* dmitriy.novakovskiy
>>>> *Operating from:* Ukraine
>>>>
>>>>
>>>> On Mon, Jul 28, 2014 at 4:32 PM, Vladimir Kuklin <vkuklin@xxxxxxxxxxxx>
>>>> wrote:
>>>>
>>>>> new node corosync insertion issue is related to
>>>>> https://bugs.launchpad.net/fuel/+bug/1312627 and will be addressed in
>>>>> 5.1 release.
>>>>>
>>>>>
>>>>> On Mon, Jul 28, 2014 at 6:05 PM, Sergii Golovatiuk <
>>>>> sgolovatiuk@xxxxxxxxxxxx> wrote:
>>>>>
>>>>>> Hi Dmitriy,
>>>>>>
>>>>>> The algorithm you described is correct. Currently, Fuel is really
>>>>>> close to the procedure you describe.
>>>>>>
>>>>>> 1. Remove controller from environment
>>>>>> Puppet will remove the controller from files, services re-triggered.
>>>>>> Though, the case requires one manual step from operator as corosync can't
>>>>>> remove/add new node to redundant ring protocol on the fly.
>>>>>> 2. Not a problem and already implemented.
>>>>>> 3. Everything should work fine except insertion a new node to
>>>>>> corosync.
>>>>>>
>>>>>> We have a blueprint to tune/fix corosync additional/removal nodes. I
>>>>>> hope this functionality will be implemented soon.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Sergii Golovatiuk,
>>>>>> Skype #golserge
>>>>>> IRC #holser
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 28, 2014 at 3:49 PM, Dmitriy Novakovskiy <
>>>>>> dnovakovskiy@xxxxxxxxxxxx> wrote:
>>>>>>
>>>>>>> Hi Fuelers,
>>>>>>>
>>>>>>> I recently got a question from one of the prospects - what should
>>>>>>> Fuel user do if one of the OpenStack controllers fails (completely) and
>>>>>>> there's a need to replace it with new box.
>>>>>>>
>>>>>>> My educated guess was:
>>>>>>> 1. Remove controller from the environment in Fuel UI (*Q1:* is it
>>>>>>> actually possible? assuming that server is out and Fuel won't be able to do
>>>>>>> cleanup)
>>>>>>> 2. Get new controller discovered
>>>>>>> 3. Add new controller to the environment in Fuel UI (*Q2:* how does
>>>>>>> this happen right now? Does Fuel re-reploy all controllers? Will cloud
>>>>>>> experience services downtime? Will DB state be preserved?)
>>>>>>>
>>>>>>> Is it anywhere close to reality? Do we actually test the cases like
>>>>>>> that?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> ---
>>>>>>> Regards,
>>>>>>>
>>>>>>> *Dmitriy Novakovskiy*
>>>>>>> Sales Engineer, Mirantis EMEA
>>>>>>>
>>>>>>> *Skype:* dmitriy.novakovskiy
>>>>>>> *Operating from:* Ukraine
>>>>>>>
>>>>>>> --
>>>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Yours Faithfully,
>>>>> Vladimir Kuklin,
>>>>> Fuel Library Tech Lead,
>>>>> Mirantis, Inc.
>>>>> +7 (495) 640-49-04
>>>>> +7 (926) 702-39-68
>>>>> Skype kuklinvv
>>>>> 45bk3, Vorontsovskaya Str.
>>>>> Moscow, Russia,
>>>>> www.mirantis.com <http://www.mirantis.ru/>
>>>>> www.mirantis.ru
>>>>> vkuklin@xxxxxxxxxxxx
>>>>>
>>>>
>>>>
>>
>
>
> --
> Yours Faithfully,
> Vladimir Kuklin,
> Fuel Library Tech Lead,
> Mirantis, Inc.
> +7 (495) 640-49-04
> +7 (926) 702-39-68
> Skype kuklinvv
> 45bk3, Vorontsovskaya Str.
> Moscow, Russia,
> www.mirantis.com <http://www.mirantis.ru/>
> www.mirantis.ru
> vkuklin@xxxxxxxxxxxx
>
> --
> Mailing list: https://launchpad.net/~fuel-dev
> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~fuel-dev
> More help   : https://help.launchpad.net/ListHelp
>
>


-- 
Andrey Danin
adanin@xxxxxxxxxxxx
skype: gcon.monolake

References