← Back to team overview

fuel-dev team mailing list archive

Re: Bonding problems

 

Night flood ping  through LACP didn't lose any packet.


On Wed, Feb 26, 2014 at 12:31 AM, Vladimir Kuklin <vkuklin@xxxxxxxxxxxx>wrote:

> Guys, suggested https://review.openstack.org/76345 fix works OK, though
> it makes impossible to understand patch names :-) So we are waiting for
> Sergey to provide more human-readable workaround. But we can continue
> testing with this patch applied to ensure that 1.9.3 downgrade does not
> introduce any regressions.
>
>
> On Wed, Feb 26, 2014 at 12:11 AM, Vladimir Kuklin <vkuklin@xxxxxxxxxxxx>wrote:
>
>> Guys, we are testing OVS 1.9.3 on Ubuntu right now. It seems we have some
>> problems with l23network module:
>> https://bugs.launchpad.net/fuel/+bug/1284801
>> We are going to apply a workaround for it. If everything else goes fine,
>> we are going to move to 1.9.3 as it is OVS LTS version both for CentOS and
>> Ubuntu.
>>
>>
>> On Tue, Feb 25, 2014 at 11:27 PM, Mike Scherbakov <
>> mscherbakov@xxxxxxxxxxxx> wrote:
>>
>>> Great news!!!
>>> Andrey, thanks for staying late and waking up early these days in order
>>> to resolve this. You deserve a good rest. Przmek - thanks for help!
>>> Documentation would be really needed, otherwise users will be getting back
>>> to us and complaining that something doesn't work..
>>>
>>>
>>>
>>> On Tue, Feb 25, 2014 at 11:04 PM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>>
>>>> Okay. I finally have learned to set up LACP between OVS and Procurve
>>>> 2510G. It works fine, like the balance-slb do. I leave my flood ping for
>>>> the night and will tell you the results tomorrow. It seems we can fly to
>>>> production with current versions of openvswitch. But here in Moscow we
>>>> still try to build a fully OVS-1.9.3 ISO and test it.
>>>>
>>>> Of course we need to document all the issues properly. As I know
>>>> Przemek wants to publish a good written examples of OVS, Cisco, Juniper and
>>>> Arista configs about enabling LACP.
>>>>
>>>>
>>>> On Tue, Feb 25, 2014 at 3:12 PM, Mike Scherbakov <
>>>> mscherbakov@xxxxxxxxxxxx> wrote:
>>>>
>>>>> Good news.
>>>>> Thanks Andrey, keep going!
>>>>>
>>>>>
>>>>> On Tue, Feb 25, 2014 at 2:28 PM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>>>>
>>>>>> After 14 hours of a flood ping a hardware lab lost few packets and
>>>>>> virtual env lost hundreds of packets. Mode: balance-slb.
>>>>>>
>>>>>> I'm going to test LACP behaviour today.
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 25, 2014 at 3:50 AM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>>>>>
>>>>>>> Fine. They wrote about that in the documentation too:
>>>>>>> http://openvswitch.org/ovs-vswitchd.conf.db.5.pdf page 14 It was
>>>>>>> introduced two years ago since version 1.5.0. One problem less!
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Feb 25, 2014 at 3:37 AM, Ryan Moe <rmoe@xxxxxxxxxxxx> wrote:
>>>>>>>
>>>>>>>> Andrey is correct. It appears that balance-tcp requires successful
>>>>>>>> LACP negotiation. See here:
>>>>>>>> https://github.com/osrg/openvswitch/blob/master/lib/bond.c#L610and here:
>>>>>>>> https://github.com/osrg/openvswitch/blob/master/lib/bond.c#L1438.
>>>>>>>> This also means that when we create bonds with balance-tcp we need to
>>>>>>>> configure lacp as well.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Feb 24, 2014 at 3:14 PM, Andrey Danin <adanin@xxxxxxxxxxxx>wrote:
>>>>>>>>
>>>>>>>>> And yes, the bug https://bugs.launchpad.net/fuel/+bug/1272842 and
>>>>>>>>> current problem can be unrelated but they have similar error messages in
>>>>>>>>> OVS logs.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Feb 25, 2014 at 2:55 AM, Andrey Danin <adanin@xxxxxxxxxxxx
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> Guys, I set up hardware (2 nodes) and software (3 nodes) labs
>>>>>>>>>> today with ISO #181 to test bonding. Unfortunately, balance-tcp mode is
>>>>>>>>>> totally broken. When I use it during deployment or switch to it in a
>>>>>>>>>> working cluster, all traffic stops. Playing with rebalance interval doesn't
>>>>>>>>>> help.
>>>>>>>>>> On the contrary, balance-slb works fine. Both Ubuntu (Hhardware
>>>>>>>>>> nodes) and CentOS (virtual env) works without any traffic lost. I'm running
>>>>>>>>>> a flooded ping between virtual instances inside of clouds for a night and
>>>>>>>>>> will check a number of lost packets. Also I want to play with iperf.
>>>>>>>>>>
>>>>>>>>>> Next things we can do:
>>>>>>>>>> * Build an ISO with stable (1.9.3) or newest (2.0.x) version of
>>>>>>>>>> OVS and play with them. Yesterday we decided to build Ubuntu 12.04 with
>>>>>>>>>> Debian Sid 1.9.3 version of OVS. There is the ticket about that
>>>>>>>>>> https://mirantis.jira.com/browse/OSCI-1089 Also Igor built its
>>>>>>>>>> own version of an ISO with Sid package.
>>>>>>>>>> * Dump openflow rules in balance-tcp mode and try to fix them.
>>>>>>>>>> It's hard to do that because Aliens developed their syntax.
>>>>>>>>>> * Run Igor's tests again and again until balance-slb starts block
>>>>>>>>>> a traffic. Then dig into openflow rules.
>>>>>>>>>> * Play with LACP on a real hardware. Maybe balance-tcp can be
>>>>>>>>>> used only with lacp=active.
>>>>>>>>>> * Ask the openvswitch community about our problems.
>>>>>>>>>>
>>>>>>>>>> Andrew, yes, the PXE network still nailed to an interface. I hope
>>>>>>>>>> we will fix it in 5.0.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 25, 2014 at 12:20 AM, Igor Shishkin <
>>>>>>>>>> ishishkin@xxxxxxxxxxxx> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello, Dmitry.
>>>>>>>>>>>
>>>>>>>>>>> It’s 100% reproducible on virtual environment when we’re trying
>>>>>>>>>>> to deploy bonding in balance tcp or balance slb mode.
>>>>>>>>>>> Tests related as a way to reproduce and a warning why these
>>>>>>>>>>> tests should fail when they’ll be merged.
>>>>>>>>>>>
>>>>>>>>>>> As we can see problem is in rebalance procedure openvswitch
>>>>>>>>>>> tries to do since it started bonded interface. And in this time bonded
>>>>>>>>>>> interfaces stops to accept ARPs.
>>>>>>>>>>>
>>>>>>>>>>> I just built openvswitch=1.9.3 which is LTS and wanna try it in
>>>>>>>>>>> the same case and try to descrease bond-rebalance-interval to 0(as Andrey
>>>>>>>>>>> K. suggested). If any of this will help - this could be the solution(but
>>>>>>>>>>> I'm really not sure bond-rebalance-interval=0 is a good way).
>>>>>>>>>>> —
>>>>>>>>>>> Igor Shishkin
>>>>>>>>>>> QA Engineer
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 24 Feb 2014, at 23:59, Dmitry Borodaenko <
>>>>>>>>>>> dborodaenko@xxxxxxxxxxxx> wrote:
>>>>>>>>>>>
>>>>>>>>>>> > Mike, Igor,
>>>>>>>>>>> >
>>>>>>>>>>> > Can you provide more details on how the integration test in
>>>>>>>>>>> review
>>>>>>>>>>> > #75161 helps to reproduce bug #1272842?
>>>>>>>>>>> >
>>>>>>>>>>> > As far as I understand, the bug is a highly intermittent
>>>>>>>>>>> problem with
>>>>>>>>>>> > ARP that was only showing up after an environment with LACP
>>>>>>>>>>> bonding
>>>>>>>>>>> > was operational for at least a few hours.
>>>>>>>>>>> >
>>>>>>>>>>> > On the other hand, the problem Igor is reporting based on the
>>>>>>>>>>> > integration test sounds like something 100% reproducible that
>>>>>>>>>>> doesn't
>>>>>>>>>>> > require real hardware or LACP and is not necessarily related
>>>>>>>>>>> to ARP.
>>>>>>>>>>> >
>>>>>>>>>>> > Are you sure you're not confusing two unrelated problems?
>>>>>>>>>>> >
>>>>>>>>>>> > Thanks,
>>>>>>>>>>> > -DmitryB
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > On Mon, Feb 24, 2014 at 9:18 AM, Mike Scherbakov
>>>>>>>>>>> > <mscherbakov@xxxxxxxxxxxx> wrote:
>>>>>>>>>>> >> The issue is here:
>>>>>>>>>>> https://bugs.launchpad.net/fuel/+bug/1272842.
>>>>>>>>>>> >> Those who know what can be wrong with our openvswitch/kernel,
>>>>>>>>>>> please provide
>>>>>>>>>>> >> your input..
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Mon, Feb 24, 2014 at 9:04 PM, Igor Shishkin <
>>>>>>>>>>> ishishkin@xxxxxxxxxxxx>
>>>>>>>>>>> >> wrote:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Hello,
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Currently we have this review
>>>>>>>>>>> https://review.openstack.org/#/c/75161 with
>>>>>>>>>>> >>> test cases for our brand new shiny bonding feature but
>>>>>>>>>>> >>> balance-tcp/balance-slb modes are not working for now.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Steps to reproduce are very simple:
>>>>>>>>>>> >>> Create cluster with simple or HA configuration, select
>>>>>>>>>>> balance-tcp or
>>>>>>>>>>> >>> balance-slb bonding mode and start deployment.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Deployment will not finish with success because of rebalance
>>>>>>>>>>> procedure
>>>>>>>>>>> >>> problems.
>>>>>>>>>>> >>> --
>>>>>>>>>>> >>> Igor Shishkin
>>>>>>>>>>> >>> QA Engineer
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> --
>>>>>>>>>>> >> Mike Scherbakov
>>>>>>>>>>> >> #mihgen
>>>>>>>>>>> >>
>>>>>>>>>>> >> --
>>>>>>>>>>> >> Mailing list: https://launchpad.net/~fuel-dev
>>>>>>>>>>> >> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>>>>>>> >> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>>>>>>> >> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>>> >>
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > --
>>>>>>>>>>> > Dmitry Borodaenko
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>>>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>>>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Andrey Danin
>>>>>>>>>> adanin@xxxxxxxxxxxx
>>>>>>>>>> skype: gcon.monolake
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Andrey Danin
>>>>>>>>> adanin@xxxxxxxxxxxx
>>>>>>>>> skype: gcon.monolake
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Andrey Danin
>>>>>>> adanin@xxxxxxxxxxxx
>>>>>>> skype: gcon.monolake
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Andrey Danin
>>>>>> adanin@xxxxxxxxxxxx
>>>>>> skype: gcon.monolake
>>>>>>
>>>>>> --
>>>>>> Mailing list: https://launchpad.net/~fuel-dev
>>>>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Mike Scherbakov
>>>>> #mihgen
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Andrey Danin
>>>> adanin@xxxxxxxxxxxx
>>>> skype: gcon.monolake
>>>>
>>>
>>>
>>>
>>> --
>>> Mike Scherbakov
>>> #mihgen
>>>
>>> --
>>> Mailing list: https://launchpad.net/~fuel-dev
>>> Post to     : fuel-dev@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~fuel-dev
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>>
>> --
>> Yours Faithfully,
>> Vladimir Kuklin,
>> Senior Deployment Engineer,
>> Mirantis, Inc.
>> +7 (495) 640-49-04
>> +7 (926) 702-39-68
>> Skype kuklinvv
>> 45bk3, Vorontsovskaya Str.
>> Moscow, Russia,
>> www.mirantis.com <http://www.mirantis.ru/>
>> www.mirantis.ru
>> vkuklin@xxxxxxxxxxxx
>>
>
>
>
> --
> Yours Faithfully,
> Vladimir Kuklin,
> Senior Deployment Engineer,
> Mirantis, Inc.
> +7 (495) 640-49-04
> +7 (926) 702-39-68
> Skype kuklinvv
> 45bk3, Vorontsovskaya Str.
> Moscow, Russia,
> www.mirantis.com <http://www.mirantis.ru/>
> www.mirantis.ru
> vkuklin@xxxxxxxxxxxx
>



-- 
Andrey Danin
adanin@xxxxxxxxxxxx
skype: gcon.monolake

Follow ups

References