openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #23595
Re: A Grizzly GRE failure [SOLVED]
I've had a terrible time getting the community to help me with this
problem. So special thanks to Darragh O'Reilly and rkeene on
#openstack who was mean and a bit of a wisenheimer (I'd use different
words elsewhere), but at least he talked to me and got me to think
twice about my GRE setup.
But enough of that, problem solved and a bug report has been
submitted: https://bugs.launchpad.net/quantum/+bug/1179223.. I added
an "s" to the front of "persists" in the subject, but whatever. I
always leave one thing in the hotel room, and I always leave one
embarrassing typo.
Here's the part explaining how it was fixed:
SOLUTION:
mysql> delete from ovs_tunnel_endpoints where id = 1;
Query OK, 1 row affected (0.00 sec)
mysql> select * from ovs_tunnel_endpoints;
+-----------------+----+
| ip_address | id |
+-----------------+----+
| 192.168.239.110 | 3 |
| 192.168.239.114 | 4 |
| 192.168.239.115 | 5 |
| 192.168.239.99 | 2 |
+-----------------+----+
4 rows in set (0.00 sec)
* After doing that, I simply restarted the quantum ovs agents on the
network and compute nodes. The old GRE tunnel is not re-created.
Thereafter, VM network traffic to and from the external network
proceeds without incident.
* Should these tables be cleaned up as well, I wonder:
mysql> select * from ovs_network_bindings;
+--------------------------------------+--------------+------------------+-----------------+
| network_id | network_type | physical_network | segmentation_id |
+--------------------------------------+--------------+------------------+-----------------+
| 4e8aacca-8b38-40ac-a628-18cac3168fe6 | gre | NULL | 2 |
| af224f3f-8de6-4e0d-b043-6bcd5cb014c5 | gre | NULL | 1 |
+--------------------------------------+--------------+------------------+-----------------+
2 rows in set (0.00 sec)
mysql> select * from ovs_tunnel_allocations where allocated != 0;
+-----------+-----------+
| tunnel_id | allocated |
+-----------+-----------+
| 1 | 1 |
| 2 | 1 |
+-----------+-----------+
2 rows in set (0.00 sec)
Cheers, and happy openstacking. Even you, rkeene!
--Greg Chavez
On Sat, May 11, 2013 at 2:28 PM, Greg Chavez <greg.chavez@xxxxxxxxx> wrote:
> So to be clear:
>
> * I have a three nics on my network node. The VM traffic goes out the
> 1st nic on 192.168.239.99/24 to the other compute nodes, while
> management traffic goes out the 2nd nic on 192.168.241.99. The 3rd nic
> is external and has no IP.
>
> * I have four GRE endpoints on the VM network, one at the network node
> (192.168.239.99) and three on compute nodes
> (192.168.239.{110,114,115}), all with IDs 2-5.
>
> * I have a fifth GRE endpoint with id 1 to 192.168.241.99, the network
> node's management interface. This was the first tunnel created when I
> deployed the network node because that is how I set the remote_ip in
> the ovs plugin ini. I corrected the setting later, but the
> 192.168.241.99 endpoint persists and, as your response implies, *this
> extraneous endpoint is the cause of my troubles*.
>
> My next question then is what is happening? My guess:
>
> * I ping a guest from the external network using its floater (10.21.166.4).
>
> * It gets NAT'd at the tenant router on the network node to
> 192.168.252.3, at which point an arp request is sent over the unified
> GRE broadcast domain.
>
> * On a compute node, the arp request is received by the VM, which then
> sends a reply to the tenant router's MAC (which I verified with
> tcpdumps).
>
> * There are four endpoints for the packet to go down:
>
> Bridge br-tun
> Port br-tun
> Interface br-tun
> type: internal
> Port "gre-1"
> Interface "gre-1"
> type: gre
> options: {in_key=flow, out_key=flow, remote_ip="192.168.241.99"}
> Port "gre-4"
> Interface "gre-4"
> type: gre
> options: {in_key=flow, out_key=flow,
> remote_ip="192.168.239.114"}
> Port "gre-3"
> Interface "gre-3"
> type: gre
> options: {in_key=flow, out_key=flow,
> remote_ip="192.168.239.110"}
> Port patch-int
> Interface patch-int
> type: patch
> options: {peer=patch-tun}
> Port "gre-2"
> Interface "gre-2"
> type: gre
> options: {in_key=flow, out_key=flow, remote_ip="192.168.239.99"}
>
> Here's where I get confused. Does it know that gre-1 is a different
> broadcast domain than the others, or does is see all endpoints as the
> same domain?
>
> What happens here? Is this the cause of my network timeouts on
> external connections to the VMs? Does this also explain the sporadic
> nature of the timeouts, why they aren't consistent in frequency or
> duration?
>
> Finally, what happens when I remove the oddball endpoint from the DB?
> Sounds risky!
>
> Thanks for your help
> --Greg Chavez
>
> On Fri, May 10, 2013 at 7:17 PM, Darragh O'Reilly
> <dara2002-openstack@xxxxxxxxx> wrote:
>> I'm not sure how to rectify that. You may have to delete the bad row from the DB and restart the agents:
>>
>> mysql> use quantum;
>> mysql> select * from ovs_tunnel_endpoints;
>> ...
>>
>>On Fri, May 10, 2013 at 6:43 PM, Greg Chavez <greg.chavez@xxxxxxxxx> wrote:
>>> I'm refactoring my question once again (see "A Grizzly arping
>>> failure" and "Failure to arp by quantum router").
>>>
>>> Quickly, the problem is in a multi-node Grizzly+Raring setup with a
>>> separate network node and a dedicated VLAN for VM traffic. External
>>> connections time out within a minute and dont' resume until traffic is
>>> initiated from the VM.
>>>
>>> I got some rather annoying and hostile assistance just now on IRC and
>>> while it didn't result in a fix, it got me to realize that the problem
>>> is possibly with my GRE setup.
>>>
>>> I made a mistake when I originally set this up, assigning the mgmt
>>> interface of the network node (192.168.241.99) as its GRE remote_ip
>>> instead if the vm_config network interface (192.168.239.99). I
>>> realized my mistake and reconfigured the OVS plugin on the network
>>> node and moved one. But now, taking a look at my OVS bridges on the
>>> network node, I see that the old remote IP is still there!
>>>
>>> Bridge br-tun
>>> <snip>
>>> Port "gre-1"
>>> Interface "gre-1"
>>> type: gre
>>> options: {in_key=flow, out_key=flow, remote_ip="192.168.241.99"}
>>> <snip>
>>>
>>> This is also on all the compute nodes.
>>>
>>> ( Full ovs-vsctl show output here: http://pastebin.com/xbre1fNV)
>>>
>>> What's more, I have this error every time I restart OVS:
>>>
>>> 2013-05-10 18:21:24 ERROR [quantum.agent.linux.ovs_lib] Unable to
>>> execute ['ovs-vsctl', '--timeout=2', 'add-port', 'br-tun', 'gre-5'].
>>> Exception:
>>> Command: ['sudo', 'quantum-rootwrap', '/etc/quantum/rootwrap.conf',
>>> 'ovs-vsctl', '--timeout=2', 'add-port', 'br-tun', 'gre-5']
>>> Exit code: 1
>>> Stdout: ''
>>> Stderr: 'ovs-vsctl: cannot create a port named gre-5 because a port
>>> named gre-5 already exists on bridge br-tun\n'
>>>
>>> Could that be because grep-1 is vestigial and possibly fouling up the
>>> works by creating two possible paths for VM traffic?
>>>
>>> Is it as simple as removing it with ovs-vsctl or is something else required?
>>>
>>> Or is this actually needed for some reason? Argh... help!