← Back to team overview

p2psp team mailing list archive

Re: NAT Traversal Set of rules implementation

 

Hi Vicente,

sorry for the long delay; I thought I would gather more results and
write them together.

This week I did a lot of further tests, and finally determined the
behaviour with SYMPP<->PRCN and SYMPP<->SYMPP connections (by tracking
every packet with iptables logging):
The SYMPP NAT (as specified by the iptables rules) sends an ICMP
Destination Unreachable packet if there is no NAT entry assigned to this
port. Apparently the PRCN and SYMPP NATs immediately close the NAT entry
if such an ICMP packet is received, and somehow remember the
address;port tuple as "not responding", so that further connection
attempts will fail. This has the following consequences:
In the SYMPP<->SYMPP scenario, the connection only works if the two
peers send the first packet to each other "simultaneously" (*).
In the SYMPP<->PRCN scenario, the connection works if the two peers send
the first packet to each other "simultaneously" (*), or if the peer
behind the SYMPP router sends its first packet before the other peer.

(*) Supposed that peer 1 sends its first packet to peer 2,
"simultaneously" here means that the NAT entry at peer 2 has to be
created before the packet from peer 1 arrives, or the packet from peer 2
has to arrive at peer 1 before the ICMP packet, in order to mark the NAT
entry at peer 1 as assured.
The time difference between the first packets from each peer is
determined by the packet jitter (delay variation) of the splitter plus
the difference in delay between the two peers. This time difference has
to be less than the sum of both peers' delay, in order to successfully
connect. Apparently this applies in nearly all cases, just not in the
simulation where the jitter (~5ms) is higher than the delay¹ (~3ms).
So I ran the tests again with additional delay between the peers (via
the tc command and the netem module [1]). With a delay at the peers of
4ms and above, the connection could be reliably established.

To test real networking scenarios, I combined different delay and jitter
values with a specific rate of packet loss, and modified the test script
to run a specific number of test runs and output the percentage of
successful runs. Please note the results attached below.


On 27.07.2015 09:24, Vicente Gonzalez wrote:
>
>     For testing, we theoretically could just take the results from the
>     PRCN NAT, as the only difference between PRCN and SYMPP is the
>     local source port at the peer, which does not make a difference at
>     all. Though it would be great to know if this NAT behaviour
>     actually exists in real NAT implementations, and if this has to be
>     addressed by sending hello packets more often or in a burst-like
>     style (e.g. send 100 packets with 20ms between each, and then
>     pause for 1 second).
>
> To find out more information about this issue, the only thing that we
> can do is to test with different (real) NATs, and this is something
> that I would do at the testing time (when the development stage be
> finised). Therefore, for now, implement only the code neccesary to
> solve the problem in iptables.
I tried the sending of packets in bursts, and sending packets in
different intervals so that after some time two packets from both peers
are sent simultaneously (with <1ms accuracy), but as noted above,
apparently for this NAT type only the first packet counts.
Testing on real NATs will be quite interesting. :)

>     Another thing I noticed is that the sequentially allocating NAT
>     type cannot be simulated by iptables rules. I thought that the
>     rules currently stated in the nts_doc branch would work like this,
>     but apparently the source port is only increased if a completely
>     new socket is used for the connection to another peer. So now we
>     have a few options to solve this:
>     1. find another possibility to simulate this NAT behaviour (e.g. a
>     specific router distribution running on the virtual machine); or I
>     could try to alter the iptables NAT code and build a kernel module
>     reflecting the wanted behaviour (iptables code seems not too
>     complicated, though I do not know if this is possible at all)
>     2. change the peer code (just for testing) to somehow force the
>     allocation of the next port number (e.g. bind to another port or
>     something similar)
>     3. test this situation on a real NAT with this behaviour: do you
>     have such a NAT or do you know where this could exist?
>     4. do not test this NAT type, as the NAT type detection and source
>     port prediction is somehow trivial and easy to verify by reading
>     the code
>     What do you think is the best step to take?
>
>
> I imagine that although this kind of NAT there exists, we can not
> expect that we can find a "pure"  sequentially allocating NAT
> behaviour in real contexts, just because the use of this type of NAT
> in concurrency with many other users probably will produce that the
> NAT does not work in such way.
>
> I would select the choice 4 (at least at this moment, maybe in the
> testing stage we can do something different if we found a real NAT
> with this behaviour).
Ok. I added a simple port prediction algorithm to the NTS code, you can
have a look at the changes here [2]. Currently it determines if the
difference between the source ports towards the splitter and the monitor
is <10, and then takes this difference as the step for the port
prediction. Other possibilities would be assuming a constant step of 1,
or to let the peers send packets to all
port,port+1,...,port+port_difference possibilities. Another approach
would be to start another listening socket at the monitor and gather
another source port difference, to detect the port allocation type more
accurately.
Which approach would you suggest? How do you think about sending ~10
packets per second to each peer in the connection attempt phase, is this
too much?

Some NATs might be destination port-insensitive when allocating source
ports, which could lead to a wrong NAT type detection (port preservation
instead of sequential allocation) if splitter and monitor are on the
same host. The only option here would be using either public STUN
servers or having trusted peers with predictable NATs, to determine the
source port difference for different destination addresses.


A NAT type combination (marked as "(yes)" in the tables in previous
emails) that does not work yet is a SYMSP router at an existing peer and
a port-restrictive NAT (any type except FCN and RCN) at the arriving
peer: To handle this situation, the existing peer has to send UDP
packets to a new port at the monitor or at the splitter, to get the
currently allocated source port of the existing peer and predict its
next port. What do you think about this?


I updated the task list for the last two weeks and the todo list [3].
After completing the NAT detection and traversal implementation, I will
finish the documentation of the implemented techniques and test the
software on as much different NAT types as possible.

Thanks,
Max



[1] http://www.linuxfoundation.org/collaborate/workgroups/networking/netem
[2]
https://github.com/jellysheep/p2psp/commit/57c0294e762fefc4a190b2aa2b465f0ec6b870c2
[3]
https://github.com/jellysheep/p2psp/wiki/GSoC-2015:-NAT-traversal-using-UDP-hole-punching---Timeline

¹ To solve those rare cases, I tried to detect such NAT behaviour by
connecting and sending packets to the NAT and detecting if the NAT
replies with an ICMP packet, but the software would have to use raw
sockets (and therefore need to have higher priviledges). So the only
option would be for the SYMPP<->PRCN case that peer 1 sends its first
packet before peer 2 to a temporary port number, then if that worked it
sends its first packet before peer 2 to the "real" port number used for
P2PSP packets, or if it did not work then peer 2 sends its first packet
before peer 1.



Test results in percent, 20 test runs each, for Splitter_NTS,
Monitor_NTS, Peer_NTS (branch nts, commit 8b28afa):

Configuration resembling my network at home (35ms delay ±5ms jitter at
the peers, 3ms ±2ms jitter at the splitter, 1% packet loss at each host):

    Peer1\2 | rcn   | prcn  | sympp | symrp
    ========================================
    rcn     | 95    | 100   | 100   | 95
    prcn    | 100   | 100   | 90    | 0
    sympp   | 100   | 100   | 85    | 0
    symrp   | 100   | 0     | 0     | 0

Hard networking conditions (15ms delay ±5ms jitter at the peers, 40ms
±20ms jitter at the splitter, 10% packet loss at each host):

    Peer1\2 | rcn   | prcn  | sympp | symrp
    ========================================
    rcn     | 95    | 65    | 75    | 90
    prcn    | 70    | 85    | 40    | 0
    sympp   | 75    | 55    | 30    | 0
    symrp   | 90    | 0     | 0     | 0

Extremely hard networking conditions, just out of interest; this
actually was a bit like fuzz testing, and I detected a bug not appearing
at normal networking conditions, where I assumed the peer list to not be
empty (10ms delay ±5ms jitter at the peers, 60ms ±30ms jitter at the
splitter, 30% packet loss at each host):

    Peer1\2 | rcn   | prcn  | sympp | symrp
    ========================================
    rcn     | 5     | 5     | 5     | 0
    prcn    | 0     | 0     | 0     | 0
    sympp   | 0     | 0     | 0     | 0
    symrp   | 0     | 0     | 0     | 0



Follow ups

References