[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RRG] Re: [RAM] Tunneling overheads and fragmentation



On the RAM list, Iljitsch van Beijnum wrote:

> [caught up, yay!]

I am moving house and haven't been able to participate in
discussions as I would like.

> On 21-jul-2007, at 16:40, Robin Whittle wrote:
> 
>> Even then, I fear that in order to preserve both reachability and
>> efficiency (and any reachability problems which arise from
>> fragmentation), that hosts in all networks, including non-upgraded
>> networks, will need to adopt a somewhat lower MTU setting - for
>> all the packets they send.
> 
> That would assume that there is something that doesn't support the
> "regular" MTU (in theory, there is no such thing, in practice: 1500
> bytes) between the place the packets are encapsulated and the place the
> packets are decapsulated.

My concern is that a host sends packets assuming a 1500 byte MTU but
that the encapsulation process (as with LISP, eFIT-APT or TRRP)
makes the packet longer, requiring fragmentation in the network (I
am discussing IPv4 only here).

I think this could be solved by all hosts adopting a smaller MTU
value, so the longest packets they send will not require
fragmentation in the event they are encapsulated by Ivip, TRRP, LISP
etc.

This would be a loss of efficiency for all packets, unless the host
could somehow figure out that the packet was or was not going to be
encapsulated, and so choose the highest MTU value which will avoid
the need for fragmentation.


> I'm still operating under the assumption that both those places are in
> ISP networks. Now obviously there are lots of places in ISP networks
> that only support 1500-byte packets, but what would be a better decision
> here: push out a reduced packet size EVERYWHERE which we probably won't
> be able to raise any time soon, or require ISPs to either:
> 
> 1. Implement path MTU discovery correctly, or:
> 2. Make sure that all encapsulated packets travel over paths that
> support at least 1500 byte + encapsulation sized packets

All these options look impossible or ugly to me.


>> I wonder to what extent every possible application respects the
>> operating system's MTU.
> 
> The two are unrelated. Applications talk to transport protocols. The
> most popular one, TCP, breaks large chunks of data into smaller segments
> and coalesces multiple small chunks into larger segments. UDP simply
> adds its header and hands over the packet to the IP layer. The IP layer
> will fragment packets that are too large to be sent out over the
> interface of choice and/or the packet's destination. The only time there
> are problems (except of course from firewalls that don't like
> fragmentation) is when the transport protocol or application doesn't
> want fragmentation (DF=1) but the packet is larger than the interface
> MTU. Not sure what happens then, except that there is no way that a
> packet larger than the interface MTU is sent in one piece.

If we don't mind fragmentation, then there is no problem with IPv4
and the various proposals (LISP, Ivip, eFIT-APT and TRRP etc.).
However, fragmentation involves tremendous inefficiency, lower
reliability etc.  I think we should go to a lot of trouble to avoid it.



>> I wonder if there are any widely
>> used applications, such as games, P2P programs etc. which are
>> hard-coded to assume a certain MTU which is close to, or right at,
>> the limit of what can safely be sent across most of the Net.
> 
> Mostly video streaming, although that's all quickly moving to TCP these
> days. 

Can you cite some example of streaming applications using TCP?

> . . . 

> By the way, I'm currently working on this:
> 
> http://www.ietf.org/internet-drafts/draft-van-beijnum-multi-mtu-01.txt
> 
> It doesn't directly address this issue but it allows for systems with
> different MTUs to coexist on the same subnet so it becomes a lot easier
> to deploy jumboframes.

I will look at this.

>> This really needs to be done for all hosts in all networks - not
>> just hosts in networks which have been upgraded with ITRs and ETRs
>> etc.
> 
> Oh joy.

Indeed . . .

> Doing path MTU discovery is probably easier, and note that if one host
> in a TCP session has a reduced MTU, it will let the other know during
> the three-way handshake so the unencumbered host won't send packets that
> are too large.

The trouble is that these proposals (Ivip etc.) all involve
tunneling and it is not clear to me how to support PMTUD in the
tunneled portion of the path without efforts which I think are
unsustainable.

For those proposals (all other than Ivip) where the tunneled packet
has an outer header Source Address (SA) which is that of the ITR,
full support of PMTUD would require extreme efforts by the ITR,
caching each recently sent packet, to match ICMP packets coming back
from the tunneled section of the path to the original packet which
was received, so that a new ICMP packet can be sent back to the
originating host in a manner which will be recognised by that host.
 (Actually, it only needs to cache those packets which Don't
Fragment set - it is not as bad as having to do this for all
encapsulated packets.  But the work to match the ICMP message to the
original packet is still extremely onerous, if security against
spoofed ICMP messages is to be retained.)

Ivip uses the sending host's SA in the outer IP header of the
tunneled packet - so the ITR is not involved in handling ICMP
messages.  However, a properly written host PMTUD should ignore the
ICMP packets which come back from tunneled areas of the path, since
they would contain some header details (the destination address
being that of the ETR, not the final destination host) which do not
match those of the packet the sending host sent.


>> Even if the host had its own ITFH
>> (Ingress Tunnel Function in Host) function, I don't see how the
>> operating system could tell application programs that there is one
>> MTU and MSS setting for packets going to some addresses and
>> another setting for packets going to other addresses.
> 
> Not a problem, path MTU discovery already does this today.

This is an area I know virtually nothing about.  I understand that
you mean something that the IP layer tells the UDP, or TCP layer,
what the MTU is for the given destination address.  I have never
looked at the socket programming - but I should.  This might be a
place to start: http://beej.us/guide/bgnet/



> However, I see issues with hosts doing their own en/decapsulation: that
> way, the locators are exposed to hosts and they are at risk for becoming
> just as unrenumberable as IP addresses today.

The ITR function in a host would ideally be within the TCP/IP
operating system software, not implemented by each application
program.  This function would be like a caching ITR, so it will use
an upstream Query Server QSD to get mapping information.  Ivip
involves those QSDs getting the full mapping data, and sending
specific messages to all queriers (Query Server Cache - QSCs - and
ITRCs and ITFHs) which recently asked about mapping for an IP
address for which the mapping information has just changed.



> The only way to make sure that you can renumber easily is if there are
> no firewall rules looking at locators. And the only way that will happen
> is if the relationship between location and identity is strong enough
> that spoofing it is not a viable attack vector. Concretely: today the
> routing system is unspoofable enough that people filter on IP addresses.
> The routing system is run by service providers who generally aren't in
> the business of attacking people. But if a locator mapping system is
> also open to end-users, it may be possible for attacks to use this path
> and people won't be happy to filter on just identifiers, just like they
> aren't happy to filter on just DNS names today. So either the ISPs must
> run it or there must be a heavy layer of magic security dust.

I will reread this and try to understand what you are saying when I
have more time.


>> UDP encapsulation, as used by LISP and I think eFIT-APT, involves
>> 20 bytes for the IP header, 8 bytes for the UDP header and some
>> number of bytes, such as 4, for extra stuff
> 
> What exactly was the reason for UDP encapsulation again? 

To tunnel the packet from an ITR to an ETR.

The alternative is to simply rewrite the destination address - but
then how does the ETR know where to send the packet?

> I think Dino
> said something about load balancing and firewalling. I'm not buying
> that: you can load balance on the destination IP address and anyone
> running firewalls in the middle of the routing system is best served
> with a single deny any any rule.
> 
>> This is where IPv6's long addresses and headers become really
>> ugly. There would be 40 bytes for IP-in-IP and 52 for basic UDP
>> encapsulation.
> 
> We really need bigger packets to offset the ever-increasing overhead.

That would help for larger packets - but the efficiency problem of
the tunneling overhead is quite minor for long packets.

The overhead is a big problem, I think, for short packets in terms
of packet length and therefore the time they occupy on the "wire" or
fibre.  In addition, there is the problem of the routers having to
crunch through 4 to 6 bytes of destination address bits just to
forward the packet.

As I have written elsewhere, every IPv6 packet involves each DFZ
router looking at up to 48 bits of address in order to forward the
packet.  This is 6 bytes of intensive, very fast, processing at each
router.

If there are 15 routers in the path, they are collectively
processing, with extremely expensive FIB hardware, 90 bytes of
address data just to deliver the little VoIP packet with 20 bytes of
data.  This is happening 50 times a second in both directions, for
the supposedly lightweight, inexpensive, voice call.  In each
direction, the routers are crunching 36kbps of address bits - just
for a lousy VoIP call!

IPv4 currently gets along with a maximum of 24 bits being crunched
by each DFZ router.

 - Robin


--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg