[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RRG] MTU/fragmentation AGAIN



Sorry about that.

As I said a few days ago, Fred's fragmentation model makes a lot of sense if we want to allow fragmentation in the first place. Let's see where we stand if we don't want to allow fragmentation.

If the path between the ITR and ETR supports 1500 bytes + encapsulation there shouldn't be any issues with path MTU discovery that aren't there without the encapsulation, too. (I'm using LISP terminology but this applies to all map/encap schemes.)

There are two possible other cases:

- the path doesn't support 1500+E and the ITR knows this
- the path doesn't support 1500+E but the ITR doesn't know this

In the first case, the ITR can simply return a too big message, and hosts/sites implementing PMTUD correctly won't have a problem. Hosts/ sites that don't will experience a PMTUD black hole. I'll get back to this.

In the latter case, the ITR would have to do PMTUD towards the ETR and after that send too bigs based on that result to the source hosts that send packets through the ITR. This is unwanted complexity for the ITR and it means that a relatively high number of packets that flow through the ITR could be dropped during this process. As such, I'd say that it's probably unacceptable to design a network such that this situation is common. I.e., having an ITR on network A that supports 1500+E and an ETR on network B that supports 1500+E but then have the interconnection between these networks happen over an internet exchange with a 1500-byte MTU would trigger PMTUD and lost packets on ALL sessions between these ISPs, which I wouldn't find acceptable. In other words: internet exchanges used between two 1500+E networks that do map/encap must be upgraded to support at least 1500+E.

The reason why PMTUD works so badly today is most likely because it doesn't have to: all hosts connected to the internet that advertise TCP MSS of 1500 - overhead can successfully receive 1500-byte IPv4 packets with DF=1. However, if a site wants to deploy an ITR and/or ETR, and suddenly, PMTUD black holes happen, the site has a very strong incentive to make the changes necessary to make those black holes go away. In the case of an ITR, this means making sure the source host receives the too big messages. In the case of an ETR, this means announcing a smaller TCP MSS.

So I'd say that SITES should be able to deploy xTRs even though they can't support a 1500+E MTU. (There needs to be a way to communicate the ETR MTU limiation back to ITRs, though.) However, the same is not true for ISPs: in that case, the ISP operators create the problem, but site operators / host admins (for a large number of sites) need to fix the problem. This is something that isn't going to work in practice. So ISPs MUST support 1500+E both on their xTRs and on peering links to other ISPs that run xTRs.

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg