[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RRG] RE: Sprite & IPTM while PMTU probing is in progress



Robin,

Please note that the name of the proposal is "sprite-mtu";
not "sprite". See below for some responses:

> -----Original Message-----
> From: Robin Whittle [mailto:rw@firstpr.com.au] 
> Sent: Tuesday, November 27, 2007 6:09 PM
> To: Routing Research Group list
> Cc: Templin, Fred L
> Subject: Sprite & IPTM while PMTU probing is in progress
> 
> Hi Fred,
> 
> Here is a diagrammatic explanation of how IPTM would works, and a
> question about how Sprite would work.  I haven't been able to
> understand your ID clearly enough to answer this myself.
> 
>   http://tools.ietf.org/html/draft-templin-inetmtu-06
>   http://www.firstpr.com.au/ip/ivip/pmtud-frag/
> 
> IPTM only sends a PTB message to the SH after it has reliably
> ascertained the PMTU to the ETR - and then only when the SH sends it
> a packet which is of a length which would exceed that PMTU, once it
> was encapsulated.  Until then (with the first such "long" packet,
> and with any others which arrive while the ITR is probing) "long"
> outer packets are fragmented, whether or not the inner packet has
> its DF flag set, to be reassembled at the ETR before decapsulation.

On- and off-list discussions have explored the idea of
requiring a 2KB EMTU_R on all ETRs and accommodating all
1500-byte and smaller packets; even if fragmentation is
needed and the inner DF=1. This would uphold the "principle
of least surprise" to the SH, but the issue comes in
knowing the EMTU_R of the ETR and in avoiding excessive
fragmentation. 

Since the ITR cannot know the EMTU_R of the ETR a priori
unless there is some spec that says: "all ETRs MUST configure
an EMTU_R of at least X bytes", the ITR should not simply
fragment the outer packets (or, allow the network to fragment
them) since they could black-hole. Also, a burst of packets
requiring fragmentation that arrive before the path is probed
can cause fragment misassociations at the ETR which could
result in undetected corruption per RFC4963.

There are also other factors to consider, including that
the ITR may not have ultimate control over the setting of
the ip_id. And, the ETR may not be able to receive non-initial
fragments in the first place. (These factors can be mitigated
by the placement of the ITR and ETR in some use cases, however.)

> A "long" inner packet is one which, once encapsulated, would exceed
> the ITR's current best estimate of the PMTU.  This would initially
> be a default such as 1280 bytes.
>
> If this default value is above the
> minimum for the protocol, eg. 576 for IPv4, then this value of PMTU
> to the "core" of the net must be available to every ITR and ETR and
> would be part of the specification of the ITR-ETR scheme.

The pathMTU cannot be known a priori; all that can be known
a priori is the EMTU_R of the ETR if there is a specification
for the minimum size.

>   (Did you mean "1280" instead of "1500" in:
>    "For IPv6, the minimum EMTU_R is 1500 bytes ..."
>    http://psg.com/lists/rrg/2007/msg00623.html ?)

1500 per RFC2560, Section 5.
 
> The default value would be replaced by a higher value once the probe
> process was complete.  The pattern would be something like this,
> assuming the SH's initial idea of PMTU to the Destination Host was
> 1500.  I will assume an ENCAPS overhead of 20 bytes (as with IPv4
> Ivip, though other ITR-ETR schemes have higher overheads) and that
> all ITRs and ETRs are located so they have an MTU of at least 1280
> from the DFZ.

Do you mean to say "pathMTU" or "linkMTU"? In terms of
"pathMTU", do we need to consider links with configurable
linkMTUs that might have either mis-configured or overly-
conservative values? 
 
>    Inner      ITR action on           SH's idea  DH gets
>    packet     outer packet            of PMTU
>    length     following encapsul-     to DH
>               ation of inner packet
> 
>                                       1500
> 
> 
>    200       Send outer packet -      1500       The packet
>              the length is less than
>              1280.
> 
>   1500       Fragment packet and      1500       The packet
>              commence probing                    (Less efficient
>                                                  and more error-
>                                                  prone tunnel with
>                                                  2 packets instead
>                                                  of 1, but this is
>                                                  only for a few
>                                                  seconds, I hope.) 
>   1500       Fragment packet and      1500       The packet
>              continue probing
> 
>              ... etc.

This could black hole and appear as congestion-related
loss to the SH.
 
>              Probing complete:
>              PMTU to ETR decided
>              to be 1460.
> 
>   1400       Send outer packet -      1500       The packet
>              the length is <= 1460.
>              (This length would not
>              necessarily be sent - it
>              is just to show that the
>              ITR will now send longer
>              packets without frag-
>              mentation than before.)
> 
>   1500       Drop the packet and send
>              the SH a PTB message
>              with value 1440.         1440       Nothing, but the
>                                                  ITR is usually
>                                                  close to the SH,
>                                                  and it doesn't
>                                                  take long for...

Depending on the placement of the ITR, this PTB might not
make it back to the SH.
 
> 
>   1440      Send the outer packet -  1440        The packet
>             the length is <= 1460                (Now the tunnel
>                                                  is handling
>                                                  optimal length
>                                                  packets.)
>
> This pattern would continue unless the ITR, with periodic probing,
> decides that the PMTU is less than 1460 (it might do this quickly if
> it got a PTB message from a router in a new, more MTU-challenged,
> path to the ETR), and if the SH sends a packet which would be too

There is not strictly any periodic probing needed to detect
pathMTU reductions, since the data packets serve as virtual
probes. The data packets will be lost and might be considered
by the SH as congestion-related loss if the PTB can't be
translated by the ITR and sent back to the SH. But, the ITR
will be able to return the correct PTB when the SH retransmits.
This is the same as for sprite-mtu. 
 
> big for the new lower value of PMTU.  Then the ITR would send
> another another PTB message to the SH, with a lower value than 1440.
> 
> Alternatively, occasional probing by the ITR might discover a higher
> value of PMTU to this ETR, and the SH could discover this increase
> by trying its luck with a larger packet - and either having it
> accepted, or rejected with a PTB containing the new higher value,
> minus 20.

SHs that don't implement RFC4821 will have to wait for
a long time before trying a larger packet (RFCs 1191 and
1981 say 10min, I believe). SHs that implement RFC4821
can retry end-to-end probing more frequently than that,
since loss of a probe does not expose data to silent loss.  

> Sprite, as I understand it for IPv4 ...
> 
>   (Fred, the "Section 5.5.x" numbers in Figure 2 should be "5.6.x"
>    except for 5.5.6, which should be 5.6.5.  Other references
>    in the ID to "5.5.4" etc. may also need correction.)

I'll fix these editorials.
 
> ... actually, I don't understand it clearly enough to describe it.
> 
> I think Sprite will fragment outer packets, as does IPTM,
> irrespective of the DF flag of the inner packet.

That is true.

> The criteria for
> which IPv4 outer packets are fragmentable is complex (5.6.4).

That text borrows from the tunnel MTU and fragmentation
discipline set forth in RFC4213, Section 3.2. But, I don't
know what you mean by complex? 
 
> I am not sure how Sprite handles "large" outer packets while it is
> probing.  Does it fragment them as IPTM does?  Or does it do the
> following, which is the same as what it does for a "long" packet
> once the PMTU has been reliably ascertained:
> 
>   (5.6.5) "... admits the packet but also sends a PTB message ..."

It does the latter; this borrows from RFC2003, Section 5. 
 
> It seems strange to me to send the packet (unfragmented, I assume)
> while also sending back a PTB message to the sending host.  Wouldn't
> this cause needless traffic and/or confusing signals to the SH if
> the outer packet does in fact arrive at the ETR and therefore the
> inner packet is delivered to the destination host?

To the SH, it would appear that there is a router on the
path returning inaccurate information. This can happen
already today, since routers can be misconfigured, and
spoofed PTBs can be sent from any node in the network.

SHs that implement RFC4821 should not have a problem
deconflicting the (suspect) PTB information from (authentic)
end-to-end feedback from the DH, but should benefit from the
PTB info when the actual data is not delivered to the DH.

ITRs can help the situation by sending sprites of, e.g.,
1500 bytes into the tunnel early in the process so that
most if not all SHs that use the tunnel will see a 1500
byte or larger MTU. 
 
> Here I will assume IPv4 only, with 1280 bytes for the default PMTU
> for every ETR the ITR has not yet probed.  I will also assume an
> encapsulation overhead of 20, although this would typically be
> higher for Sprite and non-Ivip ITR-ETR schemes.

I don't understand "higher for sprite-mtu"? 
 
> If the ITR sends a PTB message to the SH when the first packet (or
> multiple packets) length exceeds the default PMTU value and then,
> after probing, decides the PMTU is 1480, then I am concerned that
> the SH would get contradictory values in these PTB messages.
> 
> At first the SH would be told to send packets no longer than (1280 -
> 20 = 1260) and later, it would be told to send packets no longer
> than (1480 - 20 = 1460).

Note: in the next draft version I would like to rewrite
the second bullet of Section 5.6.4 as:

   o  for IPv4/*/IPv4 tunnels, 'pathMTU' is less than
      MIN(EMTU_R, 1280+ENCAPS) bytes and the inner IPv4
      packet is no larger than MIN(EMTU_R-ENCAPS, 1280).
 
> But if the SH took notice of the first PTB message, it probably
> wouldn't send any longer packets which would trigger the second.
> So, if my understanding of Sprite is correct, the SH would
> experience something like this:
> 
>    Inner      ITR action on           SH's idea  DH gets
>    packet     outer packet            of PMTU
>    length     following encapsul-     to DH
>               ation of inner packet
> 
>                                       1500
> 
> 
>    200       Send outer packet -      1500       The packet
>              the length is less than
>              1280.

Probing can start here also.
 
>   1500       Send packet and                     Probably nothing
>              commence probing                    - however, if the
>              Send PTB with value                 packet was a little
>              1260                     1260       shorter, such as
>                                                  1440, then it may
>                                                  arrive at the ETR
>                                                  in one piece,
>                                                  despite the SH
>                                                  being told it was
>                                                  too big.

With the note above, the returned size would be 1280;
not 1260. Also, I'm not sure as to the "probably nothing"
as linkMTUs increase above 1500 (perhaps someone could
send the IEEE reference that proposes the increase for
802 linkMTUs).

>              SH breaks the message
>              into smaller packets
>              and retries:
> 
>   1260       Send packet and          1260       The packet
>              continue probing
> 
>              ... etc.
> 
>              Probing complete:
>              PMTU to ETR decided
>              to be 1460.

By probing, do you mean by the ITR or by the SH? I am
assuming that SHs will begin using RFC4821 and will probe
the path for themselves independent of any probing done
by the ITR.

>   1260       Send outer packet -      1260       The packet
>              the length is <= 1460.
> 
>   1260       SH would probably keep              The packet -
>              sending packets of                  but more and
>              length <= 1260.  Unless             shorter
>               the SH was pushy, it               packets than
>              would never discover the            the ITR-ETR
>              PMTU it could use was               tunnel can
>              in fact 1440.                       handle.

IMHO, SHs that use RFC4821 can be "pushy" within reason.

I would like to add one other note about the 1280. That
number comes from the SHOULD in RFC4213, Section 3.2.1.
The reason I am taking the SHOULD is that there can be
additional encapsulations on the path between the ITR
and ETR (e.g., tunnel-mode IPsec) and we don't want to
cause fragmentation for those, either. If the ITR and
ETR are arranged such that there will be no additional
encapsulations on the path (and the ITR has a way of
knowing this) then the spec could use the RFC4213,
Section 3.2.1 "configuration knob" to push the 1280 up
to as much as 1480 or perhaps even 1500. I would not
want to go any higher than this, since it could involve
excessive fragmentation resulting in undetected data
corruption. Maybe I should add something about this
to the spec?

 
Fred
fred.l.templin@boeing.com

--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg