[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Status of Operational issues with Tiny Fragments in IPv6



Fred Baker wrote:
On May 26, 2006, at 1:21 AM, Elwyn Davies wrote:

I would suggest that rather than making this into a separate draft we can look at improving the text in the security overview (if necessary) given that we are probably going to have to do *another* round on this one.


That seems rational to me.

Me too. As Pekka said any normative changes to the IPv6 protocol would probably have to go elsewhere anyway.


A quick comment:

- no overlapping fragments and

Overlapping fragments is the main problem IMO. I would have liked this to be banned so that implementations can just drop such packets. Of course some hosts might still accept them, and e.g. a firewall would need to keep some state to detect overlap.

- non-last fragments to be close to the guaranteed minimum MTU.


I think the latter point wants to read "non-last fragments close to the PATH MTU, and therefore at least as large as the largest fragment size less than or equal in size to the minimum MTU."

The draft doesn't talk about overlap, but talks about this latter point. Unless overlap is forbidden, a minimum size alone is not sufficient. You might have an initial fragment containing most of the datagram (of sufficient size), and then have a second overlapping one that overwrites parts of the header.

Whatever minimum size we set, one may add so many extension headers that e.g. the transport header does not go into the first fragment anyway.

If a minimum fragment size is specified, then I think it should be about half the minimum MTU. Rather than using the MTU for all fragments but the last, it might make sense to distribute the data evenly so that all fragments are roughly the same size. See below.

There is a discussion of fragmentation and reassembly in RFC 1812 section 4.2.2.7 that may be useful to reference, or at least learn lessons from. In part, this results from the behavior of NIC cards in the late 1980's that couldn't reliably receive datagrams back to back for very long due to chip or memory issues, and partly this is is due to brain-dead behaviors in early end-station OS's. One issue was that many systems sent packets at the minimum MTU size (then 576 bytes, derived from the memory structure of the Fuzzball and BBN Routers) rather than at the path MTU, which meant that there were greater opportunities for per-datagram errors because there were more of them. There were also various strategies on how to fragment - should the first datagram be the smallest, the last, or should the fragmenting system try to make them all approximately the same size? It makes some specific recommendations:

4.2.2.7 Fragmentation: RFC 791 Section 3.2

   Fragmentation, as described in [INTERNET:1], MUST be supported by a
   router.

   When a router fragments an IP datagram, it SHOULD minimize the  number
   of fragments.  When a router fragments an IP datagram, it SHOULD  send
   the fragments in order.  A fragmentation method that may  generate one
   IP fragment that is significantly smaller than the other MAY cause
   the first IP fragment to be the smaller one.

   DISCUSSION
      There are several fragmentation techniques in common use in the
      Internet.  One involves splitting the IP datagram into IP
      fragments with the first being MTU sized, and the others being
      approximately the same size, smaller than the MTU.  The  reason for
      this is twofold.  The first IP fragment in the sequence will be
      the effective MTU of the current path between the hosts, and the
      following IP fragments are sized to minimize the further
      fragmentation of the IP datagram.  Another technique is to split
      the IP datagram into MTU sized IP fragments, with the last
      fragment being the only one smaller, as described in  [INTERNET:1].

      A common trick used by some implementations of TCP/IP is to
      fragment an IP datagram into IP fragments that are no larger  than
      576 bytes when the IP datagram is to travel through a router.
      This is intended to allow the resulting IP fragments to pass the
      rest of the path without further fragmentation.  This would,
      though, create more of a load on the destination host, since it
      would have a larger number of IP fragments to reassemble into  one
      IP datagram.  It would also not be efficient on networks  where the
      MTU only changes once and stays much larger than 576 bytes.
      Examples include LAN networks such as an IEEE 802.5 network  with a
      MTU of 2048 or an Ethernet network with an MTU of 1500).

      One other fragmentation technique discussed was splitting the IP
      datagram into approximately equal sized IP fragments, with the
      size less than or equal to the next hop network's MTU.  This is
      intended to minimize the number of fragments that would result
      from additional fragmentation further down the path, and assure
      equal delay for each fragment.

This latter point is what I'm refering to above.

Also note that as mentioned above, some implementations send fragments out of order (e.g. Linux has been known to do this). The reason is that you don't know the total size of the datagram until you receive the last fragment. Receiving the last fragment first means that you can allocate the right amount of memory immediately.

Stig


      Routers SHOULD generate the least possible number of IP  fragments.

      Work with slow machines leads us to believe that if it is
      necessary to fragment messages, sending the small IP fragment
      first maximizes the chance of a host with a slow interface of
      receiving all the fragments.


I think the NIC card issue (where should the smallest fragment be?) is historical and not to be worried about. But the matter of generating (by whatever algorithm) the least number of fragments that can represent a transport-level message is, I think, important.