gogo6 linux

RFC 792 – Internet Control Message Protocol


Network Working Group                                          J. Postel
Request for Comments:  792                                           ISI
                                                          September 1981
Updates:  RFCs 777, 760
Updates:  IENs 109, 128

                   INTERNET CONTROL MESSAGE PROTOCOL

                         DARPA INTERNET PROGRAM
                         PROTOCOL SPECIFICATION

Introduction

   The Internet Protocol (IP) [1] is used for host-to-host datagram
   service in a system of interconnected networks called the
   Catenet [2].  The network connecting devices are called Gateways.
   These gateways communicate between themselves for control purposes
   via a Gateway to Gateway Protocol (GGP) [3,4].  Occasionally a
   gateway or destination host will communicate with a source host, for
   example, to report an error in datagram processing.  For such
   purposes this protocol, the Internet Control Message Protocol (ICMP),
   is used.  ICMP, uses the basic support of IP as if it were a higher
   level protocol, however, ICMP is actually an integral part of IP, and
   must be implemented by every IP module.

   ICMP messages are sent in several situations:  for example, when a
   datagram cannot reach its destination, when the gateway does not have
   the buffering capacity to forward a datagram, and when the gateway
   can direct the host to send traffic on a shorter route.

   The Internet Protocol is not designed to be absolutely reliable.  The
   purpose of these control messages is to provide feedback about
   problems in the communication environment, not to make IP reliable.
   There are still no guarantees that a datagram will be delivered or a
   control message will be returned.  Some datagrams may still be
   undelivered without any report of their loss.  The higher level
   protocols that use IP must implement their own reliability procedures
   if reliable communication is required.

   The ICMP messages typically report errors in the processing of
   datagrams.  To avoid the infinite regress of messages about messages
   etc., no ICMP messages are sent about ICMP messages.  Also ICMP
   messages are only sent about errors in handling fragment zero of
   fragemented datagrams.  (Fragment zero has the fragment offeset equal
   zero).

                                                                [Page 1]

                                                          September 1981
RFC 792

Message Formats

   ICMP messages are sent using the basic IP header.  The first octet of
   the data portion of the datagram is a ICMP type field; the value of
   this field determines the format of the remaining data.  Any field
   labeled "unused" is reserved for later extensions and must be zero
   when sent, but receivers should not use these fields (except to
   include them in the checksum).  Unless otherwise noted under the
   individual format descriptions, the values of the internet header
   fields are as follows:

   Version

      4

   IHL

      Internet header length in 32-bit words.

   Type of Service

      0

   Total Length

      Length of internet header and data in octets.

   Identification, Flags, Fragment Offset

      Used in fragmentation, see [1].

   Time to Live

      Time to live in seconds; as this field is decremented at each
      machine in which the datagram is processed, the value in this
      field should be at least as great as the number of gateways which
      this datagram will traverse.

   Protocol

      ICMP = 1

   Header Checksum

      The 16 bit one's complement of the one's complement sum of all 16
      bit words in the header.  For computing the checksum, the checksum
      field should be zero.  This checksum may be replaced in the
      future.

[Page 2]                                                                

September 1981                                                          
RFC 792

   Source Address

      The address of the gateway or host that composes the ICMP message.
      Unless otherwise noted, this can be any of a gateway's addresses.

   Destination Address

      The address of the gateway or host to which the message should be
      sent.

                                                                [Page 3]

                                                          September 1981
RFC 792

Destination Unreachable Message

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |     Code      |          Checksum             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             unused                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Internet Header + 64 bits of Original Data Datagram      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   IP Fields:

   Destination Address

      The source network and address from the original datagram's data.

   ICMP Fields:

   Type

      3

   Code

      0 = net unreachable;

      1 = host unreachable;

      2 = protocol unreachable;

      3 = port unreachable;

      4 = fragmentation needed and DF set;

      5 = source route failed.

   Checksum

      The checksum is the 16-bit ones's complement of the one's
      complement sum of the ICMP message starting with the ICMP Type.
      For computing the checksum , the checksum field should be zero.
      This checksum may be replaced in the future.

   Internet Header + 64 bits of Data Datagram

      The internet header plus the first 64 bits of the original

[Page 4]                                                                

September 1981                                                          
RFC 792

      datagram's data.  This data is used by the host to match the
      message to the appropriate process.  If a higher level protocol
      uses port numbers, they are assumed to be in the first 64 data
      bits of the original datagram's data.

   Description

      If, according to the information in the gateway's routing tables,
      the network specified in the internet destination field of a
      datagram is unreachable, e.g., the distance to the network is
      infinity, the gateway may send a destination unreachable message
      to the internet source host of the datagram.  In addition, in some
      networks, the gateway may be able to determine if the internet
      destination host is unreachable.  Gateways in these networks may
      send destination unreachable messages to the source host when the
      destination host is unreachable.

      If, in the destination host, the IP module cannot deliver the
      datagram  because the indicated protocol module or process port is
      not active, the destination host may send a destination
      unreachable message to the source host.

      Another case is when a datagram must be fragmented to be forwarded
      by a gateway yet the Don't Fragment flag is on.  In this case the
      gateway must discard the datagram and may return a destination
      unreachable message.

      Codes 0, 1, 4, and 5 may be received from a gateway.  Codes 2 and
      3 may be received from a host.

                                                                [Page 5]

                                                          September 1981
RFC 792

Time Exceeded Message

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |     Code      |          Checksum             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             unused                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Internet Header + 64 bits of Original Data Datagram      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   IP Fields:

   Destination Address

      The source network and address from the original datagram's data.

   ICMP Fields:

   Type

      11

   Code

      0 = time to live exceeded in transit;

      1 = fragment reassembly time exceeded.

   Checksum

      The checksum is the 16-bit ones's complement of the one's
      complement sum of the ICMP message starting with the ICMP Type.
      For computing the checksum , the checksum field should be zero.
      This checksum may be replaced in the future.

   Internet Header + 64 bits of Data Datagram

      The internet header plus the first 64 bits of the original
      datagram's data.  This data is used by the host to match the
      message to the appropriate process.  If a higher level protocol
      uses port numbers, they are assumed to be in the first 64 data
      bits of the original datagram's data.

   Description

      If the gateway processing a datagram finds the time to live field

[Page 6]                                                                

September 1981                                                          
RFC 792

      is zero it must discard the datagram.  The gateway may also notify
      the source host via the time exceeded message.

      If a host reassembling a fragmented datagram cannot complete the
      reassembly due to missing fragments within its time limit it
      discards the datagram, and it may send a time exceeded message.

      If fragment zero is not available then no time exceeded need be
      sent at all.

      Code 0 may be received from a gateway.  Code 1 may be received
      from a host.

                                                                [Page 7]

                                                          September 1981
RFC 792

Parameter Problem Message

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |     Code      |          Checksum             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Pointer    |                   unused                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Internet Header + 64 bits of Original Data Datagram      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   IP Fields:

   Destination Address

      The source network and address from the original datagram's data.

   ICMP Fields:

   Type

      12

   Code

      0 = pointer indicates the error.

   Checksum

      The checksum is the 16-bit ones's complement of the one's
      complement sum of the ICMP message starting with the ICMP Type.
      For computing the checksum , the checksum field should be zero.
      This checksum may be replaced in the future.

   Pointer

      If code = 0, identifies the octet where an error was detected.

   Internet Header + 64 bits of Data Datagram

      The internet header plus the first 64 bits of the original
      datagram's data.  This data is used by the host to match the
      message to the appropriate process.  If a higher level protocol
      uses port numbers, they are assumed to be in the first 64 data
      bits of the original datagram's data.

[Page 8]                                                                

September 1981                                                          
RFC 792

   Description

      If the gateway or host processing a datagram finds a problem with
      the header parameters such that it cannot complete processing the
      datagram it must discard the datagram.  One potential source of
      such a problem is with incorrect arguments in an option.  The
      gateway or host may also notify the source host via the parameter
      problem message.  This message is only sent if the error caused
      the datagram to be discarded.

      The pointer identifies the octet of the original datagram's header
      where the error was detected (it may be in the middle of an
      option).  For example, 1 indicates something is wrong with the
      Type of Service, and (if there are options present) 20 indicates
      something is wrong with the type code of the first option.

      Code 0 may be received from a gateway or a host.

                                                                [Page 9]

                                                          September 1981
RFC 792

Source Quench Message

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |     Code      |          Checksum             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             unused                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Internet Header + 64 bits of Original Data Datagram      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   IP Fields:

   Destination Address

      The source network and address of the original datagram's data.

   ICMP Fields:

   Type

      4

   Code

      0

   Checksum

      The checksum is the 16-bit ones's complement of the one's
      complement sum of the ICMP message starting with the ICMP Type.
      For computing the checksum , the checksum field should be zero.
      This checksum may be replaced in the future.

   Internet Header + 64 bits of Data Datagram

      The internet header plus the first 64 bits of the original
      datagram's data.  This data is used by the host to match the
      message to the appropriate process.  If a higher level protocol
      uses port numbers, they are assumed to be in the first 64 data
      bits of the original datagram's data.

   Description

      A gateway may discard internet datagrams if it does not have the
      buffer space needed to queue the datagrams for output to the next
      network on the route to the destination network.  If a gateway

[Page 10]                                                               

September 1981                                                          
RFC 792

      discards a datagram, it may send a source quench message to the
      internet source host of the datagram.  A destination host may also
      send a source quench message if datagrams arrive too fast to be
      processed.  The source quench message is a request to the host to
      cut back the rate at which it is sending traffic to the internet
      destination.  The gateway may send a source quench message for
      every message that it discards.  On receipt of a source quench
      message, the source host should cut back the rate at which it is
      sending traffic to the specified destination until it no longer
      receives source quench messages from the gateway.  The source host
      can then gradually increase the rate at which it sends traffic to
      the destination until it again receives source quench messages.

      The gateway or host may send the source quench message when it
      approaches its capacity limit rather than waiting until the
      capacity is exceeded.  This means that the data datagram which
      triggered the source quench message may be delivered.

      Code 0 may be received from a gateway or a host.

                                                               [Page 11]

                                                          September 1981
RFC 792

Redirect Message

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |     Code      |          Checksum             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 Gateway Internet Address                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Internet Header + 64 bits of Original Data Datagram      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   IP Fields:

   Destination Address

      The source network and address of the original datagram's data.

   ICMP Fields:

   Type

      5

   Code

      0 = Redirect datagrams for the Network.

      1 = Redirect datagrams for the Host.

      2 = Redirect datagrams for the Type of Service and Network.

      3 = Redirect datagrams for the Type of Service and Host.

   Checksum

      The checksum is the 16-bit ones's complement of the one's
      complement sum of the ICMP message starting with the ICMP Type.
      For computing the checksum , the checksum field should be zero.
      This checksum may be replaced in the future.

   Gateway Internet Address

      Address of the gateway to which traffic for the network specified
      in the internet destination network field of the original
      datagram's data should be sent.

[Page 12]                                                               

September 1981                                                          
RFC 792

   Internet Header + 64 bits of Data Datagram

      The internet header plus the first 64 bits of the original
      datagram's data.  This data is used by the host to match the
      message to the appropriate process.  If a higher level protocol
      uses port numbers, they are assumed to be in the first 64 data
      bits of the original datagram's data.

   Description

      The gateway sends a redirect message to a host in the following
      situation.  A gateway, G1, receives an internet datagram from a
      host on a network to which the gateway is attached.  The gateway,
      G1, checks its routing table and obtains the address of the next
      gateway, G2, on the route to the datagram's internet destination
      network, X.  If G2 and the host identified by the internet source
      address of the datagram are on the same network, a redirect
      message is sent to the host.  The redirect message advises the
      host to send its traffic for network X directly to gateway G2 as
      this is a shorter path to the destination.  The gateway forwards
      the original datagram's data to its internet destination.

      For datagrams with the IP source route options and the gateway
      address in the destination address field, a redirect message is
      not sent even if there is a better route to the ultimate
      destination than the next address in the source route.

      Codes 0, 1, 2, and 3 may be received from a gateway.

                                                               [Page 13]

                                                          September 1981
RFC 792

Echo or Echo Reply Message

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |     Code      |          Checksum             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Identifier          |        Sequence Number        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Data ...
   +-+-+-+-+-

   IP Fields:

   Addresses

      The address of the source in an echo message will be the
      destination of the echo reply message.  To form an echo reply
      message, the source and destination addresses are simply reversed,
      the type code changed to 0, and the checksum recomputed.

   IP Fields:

   Type

      8 for echo message;

      0 for echo reply message.

   Code

      0

   Checksum

      The checksum is the 16-bit ones's complement of the one's
      complement sum of the ICMP message starting with the ICMP Type.
      For computing the checksum , the checksum field should be zero.
      If the total length is odd, the received data is padded with one
      octet of zeros for computing the checksum.  This checksum may be
      replaced in the future.

   Identifier

      If code = 0, an identifier to aid in matching echos and replies,
      may be zero.

   Sequence Number

[Page 14]                                                               

September 1981                                                          
RFC 792

      If code = 0, a sequence number to aid in matching echos and
      replies, may be zero.

   Description

      The data received in the echo message must be returned in the echo
      reply message.

      The identifier and sequence number may be used by the echo sender
      to aid in matching the replies with the echo requests.  For
      example, the identifier might be used like a port in TCP or UDP to
      identify a session, and the sequence number might be incremented
      on each echo request sent.  The echoer returns these same values
      in the echo reply.

      Code 0 may be received from a gateway or a host.

                                                               [Page 15]

                                                          September 1981
RFC 792

Timestamp or Timestamp Reply Message

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |      Code     |          Checksum             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Identifier          |        Sequence Number        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Originate Timestamp                                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Receive Timestamp                                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Transmit Timestamp                                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   IP Fields:

   Addresses

      The address of the source in a timestamp message will be the
      destination of the timestamp reply message.  To form a timestamp
      reply message, the source and destination addresses are simply
      reversed, the type code changed to 14, and the checksum
      recomputed.

   IP Fields:

   Type

      13 for timestamp message;

      14 for timestamp reply message.

   Code

      0

   Checksum

      The checksum is the 16-bit ones's complement of the one's
      complement sum of the ICMP message starting with the ICMP Type.
      For computing the checksum , the checksum field should be zero.
      This checksum may be replaced in the future.

   Identifier

[Page 16]                                                               

September 1981                                                          
RFC 792

      If code = 0, an identifier to aid in matching timestamp and
      replies, may be zero.

   Sequence Number

      If code = 0, a sequence number to aid in matching timestamp and
      replies, may be zero.

   Description

      The data received (a timestamp) in the message is returned in the
      reply together with an additional timestamp.  The timestamp is 32
      bits of milliseconds since midnight UT.  One use of these
      timestamps is described by Mills [5].

      The Originate Timestamp is the time the sender last touched the
      message before sending it, the Receive Timestamp is the time the
      echoer first touched it on receipt, and the Transmit Timestamp is
      the time the echoer last touched the message on sending it.

      If the time is not available in miliseconds or cannot be provided
      with respect to midnight UT then any time can be inserted in a
      timestamp provided the high order bit of the timestamp is also set
      to indicate this non-standard value.

      The identifier and sequence number may be used by the echo sender
      to aid in matching the replies with the requests.  For example,
      the identifier might be used like a port in TCP or UDP to identify
      a session, and the sequence number might be incremented on each
      request sent.  The destination returns these same values in the
      reply.

      Code 0 may be received from a gateway or a host.

                                                               [Page 17]

                                                          September 1981
RFC 792

Information Request or Information Reply Message

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |      Code     |          Checksum             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Identifier          |        Sequence Number        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   IP Fields:

   Addresses

      The address of the source in a information request message will be
      the destination of the information reply message.  To form a
      information reply message, the source and destination addresses
      are simply reversed, the type code changed to 16, and the checksum
      recomputed.

   IP Fields:

   Type

      15 for information request message;

      16 for information reply message.

   Code

      0

   Checksum

      The checksum is the 16-bit ones's complement of the one's
      complement sum of the ICMP message starting with the ICMP Type.
      For computing the checksum , the checksum field should be zero.
      This checksum may be replaced in the future.

   Identifier

      If code = 0, an identifier to aid in matching request and replies,
      may be zero.

   Sequence Number

      If code = 0, a sequence number to aid in matching request and
      replies, may be zero.

[Page 18]                                                               

September 1981                                                          
RFC 792

   Description

      This message may be sent with the source network in the IP header
      source and destination address fields zero (which means "this"
      network).  The replying IP module should send the reply with the
      addresses fully specified.  This message is a way for a host to
      find out the number of the network it is on.

      The identifier and sequence number may be used by the echo sender
      to aid in matching the replies with the requests.  For example,
      the identifier might be used like a port in TCP or UDP to identify
      a session, and the sequence number might be incremented on each
      request sent.  The destination returns these same values in the
      reply.

      Code 0 may be received from a gateway or a host.

                                                               [Page 19]

                                                          September 1981
RFC 792

Summary of Message Types

    0  Echo Reply

    3  Destination Unreachable

    4  Source Quench

    5  Redirect

    8  Echo

   11  Time Exceeded

   12  Parameter Problem

   13  Timestamp

   14  Timestamp Reply

   15  Information Request

   16  Information Reply

[Page 20]                                                               

September 1981                                                          
RFC 792

References

   [1]  Postel, J. (ed.), "Internet Protocol - DARPA Internet Program
         Protocol Specification," RFC 791, USC/Information Sciences
         Institute, September 1981.

   [2]   Cerf, V., "The Catenet Model for Internetworking," IEN 48,
         Information Processing Techniques Office, Defense Advanced
         Research Projects Agency, July 1978.

   [3]   Strazisar, V., "Gateway Routing:  An Implementation
         Specification", IEN 30, Bolt Beranek and Newman, April 1979.

   [4]   Strazisar, V., "How to Build a Gateway", IEN 109, Bolt Beranek
         and Newman, August 1979.

   [5]   Mills, D., "DCNET Internet Clock Service," RFC 778, COMSAT
         Laboratories, April 1981.

RFC 2373 – IP Version 6 Addressing Architecture

 
Network Working Group                                        R. Hinden
Request for Comments: 2373 Nokia
Obsoletes: 1884 S. Deering
Category: Standards Track Cisco Systems
July 1998

IP Version 6 Addressing Architecture

Status of this Memo

This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.

Copyright Notice

Copyright (C) The Internet Society (1998). All Rights Reserved.

Abstract

This specification defines the addressing architecture of the IP
Version 6 protocol [IPV6]. The document includes the IPv6 addressing
model, text representations of IPv6 addresses, definition of IPv6
unicast addresses, anycast addresses, and multicast addresses, and an
IPv6 node's required addresses.

Table of Contents

1. Introduction.................................................2
2. IPv6 Addressing..............................................2
2.1 Addressing Model.........................................3
2.2 Text Representation of Addresses.........................3
2.3 Text Representation of Address Prefixes..................5
2.4 Address Type Representation..............................6
2.5 Unicast Addresses........................................7
2.5.1 Interface Identifiers................................8
2.5.2 The Unspecified Address..............................9
2.5.3 The Loopback Address.................................9
2.5.4 IPv6 Addresses with Embedded IPv4 Addresses.........10
2.5.5 NSAP Addresses......................................10
2.5.6 IPX Addresses.......................................10
2.5.7 Aggregatable Global Unicast Addresses...............11
2.5.8 Local-use IPv6 Unicast Addresses....................11
2.6 Anycast Addresses.......................................12
2.6.1 Required Anycast Address............................13
2.7 Multicast Addresses.....................................14

2.7.1 Pre-Defined Multicast Addresses.....................15
2.7.2 Assignment of New IPv6 Multicast Addresses..........17
2.8 A Node's Required Addresses.............................17
3. Security Considerations.....................................18
APPENDIX A: Creating EUI-64 based Interface Identifiers........19
APPENDIX B: ABNF Description of Text Representations...........22
APPENDIX C: CHANGES FROM RFC-1884..............................23
REFERENCES.....................................................24
AUTHORS' ADDRESSES.............................................25
FULL COPYRIGHT STATEMENT.......................................26

1.0 INTRODUCTION

This specification defines the addressing architecture of the IP
Version 6 protocol. It includes a detailed description of the
currently defined address formats for IPv6 [IPV6].

The authors would like to acknowledge the contributions of Paul
Francis, Scott Bradner, Jim Bound, Brian Carpenter, Matt Crawford,
Deborah Estrin, Roger Fajman, Bob Fink, Peter Ford, Bob Gilligan,
Dimitry Haskin, Tom Harsch, Christian Huitema, Tony Li, Greg
Minshall, Thomas Narten, Erik Nordmark, Yakov Rekhter, Bill Simpson,
and Sue Thomson.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC 2119].

2.0 IPv6 ADDRESSING

IPv6 addresses are 128-bit identifiers for interfaces and sets of
interfaces. There are three types of addresses:

Unicast: An identifier for a single interface. A packet sent to
a unicast address is delivered to the interface
identified by that address.

Anycast: An identifier for a set of interfaces (typically
belonging to different nodes). A packet sent to an
anycast address is delivered to one of the interfaces
identified by that address (the "nearest" one, according
to the routing protocols' measure of distance).

Multicast: An identifier for a set of interfaces (typically
belonging to different nodes). A packet sent to a
multicast address is delivered to all interfaces
identified by that address.

There are no broadcast addresses in IPv6, their function being
superseded by multicast addresses.

In this document, fields in addresses are given a specific name, for
example "subscriber". When this name is used with the term "ID" for
identifier after the name (e.g., "subscriber ID"), it refers to the
contents of the named field. When it is used with the term "prefix"
(e.g. "subscriber prefix") it refers to all of the address up to and
including this field.

In IPv6, all zeros and all ones are legal values for any field,
unless specifically excluded. Specifically, prefixes may contain
zero-valued fields or end in zeros.

2.1 Addressing Model

IPv6 addresses of all types are assigned to interfaces, not nodes.
An IPv6 unicast address refers to a single interface. Since each
interface belongs to a single node, any of that node's interfaces'
unicast addresses may be used as an identifier for the node.

All interfaces are required to have at least one link-local unicast
address (see section 2.8 for additional required addresses). A
single interface may also be assigned multiple IPv6 addresses of any
type (unicast, anycast, and multicast) or scope. Unicast addresses
with scope greater than link-scope are not needed for interfaces that
are not used as the origin or destination of any IPv6 packets to or
from non-neighbors. This is sometimes convenient for point-to-point
interfaces. There is one exception to this addressing model:

An unicast address or a set of unicast addresses may be assigned to
multiple physical interfaces if the implementation treats the
multiple physical interfaces as one interface when presenting it to
the internet layer. This is useful for load-sharing over multiple
physical interfaces.

Currently IPv6 continues the IPv4 model that a subnet prefix is
associated with one link. Multiple subnet prefixes may be assigned
to the same link.

2.2 Text Representation of Addresses

There are three conventional forms for representing IPv6 addresses as
text strings:

1. The preferred form is x:x:x:x:x:x:x:x, where the 'x's are the
hexadecimal values of the eight 16-bit pieces of the address.
Examples:

FEDC:BA98:7654:3210:FEDC:BA98:7654:3210

1080:0:0:0:8:800:200C:417A

Note that it is not necessary to write the leading zeros in an
individual field, but there must be at least one numeral in every
field (except for the case described in 2.).

2. Due to some methods of allocating certain styles of IPv6
addresses, it will be common for addresses to contain long strings
of zero bits. In order to make writing addresses containing zero
bits easier a special syntax is available to compress the zeros.
The use of "::" indicates multiple groups of 16-bits of zeros.
The "::" can only appear once in an address. The "::" can also be
used to compress the leading and/or trailing zeros in an address.

For example the following addresses:

1080:0:0:0:8:800:200C:417A a unicast address
FF01:0:0:0:0:0:0:101 a multicast address
0:0:0:0:0:0:0:1 the loopback address
0:0:0:0:0:0:0:0 the unspecified addresses

may be represented as:

1080::8:800:200C:417A a unicast address
FF01::101 a multicast address
::1 the loopback address
:: the unspecified addresses

3. An alternative form that is sometimes more convenient when dealing
with a mixed environment of IPv4 and IPv6 nodes is
x:x:x:x:x:x:d.d.d.d, where the 'x's are the hexadecimal values of
the six high-order 16-bit pieces of the address, and the 'd's are
the decimal values of the four low-order 8-bit pieces of the
address (standard IPv4 representation). Examples:

0:0:0:0:0:0:13.1.68.3

0:0:0:0:0:FFFF:129.144.52.38

or in compressed form:

::13.1.68.3

::FFFF:129.144.52.38

2.3 Text Representation of Address Prefixes

The text representation of IPv6 address prefixes is similar to the
way IPv4 addresses prefixes are written in CIDR notation. An IPv6
address prefix is represented by the notation:

ipv6-address/prefix-length

where

ipv6-address is an IPv6 address in any of the notations listed
in section 2.2.

prefix-length is a decimal value specifying how many of the
leftmost contiguous bits of the address comprise
the prefix.

For example, the following are legal representations of the 60-bit
prefix 12AB00000000CD3 (hexadecimal):

12AB:0000:0000:CD30:0000:0000:0000:0000/60
12AB::CD30:0:0:0:0/60
12AB:0:0:CD30::/60

The following are NOT legal representations of the above prefix:

12AB:0:0:CD3/60 may drop leading zeros, but not trailing zeros,
within any 16-bit chunk of the address

12AB::CD30/60 address to left of "/" expands to
12AB:0000:0000:0000:0000:000:0000:CD30

12AB::CD3/60 address to left of "/" expands to
12AB:0000:0000:0000:0000:000:0000:0CD3

When writing both a node address and a prefix of that node address
(e.g., the node's subnet prefix), the two can combined as follows:

the node address 12AB:0:0:CD30:123:4567:89AB:CDEF
and its subnet number 12AB:0:0:CD30::/60

can be abbreviated as 12AB:0:0:CD30:123:4567:89AB:CDEF/60

2.4 Address Type Representation

The specific type of an IPv6 address is indicated by the leading bits
in the address. The variable-length field comprising these leading
bits is called the Format Prefix (FP). The initial allocation of
these prefixes is as follows:

Allocation Prefix Fraction of
(binary) Address Space
----------------------------------- -------- -------------
Reserved 0000 0000 1/256
Unassigned 0000 0001 1/256

Reserved for NSAP Allocation 0000 001 1/128
Reserved for IPX Allocation 0000 010 1/128

Unassigned 0000 011 1/128
Unassigned 0000 1 1/32
Unassigned 0001 1/16

Aggregatable Global Unicast Addresses 001 1/8
Unassigned 010 1/8
Unassigned 011 1/8
Unassigned 100 1/8
Unassigned 101 1/8
Unassigned 110 1/8

Unassigned 1110 1/16
Unassigned 1111 0 1/32
Unassigned 1111 10 1/64
Unassigned 1111 110 1/128
Unassigned 1111 1110 0 1/512

Link-Local Unicast Addresses 1111 1110 10 1/1024
Site-Local Unicast Addresses 1111 1110 11 1/1024

Multicast Addresses 1111 1111 1/256

Notes:

(1) The "unspecified address" (see section 2.5.2), the loopback
address (see section 2.5.3), and the IPv6 Addresses with
Embedded IPv4 Addresses (see section 2.5.4), are assigned out
of the 0000 0000 format prefix space.

(2) The format prefixes 001 through 111, except for Multicast
Addresses (1111 1111), are all required to have to have 64-bit
interface identifiers in EUI-64 format. See section 2.5.1 for
definitions.

This allocation supports the direct allocation of aggregation
addresses, local use addresses, and multicast addresses. Space is
reserved for NSAP addresses and IPX addresses. The remainder of the
address space is unassigned for future use. This can be used for
expansion of existing use (e.g., additional aggregatable addresses,
etc.) or new uses (e.g., separate locators and identifiers). Fifteen
percent of the address space is initially allocated. The remaining
85% is reserved for future use.

Unicast addresses are distinguished from multicast addresses by the
value of the high-order octet of the addresses: a value of FF
(11111111) identifies an address as a multicast address; any other
value identifies an address as a unicast address. Anycast addresses
are taken from the unicast address space, and are not syntactically
distinguishable from unicast addresses.

2.5 Unicast Addresses

IPv6 unicast addresses are aggregatable with contiguous bit-wise
masks similar to IPv4 addresses under Class-less Interdomain Routing
[CIDR].

There are several forms of unicast address assignment in IPv6,
including the global aggregatable global unicast address, the NSAP
address, the IPX hierarchical address, the site-local address, the
link-local address, and the IPv4-capable host address. Additional
address types can be defined in the future.

IPv6 nodes may have considerable or little knowledge of the internal
structure of the IPv6 address, depending on the role the node plays
(for instance, host versus router). At a minimum, a node may
consider that unicast addresses (including its own) have no internal
structure:

| 128 bits |
+-----------------------------------------------------------------+
| node address |
+-----------------------------------------------------------------+

A slightly sophisticated host (but still rather simple) may
additionally be aware of subnet prefix(es) for the link(s) it is
attached to, where different addresses may have different values for
n:

| n bits | 128-n bits |
+------------------------------------------------+----------------+
| subnet prefix | interface ID |
+------------------------------------------------+----------------+

Still more sophisticated hosts may be aware of other hierarchical
boundaries in the unicast address. Though a very simple router may
have no knowledge of the internal structure of IPv6 unicast
addresses, routers will more generally have knowledge of one or more
of the hierarchical boundaries for the operation of routing
protocols. The known boundaries will differ from router to router,
depending on what positions the router holds in the routing
hierarchy.

2.5.1 Interface Identifiers

Interface identifiers in IPv6 unicast addresses are used to identify
interfaces on a link. They are required to be unique on that link.
They may also be unique over a broader scope. In many cases an
interface's identifier will be the same as that interface's link-
layer address. The same interface identifier may be used on multiple
interfaces on a single node.

Note that the use of the same interface identifier on multiple
interfaces of a single node does not affect the interface
identifier's global uniqueness or each IPv6 addresses global
uniqueness created using that interface identifier.

In a number of the format prefixes (see section 2.4) Interface IDs
are required to be 64 bits long and to be constructed in IEEE EUI-64
format [EUI64]. EUI-64 based Interface identifiers may have global
scope when a global token is available (e.g., IEEE 48bit MAC) or may
have local scope where a global token is not available (e.g., serial
links, tunnel end-points, etc.). It is required that the "u" bit
(universal/local bit in IEEE EUI-64 terminology) be inverted when
forming the interface identifier from the EUI-64. The "u" bit is set
to one (1) to indicate global scope, and it is set to zero (0) to
indicate local scope. The first three octets in binary of an EUI-64
identifier are as follows:

0 0 0 1 1 2
|0 7 8 5 6 3|
+----+----+----+----+----+----+
|cccc|ccug|cccc|cccc|cccc|cccc|
+----+----+----+----+----+----+

written in Internet standard bit-order , where "u" is the
universal/local bit, "g" is the individual/group bit, and "c" are the
bits of the company_id. Appendix A: "Creating EUI-64 based Interface
Identifiers" provides examples on the creation of different EUI-64
based interface identifiers.

The motivation for inverting the "u" bit when forming the interface
identifier is to make it easy for system administrators to hand
configure local scope identifiers when hardware tokens are not
available. This is expected to be case for serial links, tunnel end-
points, etc. The alternative would have been for these to be of the
form 0200:0:0:1, 0200:0:0:2, etc., instead of the much simpler ::1,
::2, etc.

The use of the universal/local bit in the IEEE EUI-64 identifier is
to allow development of future technology that can take advantage of
interface identifiers with global scope.

The details of forming interface identifiers are defined in the
appropriate "IPv6 over <link>" specification such as "IPv6 over
Ethernet" [ETHER], "IPv6 over FDDI" [FDDI], etc.

2.5.2 The Unspecified Address

The address 0:0:0:0:0:0:0:0 is called the unspecified address. It
must never be assigned to any node. It indicates the absence of an
address. One example of its use is in the Source Address field of
any IPv6 packets sent by an initializing host before it has learned
its own address.

The unspecified address must not be used as the destination address
of IPv6 packets or in IPv6 Routing Headers.

2.5.3 The Loopback Address

The unicast address 0:0:0:0:0:0:0:1 is called the loopback address.
It may be used by a node to send an IPv6 packet to itself. It may
never be assigned to any physical interface. It may be thought of as
being associated with a virtual interface (e.g., the loopback
interface).

The loopback address must not be used as the source address in IPv6
packets that are sent outside of a single node. An IPv6 packet with
a destination address of loopback must never be sent outside of a
single node and must never be forwarded by an IPv6 router.

2.5.4 IPv6 Addresses with Embedded IPv4 Addresses

The IPv6 transition mechanisms [TRAN] include a technique for hosts
and routers to dynamically tunnel IPv6 packets over IPv4 routing
infrastructure. IPv6 nodes that utilize this technique are assigned
special IPv6 unicast addresses that carry an IPv4 address in the low-
order 32-bits. This type of address is termed an "IPv4-compatible
IPv6 address" and has the format:

| 80 bits | 16 | 32 bits |
+--------------------------------------+--------------------------+
|0000..............................0000|0000| IPv4 address |
+--------------------------------------+----+---------------------+

A second type of IPv6 address which holds an embedded IPv4 address is
also defined. This address is used to represent the addresses of
IPv4-only nodes (those that *do not* support IPv6) as IPv6 addresses.
This type of address is termed an "IPv4-mapped IPv6 address" and has
the format:

| 80 bits | 16 | 32 bits |
+--------------------------------------+--------------------------+
|0000..............................0000|FFFF| IPv4 address |
+--------------------------------------+----+---------------------+

2.5.5 NSAP Addresses

This mapping of NSAP address into IPv6 addresses is defined in
[NSAP]. This document recommends that network implementors who have
planned or deployed an OSI NSAP addressing plan, and who wish to
deploy or transition to IPv6, should redesign a native IPv6
addressing plan to meet their needs. However, it also defines a set
of mechanisms for the support of OSI NSAP addressing in an IPv6
network. These mechanisms are the ones that must be used if such
support is required. This document also defines a mapping of IPv6
addresses within the OSI address format, should this be required.

2.5.6 IPX Addresses

This mapping of IPX address into IPv6 addresses is as follows:

| 7 | 121 bits |
+-------+---------------------------------------------------------+
|0000010| to be defined |
+-------+---------------------------------------------------------+

The draft definition, motivation, and usage are under study.

2.5.7 Aggregatable Global Unicast Addresses

The global aggregatable global unicast address is defined in [AGGR].
This address format is designed to support both the current provider
based aggregation and a new type of aggregation called exchanges.
The combination will allow efficient routing aggregation for both
sites which connect directly to providers and who connect to
exchanges. Sites will have the choice to connect to either type of
aggregation point.

The IPv6 aggregatable global unicast address format is as follows:

| 3| 13 | 8 | 24 | 16 | 64 bits |
+--+-----+---+--------+--------+--------------------------------+
|FP| TLA |RES| NLA | SLA | Interface ID |
| | ID | | ID | ID | |
+--+-----+---+--------+--------+--------------------------------+

Where

001 Format Prefix (3 bit) for Aggregatable Global
Unicast Addresses
TLA ID Top-Level Aggregation Identifier
RES Reserved for future use
NLA ID Next-Level Aggregation Identifier
SLA ID Site-Level Aggregation Identifier
INTERFACE ID Interface Identifier

The contents, field sizes, and assignment rules are defined in
[AGGR].

2.5.8 Local-Use IPv6 Unicast Addresses

There are two types of local-use unicast addresses defined. These
are Link-Local and Site-Local. The Link-Local is for use on a single
link and the Site-Local is for use in a single site. Link-Local
addresses have the following format:

| 10 |
| bits | 54 bits | 64 bits |
+----------+-------------------------+----------------------------+
|1111111010| 0 | interface ID |
+----------+-------------------------+----------------------------+

Link-Local addresses are designed to be used for addressing on a
single link for purposes such as auto-address configuration, neighbor
discovery, or when no routers are present.

Routers must not forward any packets with link-local source or
destination addresses to other links.

Site-Local addresses have the following format:

| 10 |
| bits | 38 bits | 16 bits | 64 bits |
+----------+-------------+-----------+----------------------------+
|1111111011| 0 | subnet ID | interface ID |
+----------+-------------+-----------+----------------------------+

Site-Local addresses are designed to be used for addressing inside of
a site without the need for a global prefix.

Routers must not forward any packets with site-local source or
destination addresses outside of the site.

2.6 Anycast Addresses

An IPv6 anycast address is an address that is assigned to more than
one interface (typically belonging to different nodes), with the
property that a packet sent to an anycast address is routed to the
"nearest" interface having that address, according to the routing
protocols' measure of distance.

Anycast addresses are allocated from the unicast address space, using
any of the defined unicast address formats. Thus, anycast addresses
are syntactically indistinguishable from unicast addresses. When a
unicast address is assigned to more than one interface, thus turning
it into an anycast address, the nodes to which the address is
assigned must be explicitly configured to know that it is an anycast
address.

For any assigned anycast address, there is a longest address prefix P
that identifies the topological region in which all interfaces
belonging to that anycast address reside. Within the region
identified by P, each member of the anycast set must be advertised as
a separate entry in the routing system (commonly referred to as a
"host route"); outside the region identified by P, the anycast
address may be aggregated into the routing advertisement for prefix
P.

Note that in, the worst case, the prefix P of an anycast set may be
the null prefix, i.e., the members of the set may have no topological
locality. In that case, the anycast address must be advertised as a
separate routing entry throughout the entire internet, which presents

a severe scaling limit on how many such "global" anycast sets may be
supported. Therefore, it is expected that support for global anycast
sets may be unavailable or very restricted.

One expected use of anycast addresses is to identify the set of
routers belonging to an organization providing internet service.
Such addresses could be used as intermediate addresses in an IPv6
Routing header, to cause a packet to be delivered via a particular
aggregation or sequence of aggregations. Some other possible uses
are to identify the set of routers attached to a particular subnet,
or the set of routers providing entry into a particular routing
domain.

There is little experience with widespread, arbitrary use of internet
anycast addresses, and some known complications and hazards when
using them in their full generality [ANYCST]. Until more experience
has been gained and solutions agreed upon for those problems, the
following restrictions are imposed on IPv6 anycast addresses:

o An anycast address must not be used as the source address of an
IPv6 packet.

o An anycast address must not be assigned to an IPv6 host, that
is, it may be assigned to an IPv6 router only.

2.6.1 Required Anycast Address

The Subnet-Router anycast address is predefined. Its format is as
follows:

| n bits | 128-n bits |
+------------------------------------------------+----------------+
| subnet prefix | 00000000000000 |
+------------------------------------------------+----------------+

The "subnet prefix" in an anycast address is the prefix which
identifies a specific link. This anycast address is syntactically
the same as a unicast address for an interface on the link with the
interface identifier set to zero.

Packets sent to the Subnet-Router anycast address will be delivered
to one router on the subnet. All routers are required to support the
Subnet-Router anycast addresses for the subnets which they have
interfaces.

The subnet-router anycast address is intended to be used for
applications where a node needs to communicate with one of a set of
routers on a remote subnet. For example when a mobile host needs to
communicate with one of the mobile agents on its "home" subnet.

2.7 Multicast Addresses

An IPv6 multicast address is an identifier for a group of nodes. A
node may belong to any number of multicast groups. Multicast
addresses have the following format:

| 8 | 4 | 4 | 112 bits |
+------ -+----+----+---------------------------------------------+
|11111111|flgs|scop| group ID |
+--------+----+----+---------------------------------------------+

11111111 at the start of the address identifies the address as
being a multicast address.

+-+-+-+-+
flgs is a set of 4 flags: |0|0|0|T|
+-+-+-+-+

The high-order 3 flags are reserved, and must be initialized to
0.

T = 0 indicates a permanently-assigned ("well-known") multicast
address, assigned by the global internet numbering authority.

T = 1 indicates a non-permanently-assigned ("transient")
multicast address.

scop is a 4-bit multicast scope value used to limit the scope of
the multicast group. The values are:

0 reserved
1 node-local scope
2 link-local scope
3 (unassigned)
4 (unassigned)
5 site-local scope
6 (unassigned)
7 (unassigned)
8 organization-local scope
9 (unassigned)
A (unassigned)
B (unassigned)
C (unassigned)

D (unassigned)
E global scope
F reserved

group ID identifies the multicast group, either permanent or
transient, within the given scope.

The "meaning" of a permanently-assigned multicast address is
independent of the scope value. For example, if the "NTP servers
group" is assigned a permanent multicast address with a group ID of
101 (hex), then:

FF01:0:0:0:0:0:0:101 means all NTP servers on the same node as the
sender.

FF02:0:0:0:0:0:0:101 means all NTP servers on the same link as the
sender.

FF05:0:0:0:0:0:0:101 means all NTP servers at the same site as the
sender.

FF0E:0:0:0:0:0:0:101 means all NTP servers in the internet.

Non-permanently-assigned multicast addresses are meaningful only
within a given scope. For example, a group identified by the non-
permanent, site-local multicast address FF15:0:0:0:0:0:0:101 at one
site bears no relationship to a group using the same address at a
different site, nor to a non-permanent group using the same group ID
with different scope, nor to a permanent group with the same group
ID.

Multicast addresses must not be used as source addresses in IPv6
packets or appear in any routing header.

2.7.1 Pre-Defined Multicast Addresses

The following well-known multicast addresses are pre-defined:

Reserved Multicast Addresses: FF00:0:0:0:0:0:0:0
FF01:0:0:0:0:0:0:0
FF02:0:0:0:0:0:0:0
FF03:0:0:0:0:0:0:0
FF04:0:0:0:0:0:0:0
FF05:0:0:0:0:0:0:0
FF06:0:0:0:0:0:0:0
FF07:0:0:0:0:0:0:0
FF08:0:0:0:0:0:0:0
FF09:0:0:0:0:0:0:0

FF0A:0:0:0:0:0:0:0
FF0B:0:0:0:0:0:0:0
FF0C:0:0:0:0:0:0:0
FF0D:0:0:0:0:0:0:0
FF0E:0:0:0:0:0:0:0
FF0F:0:0:0:0:0:0:0

The above multicast addresses are reserved and shall never be
assigned to any multicast group.

All Nodes Addresses: FF01:0:0:0:0:0:0:1
FF02:0:0:0:0:0:0:1

The above multicast addresses identify the group of all IPv6 nodes,
within scope 1 (node-local) or 2 (link-local).

All Routers Addresses: FF01:0:0:0:0:0:0:2
FF02:0:0:0:0:0:0:2
FF05:0:0:0:0:0:0:2

The above multicast addresses identify the group of all IPv6 routers,
within scope 1 (node-local), 2 (link-local), or 5 (site-local).

Solicited-Node Address: FF02:0:0:0:0:1:FFXX:XXXX

The above multicast address is computed as a function of a node's
unicast and anycast addresses. The solicited-node multicast address
is formed by taking the low-order 24 bits of the address (unicast or
anycast) and appending those bits to the prefix
FF02:0:0:0:0:1:FF00::/104 resulting in a multicast address in the
range

FF02:0:0:0:0:1:FF00:0000

to

FF02:0:0:0:0:1:FFFF:FFFF

For example, the solicited node multicast address corresponding to
the IPv6 address 4037::01:800:200E:8C6C is FF02::1:FF0E:8C6C. IPv6
addresses that differ only in the high-order bits, e.g. due to
multiple high-order prefixes associated with different aggregations,
will map to the same solicited-node address thereby reducing the
number of multicast addresses a node must join.

A node is required to compute and join the associated Solicited-Node
multicast addresses for every unicast and anycast address it is
assigned.

2.7.2 Assignment of New IPv6 Multicast Addresses

The current approach [ETHER] to map IPv6 multicast addresses into
IEEE 802 MAC addresses takes the low order 32 bits of the IPv6
multicast address and uses it to create a MAC address. Note that
Token Ring networks are handled differently. This is defined in
[TOKEN]. Group ID's less than or equal to 32 bits will generate
unique MAC addresses. Due to this new IPv6 multicast addresses
should be assigned so that the group identifier is always in the low
order 32 bits as shown in the following:

| 8 | 4 | 4 | 80 bits | 32 bits |
+------ -+----+----+---------------------------+-----------------+
|11111111|flgs|scop| reserved must be zero | group ID |
+--------+----+----+---------------------------+-----------------+

While this limits the number of permanent IPv6 multicast groups to
2^32 this is unlikely to be a limitation in the future. If it
becomes necessary to exceed this limit in the future multicast will
still work but the processing will be sightly slower.

Additional IPv6 multicast addresses are defined and registered by the
IANA [MASGN].

2.8 A Node's Required Addresses

A host is required to recognize the following addresses as
identifying itself:

o Its Link-Local Address for each interface
o Assigned Unicast Addresses
o Loopback Address
o All-Nodes Multicast Addresses
o Solicited-Node Multicast Address for each of its assigned
unicast and anycast addresses
o Multicast Addresses of all other groups to which the host
belongs.

A router is required to recognize all addresses that a host is
required to recognize, plus the following addresses as identifying
itself:

o The Subnet-Router anycast addresses for the interfaces it is
configured to act as a router on.
o All other Anycast addresses with which the router has been
configured.
o All-Routers Multicast Addresses

o Multicast Addresses of all other groups to which the router
belongs.

The only address prefixes which should be predefined in an
implementation are the:

o Unspecified Address
o Loopback Address
o Multicast Prefix (FF)
o Local-Use Prefixes (Link-Local and Site-Local)
o Pre-Defined Multicast Addresses
o IPv4-Compatible Prefixes

Implementations should assume all other addresses are unicast unless
specifically configured (e.g., anycast addresses).

3. Security Considerations

IPv6 addressing documents do not have any direct impact on Internet
infrastructure security. Authentication of IPv6 packets is defined
in [AUTH].

APPENDIX A : Creating EUI-64 based Interface Identifiers
--------------------------------------------------------

Depending on the characteristics of a specific link or node there are
a number of approaches for creating EUI-64 based interface
identifiers. This appendix describes some of these approaches.

Links or Nodes with EUI-64 Identifiers

The only change needed to transform an EUI-64 identifier to an
interface identifier is to invert the "u" (universal/local) bit. For
example, a globally unique EUI-64 identifier of the form:

|0 1|1 3|3 4|4 6|
|0 5|6 1|2 7|8 3|
+----------------+----------------+----------------+----------------+
|cccccc0gcccccccc|ccccccccmmmmmmmm|mmmmmmmmmmmmmmmm|mmmmmmmmmmmmmmmm|
+----------------+----------------+----------------+----------------+

where "c" are the bits of the assigned company_id, "0" is the value
of the universal/local bit to indicate global scope, "g" is
individual/group bit, and "m" are the bits of the manufacturer-
selected extension identifier. The IPv6 interface identifier would
be of the form:

|0 1|1 3|3 4|4 6|
|0 5|6 1|2 7|8 3|
+----------------+----------------+----------------+----------------+
|cccccc1gcccccccc|ccccccccmmmmmmmm|mmmmmmmmmmmmmmmm|mmmmmmmmmmmmmmmm|
+----------------+----------------+----------------+----------------+

The only change is inverting the value of the universal/local bit.

Links or Nodes with IEEE 802 48 bit MAC's

[EUI64] defines a method to create a EUI-64 identifier from an IEEE
48bit MAC identifier. This is to insert two octets, with hexadecimal
values of 0xFF and 0xFE, in the middle of the 48 bit MAC (between the
company_id and vendor supplied id). For example the 48 bit MAC with
global scope:

|0 1|1 3|3 4|
|0 5|6 1|2 7|
+----------------+----------------+----------------+
|cccccc0gcccccccc|ccccccccmmmmmmmm|mmmmmmmmmmmmmmmm|
+----------------+----------------+----------------+

where "c" are the bits of the assigned company_id, "0" is the value
of the universal/local bit to indicate global scope, "g" is
individual/group bit, and "m" are the bits of the manufacturer-
selected extension identifier. The interface identifier would be of
the form:

|0 1|1 3|3 4|4 6|
|0 5|6 1|2 7|8 3|
+----------------+----------------+----------------+----------------+
|cccccc1gcccccccc|cccccccc11111111|11111110mmmmmmmm|mmmmmmmmmmmmmmmm|
+----------------+----------------+----------------+----------------+

When IEEE 802 48bit MAC addresses are available (on an interface or a
node), an implementation should use them to create interface
identifiers due to their availability and uniqueness properties.

Links with Non-Global Identifiers

There are a number of types of links that, while multi-access, do not
have globally unique link identifiers. Examples include LocalTalk
and Arcnet. The method to create an EUI-64 formatted identifier is
to take the link identifier (e.g., the LocalTalk 8 bit node
identifier) and zero fill it to the left. For example a LocalTalk 8
bit node identifier of hexadecimal value 0x4F results in the
following interface identifier:

|0 1|1 3|3 4|4 6|
|0 5|6 1|2 7|8 3|
+----------------+----------------+----------------+----------------+
|0000000000000000|0000000000000000|0000000000000000|0000000001001111|
+----------------+----------------+----------------+----------------+

Note that this results in the universal/local bit set to "0" to
indicate local scope.

Links without Identifiers

There are a number of links that do not have any type of built-in
identifier. The most common of these are serial links and configured
tunnels. Interface identifiers must be chosen that are unique for
the link.

When no built-in identifier is available on a link the preferred
approach is to use a global interface identifier from another
interface or one which is assigned to the node itself. To use this
approach no other interface connecting the same node to the same link
may use the same identifier.

If there is no global interface identifier available for use on the
link the implementation needs to create a local scope interface
identifier. The only requirement is that it be unique on the link.
There are many possible approaches to select a link-unique interface
identifier. They include:

Manual Configuration
Generated Random Number
Node Serial Number (or other node-specific token)

The link-unique interface identifier should be generated in a manner
that it does not change after a reboot of a node or if interfaces are
added or deleted from the node.

The selection of the appropriate algorithm is link and implementation
dependent. The details on forming interface identifiers are defined
in the appropriate "IPv6 over <link>" specification. It is strongly
recommended that a collision detection algorithm be implemented as
part of any automatic algorithm.

APPENDIX B: ABNF Description of Text Representations
----------------------------------------------------

This appendix defines the text representation of IPv6 addresses and
prefixes in Augmented BNF [ABNF] for reference purposes.

IPv6address = hexpart [ ":" IPv4address ]
IPv4address = 1*3DIGIT "." 1*3DIGIT "." 1*3DIGIT "." 1*3DIGIT

IPv6prefix = hexpart "/" 1*2DIGIT

hexpart = hexseq | hexseq "::" [ hexseq ] | "::" [ hexseq ]
hexseq = hex4 *( ":" hex4)
hex4 = 1*4HEXDIG

APPENDIX C: CHANGES FROM RFC-1884
---------------------------------

The following changes were made from RFC-1884 "IP Version 6
Addressing Architecture":

- Added an appendix providing a ABNF description of text
representations.
- Clarification that link unique identifiers not change after
reboot or other interface reconfigurations.
- Clarification of Address Model based on comments.
- Changed aggregation format terminology to be consistent with
aggregation draft.
- Added text to allow interface identifier to be used on more than
one interface on same node.
- Added rules for defining new multicast addresses.
- Added appendix describing procedures for creating EUI-64 based
interface ID's.
- Added notation for defining IPv6 prefixes.
- Changed solicited node multicast definition to use a longer
prefix.
- Added site scope all routers multicast address.
- Defined Aggregatable Global Unicast Addresses to use "001" Format
Prefix.
- Changed "010" (Provider-Based Unicast) and "100" (Reserved for
Geographic) Format Prefixes to Unassigned.
- Added section on Interface ID definition for unicast addresses.
Requires use of EUI-64 in range of format prefixes and rules for
setting global/local scope bit in EUI-64.
- Updated NSAP text to reflect working in RFC1888.
- Removed protocol specific IPv6 multicast addresses (e.g., DHCP)
and referenced the IANA definitions.
- Removed section "Unicast Address Example". Had become OBE.
- Added new and updated references.
- Minor text clarifications and improvements.

REFERENCES

[ABNF] Crocker, D., and P. Overell, "Augmented BNF for
Syntax Specifications: ABNF", RFC 2234, November 1997.

[AGGR] Hinden, R., O'Dell, M., and S. Deering, "An
Aggregatable Global Unicast Address Format", RFC 2374, July
1998.

[AUTH] Atkinson, R., "IP Authentication Header", RFC 1826, August
1995.

[ANYCST] Partridge, C., Mendez, T., and W. Milliken, "Host
Anycasting Service", RFC 1546, November 1993.

[CIDR] Fuller, V., Li, T., Yu, J., and K. Varadhan, "Classless
Inter-Domain Routing (CIDR): An Address Assignment and
Aggregation Strategy", RFC 1519, September 1993.

[ETHER] Crawford, M., "Transmission of IPv6 Pacekts over Ethernet
Networks", Work in Progress.

[EUI64] IEEE, "Guidelines for 64-bit Global Identifier (EUI-64)
Registration Authority",
http://standards.ieee.org/db/oui/tutorials/EUI64.html,
March 1997.

[FDDI] Crawford, M., "Transmission of IPv6 Packets over FDDI
Networks", Work in Progress.

[IPV6] Deering, S., and R. Hinden, Editors, "Internet Protocol,
Version 6 (IPv6) Specification", RFC 1883, December 1995.

[MASGN] Hinden, R., and S. Deering, "IPv6 Multicast Address
Assignments", RFC 2375, July 1998.

[NSAP] Bound, J., Carpenter, B., Harrington, D., Houldsworth, J.,
and A. Lloyd, "OSI NSAPs and IPv6", RFC 1888, August 1996.

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.

[TOKEN] Thomas, S., "Transmission of IPv6 Packets over Token Ring
Networks", Work in Progress.

[TRAN] Gilligan, R., and E. Nordmark, "Transition Mechanisms for
IPv6 Hosts and Routers", RFC 1993, April 1996.

AUTHORS' ADDRESSES

Robert M. Hinden
Nokia
232 Java Drive
Sunnyvale, CA 94089
USA

Phone: +1 408 990-2004
Fax: +1 408 743-5677
EMail: hinden@iprg.nokia.com

Stephen E. Deering
Cisco Systems, Inc.
170 West Tasman Drive
San Jose, CA 95134-1706
USA

Phone: +1 408 527-8213
Fax: +1 408 527-8254
EMail: deering@cisco.com

Full Copyright Statement

Copyright (C) The Internet Society (1998). All Rights Reserved.

This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.

The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.

This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

RFC 1981 – Path MTU Discovery for IP version 6

 
Network Working Group                                          J. McCann
Request for Comments: 1981 Digital Equipment Corporation
Category: Standards Track S. Deering
Xerox PARC
J. Mogul
Digital Equipment Corporation
August 1996

Path MTU Discovery for IP version 6

Status of this Memo

This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.

Abstract

This document describes Path MTU Discovery for IP version 6. It is
largely derived from RFC 1191, which describes Path MTU Discovery for
IP version 4.

Table of Contents

1. Introduction.................................................2
2. Terminology..................................................2
3. Protocol overview............................................3
4. Protocol Requirements........................................4
5. Implementation Issues........................................5
5.1. Layering...................................................5
5.2. Storing PMTU information...................................6
5.3. Purging stale PMTU information.............................8
5.4. TCP layer actions..........................................9
5.5. Issues for other transport protocols......................11
5.6. Management interface......................................12
6. Security Considerations.....................................12
Acknowledgements...............................................13
Appendix A - Comparison to RFC 1191............................14
References.....................................................14
Authors' Addresses.............................................15

1. Introduction

When one IPv6 node has a large amount of data to send to another
node, the data is transmitted in a series of IPv6 packets. It is
usually preferable that these packets be of the largest size that can
successfully traverse the path from the source node to the
destination node. This packet size is referred to as the Path MTU
(PMTU), and it is equal to the minimum link MTU of all the links in a
path. IPv6 defines a standard mechanism for a node to discover the
PMTU of an arbitrary path.

IPv6 nodes SHOULD implement Path MTU Discovery in order to discover
and take advantage of paths with PMTU greater than the IPv6 minimum
link MTU [IPv6-SPEC]. A minimal IPv6 implementation (e.g., in a boot
ROM) may choose to omit implementation of Path MTU Discovery.

Nodes not implementing Path MTU Discovery use the IPv6 minimum link
MTU defined in [IPv6-SPEC] as the maximum packet size. In most
cases, this will result in the use of smaller packets than necessary,
because most paths have a PMTU greater than the IPv6 minimum link
MTU. A node sending packets much smaller than the Path MTU allows is
wasting network resources and probably getting suboptimal throughput.

2. Terminology

node - a device that implements IPv6.

router - a node that forwards IPv6 packets not explicitly
addressed to itself.

host - any node that is not a router.

upper layer - a protocol layer immediately above IPv6. Examples are
transport protocols such as TCP and UDP, control
protocols such as ICMP, routing protocols such as OSPF,
and internet or lower-layer protocols being "tunneled"
over (i.e., encapsulated in) IPv6 such as IPX,
AppleTalk, or IPv6 itself.

link - a communication facility or medium over which nodes can
communicate at the link layer, i.e., the layer
immediately below IPv6. Examples are Ethernets (simple
or bridged); PPP links; X.25, Frame Relay, or ATM
networks; and internet (or higher) layer "tunnels",
such as tunnels over IPv4 or IPv6 itself.

interface - a node's attachment to a link.

address - an IPv6-layer identifier for an interface or a set of
interfaces.

packet - an IPv6 header plus payload.

link MTU - the maximum transmission unit, i.e., maximum packet
size in octets, that can be conveyed in one piece over
a link.

path - the set of links traversed by a packet between a source
node and a destination node

path MTU - the minimum link MTU of all the links in a path between
a source node and a destination node.

PMTU - path MTU

Path MTU
Discovery - process by which a node learns the PMTU of a path

flow - a sequence of packets sent from a particular source
to a particular (unicast or multicast) destination for
which the source desires special handling by the
intervening routers.

flow id - a combination of a source address and a non-zero
flow label.

3. Protocol overview

This memo describes a technique to dynamically discover the PMTU of a
path. The basic idea is that a source node initially assumes that
the PMTU of a path is the (known) MTU of the first hop in the path.
If any of the packets sent on that path are too large to be forwarded
by some node along the path, that node will discard them and return
ICMPv6 Packet Too Big messages [ICMPv6]. Upon receipt of such a
message, the source node reduces its assumed PMTU for the path based
on the MTU of the constricting hop as reported in the Packet Too Big
message.

The Path MTU Discovery process ends when the node's estimate of the
PMTU is less than or equal to the actual PMTU. Note that several
iterations of the packet-sent/Packet-Too-Big-message-received cycle
may occur before the Path MTU Discovery process ends, as there may be
links with smaller MTUs further along the path.

Alternatively, the node may elect to end the discovery process by
ceasing to send packets larger than the IPv6 minimum link MTU.

The PMTU of a path may change over time, due to changes in the
routing topology. Reductions of the PMTU are detected by Packet Too
Big messages. To detect increases in a path's PMTU, a node
periodically increases its assumed PMTU. This will almost always
result in packets being discarded and Packet Too Big messages being
generated, because in most cases the PMTU of the path will not have
changed. Therefore, attempts to detect increases in a path's PMTU
should be done infrequently.

Path MTU Discovery supports multicast as well as unicast
destinations. In the case of a multicast destination, copies of a
packet may traverse many different paths to many different nodes.
Each path may have a different PMTU, and a single multicast packet
may result in multiple Packet Too Big messages, each reporting a
different next-hop MTU. The minimum PMTU value across the set of
paths in use determines the size of subsequent packets sent to the
multicast destination.

Note that Path MTU Discovery must be performed even in cases where a
node "thinks" a destination is attached to the same link as itself.
In a situation such as when a neighboring router acts as proxy [ND]
for some destination, the destination can to appear to be directly
connected but is in fact more than one hop away.

4. Protocol Requirements

As discussed in section 1, IPv6 nodes are not required to implement
Path MTU Discovery. The requirements in this section apply only to
those implementations that include Path MTU Discovery.

When a node receives a Packet Too Big message, it MUST reduce its
estimate of the PMTU for the relevant path, based on the value of the
MTU field in the message. The precise behavior of a node in this
circumstance is not specified, since different applications may have
different requirements, and since different implementation
architectures may favor different strategies.

After receiving a Packet Too Big message, a node MUST attempt to
avoid eliciting more such messages in the near future. The node MUST
reduce the size of the packets it is sending along the path. Using a
PMTU estimate larger than the IPv6 minimum link MTU may continue to
elicit Packet Too Big messages. Since each of these messages (and
the dropped packets they respond to) consume network resources, the
node MUST force the Path MTU Discovery process to end.

Nodes using Path MTU Discovery MUST detect decreases in PMTU as fast
as possible. Nodes MAY detect increases in PMTU, but because doing
so requires sending packets larger than the current estimated PMTU,

and because the likelihood is that the PMTU will not have increased,
this MUST be done at infrequent intervals. An attempt to detect an
increase (by sending a packet larger than the current estimate) MUST
NOT be done less than 5 minutes after a Packet Too Big message has
been received for the given path. The recommended setting for this
timer is twice its minimum value (10 minutes).

A node MUST NOT reduce its estimate of the Path MTU below the IPv6
minimum link MTU.

Note: A node may receive a Packet Too Big message reporting a
next-hop MTU that is less than the IPv6 minimum link MTU. In that
case, the node is not required to reduce the size of subsequent
packets sent on the path to less than the IPv6 minimun link MTU,
but rather must include a Fragment header in those packets [IPv6-
SPEC].

A node MUST NOT increase its estimate of the Path MTU in response to
the contents of a Packet Too Big message. A message purporting to
announce an increase in the Path MTU might be a stale packet that has
been floating around in the network, a false packet injected as part
of a denial-of-service attack, or the result of having multiple paths
to the destination, each with a different PMTU.

5. Implementation Issues

This section discusses a number of issues related to the
implementation of Path MTU Discovery. This is not a specification,
but rather a set of notes provided as an aid for implementors.

The issues include:

- What layer or layers implement Path MTU Discovery?

- How is the PMTU information cached?

- How is stale PMTU information removed?

- What must transport and higher layers do?

5.1. Layering

In the IP architecture, the choice of what size packet to send is
made by a protocol at a layer above IP. This memo refers to such a
protocol as a "packetization protocol". Packetization protocols are
usually transport protocols (for example, TCP) but can also be
higher-layer protocols (for example, protocols built on top of UDP).

Implementing Path MTU Discovery in the packetization layers
simplifies some of the inter-layer issues, but has several drawbacks:
the implementation may have to be redone for each packetization
protocol, it becomes hard to share PMTU information between different
packetization layers, and the connection-oriented state maintained by
some packetization layers may not easily extend to save PMTU
information for long periods.

It is therefore suggested that the IP layer store PMTU information
and that the ICMP layer process received Packet Too Big messages.
The packetization layers may respond to changes in the PMTU, by
changing the size of the messages they send. To support this
layering, packetization layers require a way to learn of changes in
the value of MMS_S, the "maximum send transport-message size". The
MMS_S is derived from the Path MTU by subtracting the size of the
IPv6 header plus space reserved by the IP layer for additional
headers (if any).

It is possible that a packetization layer, perhaps a UDP application
outside the kernel, is unable to change the size of messages it
sends. This may result in a packet size that exceeds the Path MTU.
To accommodate such situations, IPv6 defines a mechanism that allows
large payloads to be divided into fragments, with each fragment sent
in a separate packet (see [IPv6-SPEC] section "Fragment Header").
However, packetization layers are encouraged to avoid sending
messages that will require fragmentation (for the case against
fragmentation, see [FRAG]).

5.2. Storing PMTU information

Ideally, a PMTU value should be associated with a specific path
traversed by packets exchanged between the source and destination
nodes. However, in most cases a node will not have enough
information to completely and accurately identify such a path.
Rather, a node must associate a PMTU value with some local
representation of a path. It is left to the implementation to select
the local representation of a path.

In the case of a multicast destination address, copies of a packet
may traverse many different paths to reach many different nodes. The
local representation of the "path" to a multicast destination must in
fact represent a potentially large set of paths.

Minimally, an implementation could maintain a single PMTU value to be
used for all packets originated from the node. This PMTU value would
be the minimum PMTU learned across the set of all paths in use by the
node. This approach is likely to result in the use of smaller
packets than is necessary for many paths.

An implementation could use the destination address as the local
representation of a path. The PMTU value associated with a
destination would be the minimum PMTU learned across the set of all
paths in use to that destination. The set of paths in use to a
particular destination is expected to be small, in many cases
consisting of a single path. This approach will result in the use of
optimally sized packets on a per-destination basis. This approach
integrates nicely with the conceptual model of a host as described in
[ND]: a PMTU value could be stored with the corresponding entry in
the destination cache.

If flows [IPv6-SPEC] are in use, an implementation could use the flow
id as the local representation of a path. Packets sent to a
particular destination but belonging to different flows may use
different paths, with the choice of path depending on the flow id.
This approach will result in the use of optimally sized packets on a
per-flow basis, providing finer granularity than PMTU values
maintained on a per-destination basis.

For source routed packets (i.e. packets containing an IPv6 Routing
header [IPv6-SPEC]), the source route may further qualify the local
representation of a path. In particular, a packet containing a type
0 Routing header in which all bits in the Strict/Loose Bit Map are
equal to 1 contains a complete path specification. An implementation
could use source route information in the local representation of a
path.

Note: Some paths may be further distinguished by different
security classifications. The details of such classifications are
beyond the scope of this memo.

Initially, the PMTU value for a path is assumed to be the (known) MTU
of the first-hop link.

When a Packet Too Big message is received, the node determines which
path the message applies to based on the contents of the Packet Too
Big message. For example, if the destination address is used as the
local representation of a path, the destination address from the
original packet would be used to determine which path the message
applies to.

Note: if the original packet contained a Routing header, the
Routing header should be used to determine the location of the
destination address within the original packet. If Segments Left
is equal to zero, the destination address is in the Destination
Address field in the IPv6 header. If Segments Left is greater
than zero, the destination address is the last address
(Address[n]) in the Routing header.

The node then uses the value in the MTU field in the Packet Too Big
message as a tentative PMTU value, and compares the tentative PMTU to
the existing PMTU. If the tentative PMTU is less than the existing
PMTU estimate, the tentative PMTU replaces the existing PMTU as the
PMTU value for the path.

The packetization layers must be notified about decreases in the
PMTU. Any packetization layer instance (for example, a TCP
connection) that is actively using the path must be notified if the
PMTU estimate is decreased.

Note: even if the Packet Too Big message contains an Original
Packet Header that refers to a UDP packet, the TCP layer must be
notified if any of its connections use the given path.

Also, the instance that sent the packet that elicited the Packet Too
Big message should be notified that its packet has been dropped, even
if the PMTU estimate has not changed, so that it may retransmit the
dropped data.

Note: An implementation can avoid the use of an asynchronous
notification mechanism for PMTU decreases by postponing
notification until the next attempt to send a packet larger than
the PMTU estimate. In this approach, when an attempt is made to
SEND a packet that is larger than the PMTU estimate, the SEND
function should fail and return a suitable error indication. This
approach may be more suitable to a connectionless packetization
layer (such as one using UDP), which (in some implementations) may
be hard to "notify" from the ICMP layer. In this case, the normal
timeout-based retransmission mechanisms would be used to recover
from the dropped packets.

It is important to understand that the notification of the
packetization layer instances using the path about the change in the
PMTU is distinct from the notification of a specific instance that a
packet has been dropped. The latter should be done as soon as
practical (i.e., asynchronously from the point of view of the
packetization layer instance), while the former may be delayed until
a packetization layer instance wants to create a packet.
Retransmission should be done for only for those packets that are
known to be dropped, as indicated by a Packet Too Big message.

5.3. Purging stale PMTU information

Internetwork topology is dynamic; routes change over time. While the
local representation of a path may remain constant, the actual
path(s) in use may change. Thus, PMTU information cached by a node
can become stale.

If the stale PMTU value is too large, this will be discovered almost
immediately once a large enough packet is sent on the path. No such
mechanism exists for realizing that a stale PMTU value is too small,
so an implementation should "age" cached values. When a PMTU value
has not been decreased for a while (on the order of 10 minutes), the
PMTU estimate should be set to the MTU of the first-hop link, and the
packetization layers should be notified of the change. This will
cause the complete Path MTU Discovery process to take place again.

Note: an implementation should provide a means for changing the
timeout duration, including setting it to "infinity". For
example, nodes attached to an FDDI link which is then attached to
the rest of the Internet via a small MTU serial line are never
going to discover a new non-local PMTU, so they should not have to
put up with dropped packets every 10 minutes.

An upper layer must not retransmit data in response to an increase in
the PMTU estimate, since this increase never comes in response to an
indication of a dropped packet.

One approach to implementing PMTU aging is to associate a timestamp
field with a PMTU value. This field is initialized to a "reserved"
value, indicating that the PMTU is equal to the MTU of the first hop
link. Whenever the PMTU is decreased in response to a Packet Too Big
message, the timestamp is set to the current time.

Once a minute, a timer-driven procedure runs through all cached PMTU
values, and for each PMTU whose timestamp is not "reserved" and is
older than the timeout interval:

- The PMTU estimate is set to the MTU of the first hop link.

- The timestamp is set to the "reserved" value.

- Packetization layers using this path are notified of the increase.

5.4. TCP layer actions

The TCP layer must track the PMTU for the path(s) in use by a
connection; it should not send segments that would result in packets
larger than the PMTU. A simple implementation could ask the IP layer
for this value each time it created a new segment, but this could be
inefficient. Moreover, TCP implementations that follow the "slow-
start" congestion-avoidance algorithm [CONG] typically calculate and
cache several other values derived from the PMTU. It may be simpler
to receive asynchronous notification when the PMTU changes, so that
these variables may be updated.

A TCP implementation must also store the MSS value received from its
peer, and must not send any segment larger than this MSS, regardless
of the PMTU. In 4.xBSD-derived implementations, this may require
adding an additional field to the TCP state record.

The value sent in the TCP MSS option is independent of the PMTU.
This MSS option value is used by the other end of the connection,
which may be using an unrelated PMTU value. See [IPv6-SPEC] sections
"Packet Size Issues" and "Maximum Upper-Layer Payload Size" for
information on selecting a value for the TCP MSS option.

When a Packet Too Big message is received, it implies that a packet
was dropped by the node that sent the ICMP message. It is sufficient
to treat this as any other dropped segment, and wait until the
retransmission timer expires to cause retransmission of the segment.
If the Path MTU Discovery process requires several steps to find the
PMTU of the full path, this could delay the connection by many
round-trip times.

Alternatively, the retransmission could be done in immediate response
to a notification that the Path MTU has changed, but only for the
specific connection specified by the Packet Too Big message. The
packet size used in the retransmission should be no larger than the
new PMTU.

Note: A packetization layer must not retransmit in response to
every Packet Too Big message, since a burst of several oversized
segments will give rise to several such messages and hence several
retransmissions of the same data. If the new estimated PMTU is
still wrong, the process repeats, and there is an exponential
growth in the number of superfluous segments sent.

This means that the TCP layer must be able to recognize when a
Packet Too Big notification actually decreases the PMTU that it
has already used to send a packet on the given connection, and
should ignore any other notifications.

Many TCP implementations incorporate "congestion avoidance" and
"slow-start" algorithms to improve performance [CONG]. Unlike a
retransmission caused by a TCP retransmission timeout, a
retransmission caused by a Packet Too Big message should not change
the congestion window. It should, however, trigger the slow-start
mechanism (i.e., only one segment should be retransmitted until
acknowledgements begin to arrive again).

TCP performance can be reduced if the sender's maximum window size is
not an exact multiple of the segment size in use (this is not the
congestion window size, which is always a multiple of the segment

size). In many systems (such as those derived from 4.2BSD), the
segment size is often set to 1024 octets, and the maximum window size
(the "send space") is usually a multiple of 1024 octets, so the
proper relationship holds by default. If Path MTU Discovery is used,
however, the segment size may not be a submultiple of the send space,
and it may change during a connection; this means that the TCP layer
may need to change the transmission window size when Path MTU
Discovery changes the PMTU value. The maximum window size should be
set to the greatest multiple of the segment size that is less than or
equal to the sender's buffer space size.

5.5. Issues for other transport protocols

Some transport protocols (such as ISO TP4 [ISOTP]) are not allowed to
repacketize when doing a retransmission. That is, once an attempt is
made to transmit a segment of a certain size, the transport cannot
split the contents of the segment into smaller segments for
retransmission. In such a case, the original segment can be
fragmented by the IP layer during retransmission. Subsequent
segments, when transmitted for the first time, should be no larger
than allowed by the Path MTU.

The Sun Network File System (NFS) uses a Remote Procedure Call (RPC)
protocol [RPC] that, when used over UDP, in many cases will generate
payloads that must be fragmented even for the first-hop link. This
might improve performance in certain cases, but it is known to cause
reliability and performance problems, especially when the client and
server are separated by routers.

It is recommended that NFS implementations use Path MTU Discovery
whenever routers are involved. Most NFS implementations allow the
RPC datagram size to be changed at mount-time (indirectly, by
changing the effective file system block size), but might require
some modification to support changes later on.

Also, since a single NFS operation cannot be split across several UDP
datagrams, certain operations (primarily, those operating on file
names and directories) require a minimum payload size that if sent in
a single packet would exceed the PMTU. NFS implementations should
not reduce the payload size below this threshold, even if Path MTU
Discovery suggests a lower value. In this case the payload will be
fragmented by the IP layer.

5.6. Management interface

It is suggested that an implementation provide a way for a system
utility program to:

- Specify that Path MTU Discovery not be done on a given path.

- Change the PMTU value associated with a given path.

The former can be accomplished by associating a flag with the path;
when a packet is sent on a path with this flag set, the IP layer does
not send packets larger than the IPv6 minimum link MTU.

These features might be used to work around an anomalous situation,
or by a routing protocol implementation that is able to obtain Path
MTU values.

The implementation should also provide a way to change the timeout
period for aging stale PMTU information.

6. Security Considerations

This Path MTU Discovery mechanism makes possible two denial-of-
service attacks, both based on a malicious party sending false Packet
Too Big messages to a node.

In the first attack, the false message indicates a PMTU much smaller
than reality. This should not entirely stop data flow, since the
victim node should never set its PMTU estimate below the IPv6 minimum
link MTU. It will, however, result in suboptimal performance.

In the second attack, the false message indicates a PMTU larger than
reality. If believed, this could cause temporary blockage as the
victim sends packets that will be dropped by some router. Within one
round-trip time, the node would discover its mistake (receiving
Packet Too Big messages from that router), but frequent repetition of
this attack could cause lots of packets to be dropped. A node,
however, should never raise its estimate of the PMTU based on a
Packet Too Big message, so should not be vulnerable to this attack.

A malicious party could also cause problems if it could stop a victim
from receiving legitimate Packet Too Big messages, but in this case
there are simpler denial-of-service attacks available.

Acknowledgements

We would like to acknowledge the authors of and contributors to
[RFC-1191], from which the majority of this document was derived. We
would also like to acknowledge the members of the IPng working group
for their careful review and constructive criticisms.

Appendix A - Comparison to RFC 1191

This document is based in large part on RFC 1191, which describes
Path MTU Discovery for IPv4. Certain portions of RFC 1191 were not
needed in this document:

router specification - Packet Too Big messages and corresponding
router behavior are defined in [ICMPv6]

Don't Fragment bit - there is no DF bit in IPv6 packets

TCP MSS discussion - selecting a value to send in the TCP MSS
option is discussed in [IPv6-SPEC]

old-style messages - all Packet Too Big messages report the
MTU of the constricting link

MTU plateau tables - not needed because there are no old-style
messages

References

[CONG] Van Jacobson. Congestion Avoidance and Control. Proc.
SIGCOMM '88 Symposium on Communications Architectures and
Protocols, pages 314-329. Stanford, CA, August, 1988.

[FRAG] C. Kent and J. Mogul. Fragmentation Considered Harmful.
In Proc. SIGCOMM '87 Workshop on Frontiers in Computer
Communications Technology. August, 1987.

[ICMPv6] Conta, A., and S. Deering, "Internet Control Message
Protocol (ICMPv6) for the Internet Protocol Version 6
(IPv6) Specification", RFC 1885, December 1995.

[IPv6-SPEC] Deering, S., and R. Hinden, "Internet Protocol, Version
6 (IPv6) Specification", RFC 1883, December 1995.

[ISOTP] ISO. ISO Transport Protocol Specification: ISO DP 8073.
RFC 905, SRI Network Information Center, April, 1984.

[ND] Narten, T., Nordmark, E., and W. Simpson, "Neighbor
Discovery for IP Version 6 (IPv6)", Work in Progress.

[RFC-1191] Mogul, J., and S. Deering, "Path MTU Discovery",
RFC 1191, November 1990.

[RPC] Sun Microsystems, Inc., "RPC: Remote Procedure Call
Protocol", RFC 1057, SRI Network Information Center,
June, 1988.

Authors' Addresses

Jack McCann
Digital Equipment Corporation
110 Spitbrook Road, ZKO3-3/U14
Nashua, NH 03062
Phone: +1 603 881 2608

Fax: +1 603 881 0120
Email: mccann@zk3.dec.com

Stephen E. Deering
Xerox Palo Alto Research Center
3333 Coyote Hill Road
Palo Alto, CA 94304
Phone: +1 415 812 4839

Fax: +1 415 812 4471
EMail: deering@parc.xerox.com

Jeffrey Mogul
Digital Equipment Corporation Western Research Laboratory
250 University Avenue
Palo Alto, CA 94301
Phone: +1 415 617 3304

EMail: mogul@pa.dec.com


RFC 2406 – IP Encapsulating Security Payload (ESP)

 
Network Working Group                                            S. Kent
Request for Comments: 2406 BBN Corp
Obsoletes: 1827 R. Atkinson
Category: Standards Track @Home Network
November 1998

IP Encapsulating Security Payload (ESP)

Status of this Memo

This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.

Copyright Notice

Copyright (C) The Internet Society (1998). All Rights Reserved.

Table of Contents

1. Introduction..................................................2
2. Encapsulating Security Payload Packet Format..................3
2.1 Security Parameters Index................................4
2.2 Sequence Number .........................................4
2.3 Payload Data.............................................5
2.4 Padding (for Encryption).................................5
2.5 Pad Length...............................................7
2.6 Next Header..............................................7
2.7 Authentication Data......................................7
3. Encapsulating Security Protocol Processing....................7
3.1 ESP Header Location......................................7
3.2 Algorithms..............................................10
3.2.1 Encryption Algorithms..............................10
3.2.2 Authentication Algorithms..........................10
3.3 Outbound Packet Processing..............................10
3.3.1 Security Association Lookup........................11
3.3.2 Packet Encryption..................................11
3.3.3 Sequence Number Generation.........................12
3.3.4 Integrity Check Value Calculation..................12
3.3.5 Fragmentation......................................13
3.4 Inbound Packet Processing...............................13
3.4.1 Reassembly.........................................13
3.4.2 Security Association Lookup........................13
3.4.3 Sequence Number Verification.......................14
3.4.4 Integrity Check Value Verification.................15

3.4.5 Packet Decryption..................................16
4. Auditing.....................................................17
5. Conformance Requirements.....................................18
6. Security Considerations......................................18
7. Differences from RFC 1827....................................18
Acknowledgements................................................19
References......................................................19
Disclaimer......................................................20
Author Information..............................................21
Full Copyright Statement........................................22

1. Introduction

The Encapsulating Security Payload (ESP) header is designed to
provide a mix of security services in IPv4 and IPv6. ESP may be
applied alone, in combination with the IP Authentication Header (AH)
[KA97b], or in a nested fashion, e.g., through the use of tunnel mode
(see "Security Architecture for the Internet Protocol" [KA97a],
hereafter referred to as the Security Architecture document).
Security services can be provided between a pair of communicating
hosts, between a pair of communicating security gateways, or between
a security gateway and a host. For more details on how to use ESP
and AH in various network environments, see the Security Architecture
document [KA97a].

The ESP header is inserted after the IP header and before the upper
layer protocol header (transport mode) or before an encapsulated IP
header (tunnel mode). These modes are described in more detail
below.

ESP is used to provide confidentiality, data origin authentication,
connectionless integrity, an anti-replay service (a form of partial
sequence integrity), and limited traffic flow confidentiality. The
set of services provided depends on options selected at the time of
Security Association establishment and on the placement of the
implementation. Confidentiality may be selected independent of all
other services. However, use of confidentiality without
integrity/authentication (either in ESP or separately in AH) may
subject traffic to certain forms of active attacks that could
undermine the confidentiality service (see [Bel96]). Data origin
authentication and connectionless integrity are joint services
(hereafter referred to jointly as "authentication) and are offered as
an option in conjunction with (optional) confidentiality. The anti-
replay service may be selected only if data origin authentication is
selected, and its election is solely at the discretion of the
receiver. (Although the default calls for the sender to increment
the Sequence Number used for anti-replay, the service is effective
only if the receiver checks the Sequence Number.) Traffic flow

confidentiality requires selection of tunnel mode, and is most
effective if implemented at a security gateway, where traffic
aggregation may be able to mask true source-destination patterns.
Note that although both confidentiality and authentication are
optional, at least one of them MUST be selected.

It is assumed that the reader is familiar with the terms and concepts
described in the Security Architecture document. In particular, the
reader should be familiar with the definitions of security services
offered by ESP and AH, the concept of Security Associations, the ways
in which ESP can be used in conjunction with the Authentication
Header (AH), and the different key management options available for
ESP and AH. (With regard to the last topic, the current key
management options required for both AH and ESP are manual keying and
automated keying via IKE [HC98].)

The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in RFC 2119 [Bra97].

2. Encapsulating Security Payload Packet Format

The protocol header (IPv4, IPv6, or Extension) immediately preceding
the ESP header will contain the value 50 in its Protocol (IPv4) or
Next Header (IPv6, Extension) field [STD-2].

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ----
| Security Parameters Index (SPI) | ^Auth.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Cov-
| Sequence Number | |erage
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ----
| Payload Data* (variable) | | ^
~ ~ | |
| | |Conf.
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Cov-
| | Padding (0-255 bytes) | |erage*
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
| | Pad Length | Next Header | v v
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------
| Authentication Data (variable) |
~ ~
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

* If included in the Payload field, cryptographic
synchronization data, e.g., an Initialization Vector (IV, see

Section 2.3), usually is not encrypted per se, although it
often is referred to as being part of the ciphertext.

The following subsections define the fields in the header format.
"Optional" means that the field is omitted if the option is not
selected, i.e., it is present in neither the packet as transmitted
nor as formatted for computation of an Integrity Check Value (ICV,
see Section 2.7). Whether or not an option is selected is defined as
part of Security Association (SA) establishment. Thus the format of
ESP packets for a given SA is fixed, for the duration of the SA. In
contrast, "mandatory" fields are always present in the ESP packet
format, for all SAs.

2.1 Security Parameters Index

The SPI is an arbitrary 32-bit value that, in combination with the
destination IP address and security protocol (ESP), uniquely
identifies the Security Association for this datagram. The set of
SPI values in the range 1 through 255 are reserved by the Internet
Assigned Numbers Authority (IANA) for future use; a reserved SPI
value will not normally be assigned by IANA unless the use of the
assigned SPI value is specified in an RFC. It is ordinarily selected
by the destination system upon establishment of an SA (see the
Security Architecture document for more details). The SPI field is
mandatory.

The SPI value of zero (0) is reserved for local, implementation-
specific use and MUST NOT be sent on the wire. For example, a key
management implementation MAY use the zero SPI value to mean "No
Security Association Exists" during the period when the IPsec
implementation has requested that its key management entity establish
a new SA, but the SA has not yet been established.

2.2 Sequence Number

This unsigned 32-bit field contains a monotonically increasing
counter value (sequence number). It is mandatory and is always
present even if the receiver does not elect to enable the anti-replay
service for a specific SA. Processing of the Sequence Number field
is at the discretion of the receiver, i.e., the sender MUST always
transmit this field, but the receiver need not act upon it (see the
discussion of Sequence Number Verification in the "Inbound Packet
Processing" section below).

The sender's counter and the receiver's counter are initialized to 0
when an SA is established. (The first packet sent using a given SA
will have a Sequence Number of 1; see Section 3.3.3 for more details
on how the Sequence Number is generated.) If anti-replay is enabled

(the default), the transmitted Sequence Number must never be allowed
to cycle. Thus, the sender's counter and the receiver's counter MUST
be reset (by establishing a new SA and thus a new key) prior to the
transmission of the 2^32nd packet on an SA.

2.3 Payload Data

Payload Data is a variable-length field containing data described by
the Next Header field. The Payload Data field is mandatory and is an
integral number of bytes in length. If the algorithm used to encrypt
the payload requires cryptographic synchronization data, e.g., an
Initialization Vector (IV), then this data MAY be carried explicitly
in the Payload field. Any encryption algorithm that requires such
explicit, per-packet synchronization data MUST indicate the length,
any structure for such data, and the location of this data as part of
an RFC specifying how the algorithm is used with ESP. If such
synchronization data is implicit, the algorithm for deriving the data
MUST be part of the RFC.

Note that with regard to ensuring the alignment of the (real)
ciphertext in the presence of an IV:

o For some IV-based modes of operation, the receiver treats
the IV as the start of the ciphertext, feeding it into the
algorithm directly. In these modes, alignment of the start
of the (real) ciphertext is not an issue at the receiver.
o In some cases, the receiver reads the IV in separately from
the ciphertext. In these cases, the algorithm
specification MUST address how alignment of the (real)
ciphertext is to be achieved.

2.4 Padding (for Encryption)

Several factors require or motivate use of the Padding field.

o If an encryption algorithm is employed that requires the
plaintext to be a multiple of some number of bytes, e.g.,
the block size of a block cipher, the Padding field is used
to fill the plaintext (consisting of the Payload Data, Pad
Length and Next Header fields, as well as the Padding) to
the size required by the algorithm.

o Padding also may be required, irrespective of encryption
algorithm requirements, to ensure that the resulting
ciphertext terminates on a 4-byte boundary. Specifically,

the Pad Length and Next Header fields must be right aligned
within a 4-byte word, as illustrated in the ESP packet
format figure above, to ensure that the Authentication Data
field (if present) is aligned on a 4-byte boundary.

o Padding beyond that required for the algorithm or alignment
reasons cited above, may be used to conceal the actual
length of the payload, in support of (partial) traffic flow
confidentiality. However, inclusion of such additional
padding has adverse bandwidth implications and thus its use
should be undertaken with care.

The sender MAY add 0-255 bytes of padding. Inclusion of the Padding
field in an ESP packet is optional, but all implementations MUST
support generation and consumption of padding.

a. For the purpose of ensuring that the bits to be encrypted
are a multiple of the algorithm's blocksize (first bullet
above), the padding computation applies to the Payload
Data exclusive of the IV, the Pad Length, and Next Header
fields.

b. For the purposes of ensuring that the Authentication Data
is aligned on a 4-byte boundary (second bullet above), the
padding computation applies to the Payload Data inclusive
of the IV, the Pad Length, and Next Header fields.

If Padding bytes are needed but the encryption algorithm does not
specify the padding contents, then the following default processing
MUST be used. The Padding bytes are initialized with a series of
(unsigned, 1-byte) integer values. The first padding byte appended
to the plaintext is numbered 1, with subsequent padding bytes making
up a monotonically increasing sequence: 1, 2, 3, ... When this
padding scheme is employed, the receiver SHOULD inspect the Padding
field. (This scheme was selected because of its relative simplicity,
ease of implementation in hardware, and because it offers limited
protection against certain forms of "cut and paste" attacks in the
absence of other integrity measures, if the receiver checks the
padding values upon decryption.)

Any encryption algorithm that requires Padding other than the default
described above, MUST define the Padding contents (e.g., zeros or
random data) and any required receiver processing of these Padding
bytes in an RFC specifying how the algorithm is used with ESP. In
such circumstances, the content of the Padding field will be
determined by the encryption algorithm and mode selected and defined
in the corresponding algorithm RFC. The relevant algorithm RFC MAY
specify that a receiver MUST inspect the Padding field or that a

receiver MUST inform senders of how the receiver will handle the
Padding field.

2.5 Pad Length

The Pad Length field indicates the number of pad bytes immediately
preceding it. The range of valid values is 0-255, where a value of
zero indicates that no Padding bytes are present. The Pad Length
field is mandatory.

2.6 Next Header

The Next Header is an 8-bit field that identifies the type of data
contained in the Payload Data field, e.g., an extension header in
IPv6 or an upper layer protocol identifier. The value of this field
is chosen from the set of IP Protocol Numbers defined in the most
recent "Assigned Numbers" [STD-2] RFC from the Internet Assigned
Numbers Authority (IANA). The Next Header field is mandatory.

2.7 Authentication Data

The Authentication Data is a variable-length field containing an
Integrity Check Value (ICV) computed over the ESP packet minus the
Authentication Data. The length of the field is specified by the
authentication function selected. The Authentication Data field is
optional, and is included only if the authentication service has been
selected for the SA in question. The authentication algorithm
specification MUST specify the length of the ICV and the comparison
rules and processing steps for validation.

3. Encapsulating Security Protocol Processing

3.1 ESP Header Location

Like AH, ESP may be employed in two ways: transport mode or tunnel
mode. The former mode is applicable only to host implementations and
provides protection for upper layer protocols, but not the IP header.
(In this mode, note that for "bump-in-the-stack" or "bump-in-the-
wire" implementations, as defined in the Security Architecture
document, inbound and outbound IP fragments may require an IPsec
implementation to perform extra IP reassembly/fragmentation in order
to both conform to this specification and provide transparent IPsec
support. Special care is required to perform such operations within
these implementations when multiple interfaces are in use.)

In transport mode, ESP is inserted after the IP header and before an
upper layer protocol, e.g., TCP, UDP, ICMP, etc. or before any other
IPsec headers that have already been inserted. In the context of

IPv4, this translates to placing ESP after the IP header (and any
options that it contains), but before the upper layer protocol.
(Note that the term "transport" mode should not be misconstrued as
restricting its use to TCP and UDP. For example, an ICMP message MAY
be sent using either "transport" mode or "tunnel" mode.) The
following diagram illustrates ESP transport mode positioning for a
typical IPv4 packet, on a "before and after" basis. (The "ESP
trailer" encompasses any Padding, plus the Pad Length, and Next
Header fields.)

BEFORE APPLYING ESP
----------------------------
IPv4 |orig IP hdr | | |
|(any options)| TCP | Data |
----------------------------

AFTER APPLYING ESP
-------------------------------------------------
IPv4 |orig IP hdr | ESP | | | ESP | ESP|
|(any options)| Hdr | TCP | Data | Trailer |Auth|
-------------------------------------------------
|<----- encrypted ---->|
|<------ authenticated ----->|

In the IPv6 context, ESP is viewed as an end-to-end payload, and thus
should appear after hop-by-hop, routing, and fragmentation extension
headers. The destination options extension header(s) could appear
either before or after the ESP header depending on the semantics
desired. However, since ESP protects only fields after the ESP
header, it generally may be desirable to place the destination
options header(s) after the ESP header. The following diagram
illustrates ESP transport mode positioning for a typical IPv6 packet.

BEFORE APPLYING ESP
---------------------------------------
IPv6 | | ext hdrs | | |
| orig IP hdr |if present| TCP | Data |
---------------------------------------

AFTER APPLYING ESP
---------------------------------------------------------
IPv6 | orig |hop-by-hop,dest*,| |dest| | | ESP | ESP|
|IP hdr|routing,fragment.|ESP|opt*|TCP|Data|Trailer|Auth|
---------------------------------------------------------
|<---- encrypted ---->|
|<---- authenticated ---->|

* = if present, could be before ESP, after ESP, or both

ESP and AH headers can be combined in a variety of modes. The IPsec
Architecture document describes the combinations of security
associations that must be supported.

Tunnel mode ESP may be employed in either hosts or security gateways.
When ESP is implemented in a security gateway (to protect subscriber
transit traffic), tunnel mode must be used. In tunnel mode, the
"inner" IP header carries the ultimate source and destination
addresses, while an "outer" IP header may contain distinct IP
addresses, e.g., addresses of security gateways. In tunnel mode, ESP
protects the entire inner IP packet, including the entire inner IP
header. The position of ESP in tunnel mode, relative to the outer IP
header, is the same as for ESP in transport mode. The following
diagram illustrates ESP tunnel mode positioning for typical IPv4 and
IPv6 packets.

-----------------------------------------------------------
IPv4 | new IP hdr* | | orig IP hdr* | | | ESP | ESP|
|(any options)| ESP | (any options) |TCP|Data|Trailer|Auth|
-----------------------------------------------------------
|<--------- encrypted ---------->|
|<----------- authenticated ---------->|

------------------------------------------------------------
IPv6 | new* |new ext | | orig*|orig ext | | | ESP | ESP|
|IP hdr| hdrs* |ESP|IP hdr| hdrs * |TCP|Data|Trailer|Auth|
------------------------------------------------------------
|<--------- encrypted ----------->|
|<---------- authenticated ---------->|

* = if present, construction of outer IP hdr/extensions
and modification of inner IP hdr/extensions is
discussed below.

3.2 Algorithms

The mandatory-to-implement algorithms are described in Section 5,
"Conformance Requirements". Other algorithms MAY be supported. Note
that although both confidentiality and authentication are optional,
at least one of these services MUST be selected hence both algorithms
MUST NOT be simultaneously NULL.

3.2.1 Encryption Algorithms

The encryption algorithm employed is specified by the SA. ESP is
designed for use with symmetric encryption algorithms. Because IP
packets may arrive out of order, each packet must carry any data
required to allow the receiver to establish cryptographic
synchronization for decryption. This data may be carried explicitly
in the payload field, e.g., as an IV (as described above), or the
data may be derived from the packet header. Since ESP makes
provision for padding of the plaintext, encryption algorithms
employed with ESP may exhibit either block or stream mode
characteristics. Note that since encryption (confidentiality) is
optional, this algorithm may be "NULL".

3.2.2 Authentication Algorithms

The authentication algorithm employed for the ICV computation is
specified by the SA. For point-to-point communication, suitable
authentication algorithms include keyed Message Authentication Codes
(MACs) based on symmetric encryption algorithms (e.g., DES) or on
one-way hash functions (e.g., MD5 or SHA-1). For multicast
communication, one-way hash algorithms combined with asymmetric
signature algorithms are appropriate, though performance and space
considerations currently preclude use of such algorithms. Note that
since authentication is optional, this algorithm may be "NULL".

3.3 Outbound Packet Processing

In transport mode, the sender encapsulates the upper layer protocol
information in the ESP header/trailer, and retains the specified IP
header (and any IP extension headers in the IPv6 context). In tunnel
mode, the outer and inner IP header/extensions can be inter-related
in a variety of ways. The construction of the outer IP
header/extensions during the encapsulation process is described in
the Security Architecture document. If there is more than one IPsec
header/extension required by security policy, the order of the
application of the security headers MUST be defined by security
policy.

3.3.1 Security Association Lookup

ESP is applied to an outbound packet only after an IPsec
implementation determines that the packet is associated with an SA
that calls for ESP processing. The process of determining what, if
any, IPsec processing is applied to outbound traffic is described in
the Security Architecture document.

3.3.2 Packet Encryption

In this section, we speak in terms of encryption always being applied
because of the formatting implications. This is done with the
understanding that "no confidentiality" is offered by using the NULL
encryption algorithm. Accordingly, the sender:

1. encapsulates (into the ESP Payload field):
- for transport mode -- just the original upper layer
protocol information.
- for tunnel mode -- the entire original IP datagram.
2. adds any necessary padding.
3. encrypts the result (Payload Data, Padding, Pad Length, and
Next Header) using the key, encryption algorithm, algorithm
mode indicated by the SA and cryptographic synchronization
data (if any).
- If explicit cryptographic synchronization data, e.g.,
an IV, is indicated, it is input to the encryption
algorithm per the algorithm specification and placed
in the Payload field.
- If implicit cryptographic synchronication data, e.g.,
an IV, is indicated, it is constructed and input to
the encryption algorithm as per the algorithm
specification.

The exact steps for constructing the outer IP header depend on the
mode (transport or tunnel) and are described in the Security
Architecture document.

If authentication is selected, encryption is performed first, before
the authentication, and the encryption does not encompass the
Authentication Data field. This order of processing facilitates
rapid detection and rejection of replayed or bogus packets by the
receiver, prior to decrypting the packet, hence potentially reducing
the impact of denial of service attacks. It also allows for the
possibility of parallel processing of packets at the receiver, i.e.,
decryption can take place in parallel with authentication. Note that
since the Authentication Data is not protected by encryption, a keyed
authentication algorithm must be employed to compute the ICV.

3.3.3 Sequence Number Generation

The sender's counter is initialized to 0 when an SA is established.
The sender increments the Sequence Number for this SA and inserts the
new value into the Sequence Number field. Thus the first packet sent
using a given SA will have a Sequence Number of 1.

If anti-replay is enabled (the default), the sender checks to ensure
that the counter has not cycled before inserting the new value in the
Sequence Number field. In other words, the sender MUST NOT send a
packet on an SA if doing so would cause the Sequence Number to cycle.
An attempt to transmit a packet that would result in Sequence Number
overflow is an auditable event. (Note that this approach to Sequence
Number management does not require use of modular arithmetic.)

The sender assumes anti-replay is enabled as a default, unless
otherwise notified by the receiver (see 3.4.3). Thus, if the counter
has cycled, the sender will set up a new SA and key (unless the SA
was configured with manual key management).

If anti-replay is disabled, the sender does not need to monitor or
reset the counter, e.g., in the case of manual key management (see
Section 5). However, the sender still increments the counter and
when it reaches the maximum value, the counter rolls over back to
zero.

3.3.4 Integrity Check Value Calculation

If authentication is selected for the SA, the sender computes the ICV
over the ESP packet minus the Authentication Data. Thus the SPI,
Sequence Number, Payload Data, Padding (if present), Pad Length, and
Next Header are all encompassed by the ICV computation. Note that
the last 4 fields will be in ciphertext form, since encryption is
performed prior to authentication.

For some authentication algorithms, the byte string over which the
ICV computation is performed must be a multiple of a blocksize
specified by the algorithm. If the length of this byte string does
not match the blocksize requirements for the algorithm, implicit
padding MUST be appended to the end of the ESP packet, (after the
Next Header field) prior to ICV computation. The padding octets MUST
have a value of zero. The blocksize (and hence the length of the
padding) is specified by the algorithm specification. This padding
is not transmitted with the packet. Note that MD5 and SHA-1 are
viewed as having a 1-byte blocksize because of their internal padding
conventions.

3.3.5 Fragmentation

If necessary, fragmentation is performed after ESP processing within
an IPsec implementation. Thus, transport mode ESP is applied only to
whole IP datagrams (not to IP fragments). An IP packet to which ESP
has been applied may itself be fragmented by routers en route, and
such fragments must be reassembled prior to ESP processing at a
receiver. In tunnel mode, ESP is applied to an IP packet, the
payload of which may be a fragmented IP packet. For example, a
security gateway or a "bump-in-the-stack" or "bump-in-the-wire" IPsec
implementation (as defined in the Security Architecture document) may
apply tunnel mode ESP to such fragments.

NOTE: For transport mode -- As mentioned at the beginning of Section
3.1, bump-in-the-stack and bump-in-the-wire implementations may have
to first reassemble a packet fragmented by the local IP layer, then
apply IPsec, and then fragment the resulting packet.

NOTE: For IPv6 -- For bump-in-the-stack and bump-in-the-wire
implementations, it will be necessary to walk through all the
extension headers to determine if there is a fragmentation header and
hence that the packet needs reassembling prior to IPsec processing.

3.4 Inbound Packet Processing

3.4.1 Reassembly

If required, reassembly is performed prior to ESP processing. If a
packet offered to ESP for processing appears to be an IP fragment,
i.e., the OFFSET field is non-zero or the MORE FRAGMENTS flag is set,
the receiver MUST discard the packet; this is an auditable event. The
audit log entry for this event SHOULD include the SPI value,
date/time received, Source Address, Destination Address, Sequence
Number, and (in IPv6) the Flow ID.

NOTE: For packet reassembly, the current IPv4 spec does NOT require
either the zero'ing of the OFFSET field or the clearing of the MORE
FRAGMENTS flag. In order for a reassembled packet to be processed by
IPsec (as opposed to discarded as an apparent fragment), the IP code
must do these two things after it reassembles a packet.

3.4.2 Security Association Lookup

Upon receipt of a (reassembled) packet containing an ESP Header, the
receiver determines the appropriate (unidirectional) SA, based on the
destination IP address, security protocol (ESP), and the SPI. (This
process is described in more detail in the Security Architecture
document.) The SA indicates whether the Sequence Number field will

be checked, whether the Authentication Data field should be present,
and it will specify the algorithms and keys to be employed for
decryption and ICV computations (if applicable).

If no valid Security Association exists for this session (for
example, the receiver has no key), the receiver MUST discard the
packet; this is an auditable event. The audit log entry for this
event SHOULD include the SPI value, date/time received, Source
Address, Destination Address, Sequence Number, and (in IPv6) the
cleartext Flow ID.

3.4.3 Sequence Number Verification

All ESP implementations MUST support the anti-replay service, though
its use may be enabled or disabled by the receiver on a per-SA basis.
This service MUST NOT be enabled unless the authentication service
also is enabled for the SA, since otherwise the Sequence Number field
has not been integrity protected. (Note that there are no provisions
for managing transmitted Sequence Number values among multiple
senders directing traffic to a single SA (irrespective of whether the
destination address is unicast, broadcast, or multicast). Thus the
anti-replay service SHOULD NOT be used in a multi-sender environment
that employs a single SA.)

If the receiver does not enable anti-replay for an SA, no inbound
checks are performed on the Sequence Number. However, from the
perspective of the sender, the default is to assume that anti-replay
is enabled at the receiver. To avoid having the sender do
unnecessary sequence number monitoring and SA setup (see section
3.3.3), if an SA establishment protocol such as IKE is employed, the
receiver SHOULD notify the sender, during SA establishment, if the
receiver will not provide anti-replay protection.

If the receiver has enabled the anti-replay service for this SA, the
receive packet counter for the SA MUST be initialized to zero when
the SA is established. For each received packet, the receiver MUST
verify that the packet contains a Sequence Number that does not
duplicate the Sequence Number of any other packets received during
the life of this SA. This SHOULD be the first ESP check applied to a
packet after it has been matched to an SA, to speed rejection of
duplicate packets.

Duplicates are rejected through the use of a sliding receive window.
(How the window is implemented is a local matter, but the following
text describes the functionality that the implementation must
exhibit.) A MINIMUM window size of 32 MUST be supported; but a
window size of 64 is preferred and SHOULD be employed as the default.

Another window size (larger than the MINIMUM) MAY be chosen by the
receiver. (The receiver does NOT notify the sender of the window
size.)

The "right" edge of the window represents the highest, validated
Sequence Number value received on this SA. Packets that contain
Sequence Numbers lower than the "left" edge of the window are
rejected. Packets falling within the window are checked against a
list of received packets within the window. An efficient means for
performing this check, based on the use of a bit mask, is described
in the Security Architecture document.

If the received packet falls within the window and is new, or if the
packet is to the right of the window, then the receiver proceeds to
ICV verification. If the ICV validation fails, the receiver MUST
discard the received IP datagram as invalid; this is an auditable
event. The audit log entry for this event SHOULD include the SPI
value, date/time received, Source Address, Destination Address, the
Sequence Number, and (in IPv6) the Flow ID. The receive window is
updated only if the ICV verification succeeds.

DISCUSSION:

Note that if the packet is either inside the window and new, or is
outside the window on the "right" side, the receiver MUST
authenticate the packet before updating the Sequence Number window
data.

3.4.4 Integrity Check Value Verification

If authentication has been selected, the receiver computes the ICV
over the ESP packet minus the Authentication Data using the specified
authentication algorithm and verifies that it is the same as the ICV
included in the Authentication Data field of the packet. Details of
the computation are provided below.

If the computed and received ICV's match, then the datagram is valid,
and it is accepted. If the test fails, then the receiver MUST
discard the received IP datagram as invalid; this is an auditable
event. The log data SHOULD include the SPI value, date/time
received, Source Address, Destination Address, the Sequence Number,
and (in IPv6) the cleartext Flow ID.

DISCUSSION:

Begin by removing and saving the ICV value (Authentication Data
field). Next check the overall length of the ESP packet minus the
Authentication Data. If implicit padding is required, based on

the blocksize of the authentication algorithm, append zero-filled
bytes to the end of the ESP packet directly after the Next Header
field. Perform the ICV computation and compare the result with
the saved value, using the comparison rules defined by the
algorithm specification. (For example, if a digital signature and
one-way hash are used for the ICV computation, the matching
process is more complex.)

3.4.5 Packet Decryption

As in section 3.3.2, "Packet Encryption", we speak here in terms of
encryption always being applied because of the formatting
implications. This is done with the understanding that "no
confidentiality" is offered by using the NULL encryption algorithm.
Accordingly, the receiver:

1. decrypts the ESP Payload Data, Padding, Pad Length, and Next
Header using the key, encryption algorithm, algorithm mode,
and cryptographic synchronization data (if any), indicated by
the SA.
- If explicit cryptographic synchronization data, e.g.,
an IV, is indicated, it is taken from the Payload
field and input to the decryption algorithm as per the
algorithm specification.
- If implicit cryptographic synchronization data, e.g.,
an IV, is indicated, a local version of the IV is
constructed and input to the decryption algorithm as
per the algorithm specification.
2. processes any padding as specified in the encryption
algorithm specification. If the default padding scheme (see
Section 2.4) has been employed, the receiver SHOULD inspect
the Padding field before removing the padding prior to
passing the decrypted data to the next layer.
3. reconstructs the original IP datagram from:
- for transport mode -- original IP header plus the
original upper layer protocol information in the ESP
Payload field
- for tunnel mode -- tunnel IP header + the entire IP
datagram in the ESP Payload field.

The exact steps for reconstructing the original datagram depend on
the mode (transport or tunnel) and are described in the Security
Architecture document. At a minimum, in an IPv6 context, the
receiver SHOULD ensure that the decrypted data is 8-byte aligned, to
facilitate processing by the protocol identified in the Next Header
field.

If authentication has been selected, verification and decryption MAY
be performed serially or in parallel. If performed serially, then
ICV verification SHOULD be performed first. If performed in
parallel, verification MUST be completed before the decrypted packet
is passed on for further processing. This order of processing
facilitates rapid detection and rejection of replayed or bogus
packets by the receiver, prior to decrypting the packet, hence
potentially reducing the impact of denial of service attacks. Note:

If the receiver performs decryption in parallel with authentication,
care must be taken to avoid possible race conditions with regard to
packet access and reconstruction of the decrypted packet.

Note that there are several ways in which the decryption can "fail":

a. The selected SA may not be correct -- The SA may be
mis-selected due to tampering with the SPI, destination
address, or IPsec protocol type fields. Such errors, if they
map the packet to another extant SA, will be
indistinguishable from a corrupted packet, (case c).
Tampering with the SPI can be detected by use of
authentication. However, an SA mismatch might still occur
due to tampering with the IP Destination Address or the IPsec
protocol type field.

b. The pad length or pad values could be erroneous -- Bad pad
lengths or pad values can be detected irrespective of the use
of authentication.

c. The encrypted ESP packet could be corrupted -- This can be
detected if authentication is selected for the SA.,

In case (a) or (c), the erroneous result of the decryption operation
(an invalid IP datagram or transport-layer frame) will not
necessarily be detected by IPsec, and is the responsibility of later
protocol processing.

4. Auditing

Not all systems that implement ESP will implement auditing. However,
if ESP is incorporated into a system that supports auditing, then the
ESP implementation MUST also support auditing and MUST allow a system
administrator to enable or disable auditing for ESP. For the most
part, the granularity of auditing is a local matter. However,
several auditable events are identified in this specification and for
each of these events a minimum set of information that SHOULD be
included in an audit log is defined. Additional information also MAY
be included in the audit log for each of these events, and additional

events, not explicitly called out in this specification, also MAY
result in audit log entries. There is no requirement for the
receiver to transmit any message to the purported sender in response
to the detection of an auditable event, because of the potential to
induce denial of service via such action.

5. Conformance Requirements

Implementations that claim conformance or compliance with this
specification MUST implement the ESP syntax and processing described
here and MUST comply with all requirements of the Security
Architecture document. If the key used to compute an ICV is manually
distributed, correct provision of the anti-replay service would
require correct maintenance of the counter state at the sender, until
the key is replaced, and there likely would be no automated recovery
provision if counter overflow were imminent. Thus a compliant
implementation SHOULD NOT provide this service in conjunction with
SAs that are manually keyed. A compliant ESP implementation MUST
support the following mandatory-to-implement algorithms:

- DES in CBC mode [MD97]
- HMAC with MD5 [MG97a]
- HMAC with SHA-1 [MG97b]
- NULL Authentication algorithm
- NULL Encryption algorithm

Since ESP encryption and authentication are optional, support for the
2 "NULL" algorithms is required to maintain consistency with the way
these services are negotiated. NOTE that while authentication and
encryption can each be "NULL", they MUST NOT both be "NULL".

6. Security Considerations

Security is central to the design of this protocol, and thus security
considerations permeate the specification. Additional security-
relevant aspects of using the IPsec protocol are discussed in the
Security Architecture document.

7. Differences from RFC 1827

This document differs from RFC 1827 [ATK95] in several significant
ways. The major difference is that, this document attempts to
specify a complete framework and context for ESP, whereas RFC 1827
provided a "shell" that was completed through the definition of
transforms. The combinatorial growth of transforms motivated the
reformulation of the ESP specification as a more complete document,
with options for security services that may be offered in the context
of ESP. Thus, fields previously defined in transform documents are

now part of this base ESP specification. For example, the fields
necessary to support authentication (and anti-replay) are now defined
here, even though the provision of this service is an option. The
fields used to support padding for encryption, and for next protocol
identification, are now defined here as well. Packet processing
consistent with the definition of these fields also is included in
the document.

Acknowledgements

Many of the concepts embodied in this specification were derived from
or influenced by the US Government's SP3 security protocol, ISO/IEC's
NLSP, or from the proposed swIPe security protocol. [SDNS89, ISO92,
IB93].

For over 3 years, this document has evolved through multiple versions
and iterations. During this time, many people have contributed
significant ideas and energy to the process and the documents
themselves. The authors would like to thank Karen Seo for providing
extensive help in the review, editing, background research, and
coordination for this version of the specification. The authors
would also like to thank the members of the IPsec and IPng working
groups, with special mention of the efforts of (in alphabetic order):
Steve Bellovin, Steve Deering, Phil Karn, Perry Metzger, David
Mihelcic, Hilarie Orman, Norman Shulman, William Simpson and Nina
Yuan.

References

[ATK95] Atkinson, R., "IP Encapsulating Security Payload (ESP)",
RFC 1827, August 1995.

[Bel96] Steven M. Bellovin, "Problem Areas for the IP Security
Protocols", Proceedings of the Sixth Usenix Unix Security
Symposium, July, 1996.

[Bra97] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Level", BCP 14, RFC 2119, March 1997.

[HC98] Harkins, D., and D. Carrel, "The Internet Key Exchange
(IKE)", RFC 2409, November 1998.

[IB93] John Ioannidis & Matt Blaze, "Architecture and
Implementation of Network-layer Security Under Unix",
Proceedings of the USENIX Security Symposium, Santa Clara,
CA, October 1993.

[ISO92] ISO/IEC JTC1/SC6, Network Layer Security Protocol, ISO-IEC
DIS 11577, International Standards Organisation, Geneva,
Switzerland, 29 November 1992.

[KA97a] Kent, S., and R. Atkinson, "Security Architecture for the
Internet Protocol", RFC 2401, November 1998.

[KA97b] Kent, S., and R. Atkinson, "IP Authentication Header", RFC
2402, November 1998.

[MD97] Madson, C., and N. Doraswamy, "The ESP DES-CBC Cipher
Algorithm With Explicit IV", RFC 2405, November 1998.

[MG97a] Madson, C., and R. Glenn, "The Use of HMAC-MD5-96 within
ESP and AH", RFC 2403, November 1998.

[MG97b] Madson, C., and R. Glenn, "The Use of HMAC-SHA-1-96 within
ESP and AH", RFC 2404, November 1998.

[STD-2] Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC
1700, October 1994. See also:
http://www.iana.org/numbers.html

[SDNS89] SDNS Secure Data Network System, Security Protocol 3, SP3,
Document SDN.301, Revision 1.5, 15 May 1989, as published
in NIST Publication NIST-IR-90-4250, February 1990.

Disclaimer

The views and specification here are those of the authors and are not
necessarily those of their employers. The authors and their
employers specifically disclaim responsibility for any problems
arising from correct or incorrect implementation or use of this
specification.

Author Information

Stephen Kent
BBN Corporation
70 Fawcett Street
Cambridge, MA 02140
USA

Phone: +1 (617) 873-3988
EMail: kent@bbn.com

Randall Atkinson
@Home Network
425 Broadway,
Redwood City, CA 94063
USA

Phone: +1 (415) 569-5000
EMail: rja@corp.home.net

Full Copyright Statement

Copyright (C) The Internet Society (1998). All Rights Reserved.

This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.

The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.

This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


RFC 2402 – IP Authentication Header


Network Working Group S. Kent
Request for Comments: 2402 BBN Corp
Obsoletes: 1826 R. Atkinson
Category: Standards Track @Home Network
November 1998

IP Authentication Header

Status of this Memo

This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.

Copyright Notice

Copyright (C) The Internet Society (1998). All Rights Reserved.

Table of Contents

1. Introduction......................................................2
2. Authentication Header Format......................................3
2.1 Next Header...................................................4
2.2 Payload Length................................................4
2.3 Reserved......................................................4
2.4 Security Parameters Index (SPI)...............................4
2.5 Sequence Number...............................................5
2.6 Authentication Data ..........................................5
3. Authentication Header Processing..................................5
3.1 Authentication Header Location...............................5
3.2 Authentication Algorithms....................................7
3.3 Outbound Packet Processing...................................8
3.3.1 Security Association Lookup.............................8
3.3.2 Sequence Number Generation..............................8
3.3.3 Integrity Check Value Calculation.......................9
3.3.3.1 Handling Mutable Fields............................9
3.3.3.1.1 ICV Computation for IPv4.....................10
3.3.3.1.1.1 Base Header Fields.......................10
3.3.3.1.1.2 Options..................................11
3.3.3.1.2 ICV Computation for IPv6.....................11
3.3.3.1.2.1 Base Header Fields.......................11
3.3.3.1.2.2 Extension Headers Containing Options.....11
3.3.3.1.2.3 Extension Headers Not Containing Options.11
3.3.3.2 Padding...........................................12
3.3.3.2.1 Authentication Data Padding..................12

3.3.3.2.2 Implicit Packet Padding......................12
3.3.4 Fragmentation..........................................12
3.4 Inbound Packet Processing...................................13
3.4.1 Reassembly.............................................13
3.4.2 Security Association Lookup............................13
3.4.3 Sequence Number Verification...........................13
3.4.4 Integrity Check Value Verification.....................15
4. Auditing.........................................................15
5. Conformance Requirements.........................................16
6. Security Considerations..........................................16
7. Differences from RFC 1826........................................16
Acknowledgements....................................................17
Appendix A -- Mutability of IP Options/Extension Headers............18
A1. IPv4 Options.................................................18
A2. IPv6 Extension Headers.......................................19
References..........................................................20
Disclaimer..........................................................21
Author Information..................................................22
Full Copyright Statement............................................22

1. Introduction

The IP Authentication Header (AH) is used to provide connectionless
integrity and data origin authentication for IP datagrams (hereafter
referred to as just "authentication"), and to provide protection
against replays. This latter, optional service may be selected, by
the receiver, when a Security Association is established. (Although
the default calls for the sender to increment the Sequence Number
used for anti-replay, the service is effective only if the receiver
checks the Sequence Number.) AH provides authentication for as much
of the IP header as possible, as well as for upper level protocol
data. However, some IP header fields may change in transit and the
value of these fields, when the packet arrives at the receiver, may
not be predictable by the sender. The values of such fields cannot
be protected by AH. Thus the protection provided to the IP header by
AH is somewhat piecemeal.

AH may be applied alone, in combination with the IP Encapsulating
Security Payload (ESP) [KA97b], or in a nested fashion through the
use of tunnel mode (see "Security Architecture for the Internet
Protocol" [KA97a], hereafter referred to as the Security Architecture
document). Security services can be provided between a pair of
communicating hosts, between a pair of communicating security
gateways, or between a security gateway and a host. ESP may be used
to provide the same security services, and it also provides a
confidentiality (encryption) service. The primary difference between
the authentication provided by ESP and AH is the extent of the
coverage. Specifically, ESP does not protect any IP header fields

unless those fields are encapsulated by ESP (tunnel mode). For more
details on how to use AH and ESP in various network environments, see
the Security Architecture document [KA97a].

It is assumed that the reader is familiar with the terms and concepts
described in the Security Architecture document. In particular, the
reader should be familiar with the definitions of security services
offered by AH and ESP, the concept of Security Associations, the ways
in which AH can be used in conjunction with ESP, and the different
key management options available for AH and ESP. (With regard to the
last topic, the current key management options required for both AH
and ESP are manual keying and automated keying via IKE [HC98].)

The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in RFC 2119 [Bra97].

2. Authentication Header Format

The protocol header (IPv4, IPv6, or Extension) immediately preceding
the AH header will contain the value 51 in its Protocol (IPv4) or
Next Header (IPv6, Extension) field [STD-2].

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next Header | Payload Len | RESERVED |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Security Parameters Index (SPI) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number Field |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ Authentication Data (variable) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The following subsections define the fields that comprise the AH
format. All the fields described here are mandatory, i.e., they are
always present in the AH format and are included in the Integrity
Check Value (ICV) computation (see Sections 2.6 and 3.3.3).

2.1 Next Header

The Next Header is an 8-bit field that identifies the type of the
next payload after the Authentication Header. The value of this
field is chosen from the set of IP Protocol Numbers defined in the
most recent "Assigned Numbers" [STD-2] RFC from the Internet Assigned
Numbers Authority (IANA).

2.2 Payload Length

This 8-bit field specifies the length of AH in 32-bit words (4-byte
units), minus "2". (All IPv6 extension headers, as per RFC 1883,
encode the "Hdr Ext Len" field by first subtracting 1 (64-bit word)
from the header length (measured in 64-bit words). AH is an IPv6
extension header. However, since its length is measured in 32-bit
words, the "Payload Length" is calculated by subtracting 2 (32 bit
words).) In the "standard" case of a 96-bit authentication value
plus the 3 32-bit word fixed portion, this length field will be "4".
A "null" authentication algorithm may be used only for debugging
purposes. Its use would result in a "1" value for this field for
IPv4 or a "2" for IPv6, as there would be no corresponding
Authentication Data field (see Section 3.3.3.2.1 on "Authentication
Data Padding").

2.3 Reserved

This 16-bit field is reserved for future use. It MUST be set to
"zero." (Note that the value is included in the Authentication Data
calculation, but is otherwise ignored by the recipient.)

2.4 Security Parameters Index (SPI)

The SPI is an arbitrary 32-bit value that, in combination with the
destination IP address and security protocol (AH), uniquely
identifies the Security Association for this datagram. The set of
SPI values in the range 1 through 255 are reserved by the Internet
Assigned Numbers Authority (IANA) for future use; a reserved SPI
value will not normally be assigned by IANA unless the use of the
assigned SPI value is specified in an RFC. It is ordinarily selected
by the destination system upon establishment of an SA (see the
Security Architecture document for more details).

The SPI value of zero (0) is reserved for local, implementation-
specific use and MUST NOT be sent on the wire. For example, a key
management implementation MAY use the zero SPI value to mean "No
Security Association Exists" during the period when the IPsec
implementation has requested that its key management entity establish
a new SA, but the SA has not yet been established.

2.5 Sequence Number

This unsigned 32-bit field contains a monotonically increasing
counter value (sequence number). It is mandatory and is always
present even if the receiver does not elect to enable the anti-replay
service for a specific SA. Processing of the Sequence Number field
is at the discretion of the receiver, i.e., the sender MUST always
transmit this field, but the receiver need not act upon it (see the
discussion of Sequence Number Verification in the "Inbound Packet
Processing" section below).

The sender's counter and the receiver's counter are initialized to 0
when an SA is established. (The first packet sent using a given SA
will have a Sequence Number of 1; see Section 3.3.2 for more details
on how the Sequence Number is generated.) If anti-replay is enabled
(the default), the transmitted Sequence Number must never be allowed
to cycle. Thus, the sender's counter and the receiver's counter MUST
be reset (by establishing a new SA and thus a new key) prior to the
transmission of the 2^32nd packet on an SA.

2.6 Authentication Data

This is a variable-length field that contains the Integrity Check
Value (ICV) for this packet. The field must be an integral multiple
of 32 bits in length. The details of the ICV computation are
described in Section 3.3.2 below. This field may include explicit
padding. This padding is included to ensure that the length of the
AH header is an integral multiple of 32 bits (IPv4) or 64 bits
(IPv6). All implementations MUST support such padding. Details of
how to compute the required padding length are provided below. The
authentication algorithm specification MUST specify the length of the
ICV and the comparison rules and processing steps for validation.

3. Authentication Header Processing

3.1 Authentication Header Location

Like ESP, AH may be employed in two ways: transport mode or tunnel
mode. The former mode is applicable only to host implementations and
provides protection for upper layer protocols, in addition to
selected IP header fields. (In this mode, note that for "bump-in-
the-stack" or "bump-in-the-wire" implementations, as defined in the
Security Architecture document, inbound and outbound IP fragments may
require an IPsec implementation to perform extra IP
reassembly/fragmentation in order to both conform to this
specification and provide transparent IPsec support. Special care is
required to perform such operations within these implementations when
multiple interfaces are in use.)

In transport mode, AH is inserted after the IP header and before an
upper layer protocol, e.g., TCP, UDP, ICMP, etc. or before any other
IPsec headers that have already been inserted. In the context of
IPv4, this calls for placing AH after the IP header (and any options
that it contains), but before the upper layer protocol. (Note that
the term "transport" mode should not be misconstrued as restricting
its use to TCP and UDP. For example, an ICMP message MAY be sent
using either "transport" mode or "tunnel" mode.) The following
diagram illustrates AH transport mode positioning for a typical IPv4
packet, on a "before and after" basis.

BEFORE APPLYING AH
----------------------------
IPv4 |orig IP hdr | | |
|(any options)| TCP | Data |
----------------------------

AFTER APPLYING AH
---------------------------------
IPv4 |orig IP hdr | | | |
|(any options)| AH | TCP | Data |
---------------------------------
|<------- authenticated ------->|
except for mutable fields

In the IPv6 context, AH is viewed as an end-to-end payload, and thus
should appear after hop-by-hop, routing, and fragmentation extension
headers. The destination options extension header(s) could appear
either before or after the AH header depending on the semantics
desired. The following diagram illustrates AH transport mode
positioning for a typical IPv6 packet.

BEFORE APPLYING AH
---------------------------------------
IPv6 | | ext hdrs | | |
| orig IP hdr |if present| TCP | Data |
---------------------------------------

AFTER APPLYING AH
------------------------------------------------------------
IPv6 | |hop-by-hop, dest*, | | dest | | |
|orig IP hdr |routing, fragment. | AH | opt* | TCP | Data |
------------------------------------------------------------
|<---- authenticated except for mutable fields ----------->|

* = if present, could be before AH, after AH, or both

ESP and AH headers can be combined in a variety of modes. The IPsec
Architecture document describes the combinations of security
associations that must be supported.

Tunnel mode AH may be employed in either hosts or security gateways
(or in so-called "bump-in-the-stack" or "bump-in-the-wire"
implementations, as defined in the Security Architecture document).
When AH is implemented in a security gateway (to protect transit
traffic), tunnel mode must be used. In tunnel mode, the "inner" IP
header carries the ultimate source and destination addresses, while
an "outer" IP header may contain distinct IP addresses, e.g.,
addresses of security gateways. In tunnel mode, AH protects the
entire inner IP packet, including the entire inner IP header. The
position of AH in tunnel mode, relative to the outer IP header, is
the same as for AH in transport mode. The following diagram
illustrates AH tunnel mode positioning for typical IPv4 and IPv6
packets.

------------------------------------------------
IPv4 | new IP hdr* | | orig IP hdr* | | |
|(any options)| AH | (any options) |TCP | Data |
------------------------------------------------
|<- authenticated except for mutable fields -->|
| in the new IP hdr |

--------------------------------------------------------------
IPv6 | | ext hdrs*| | | ext hdrs*| | |
|new IP hdr*|if present| AH |orig IP hdr*|if present|TCP|Data|
--------------------------------------------------------------
|<-- authenticated except for mutable fields in new IP hdr ->|

* = construction of outer IP hdr/extensions and modification
of inner IP hdr/extensions is discussed below.

3.2 Authentication Algorithms

The authentication algorithm employed for the ICV computation is
specified by the SA. For point-to-point communication, suitable
authentication algorithms include keyed Message Authentication Codes
(MACs) based on symmetric encryption algorithms (e.g., DES) or on
one-way hash functions (e.g., MD5 or SHA-1). For multicast
communication, one-way hash algorithms combined with asymmetric
signature algorithms are appropriate, though performance and space
considerations currently preclude use of such algorithms. The
mandatory-to-implement authentication algorithms are described in
Section 5 "Conformance Requirements". Other algorithms MAY be
supported.

3.3 Outbound Packet Processing

In transport mode, the sender inserts the AH header after the IP
header and before an upper layer protocol header, as described above.
In tunnel mode, the outer and inner IP header/extensions can be
inter-related in a variety of ways. The construction of the outer IP
header/extensions during the encapsulation process is described in
the Security Architecture document.

If there is more than one IPsec header/extension required, the order
of the application of the security headers MUST be defined by
security policy. For simplicity of processing, each IPsec header
SHOULD ignore the existence (i.e., not zero the contents or try to
predict the contents) of IPsec headers to be applied later. (While a
native IP or bump-in-the-stack implementation could predict the
contents of later IPsec headers that it applies itself, it won't be
possible for it to predict any IPsec headers added by a bump-in-the-
wire implementation between the host and the network.)

3.3.1 Security Association Lookup

AH is applied to an outbound packet only after an IPsec
implementation determines that the packet is associated with an SA
that calls for AH processing. The process of determining what, if
any, IPsec processing is applied to outbound traffic is described in
the Security Architecture document.

3.3.2 Sequence Number Generation

The sender's counter is initialized to 0 when an SA is established.
The sender increments the Sequence Number for this SA and inserts the
new value into the Sequence Number Field. Thus the first packet sent
using a given SA will have a Sequence Number of 1.

If anti-replay is enabled (the default), the sender checks to ensure
that the counter has not cycled before inserting the new value in the
Sequence Number field. In other words, the sender MUST NOT send a
packet on an SA if doing so would cause the Sequence Number to cycle.
An attempt to transmit a packet that would result in Sequence Number
overflow is an auditable event. (Note that this approach to Sequence
Number management does not require use of modular arithmetic.)

The sender assumes anti-replay is enabled as a default, unless
otherwise notified by the receiver (see 3.4.3). Thus, if the counter
has cycled, the sender will set up a new SA and key (unless the SA
was configured with manual key management).

If anti-replay is disabled, the sender does not need to monitor or
reset the counter, e.g., in the case of manual key management (see
Section 5.) However, the sender still increments the counter and when
it reaches the maximum value, the counter rolls over back to zero.

3.3.3 Integrity Check Value Calculation

The AH ICV is computed over:
o IP header fields that are either immutable in transit or
that are predictable in value upon arrival at the endpoint
for the AH SA
o the AH header (Next Header, Payload Len, Reserved, SPI,
Sequence Number, and the Authentication Data (which is set
to zero for this computation), and explicit padding bytes
(if any))
o the upper level protocol data, which is assumed to be
immutable in transit

3.3.3.1 Handling Mutable Fields

If a field may be modified during transit, the value of the field is
set to zero for purposes of the ICV computation. If a field is
mutable, but its value at the (IPsec) receiver is predictable, then
that value is inserted into the field for purposes of the ICV
calculation. The Authentication Data field is also set to zero in
preparation for this computation. Note that by replacing each
field's value with zero, rather than omitting the field, alignment is
preserved for the ICV calculation. Also, the zero-fill approach
ensures that the length of the fields that are so handled cannot be
changed during transit, even though their contents are not explicitly
covered by the ICV.

As a new extension header or IPv4 option is created, it will be
defined in its own RFC and SHOULD include (in the Security
Considerations section) directions for how it should be handled when
calculating the AH ICV. If the IP (v4 or v6) implementation
encounters an extension header that it does not recognize, it will
discard the packet and send an ICMP message. IPsec will never see
the packet. If the IPsec implementation encounters an IPv4 option
that it does not recognize, it should zero the whole option, using
the second byte of the option as the length. IPv6 options (in
Destination extension headers or Hop by Hop extension header) contain
a flag indicating mutability, which determines appropriate processing
for such options.

3.3.3.1.1 ICV Computation for IPv4

3.3.3.1.1.1 Base Header Fields

The IPv4 base header fields are classified as follows:

Immutable
Version
Internet Header Length
Total Length
Identification
Protocol (This should be the value for AH.)
Source Address
Destination Address (without loose or strict source routing)

Mutable but predictable
Destination Address (with loose or strict source routing)

Mutable (zeroed prior to ICV calculation)
Type of Service (TOS)
Flags
Fragment Offset
Time to Live (TTL)
Header Checksum

TOS -- This field is excluded because some routers are known to
change the value of this field, even though the IP
specification does not consider TOS to be a mutable header
field.

Flags -- This field is excluded since an intermediate router might
set the DF bit, even if the source did not select it.

Fragment Offset -- Since AH is applied only to non-fragmented IP
packets, the Offset Field must always be zero, and thus it
is excluded (even though it is predictable).

TTL -- This is changed en-route as a normal course of processing
by routers, and thus its value at the receiver is not
predictable by the sender.

Header Checksum -- This will change if any of these other fields
changes, and thus its value upon reception cannot be
predicted by the sender.

3.3.3.1.1.2 Options

For IPv4 (unlike IPv6), there is no mechanism for tagging options as
mutable in transit. Hence the IPv4 options are explicitly listed in
Appendix A and classified as immutable, mutable but predictable, or
mutable. For IPv4, the entire option is viewed as a unit; so even
though the type and length fields within most options are immutable
in transit, if an option is classified as mutable, the entire option
is zeroed for ICV computation purposes.

3.3.3.1.2 ICV Computation for IPv6

3.3.3.1.2.1 Base Header Fields

The IPv6 base header fields are classified as follows:

Immutable
Version
Payload Length
Next Header (This should be the value for AH.)
Source Address
Destination Address (without Routing Extension Header)

Mutable but predictable
Destination Address (with Routing Extension Header)

Mutable (zeroed prior to ICV calculation)
Class
Flow Label
Hop Limit

3.3.3.1.2.2 Extension Headers Containing Options

IPv6 options in the Hop-by-Hop and Destination Extension Headers
contain a bit that indicates whether the option might change
(unpredictably) during transit. For any option for which contents
may change en-route, the entire "Option Data" field must be treated
as zero-valued octets when computing or verifying the ICV. The
Option Type and Opt Data Len are included in the ICV calculation.
All options for which the bit indicates immutability are included in
the ICV calculation. See the IPv6 specification [DH95] for more
information.

3.3.3.1.2.3 Extension Headers Not Containing Options

The IPv6 extension headers that do not contain options are explicitly
listed in Appendix A and classified as immutable, mutable but
predictable, or mutable.

3.3.3.2 Padding

3.3.3.2.1 Authentication Data Padding

As mentioned in section 2.6, the Authentication Data field explicitly
includes padding to ensure that the AH header is a multiple of 32
bits (IPv4) or 64 bits (IPv6). If padding is required, its length is
determined by two factors:

- the length of the ICV
- the IP protocol version (v4 or v6)

For example, if the output of the selected algorithm is 96-bits, no
padding is required for either IPv4 or for IPv6. However, if a
different length ICV is generated, due to use of a different
algorithm, then padding may be required depending on the length and
IP protocol version. The content of the padding field is arbitrarily
selected by the sender. (The padding is arbitrary, but need not be
random to achieve security.) These padding bytes are included in the
Authentication Data calculation, counted as part of the Payload
Length, and transmitted at the end of the Authentication Data field
to enable the receiver to perform the ICV calculation.

3.3.3.2.2 Implicit Packet Padding

For some authentication algorithms, the byte string over which the
ICV computation is performed must be a multiple of a blocksize
specified by the algorithm. If the IP packet length (including AH)
does not match the blocksize requirements for the algorithm, implicit
padding MUST be appended to the end of the packet, prior to ICV
computation. The padding octets MUST have a value of zero. The
blocksize (and hence the length of the padding) is specified by the
algorithm specification. This padding is not transmitted with the
packet. Note that MD5 and SHA-1 are viewed as having a 1-byte
blocksize because of their internal padding conventions.

3.3.4 Fragmentation

If required, IP fragmentation occurs after AH processing within an
IPsec implementation. Thus, transport mode AH is applied only to
whole IP datagrams (not to IP fragments). An IP packet to which AH
has been applied may itself be fragmented by routers en route, and
such fragments must be reassembled prior to AH processing at a
receiver. In tunnel mode, AH is applied to an IP packet, the payload
of which may be a fragmented IP packet. For example, a security
gateway or a "bump-in-the-stack" or "bump-in-the-wire" IPsec
implementation (see the Security Architecture document for details)
may apply tunnel mode AH to such fragments.

3.4 Inbound Packet Processing

If there is more than one IPsec header/extension present, the
processing for each one ignores (does not zero, does not use) any
IPsec headers applied subsequent to the header being processed.

3.4.1 Reassembly

If required, reassembly is performed prior to AH processing. If a
packet offered to AH for processing appears to be an IP fragment,
i.e., the OFFSET field is non-zero or the MORE FRAGMENTS flag is set,
the receiver MUST discard the packet; this is an auditable event. The
audit log entry for this event SHOULD include the SPI value,
date/time, Source Address, Destination Address, and (in IPv6) the
Flow ID.

NOTE: For packet reassembly, the current IPv4 spec does NOT require
either the zero'ing of the OFFSET field or the clearing of the MORE
FRAGMENTS flag. In order for a reassembled packet to be processed by
IPsec (as opposed to discarded as an apparent fragment), the IP code
must do these two things after it reassembles a packet.

3.4.2 Security Association Lookup

Upon receipt of a packet containing an IP Authentication Header, the
receiver determines the appropriate (unidirectional) SA, based on the
destination IP address, security protocol (AH), and the SPI. (This
process is described in more detail in the Security Architecture
document.) The SA indicates whether the Sequence Number field will
be checked, specifies the algorithm(s) employed for ICV computation,
and indicates the key(s) required to validate the ICV.

If no valid Security Association exists for this session (e.g., the
receiver has no key), the receiver MUST discard the packet; this is
an auditable event. The audit log entry for this event SHOULD
include the SPI value, date/time, Source Address, Destination
Address, and (in IPv6) the Flow ID.

3.4.3 Sequence Number Verification

All AH implementations MUST support the anti-replay service, though
its use may be enabled or disabled by the receiver on a per-SA basis.
(Note that there are no provisions for managing transmitted Sequence
Number values among multiple senders directing traffic to a single SA
(irrespective of whether the destination address is unicast,
broadcast, or multicast). Thus the anti-replay service SHOULD NOT be
used in a multi-sender environment that employs a single SA.)

If the receiver does not enable anti-replay for an SA, no inbound
checks are performed on the Sequence Number. However, from the
perspective of the sender, the default is to assume that anti-replay
is enabled at the receiver. To avoid having the sender do
unnecessary sequence number monitoring and SA setup (see section
3.3.2), if an SA establishment protocol such as IKE is employed, the
receiver SHOULD notify the sender, during SA establishment, if the
receiver will not provide anti-replay protection.

If the receiver has enabled the anti-replay service for this SA, the
receiver packet counter for the SA MUST be initialized to zero when
the SA is established. For each received packet, the receiver MUST
verify that the packet contains a Sequence Number that does not
duplicate the Sequence Number of any other packets received during
the life of this SA. This SHOULD be the first AH check applied to a
packet after it has been matched to an SA, to speed rejection of
duplicate packets.

Duplicates are rejected through the use of a sliding receive window.
(How the window is implemented is a local matter, but the following
text describes the functionality that the implementation must
exhibit.) A MINIMUM window size of 32 MUST be supported; but a
window size of 64 is preferred and SHOULD be employed as the default.
Another window size (larger than the MINIMUM) MAY be chosen by the
receiver. (The receiver does NOT notify the sender of the window
size.)

The "right" edge of the window represents the highest, validated
Sequence Number value received on this SA. Packets that contain
Sequence Numbers lower than the "left" edge of the window are
rejected. Packets falling within the window are checked against a
list of received packets within the window. An efficient means for
performing this check, based on the use of a bit mask, is described
in the Security Architecture document.

If the received packet falls within the window and is new, or if the
packet is to the right of the window, then the receiver proceeds to
ICV verification. If the ICV validation fails, the receiver MUST
discard the received IP datagram as invalid; this is an auditable
event. The audit log entry for this event SHOULD include the SPI
value, date/time, Source Address, Destination Address, the Sequence
Number, and (in IPv6) the Flow ID. The receive window is updated
only if the ICV verification succeeds.

DISCUSSION:

Note that if the packet is either inside the window and new, or is
outside the window on the "right" side, the receiver MUST
authenticate the packet before updating the Sequence Number window
data.

3.4.4 Integrity Check Value Verification

The receiver computes the ICV over the appropriate fields of the
packet, using the specified authentication algorithm, and verifies
that it is the same as the ICV included in the Authentication Data
field of the packet. Details of the computation are provided below.

If the computed and received ICV's match, then the datagram is valid,
and it is accepted. If the test fails, then the receiver MUST
discard the received IP datagram as invalid; this is an auditable
event. The audit log entry SHOULD include the SPI value, date/time
received, Source Address, Destination Address, and (in IPv6) the Flow
ID.

DISCUSSION:

Begin by saving the ICV value and replacing it (but not any
Authentication Data padding) with zero. Zero all other fields
that may have been modified during transit. (See section 3.3.3.1
for a discussion of which fields are zeroed before performing the
ICV calculation.) Check the overall length of the packet, and if
it requires implicit padding based on the requirements of the
authentication algorithm, append zero-filled bytes to the end of
the packet as required. Perform the ICV computation and compare
the result with the saved value, using the comparison rules
defined by the algorithm specification. (For example, if a
digital signature and one-way hash are used for the ICV
computation, the matching process is more complex.)

4. Auditing

Not all systems that implement AH will implement auditing. However,
if AH is incorporated into a system that supports auditing, then the
AH implementation MUST also support auditing and MUST allow a system
administrator to enable or disable auditing for AH. For the most
part, the granularity of auditing is a local matter. However,
several auditable events are identified in this specification and for
each of these events a minimum set of information that SHOULD be
included in an audit log is defined. Additional information also MAY
be included in the audit log for each of these events, and additional
events, not explicitly called out in this specification, also MAY

result in audit log entries. There is no requirement for the
receiver to transmit any message to the purported sender in response
to the detection of an auditable event, because of the potential to
induce denial of service via such action.

5. Conformance Requirements

Implementations that claim conformance or compliance with this
specification MUST fully implement the AH syntax and processing
described here and MUST comply with all requirements of the Security
Architecture document. If the key used to compute an ICV is manually
distributed, correct provision of the anti-replay service would
require correct maintenance of the counter state at the sender, until
the key is replaced, and there likely would be no automated recovery
provision if counter overflow were imminent. Thus a compliant
implementation SHOULD NOT provide this service in conjunction with
SAs that are manually keyed. A compliant AH implementation MUST
support the following mandatory-to-implement algorithms:

- HMAC with MD5 [MG97a]
- HMAC with SHA-1 [MG97b]

6. Security Considerations

Security is central to the design of this protocol, and these
security considerations permeate the specification. Additional
security-relevant aspects of using the IPsec protocol are discussed
in the Security Architecture document.

7. Differences from RFC 1826

This specification of AH differs from RFC 1826 [ATK95] in several
important respects, but the fundamental features of AH remain intact.
One goal of the revision of RFC 1826 was to provide a complete
framework for AH, with ancillary RFCs required only for algorithm
specification. For example, the anti-replay service is now an
integral, mandatory part of AH, not a feature of a transform defined
in another RFC. Carriage of a sequence number to support this
service is now required at all times. The default algorithms
required for interoperability have been changed to HMAC with MD5 or
SHA-1 (vs. keyed MD5), for security reasons. The list of IPv4 header
fields excluded from the ICV computation has been expanded to include
the OFFSET and FLAGS fields.

Another motivation for revision was to provide additional detail and
clarification of subtle points. This specification provides
rationale for exclusion of selected IPv4 header fields from AH
coverage and provides examples on positioning of AH in both the IPv4

and v6 contexts. Auditing requirements have been clarified in this
version of the specification. Tunnel mode AH was mentioned only in
passing in RFC 1826, but now is a mandatory feature of AH.
Discussion of interactions with key management and with security
labels have been moved to the Security Architecture document.

Acknowledgements

For over 3 years, this document has evolved through multiple versions
and iterations. During this time, many people have contributed
significant ideas and energy to the process and the documents
themselves. The authors would like to thank Karen Seo for providing
extensive help in the review, editing, background research, and
coordination for this version of the specification. The authors
would also like to thank the members of the IPsec and IPng working
groups, with special mention of the efforts of (in alphabetic order):
Steve Bellovin, Steve Deering, Francis Dupont, Phil Karn, Frank
Kastenholz, Perry Metzger, David Mihelcic, Hilarie Orman, Norman
Shulman, William Simpson, and Nina Yuan.

Appendix A -- Mutability of IP Options/Extension Headers

A1. IPv4 Options

This table shows how the IPv4 options are classified with regard to
"mutability". Where two references are provided, the second one
supercedes the first. This table is based in part on information
provided in RFC1700, "ASSIGNED NUMBERS", (October 1994).

Opt.
Copy Class # Name Reference
---- ----- --- ------------------------ ---------
IMMUTABLE -- included in ICV calculation
0 0 0 End of Options List [RFC791]
0 0 1 No Operation [RFC791]
1 0 2 Security [RFC1108(historic but in use)]
1 0 5 Extended Security [RFC1108(historic but in use)]
1 0 6 Commercial Security [expired I-D, now US MIL STD]
1 0 20 Router Alert [RFC2113]
1 0 21 Sender Directed Multi- [RFC1770]
Destination Delivery
MUTABLE -- zeroed
1 0 3 Loose Source Route [RFC791]
0 2 4 Time Stamp [RFC791]
0 0 7 Record Route [RFC791]
1 0 9 Strict Source Route [RFC791]
0 2 18 Traceroute [RFC1393]

EXPERIMENTAL, SUPERCEDED -- zeroed
1 0 8 Stream ID [RFC791, RFC1122 (Host Req)]
0 0 11 MTU Probe [RFC1063, RFC1191 (PMTU)]
0 0 12 MTU Reply [RFC1063, RFC1191 (PMTU)]
1 0 17 Extended Internet Proto [RFC1385, RFC1883 (IPv6)]
0 0 10 Experimental Measurement [ZSu]
1 2 13 Experimental Flow Control [Finn]
1 0 14 Experimental Access Ctl [Estrin]
0 0 15 ??? [VerSteeg]
1 0 16 IMI Traffic Descriptor [Lee]
1 0 19 Address Extension [Ullmann IPv7]

NOTE: Use of the Router Alert option is potentially incompatible with
use of IPsec. Although the option is immutable, its use implies that
each router along a packet's path will "process" the packet and
consequently might change the packet. This would happen on a hop by
hop basis as the packet goes from router to router. Prior to being
processed by the application to which the option contents are
directed, e.g., RSVP/IGMP, the packet should encounter AH processing.

However, AH processing would require that each router along the path
is a member of a multicast-SA defined by the SPI. This might pose
problems for packets that are not strictly source routed, and it
requires multicast support techniques not currently available.

NOTE: Addition or removal of any security labels (BSO, ESO, CIPSO) by
systems along a packet's path conflicts with the classification of
these IP Options as immutable and is incompatible with the use of
IPsec.

NOTE: End of Options List options SHOULD be repeated as necessary to
ensure that the IP header ends on a 4 byte boundary in order to
ensure that there are no unspecified bytes which could be used for a
covert channel.

A2. IPv6 Extension Headers

This table shows how the IPv6 Extension Headers are classified with
regard to "mutability".

Option/Extension Name Reference
----------------------------------- ---------
MUTABLE BUT PREDICTABLE -- included in ICV calculation
Routing (Type 0) [RFC1883]

BIT INDICATES IF OPTION IS MUTABLE (CHANGES UNPREDICTABLY DURING TRANSIT)
Hop by Hop options [RFC1883]
Destination options [RFC1883]

NOT APPLICABLE
Fragmentation [RFC1883]

Options -- IPv6 options in the Hop-by-Hop and Destination
Extension Headers contain a bit that indicates whether the
option might change (unpredictably) during transit. For
any option for which contents may change en-route, the
entire "Option Data" field must be treated as zero-valued
octets when computing or verifying the ICV. The Option
Type and Opt Data Len are included in the ICV calculation.
All options for which the bit indicates immutability are
included in the ICV calculation. See the IPv6
specification [DH95] for more information.

Routing (Type 0) -- The IPv6 Routing Header "Type 0" will
rearrange the address fields within the packet during
transit from source to destination. However, the contents
of the packet as it will appear at the receiver are known
to the sender and to all intermediate hops. Hence, the

IPv6 Routing Header "Type 0" is included in the
Authentication Data calculation as mutable but predictable.
The sender must order the field so that it appears as it
will at the receiver, prior to performing the ICV
computation.

Fragmentation -- Fragmentation occurs after outbound IPsec
processing (section 3.3) and reassembly occurs before
inbound IPsec processing (section 3.4). So the
Fragmentation Extension Header, if it exists, is not seen
by IPsec.

Note that on the receive side, the IP implementation could
leave a Fragmentation Extension Header in place when it
does re-assembly. If this happens, then when AH receives
the packet, before doing ICV processing, AH MUST "remove"
(or skip over) this header and change the previous header's
"Next Header" field to be the "Next Header" field in the
Fragmentation Extension Header.

Note that on the send side, the IP implementation could
give the IPsec code a packet with a Fragmentation Extension
Header with Offset of 0 (first fragment) and a More
Fragments Flag of 0 (last fragment). If this happens, then
before doing ICV processing, AH MUST first "remove" (or
skip over) this header and change the previous header's
"Next Header" field to be the "Next Header" field in the
Fragmentation Extension Header.

References

[ATK95] Atkinson, R., "The IP Authentication Header", RFC 1826,
August 1995.

[Bra97] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Level", BCP 14, RFC 2119, March 1997.

[DH95] Deering, S., and B. Hinden, "Internet Protocol version 6
(IPv6) Specification", RFC 1883, December 1995.

[HC98] Harkins, D., and D. Carrel, "The Internet Key Exchange
(IKE)", RFC 2409, November 1998.

[KA97a] Kent, S., and R. Atkinson, "Security Architecture for the
Internet Protocol", RFC 2401, November 1998.

[KA97b] Kent, S., and R. Atkinson, "IP Encapsulating Security
Payload (ESP)", RFC 2406, November 1998.

[MG97a] Madson, C., and R. Glenn, "The Use of HMAC-MD5-96 within
ESP and AH", RFC 2403, November 1998.

[MG97b] Madson, C., and R. Glenn, "The Use of HMAC-SHA-1-96 within
ESP and AH", RFC 2404, November 1998.

[STD-2] Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC
1700, October 1994. See also:
http://www.iana.org/numbers.html

Disclaimer

The views and specification here are those of the authors and are not
necessarily those of their employers. The authors and their
employers specifically disclaim responsibility for any problems
arising from correct or incorrect implementation or use of this
specification.

Author Information

Stephen Kent
BBN Corporation
70 Fawcett Street
Cambridge, MA 02140
USA

Phone: +1 (617) 873-3988
EMail: kent@bbn.com

Randall Atkinson
@Home Network
425 Broadway,
Redwood City, CA 94063
USA

Phone: +1 (415) 569-5000
EMail: rja@corp.home.net

Copyright (C) The Internet Society (1998). All Rights Reserved.

This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.

The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.

This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.