ipv6 resources

RFC 1981 – Path MTU Discovery for IP version 6

 
Network Working Group                                          J. McCann
Request for Comments: 1981 Digital Equipment Corporation
Category: Standards Track S. Deering
Xerox PARC
J. Mogul
Digital Equipment Corporation
August 1996

Path MTU Discovery for IP version 6

Status of this Memo

This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.

Abstract

This document describes Path MTU Discovery for IP version 6. It is
largely derived from RFC 1191, which describes Path MTU Discovery for
IP version 4.

Table of Contents

1. Introduction.................................................2
2. Terminology..................................................2
3. Protocol overview............................................3
4. Protocol Requirements........................................4
5. Implementation Issues........................................5
5.1. Layering...................................................5
5.2. Storing PMTU information...................................6
5.3. Purging stale PMTU information.............................8
5.4. TCP layer actions..........................................9
5.5. Issues for other transport protocols......................11
5.6. Management interface......................................12
6. Security Considerations.....................................12
Acknowledgements...............................................13
Appendix A - Comparison to RFC 1191............................14
References.....................................................14
Authors' Addresses.............................................15

1. Introduction

When one IPv6 node has a large amount of data to send to another
node, the data is transmitted in a series of IPv6 packets. It is
usually preferable that these packets be of the largest size that can
successfully traverse the path from the source node to the
destination node. This packet size is referred to as the Path MTU
(PMTU), and it is equal to the minimum link MTU of all the links in a
path. IPv6 defines a standard mechanism for a node to discover the
PMTU of an arbitrary path.

IPv6 nodes SHOULD implement Path MTU Discovery in order to discover
and take advantage of paths with PMTU greater than the IPv6 minimum
link MTU [IPv6-SPEC]. A minimal IPv6 implementation (e.g., in a boot
ROM) may choose to omit implementation of Path MTU Discovery.

Nodes not implementing Path MTU Discovery use the IPv6 minimum link
MTU defined in [IPv6-SPEC] as the maximum packet size. In most
cases, this will result in the use of smaller packets than necessary,
because most paths have a PMTU greater than the IPv6 minimum link
MTU. A node sending packets much smaller than the Path MTU allows is
wasting network resources and probably getting suboptimal throughput.

2. Terminology

node - a device that implements IPv6.

router - a node that forwards IPv6 packets not explicitly
addressed to itself.

host - any node that is not a router.

upper layer - a protocol layer immediately above IPv6. Examples are
transport protocols such as TCP and UDP, control
protocols such as ICMP, routing protocols such as OSPF,
and internet or lower-layer protocols being "tunneled"
over (i.e., encapsulated in) IPv6 such as IPX,
AppleTalk, or IPv6 itself.

link - a communication facility or medium over which nodes can
communicate at the link layer, i.e., the layer
immediately below IPv6. Examples are Ethernets (simple
or bridged); PPP links; X.25, Frame Relay, or ATM
networks; and internet (or higher) layer "tunnels",
such as tunnels over IPv4 or IPv6 itself.

interface - a node's attachment to a link.

address - an IPv6-layer identifier for an interface or a set of
interfaces.

packet - an IPv6 header plus payload.

link MTU - the maximum transmission unit, i.e., maximum packet
size in octets, that can be conveyed in one piece over
a link.

path - the set of links traversed by a packet between a source
node and a destination node

path MTU - the minimum link MTU of all the links in a path between
a source node and a destination node.

PMTU - path MTU

Path MTU
Discovery - process by which a node learns the PMTU of a path

flow - a sequence of packets sent from a particular source
to a particular (unicast or multicast) destination for
which the source desires special handling by the
intervening routers.

flow id - a combination of a source address and a non-zero
flow label.

3. Protocol overview

This memo describes a technique to dynamically discover the PMTU of a
path. The basic idea is that a source node initially assumes that
the PMTU of a path is the (known) MTU of the first hop in the path.
If any of the packets sent on that path are too large to be forwarded
by some node along the path, that node will discard them and return
ICMPv6 Packet Too Big messages [ICMPv6]. Upon receipt of such a
message, the source node reduces its assumed PMTU for the path based
on the MTU of the constricting hop as reported in the Packet Too Big
message.

The Path MTU Discovery process ends when the node's estimate of the
PMTU is less than or equal to the actual PMTU. Note that several
iterations of the packet-sent/Packet-Too-Big-message-received cycle
may occur before the Path MTU Discovery process ends, as there may be
links with smaller MTUs further along the path.

Alternatively, the node may elect to end the discovery process by
ceasing to send packets larger than the IPv6 minimum link MTU.

The PMTU of a path may change over time, due to changes in the
routing topology. Reductions of the PMTU are detected by Packet Too
Big messages. To detect increases in a path's PMTU, a node
periodically increases its assumed PMTU. This will almost always
result in packets being discarded and Packet Too Big messages being
generated, because in most cases the PMTU of the path will not have
changed. Therefore, attempts to detect increases in a path's PMTU
should be done infrequently.

Path MTU Discovery supports multicast as well as unicast
destinations. In the case of a multicast destination, copies of a
packet may traverse many different paths to many different nodes.
Each path may have a different PMTU, and a single multicast packet
may result in multiple Packet Too Big messages, each reporting a
different next-hop MTU. The minimum PMTU value across the set of
paths in use determines the size of subsequent packets sent to the
multicast destination.

Note that Path MTU Discovery must be performed even in cases where a
node "thinks" a destination is attached to the same link as itself.
In a situation such as when a neighboring router acts as proxy [ND]
for some destination, the destination can to appear to be directly
connected but is in fact more than one hop away.

4. Protocol Requirements

As discussed in section 1, IPv6 nodes are not required to implement
Path MTU Discovery. The requirements in this section apply only to
those implementations that include Path MTU Discovery.

When a node receives a Packet Too Big message, it MUST reduce its
estimate of the PMTU for the relevant path, based on the value of the
MTU field in the message. The precise behavior of a node in this
circumstance is not specified, since different applications may have
different requirements, and since different implementation
architectures may favor different strategies.

After receiving a Packet Too Big message, a node MUST attempt to
avoid eliciting more such messages in the near future. The node MUST
reduce the size of the packets it is sending along the path. Using a
PMTU estimate larger than the IPv6 minimum link MTU may continue to
elicit Packet Too Big messages. Since each of these messages (and
the dropped packets they respond to) consume network resources, the
node MUST force the Path MTU Discovery process to end.

Nodes using Path MTU Discovery MUST detect decreases in PMTU as fast
as possible. Nodes MAY detect increases in PMTU, but because doing
so requires sending packets larger than the current estimated PMTU,

and because the likelihood is that the PMTU will not have increased,
this MUST be done at infrequent intervals. An attempt to detect an
increase (by sending a packet larger than the current estimate) MUST
NOT be done less than 5 minutes after a Packet Too Big message has
been received for the given path. The recommended setting for this
timer is twice its minimum value (10 minutes).

A node MUST NOT reduce its estimate of the Path MTU below the IPv6
minimum link MTU.

Note: A node may receive a Packet Too Big message reporting a
next-hop MTU that is less than the IPv6 minimum link MTU. In that
case, the node is not required to reduce the size of subsequent
packets sent on the path to less than the IPv6 minimun link MTU,
but rather must include a Fragment header in those packets [IPv6-
SPEC].

A node MUST NOT increase its estimate of the Path MTU in response to
the contents of a Packet Too Big message. A message purporting to
announce an increase in the Path MTU might be a stale packet that has
been floating around in the network, a false packet injected as part
of a denial-of-service attack, or the result of having multiple paths
to the destination, each with a different PMTU.

5. Implementation Issues

This section discusses a number of issues related to the
implementation of Path MTU Discovery. This is not a specification,
but rather a set of notes provided as an aid for implementors.

The issues include:

- What layer or layers implement Path MTU Discovery?

- How is the PMTU information cached?

- How is stale PMTU information removed?

- What must transport and higher layers do?

5.1. Layering

In the IP architecture, the choice of what size packet to send is
made by a protocol at a layer above IP. This memo refers to such a
protocol as a "packetization protocol". Packetization protocols are
usually transport protocols (for example, TCP) but can also be
higher-layer protocols (for example, protocols built on top of UDP).

Implementing Path MTU Discovery in the packetization layers
simplifies some of the inter-layer issues, but has several drawbacks:
the implementation may have to be redone for each packetization
protocol, it becomes hard to share PMTU information between different
packetization layers, and the connection-oriented state maintained by
some packetization layers may not easily extend to save PMTU
information for long periods.

It is therefore suggested that the IP layer store PMTU information
and that the ICMP layer process received Packet Too Big messages.
The packetization layers may respond to changes in the PMTU, by
changing the size of the messages they send. To support this
layering, packetization layers require a way to learn of changes in
the value of MMS_S, the "maximum send transport-message size". The
MMS_S is derived from the Path MTU by subtracting the size of the
IPv6 header plus space reserved by the IP layer for additional
headers (if any).

It is possible that a packetization layer, perhaps a UDP application
outside the kernel, is unable to change the size of messages it
sends. This may result in a packet size that exceeds the Path MTU.
To accommodate such situations, IPv6 defines a mechanism that allows
large payloads to be divided into fragments, with each fragment sent
in a separate packet (see [IPv6-SPEC] section "Fragment Header").
However, packetization layers are encouraged to avoid sending
messages that will require fragmentation (for the case against
fragmentation, see [FRAG]).

5.2. Storing PMTU information

Ideally, a PMTU value should be associated with a specific path
traversed by packets exchanged between the source and destination
nodes. However, in most cases a node will not have enough
information to completely and accurately identify such a path.
Rather, a node must associate a PMTU value with some local
representation of a path. It is left to the implementation to select
the local representation of a path.

In the case of a multicast destination address, copies of a packet
may traverse many different paths to reach many different nodes. The
local representation of the "path" to a multicast destination must in
fact represent a potentially large set of paths.

Minimally, an implementation could maintain a single PMTU value to be
used for all packets originated from the node. This PMTU value would
be the minimum PMTU learned across the set of all paths in use by the
node. This approach is likely to result in the use of smaller
packets than is necessary for many paths.

An implementation could use the destination address as the local
representation of a path. The PMTU value associated with a
destination would be the minimum PMTU learned across the set of all
paths in use to that destination. The set of paths in use to a
particular destination is expected to be small, in many cases
consisting of a single path. This approach will result in the use of
optimally sized packets on a per-destination basis. This approach
integrates nicely with the conceptual model of a host as described in
[ND]: a PMTU value could be stored with the corresponding entry in
the destination cache.

If flows [IPv6-SPEC] are in use, an implementation could use the flow
id as the local representation of a path. Packets sent to a
particular destination but belonging to different flows may use
different paths, with the choice of path depending on the flow id.
This approach will result in the use of optimally sized packets on a
per-flow basis, providing finer granularity than PMTU values
maintained on a per-destination basis.

For source routed packets (i.e. packets containing an IPv6 Routing
header [IPv6-SPEC]), the source route may further qualify the local
representation of a path. In particular, a packet containing a type
0 Routing header in which all bits in the Strict/Loose Bit Map are
equal to 1 contains a complete path specification. An implementation
could use source route information in the local representation of a
path.

Note: Some paths may be further distinguished by different
security classifications. The details of such classifications are
beyond the scope of this memo.

Initially, the PMTU value for a path is assumed to be the (known) MTU
of the first-hop link.

When a Packet Too Big message is received, the node determines which
path the message applies to based on the contents of the Packet Too
Big message. For example, if the destination address is used as the
local representation of a path, the destination address from the
original packet would be used to determine which path the message
applies to.

Note: if the original packet contained a Routing header, the
Routing header should be used to determine the location of the
destination address within the original packet. If Segments Left
is equal to zero, the destination address is in the Destination
Address field in the IPv6 header. If Segments Left is greater
than zero, the destination address is the last address
(Address[n]) in the Routing header.

The node then uses the value in the MTU field in the Packet Too Big
message as a tentative PMTU value, and compares the tentative PMTU to
the existing PMTU. If the tentative PMTU is less than the existing
PMTU estimate, the tentative PMTU replaces the existing PMTU as the
PMTU value for the path.

The packetization layers must be notified about decreases in the
PMTU. Any packetization layer instance (for example, a TCP
connection) that is actively using the path must be notified if the
PMTU estimate is decreased.

Note: even if the Packet Too Big message contains an Original
Packet Header that refers to a UDP packet, the TCP layer must be
notified if any of its connections use the given path.

Also, the instance that sent the packet that elicited the Packet Too
Big message should be notified that its packet has been dropped, even
if the PMTU estimate has not changed, so that it may retransmit the
dropped data.

Note: An implementation can avoid the use of an asynchronous
notification mechanism for PMTU decreases by postponing
notification until the next attempt to send a packet larger than
the PMTU estimate. In this approach, when an attempt is made to
SEND a packet that is larger than the PMTU estimate, the SEND
function should fail and return a suitable error indication. This
approach may be more suitable to a connectionless packetization
layer (such as one using UDP), which (in some implementations) may
be hard to "notify" from the ICMP layer. In this case, the normal
timeout-based retransmission mechanisms would be used to recover
from the dropped packets.

It is important to understand that the notification of the
packetization layer instances using the path about the change in the
PMTU is distinct from the notification of a specific instance that a
packet has been dropped. The latter should be done as soon as
practical (i.e., asynchronously from the point of view of the
packetization layer instance), while the former may be delayed until
a packetization layer instance wants to create a packet.
Retransmission should be done for only for those packets that are
known to be dropped, as indicated by a Packet Too Big message.

5.3. Purging stale PMTU information

Internetwork topology is dynamic; routes change over time. While the
local representation of a path may remain constant, the actual
path(s) in use may change. Thus, PMTU information cached by a node
can become stale.

If the stale PMTU value is too large, this will be discovered almost
immediately once a large enough packet is sent on the path. No such
mechanism exists for realizing that a stale PMTU value is too small,
so an implementation should "age" cached values. When a PMTU value
has not been decreased for a while (on the order of 10 minutes), the
PMTU estimate should be set to the MTU of the first-hop link, and the
packetization layers should be notified of the change. This will
cause the complete Path MTU Discovery process to take place again.

Note: an implementation should provide a means for changing the
timeout duration, including setting it to "infinity". For
example, nodes attached to an FDDI link which is then attached to
the rest of the Internet via a small MTU serial line are never
going to discover a new non-local PMTU, so they should not have to
put up with dropped packets every 10 minutes.

An upper layer must not retransmit data in response to an increase in
the PMTU estimate, since this increase never comes in response to an
indication of a dropped packet.

One approach to implementing PMTU aging is to associate a timestamp
field with a PMTU value. This field is initialized to a "reserved"
value, indicating that the PMTU is equal to the MTU of the first hop
link. Whenever the PMTU is decreased in response to a Packet Too Big
message, the timestamp is set to the current time.

Once a minute, a timer-driven procedure runs through all cached PMTU
values, and for each PMTU whose timestamp is not "reserved" and is
older than the timeout interval:

- The PMTU estimate is set to the MTU of the first hop link.

- The timestamp is set to the "reserved" value.

- Packetization layers using this path are notified of the increase.

5.4. TCP layer actions

The TCP layer must track the PMTU for the path(s) in use by a
connection; it should not send segments that would result in packets
larger than the PMTU. A simple implementation could ask the IP layer
for this value each time it created a new segment, but this could be
inefficient. Moreover, TCP implementations that follow the "slow-
start" congestion-avoidance algorithm [CONG] typically calculate and
cache several other values derived from the PMTU. It may be simpler
to receive asynchronous notification when the PMTU changes, so that
these variables may be updated.

A TCP implementation must also store the MSS value received from its
peer, and must not send any segment larger than this MSS, regardless
of the PMTU. In 4.xBSD-derived implementations, this may require
adding an additional field to the TCP state record.

The value sent in the TCP MSS option is independent of the PMTU.
This MSS option value is used by the other end of the connection,
which may be using an unrelated PMTU value. See [IPv6-SPEC] sections
"Packet Size Issues" and "Maximum Upper-Layer Payload Size" for
information on selecting a value for the TCP MSS option.

When a Packet Too Big message is received, it implies that a packet
was dropped by the node that sent the ICMP message. It is sufficient
to treat this as any other dropped segment, and wait until the
retransmission timer expires to cause retransmission of the segment.
If the Path MTU Discovery process requires several steps to find the
PMTU of the full path, this could delay the connection by many
round-trip times.

Alternatively, the retransmission could be done in immediate response
to a notification that the Path MTU has changed, but only for the
specific connection specified by the Packet Too Big message. The
packet size used in the retransmission should be no larger than the
new PMTU.

Note: A packetization layer must not retransmit in response to
every Packet Too Big message, since a burst of several oversized
segments will give rise to several such messages and hence several
retransmissions of the same data. If the new estimated PMTU is
still wrong, the process repeats, and there is an exponential
growth in the number of superfluous segments sent.

This means that the TCP layer must be able to recognize when a
Packet Too Big notification actually decreases the PMTU that it
has already used to send a packet on the given connection, and
should ignore any other notifications.

Many TCP implementations incorporate "congestion avoidance" and
"slow-start" algorithms to improve performance [CONG]. Unlike a
retransmission caused by a TCP retransmission timeout, a
retransmission caused by a Packet Too Big message should not change
the congestion window. It should, however, trigger the slow-start
mechanism (i.e., only one segment should be retransmitted until
acknowledgements begin to arrive again).

TCP performance can be reduced if the sender's maximum window size is
not an exact multiple of the segment size in use (this is not the
congestion window size, which is always a multiple of the segment

size). In many systems (such as those derived from 4.2BSD), the
segment size is often set to 1024 octets, and the maximum window size
(the "send space") is usually a multiple of 1024 octets, so the
proper relationship holds by default. If Path MTU Discovery is used,
however, the segment size may not be a submultiple of the send space,
and it may change during a connection; this means that the TCP layer
may need to change the transmission window size when Path MTU
Discovery changes the PMTU value. The maximum window size should be
set to the greatest multiple of the segment size that is less than or
equal to the sender's buffer space size.

5.5. Issues for other transport protocols

Some transport protocols (such as ISO TP4 [ISOTP]) are not allowed to
repacketize when doing a retransmission. That is, once an attempt is
made to transmit a segment of a certain size, the transport cannot
split the contents of the segment into smaller segments for
retransmission. In such a case, the original segment can be
fragmented by the IP layer during retransmission. Subsequent
segments, when transmitted for the first time, should be no larger
than allowed by the Path MTU.

The Sun Network File System (NFS) uses a Remote Procedure Call (RPC)
protocol [RPC] that, when used over UDP, in many cases will generate
payloads that must be fragmented even for the first-hop link. This
might improve performance in certain cases, but it is known to cause
reliability and performance problems, especially when the client and
server are separated by routers.

It is recommended that NFS implementations use Path MTU Discovery
whenever routers are involved. Most NFS implementations allow the
RPC datagram size to be changed at mount-time (indirectly, by
changing the effective file system block size), but might require
some modification to support changes later on.

Also, since a single NFS operation cannot be split across several UDP
datagrams, certain operations (primarily, those operating on file
names and directories) require a minimum payload size that if sent in
a single packet would exceed the PMTU. NFS implementations should
not reduce the payload size below this threshold, even if Path MTU
Discovery suggests a lower value. In this case the payload will be
fragmented by the IP layer.

5.6. Management interface

It is suggested that an implementation provide a way for a system
utility program to:

- Specify that Path MTU Discovery not be done on a given path.

- Change the PMTU value associated with a given path.

The former can be accomplished by associating a flag with the path;
when a packet is sent on a path with this flag set, the IP layer does
not send packets larger than the IPv6 minimum link MTU.

These features might be used to work around an anomalous situation,
or by a routing protocol implementation that is able to obtain Path
MTU values.

The implementation should also provide a way to change the timeout
period for aging stale PMTU information.

6. Security Considerations

This Path MTU Discovery mechanism makes possible two denial-of-
service attacks, both based on a malicious party sending false Packet
Too Big messages to a node.

In the first attack, the false message indicates a PMTU much smaller
than reality. This should not entirely stop data flow, since the
victim node should never set its PMTU estimate below the IPv6 minimum
link MTU. It will, however, result in suboptimal performance.

In the second attack, the false message indicates a PMTU larger than
reality. If believed, this could cause temporary blockage as the
victim sends packets that will be dropped by some router. Within one
round-trip time, the node would discover its mistake (receiving
Packet Too Big messages from that router), but frequent repetition of
this attack could cause lots of packets to be dropped. A node,
however, should never raise its estimate of the PMTU based on a
Packet Too Big message, so should not be vulnerable to this attack.

A malicious party could also cause problems if it could stop a victim
from receiving legitimate Packet Too Big messages, but in this case
there are simpler denial-of-service attacks available.

Acknowledgements

We would like to acknowledge the authors of and contributors to
[RFC-1191], from which the majority of this document was derived. We
would also like to acknowledge the members of the IPng working group
for their careful review and constructive criticisms.

Appendix A - Comparison to RFC 1191

This document is based in large part on RFC 1191, which describes
Path MTU Discovery for IPv4. Certain portions of RFC 1191 were not
needed in this document:

router specification - Packet Too Big messages and corresponding
router behavior are defined in [ICMPv6]

Don't Fragment bit - there is no DF bit in IPv6 packets

TCP MSS discussion - selecting a value to send in the TCP MSS
option is discussed in [IPv6-SPEC]

old-style messages - all Packet Too Big messages report the
MTU of the constricting link

MTU plateau tables - not needed because there are no old-style
messages

References

[CONG] Van Jacobson. Congestion Avoidance and Control. Proc.
SIGCOMM '88 Symposium on Communications Architectures and
Protocols, pages 314-329. Stanford, CA, August, 1988.

[FRAG] C. Kent and J. Mogul. Fragmentation Considered Harmful.
In Proc. SIGCOMM '87 Workshop on Frontiers in Computer
Communications Technology. August, 1987.

[ICMPv6] Conta, A., and S. Deering, "Internet Control Message
Protocol (ICMPv6) for the Internet Protocol Version 6
(IPv6) Specification", RFC 1885, December 1995.

[IPv6-SPEC] Deering, S., and R. Hinden, "Internet Protocol, Version
6 (IPv6) Specification", RFC 1883, December 1995.

[ISOTP] ISO. ISO Transport Protocol Specification: ISO DP 8073.
RFC 905, SRI Network Information Center, April, 1984.

[ND] Narten, T., Nordmark, E., and W. Simpson, "Neighbor
Discovery for IP Version 6 (IPv6)", Work in Progress.

[RFC-1191] Mogul, J., and S. Deering, "Path MTU Discovery",
RFC 1191, November 1990.

[RPC] Sun Microsystems, Inc., "RPC: Remote Procedure Call
Protocol", RFC 1057, SRI Network Information Center,
June, 1988.

Authors' Addresses

Jack McCann
Digital Equipment Corporation
110 Spitbrook Road, ZKO3-3/U14
Nashua, NH 03062
Phone: +1 603 881 2608

Fax: +1 603 881 0120
Email: mccann@zk3.dec.com

Stephen E. Deering
Xerox Palo Alto Research Center
3333 Coyote Hill Road
Palo Alto, CA 94304
Phone: +1 415 812 4839

Fax: +1 415 812 4471
EMail: deering@parc.xerox.com

Jeffrey Mogul
Digital Equipment Corporation Western Research Laboratory
250 University Avenue
Palo Alto, CA 94301
Phone: +1 415 617 3304

EMail: mogul@pa.dec.com


RFC 1661 – The Point-to-Point Protocol (PPP)

 
Network Working Group                                 W. Simpson, Editor
Request for Comments: 1661 Daydreamer
STD: 51 July 1994
Obsoletes: 1548
Category: Standards Track

The Point-to-Point Protocol (PPP)

Status of this Memo

This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.

Abstract

The Point-to-Point Protocol (PPP) provides a standard method for
transporting multi-protocol datagrams over point-to-point links. PPP
is comprised of three main components:

1. A method for encapsulating multi-protocol datagrams.

2. A Link Control Protocol (LCP) for establishing, configuring,
and testing the data-link connection.

3. A family of Network Control Protocols (NCPs) for establishing
and configuring different network-layer protocols.

This document defines the PPP organization and methodology, and the
PPP encapsulation, together with an extensible option negotiation
mechanism which is able to negotiate a rich assortment of
configuration parameters and provides additional management
functions. The PPP Link Control Protocol (LCP) is described in terms
of this mechanism.

Table of Contents

1. Introduction .......................................... 1
1.1 Specification of Requirements ................... 2
1.2 Terminology ..................................... 3

2. PPP Encapsulation ..................................... 4

3. PPP Link Operation .................................... 6
3.1 Overview ........................................ 6
3.2 Phase Diagram ................................... 6
3.3 Link Dead (physical-layer not ready) ............ 7
3.4 Link Establishment Phase ........................ 7
3.5 Authentication Phase ............................ 8
3.6 Network-Layer Protocol Phase .................... 8
3.7 Link Termination Phase .......................... 9

4. The Option Negotiation Automaton ...................... 11
4.1 State Transition Table .......................... 12
4.2 States .......................................... 14
4.3 Events .......................................... 16
4.4 Actions ......................................... 21
4.5 Loop Avoidance .................................. 23
4.6 Counters and Timers ............................. 24

5. LCP Packet Formats .................................... 26
5.1 Configure-Request ............................... 28
5.2 Configure-Ack ................................... 29
5.3 Configure-Nak ................................... 30
5.4 Configure-Reject ................................ 31
5.5 Terminate-Request and Terminate-Ack ............. 33
5.6 Code-Reject ..................................... 34
5.7 Protocol-Reject ................................. 35
5.8 Echo-Request and Echo-Reply ..................... 36
5.9 Discard-Request ................................. 37

6. LCP Configuration Options ............................. 39
6.1 Maximum-Receive-Unit (MRU) ...................... 41
6.2 Authentication-Protocol ......................... 42
6.3 Quality-Protocol ................................ 43
6.4 Magic-Number .................................... 45
6.5 Protocol-Field-Compression (PFC) ................ 48
6.6 Address-and-Control-Field-Compression (ACFC)

SECURITY CONSIDERATIONS ...................................... 51
REFERENCES ................................................... 51
ACKNOWLEDGEMENTS ............................................. 51
CHAIR'S ADDRESS .............................................. 52
EDITOR'S ADDRESS ............................................. 52

1. Introduction

The Point-to-Point Protocol is designed for simple links which
transport packets between two peers. These links provide full-duplex
simultaneous bi-directional operation, and are assumed to deliver
packets in order. It is intended that PPP provide a common solution
for easy connection of a wide variety of hosts, bridges and routers
[1].

Encapsulation

The PPP encapsulation provides for multiplexing of different
network-layer protocols simultaneously over the same link. The
PPP encapsulation has been carefully designed to retain
compatibility with most commonly used supporting hardware.

Only 8 additional octets are necessary to form the encapsulation
when used within the default HDLC-like framing. In environments
where bandwidth is at a premium, the encapsulation and framing may
be shortened to 2 or 4 octets.

To support high speed implementations, the default encapsulation
uses only simple fields, only one of which needs to be examined
for demultiplexing. The default header and information fields
fall on 32-bit boundaries, and the trailer may be padded to an
arbitrary boundary.

Link Control Protocol

In order to be sufficiently versatile to be portable to a wide
variety of environments, PPP provides a Link Control Protocol
(LCP). The LCP is used to automatically agree upon the
encapsulation format options, handle varying limits on sizes of
packets, detect a looped-back link and other common
misconfiguration errors, and terminate the link. Other optional
facilities provided are authentication of the identity of its peer
on the link, and determination when a link is functioning properly
and when it is failing.

Network Control Protocols

Point-to-Point links tend to exacerbate many problems with the
current family of network protocols. For instance, assignment and
management of IP addresses, which is a problem even in LAN
environments, is especially difficult over circuit-switched
point-to-point links (such as dial-up modem servers). These
problems are handled by a family of Network Control Protocols
(NCPs), which each manage the specific needs required by their

respective network-layer protocols. These NCPs are defined in
companion documents.

Configuration

It is intended that PPP links be easy to configure. By design,
the standard defaults handle all common configurations. The
implementor can specify improvements to the default configuration,
which are automatically communicated to the peer without operator
intervention. Finally, the operator may explicitly configure
options for the link which enable the link to operate in
environments where it would otherwise be impossible.

This self-configuration is implemented through an extensible
option negotiation mechanism, wherein each end of the link
describes to the other its capabilities and requirements.
Although the option negotiation mechanism described in this
document is specified in terms of the Link Control Protocol (LCP),
the same facilities are designed to be used by other control
protocols, especially the family of NCPs.

1.1. Specification of Requirements

In this document, several words are used to signify the requirements
of the specification. These words are often capitalized.

MUST This word, or the adjective "required", means that the
definition is an absolute requirement of the specification.

MUST NOT This phrase means that the definition is an absolute
prohibition of the specification.

SHOULD This word, or the adjective "recommended", means that there
may exist valid reasons in particular circumstances to
ignore this item, but the full implications must be
understood and carefully weighed before choosing a
different course.

MAY This word, or the adjective "optional", means that this
item is one of an allowed set of alternatives. An
implementation which does not include this option MUST be
prepared to interoperate with another implementation which
does include the option.

1.2. Terminology

This document frequently uses the following terms:

datagram The unit of transmission in the network layer (such as IP).
A datagram may be encapsulated in one or more packets
passed to the data link layer.

frame The unit of transmission at the data link layer. A frame
may include a header and/or a trailer, along with some
number of units of data.

packet The basic unit of encapsulation, which is passed across the
interface between the network layer and the data link
layer. A packet is usually mapped to a frame; the
exceptions are when data link layer fragmentation is being
performed, or when multiple packets are incorporated into a
single frame.

peer The other end of the point-to-point link.

silently discard
The implementation discards the packet without further
processing. The implementation SHOULD provide the
capability of logging the error, including the contents of
the silently discarded packet, and SHOULD record the event
in a statistics counter.

2. PPP Encapsulation

The PPP encapsulation is used to disambiguate multiprotocol
datagrams. This encapsulation requires framing to indicate the
beginning and end of the encapsulation. Methods of providing framing
are specified in companion documents.

A summary of the PPP encapsulation is shown below. The fields are
transmitted from left to right.

+----------+-------------+---------+
| Protocol | Information | Padding |
| 8/16 bits| * | * |
+----------+-------------+---------+

Protocol Field

The Protocol field is one or two octets, and its value identifies
the datagram encapsulated in the Information field of the packet.
The field is transmitted and received most significant octet
first.

The structure of this field is consistent with the ISO 3309
extension mechanism for address fields. All Protocols MUST be
odd; the least significant bit of the least significant octet MUST
equal "1". Also, all Protocols MUST be assigned such that the
least significant bit of the most significant octet equals "0".
Frames received which don't comply with these rules MUST be
treated as having an unrecognized Protocol.

Protocol field values in the "0***" to "3***" range identify the
network-layer protocol of specific packets, and values in the
"8***" to "b***" range identify packets belonging to the
associated Network Control Protocols (NCPs), if any.

Protocol field values in the "4***" to "7***" range are used for
protocols with low volume traffic which have no associated NCP.
Protocol field values in the "c***" to "f***" range identify
packets as link-layer Control Protocols (such as LCP).

Up-to-date values of the Protocol field are specified in the most
recent "Assigned Numbers" RFC [2]. This specification reserves
the following values:

Value (in hex) Protocol Name

0001 Padding Protocol
0003 to 001f reserved (transparency inefficient)
007d reserved (Control Escape)
00cf reserved (PPP NLPID)
00ff reserved (compression inefficient)

8001 to 801f unused
807d unused
80cf unused
80ff unused

c021 Link Control Protocol
c023 Password Authentication Protocol
c025 Link Quality Report
c223 Challenge Handshake Authentication Protocol

Developers of new protocols MUST obtain a number from the Internet
Assigned Numbers Authority (IANA), at IANA@isi.edu.

Information Field

The Information field is zero or more octets. The Information
field contains the datagram for the protocol specified in the
Protocol field.

The maximum length for the Information field, including Padding,
but not including the Protocol field, is termed the Maximum
Receive Unit (MRU), which defaults to 1500 octets. By
negotiation, consenting PPP implementations may use other values
for the MRU.

Padding

On transmission, the Information field MAY be padded with an
arbitrary number of octets up to the MRU. It is the
responsibility of each protocol to distinguish padding octets from
real information.

3. PPP Link Operation

3.1. Overview

In order to establish communications over a point-to-point link, each
end of the PPP link MUST first send LCP packets to configure and test
the data link. After the link has been established, the peer MAY be
authenticated.

Then, PPP MUST send NCP packets to choose and configure one or more
network-layer protocols. Once each of the chosen network-layer
protocols has been configured, datagrams from each network-layer
protocol can be sent over the link.

The link will remain configured for communications until explicit LCP
or NCP packets close the link down, or until some external event
occurs (an inactivity timer expires or network administrator
intervention).

3.2. Phase Diagram

In the process of configuring, maintaining and terminating the
point-to-point link, the PPP link goes through several distinct
phases which are specified in the following simplified state diagram:

+------+ +-----------+ +--------------+
| | UP | | OPENED | | SUCCESS/NONE
| Dead |------->| Establish |---------->| Authenticate |--+
| | | | | | |
+------+ +-----------+ +--------------+ |
^ | | |
| FAIL | FAIL | |
+<--------------+ +----------+ |
| | |
| +-----------+ | +---------+ |
| DOWN | | | CLOSING | | |
+------------| Terminate |<---+<----------| Network |<-+
| | | |
+-----------+ +---------+

Not all transitions are specified in this diagram. The following
semantics MUST be followed.

3.3. Link Dead (physical-layer not ready)

The link necessarily begins and ends with this phase. When an
external event (such as carrier detection or network administrator
configuration) indicates that the physical-layer is ready to be used,
PPP will proceed to the Link Establishment phase.

During this phase, the LCP automaton (described later) will be in the
Initial or Starting states. The transition to the Link Establishment
phase will signal an Up event to the LCP automaton.

Implementation Note:

Typically, a link will return to this phase automatically after
the disconnection of a modem. In the case of a hard-wired link,
this phase may be extremely short -- merely long enough to detect
the presence of the device.

3.4. Link Establishment Phase

The Link Control Protocol (LCP) is used to establish the connection
through an exchange of Configure packets. This exchange is complete,
and the LCP Opened state entered, once a Configure-Ack packet
(described later) has been both sent and received.

All Configuration Options are assumed to be at default values unless
altered by the configuration exchange. See the chapter on LCP
Configuration Options for further discussion.

It is important to note that only Configuration Options which are
independent of particular network-layer protocols are configured by
LCP. Configuration of individual network-layer protocols is handled
by separate Network Control Protocols (NCPs) during the Network-Layer
Protocol phase.

Any non-LCP packets received during this phase MUST be silently
discarded.

The receipt of the LCP Configure-Request causes a return to the Link
Establishment phase from the Network-Layer Protocol phase or
Authentication phase.

3.5. Authentication Phase

On some links it may be desirable to require a peer to authenticate
itself before allowing network-layer protocol packets to be
exchanged.

By default, authentication is not mandatory. If an implementation
desires that the peer authenticate with some specific authentication
protocol, then it MUST request the use of that authentication
protocol during Link Establishment phase.

Authentication SHOULD take place as soon as possible after link
establishment. However, link quality determination MAY occur
concurrently. An implementation MUST NOT allow the exchange of link
quality determination packets to delay authentication indefinitely.

Advancement from the Authentication phase to the Network-Layer
Protocol phase MUST NOT occur until authentication has completed. If
authentication fails, the authenticator SHOULD proceed instead to the
Link Termination phase.

Only Link Control Protocol, authentication protocol, and link quality
monitoring packets are allowed during this phase. All other packets
received during this phase MUST be silently discarded.

Implementation Notes:

An implementation SHOULD NOT fail authentication simply due to
timeout or lack of response. The authentication SHOULD allow some
method of retransmission, and proceed to the Link Termination
phase only after a number of authentication attempts has been
exceeded.

The implementation responsible for commencing Link Termination
phase is the implementation which has refused authentication to
its peer.

3.6. Network-Layer Protocol Phase

Once PPP has finished the previous phases, each network-layer
protocol (such as IP, IPX, or AppleTalk) MUST be separately
configured by the appropriate Network Control Protocol (NCP).

Each NCP MAY be Opened and Closed at any time.

Implementation Note:

Because an implementation may initially use a significant amount
of time for link quality determination, implementations SHOULD
avoid fixed timeouts when waiting for their peers to configure a
NCP.

After a NCP has reached the Opened state, PPP will carry the
corresponding network-layer protocol packets. Any supported
network-layer protocol packets received when the corresponding NCP is
not in the Opened state MUST be silently discarded.

Implementation Note:

While LCP is in the Opened state, any protocol packet which is
unsupported by the implementation MUST be returned in a Protocol-
Reject (described later). Only protocols which are supported are
silently discarded.

During this phase, link traffic consists of any possible combination
of LCP, NCP, and network-layer protocol packets.

3.7. Link Termination Phase

PPP can terminate the link at any time. This might happen because of
the loss of carrier, authentication failure, link quality failure,
the expiration of an idle-period timer, or the administrative closing
of the link.

LCP is used to close the link through an exchange of Terminate
packets. When the link is closing, PPP informs the network-layer
protocols so that they may take appropriate action.

After the exchange of Terminate packets, the implementation SHOULD
signal the physical-layer to disconnect in order to enforce the
termination of the link, particularly in the case of an
authentication failure. The sender of the Terminate-Request SHOULD
disconnect after receiving a Terminate-Ack, or after the Restart
counter expires. The receiver of a Terminate-Request SHOULD wait for
the peer to disconnect, and MUST NOT disconnect until at least one
Restart time has passed after sending a Terminate-Ack. PPP SHOULD
proceed to the Link Dead phase.

Any non-LCP packets received during this phase MUST be silently
discarded.

Implementation Note:

The closing of the link by LCP is sufficient. There is no need
for each NCP to send a flurry of Terminate packets. Conversely,
the fact that one NCP has Closed is not sufficient reason to cause
the termination of the PPP link, even if that NCP was the only NCP
currently in the Opened state.

4. The Option Negotiation Automaton

The finite-state automaton is defined by events, actions and state
transitions. Events include reception of external commands such as
Open and Close, expiration of the Restart timer, and reception of
packets from a peer. Actions include the starting of the Restart
timer and transmission of packets to the peer.

Some types of packets -- Configure-Naks and Configure-Rejects, or
Code-Rejects and Protocol-Rejects, or Echo-Requests, Echo-Replies and
Discard-Requests -- are not differentiated in the automaton
descriptions. As will be described later, these packets do indeed
serve different functions. However, they always cause the same
transitions.

Events Actions

Up = lower layer is Up tlu = This-Layer-Up
Down = lower layer is Down tld = This-Layer-Down
Open = administrative Open tls = This-Layer-Started
Close= administrative Close tlf = This-Layer-Finished

TO+ = Timeout with counter > 0 irc = Initialize-Restart-Count
TO- = Timeout with counter expired zrc = Zero-Restart-Count

RCR+ = Receive-Configure-Request (Good) scr = Send-Configure-Request
RCR- = Receive-Configure-Request (Bad)
RCA = Receive-Configure-Ack sca = Send-Configure-Ack
RCN = Receive-Configure-Nak/Rej scn = Send-Configure-Nak/Rej

RTR = Receive-Terminate-Request str = Send-Terminate-Request
RTA = Receive-Terminate-Ack sta = Send-Terminate-Ack

RUC = Receive-Unknown-Code scj = Send-Code-Reject
RXJ+ = Receive-Code-Reject (permitted)
or Receive-Protocol-Reject
RXJ- = Receive-Code-Reject (catastrophic)
or Receive-Protocol-Reject
RXR = Receive-Echo-Request ser = Send-Echo-Reply
or Receive-Echo-Reply
or Receive-Discard-Request

4.1. State Transition Table

The complete state transition table follows. States are indicated
horizontally, and events are read vertically. State transitions and
actions are represented in the form action/new-state. Multiple
actions are separated by commas, and may continue on succeeding lines
as space requires; multiple actions may be implemented in any
convenient order. The state may be followed by a letter, which
indicates an explanatory footnote. The dash ('-') indicates an
illegal transition.

| State
| 0 1 2 3 4 5
Events| Initial Starting Closed Stopped Closing Stopping
------+-----------------------------------------------------------
Up | 2 irc,scr/6 - - - -
Down | - - 0 tls/1 0 1
Open | tls/1 1 irc,scr/6 3r 5r 5r
Close| 0 tlf/0 2 2 4 4
|
TO+ | - - - - str/4 str/5
TO- | - - - - tlf/2 tlf/3
|
RCR+ | - - sta/2 irc,scr,sca/8 4 5
RCR- | - - sta/2 irc,scr,scn/6 4 5
RCA | - - sta/2 sta/3 4 5
RCN | - - sta/2 sta/3 4 5
|
RTR | - - sta/2 sta/3 sta/4 sta/5
RTA | - - 2 3 tlf/2 tlf/3
|
RUC | - - scj/2 scj/3 scj/4 scj/5
RXJ+ | - - 2 3 4 5
RXJ- | - - tlf/2 tlf/3 tlf/2 tlf/3
|
RXR | - - 2 3 4 5

| State
| 6 7 8 9
Events| Req-Sent Ack-Rcvd Ack-Sent Opened
------+-----------------------------------------
Up | - - - -
Down | 1 1 1 tld/1
Open | 6 7 8 9r
Close|irc,str/4 irc,str/4 irc,str/4 tld,irc,str/4
|
TO+ | scr/6 scr/6 scr/8 -
TO- | tlf/3p tlf/3p tlf/3p -
|
RCR+ | sca/8 sca,tlu/9 sca/8 tld,scr,sca/8
RCR- | scn/6 scn/7 scn/6 tld,scr,scn/6
RCA | irc/7 scr/6x irc,tlu/9 tld,scr/6x
RCN |irc,scr/6 scr/6x irc,scr/8 tld,scr/6x
|
RTR | sta/6 sta/6 sta/6 tld,zrc,sta/5
RTA | 6 6 8 tld,scr/6
|
RUC | scj/6 scj/7 scj/8 scj/9
RXJ+ | 6 6 8 9
RXJ- | tlf/3 tlf/3 tlf/3 tld,irc,str/5
|
RXR | 6 7 8 ser/9

The states in which the Restart timer is running are identifiable by
the presence of TO events. Only the Send-Configure-Request, Send-
Terminate-Request and Zero-Restart-Count actions start or re-start
the Restart timer. The Restart timer is stopped when transitioning
from any state where the timer is running to a state where the timer
is not running.

The events and actions are defined according to a message passing
architecture, rather than a signalling architecture. If an action is
desired to control specific signals (such as DTR), additional actions
are likely to be required.

[p] Passive option; see Stopped state discussion.

[r] Restart option; see Open event discussion.

[x] Crossed connection; see RCA event discussion.

4.2. States

Following is a more detailed description of each automaton state.

Initial

In the Initial state, the lower layer is unavailable (Down), and
no Open has occurred. The Restart timer is not running in the
Initial state.

Starting

The Starting state is the Open counterpart to the Initial state.
An administrative Open has been initiated, but the lower layer is
still unavailable (Down). The Restart timer is not running in the
Starting state.

When the lower layer becomes available (Up), a Configure-Request
is sent.

Closed

In the Closed state, the link is available (Up), but no Open has
occurred. The Restart timer is not running in the Closed state.

Upon reception of Configure-Request packets, a Terminate-Ack is
sent. Terminate-Acks are silently discarded to avoid creating a
loop.

Stopped

The Stopped state is the Open counterpart to the Closed state. It
is entered when the automaton is waiting for a Down event after
the This-Layer-Finished action, or after sending a Terminate-Ack.
The Restart timer is not running in the Stopped state.

Upon reception of Configure-Request packets, an appropriate
response is sent. Upon reception of other packets, a Terminate-
Ack is sent. Terminate-Acks are silently discarded to avoid
creating a loop.

Rationale:

The Stopped state is a junction state for link termination,
link configuration failure, and other automaton failure modes.
These potentially separate states have been combined.

There is a race condition between the Down event response (from

the This-Layer-Finished action) and the Receive-Configure-
Request event. When a Configure-Request arrives before the
Down event, the Down event will supercede by returning the
automaton to the Starting state. This prevents attack by
repetition.

Implementation Option:

After the peer fails to respond to Configure-Requests, an
implementation MAY wait passively for the peer to send
Configure-Requests. In this case, the This-Layer-Finished
action is not used for the TO- event in states Req-Sent, Ack-
Rcvd and Ack-Sent.

This option is useful for dedicated circuits, or circuits which
have no status signals available, but SHOULD NOT be used for
switched circuits.

Closing

In the Closing state, an attempt is made to terminate the
connection. A Terminate-Request has been sent and the Restart
timer is running, but a Terminate-Ack has not yet been received.

Upon reception of a Terminate-Ack, the Closed state is entered.
Upon the expiration of the Restart timer, a new Terminate-Request
is transmitted, and the Restart timer is restarted. After the
Restart timer has expired Max-Terminate times, the Closed state is
entered.

Stopping

The Stopping state is the Open counterpart to the Closing state.
A Terminate-Request has been sent and the Restart timer is
running, but a Terminate-Ack has not yet been received.

Rationale:

The Stopping state provides a well defined opportunity to
terminate a link before allowing new traffic. After the link
has terminated, a new configuration may occur via the Stopped
or Starting states.

Request-Sent

In the Request-Sent state an attempt is made to configure the
connection. A Configure-Request has been sent and the Restart
timer is running, but a Configure-Ack has not yet been received

nor has one been sent.

Ack-Received

In the Ack-Received state, a Configure-Request has been sent and a
Configure-Ack has been received. The Restart timer is still
running, since a Configure-Ack has not yet been sent.

Ack-Sent

In the Ack-Sent state, a Configure-Request and a Configure-Ack
have both been sent, but a Configure-Ack has not yet been
received. The Restart timer is running, since a Configure-Ack has
not yet been received.

Opened

In the Opened state, a Configure-Ack has been both sent and
received. The Restart timer is not running.

When entering the Opened state, the implementation SHOULD signal
the upper layers that it is now Up. Conversely, when leaving the
Opened state, the implementation SHOULD signal the upper layers
that it is now Down.

4.3. Events

Transitions and actions in the automaton are caused by events.

Up

This event occurs when a lower layer indicates that it is ready to
carry packets.

Typically, this event is used by a modem handling or calling
process, or by some other coupling of the PPP link to the physical
media, to signal LCP that the link is entering Link Establishment
phase.

It also can be used by LCP to signal each NCP that the link is
entering Network-Layer Protocol phase. That is, the This-Layer-Up
action from LCP triggers the Up event in the NCP.

Down

This event occurs when a lower layer indicates that it is no

longer ready to carry packets.

Typically, this event is used by a modem handling or calling
process, or by some other coupling of the PPP link to the physical
media, to signal LCP that the link is entering Link Dead phase.

It also can be used by LCP to signal each NCP that the link is
leaving Network-Layer Protocol phase. That is, the This-Layer-
Down action from LCP triggers the Down event in the NCP.

Open

This event indicates that the link is administratively available
for traffic; that is, the network administrator (human or program)
has indicated that the link is allowed to be Opened. When this
event occurs, and the link is not in the Opened state, the
automaton attempts to send configuration packets to the peer.

If the automaton is not able to begin configuration (the lower
layer is Down, or a previous Close event has not completed), the
establishment of the link is automatically delayed.

When a Terminate-Request is received, or other events occur which
cause the link to become unavailable, the automaton will progress
to a state where the link is ready to re-open. No additional
administrative intervention is necessary.

Implementation Option:

Experience has shown that users will execute an additional Open
command when they want to renegotiate the link. This might
indicate that new values are to be negotiated.

Since this is not the meaning of the Open event, it is
suggested that when an Open user command is executed in the
Opened, Closing, Stopping, or Stopped states, the
implementation issue a Down event, immediately followed by an
Up event. Care must be taken that an intervening Down event
cannot occur from another source.

The Down followed by an Up will cause an orderly renegotiation
of the link, by progressing through the Starting to the
Request-Sent state. This will cause the renegotiation of the
link, without any harmful side effects.

Close

This event indicates that the link is not available for traffic;

that is, the network administrator (human or program) has
indicated that the link is not allowed to be Opened. When this
event occurs, and the link is not in the Closed state, the
automaton attempts to terminate the connection. Futher attempts
to re-configure the link are denied until a new Open event occurs.

Implementation Note:

When authentication fails, the link SHOULD be terminated, to
prevent attack by repetition and denial of service to other
users. Since the link is administratively available (by
definition), this can be accomplished by simulating a Close
event to the LCP, immediately followed by an Open event. Care
must be taken that an intervening Close event cannot occur from
another source.

The Close followed by an Open will cause an orderly termination
of the link, by progressing through the Closing to the Stopping
state, and the This-Layer-Finished action can disconnect the
link. The automaton waits in the Stopped or Starting states
for the next connection attempt.

Timeout (TO+,TO-)

This event indicates the expiration of the Restart timer. The
Restart timer is used to time responses to Configure-Request and
Terminate-Request packets.

The TO+ event indicates that the Restart counter continues to be
greater than zero, which triggers the corresponding Configure-
Request or Terminate-Request packet to be retransmitted.

The TO- event indicates that the Restart counter is not greater
than zero, and no more packets need to be retransmitted.

Receive-Configure-Request (RCR+,RCR-)

This event occurs when a Configure-Request packet is received from
the peer. The Configure-Request packet indicates the desire to
open a connection and may specify Configuration Options. The
Configure-Request packet is more fully described in a later
section.

The RCR+ event indicates that the Configure-Request was
acceptable, and triggers the transmission of a corresponding
Configure-Ack.

The RCR- event indicates that the Configure-Request was

unacceptable, and triggers the transmission of a corresponding
Configure-Nak or Configure-Reject.

Implementation Note:

These events may occur on a connection which is already in the
Opened state. The implementation MUST be prepared to
immediately renegotiate the Configuration Options.

Receive-Configure-Ack (RCA)

This event occurs when a valid Configure-Ack packet is received
from the peer. The Configure-Ack packet is a positive response to
a Configure-Request packet. An out of sequence or otherwise
invalid packet is silently discarded.

Implementation Note:

Since the correct packet has already been received before
reaching the Ack-Rcvd or Opened states, it is extremely
unlikely that another such packet will arrive. As specified,
all invalid Ack/Nak/Rej packets are silently discarded, and do
not affect the transitions of the automaton.

However, it is not impossible that a correctly formed packet
will arrive through a coincidentally-timed cross-connection.
It is more likely to be the result of an implementation error.
At the very least, this occurance SHOULD be logged.

Receive-Configure-Nak/Rej (RCN)

This event occurs when a valid Configure-Nak or Configure-Reject
packet is received from the peer. The Configure-Nak and
Configure-Reject packets are negative responses to a Configure-
Request packet. An out of sequence or otherwise invalid packet is
silently discarded.

Implementation Note:

Although the Configure-Nak and Configure-Reject cause the same
state transition in the automaton, these packets have
significantly different effects on the Configuration Options
sent in the resulting Configure-Request packet.

Receive-Terminate-Request (RTR)

This event occurs when a Terminate-Request packet is received.
The Terminate-Request packet indicates the desire of the peer to

close the connection.

Implementation Note:

This event is not identical to the Close event (see above), and
does not override the Open commands of the local network
administrator. The implementation MUST be prepared to receive
a new Configure-Request without network administrator
intervention.

Receive-Terminate-Ack (RTA)

This event occurs when a Terminate-Ack packet is received from the
peer. The Terminate-Ack packet is usually a response to a
Terminate-Request packet. The Terminate-Ack packet may also
indicate that the peer is in Closed or Stopped states, and serves
to re-synchronize the link configuration.

Receive-Unknown-Code (RUC)

This event occurs when an un-interpretable packet is received from
the peer. A Code-Reject packet is sent in response.

Receive-Code-Reject, Receive-Protocol-Reject (RXJ+,RXJ-)

This event occurs when a Code-Reject or a Protocol-Reject packet
is received from the peer.

The RXJ+ event arises when the rejected value is acceptable, such
as a Code-Reject of an extended code, or a Protocol-Reject of a
NCP. These are within the scope of normal operation. The
implementation MUST stop sending the offending packet type.

The RXJ- event arises when the rejected value is catastrophic,
such as a Code-Reject of Configure-Request, or a Protocol-Reject
of LCP! This event communicates an unrecoverable error that
terminates the connection.

Receive-Echo-Request, Receive-Echo-Reply, Receive-Discard-Request
(RXR)

This event occurs when an Echo-Request, Echo-Reply or Discard-
Request packet is received from the peer. The Echo-Reply packet
is a response to an Echo-Request packet. There is no reply to an
Echo-Reply or Discard-Request packet.

4.4. Actions

Actions in the automaton are caused by events and typically indicate
the transmission of packets and/or the starting or stopping of the
Restart timer.

Illegal-Event (-)

This indicates an event that cannot occur in a properly
implemented automaton. The implementation has an internal error,
which should be reported and logged. No transition is taken, and
the implementation SHOULD NOT reset or freeze.

This-Layer-Up (tlu)

This action indicates to the upper layers that the automaton is
entering the Opened state.

Typically, this action is used by the LCP to signal the Up event
to a NCP, Authentication Protocol, or Link Quality Protocol, or
MAY be used by a NCP to indicate that the link is available for
its network layer traffic.

This-Layer-Down (tld)

This action indicates to the upper layers that the automaton is
leaving the Opened state.

Typically, this action is used by the LCP to signal the Down event
to a NCP, Authentication Protocol, or Link Quality Protocol, or
MAY be used by a NCP to indicate that the link is no longer
available for its network layer traffic.

This-Layer-Started (tls)

This action indicates to the lower layers that the automaton is
entering the Starting state, and the lower layer is needed for the
link. The lower layer SHOULD respond with an Up event when the
lower layer is available.

This results of this action are highly implementation dependent.

This-Layer-Finished (tlf)

This action indicates to the lower layers that the automaton is
entering the Initial, Closed or Stopped states, and the lower
layer is no longer needed for the link. The lower layer SHOULD
respond with a Down event when the lower layer has terminated.

Typically, this action MAY be used by the LCP to advance to the
Link Dead phase, or MAY be used by a NCP to indicate to the LCP
that the link may terminate when there are no other NCPs open.

This results of this action are highly implementation dependent.

Initialize-Restart-Count (irc)

This action sets the Restart counter to the appropriate value
(Max-Terminate or Max-Configure). The counter is decremented for
each transmission, including the first.

Implementation Note:

In addition to setting the Restart counter, the implementation
MUST set the timeout period to the initial value when Restart
timer backoff is used.

Zero-Restart-Count (zrc)

This action sets the Restart counter to zero.

Implementation Note:

This action enables the FSA to pause before proceeding to the
desired final state, allowing traffic to be processed by the
peer. In addition to zeroing the Restart counter, the
implementation MUST set the timeout period to an appropriate
value.

Send-Configure-Request (scr)

A Configure-Request packet is transmitted. This indicates the
desire to open a connection with a specified set of Configuration
Options. The Restart timer is started when the Configure-Request
packet is transmitted, to guard against packet loss. The Restart
counter is decremented each time a Configure-Request is sent.

Send-Configure-Ack (sca)

A Configure-Ack packet is transmitted. This acknowledges the
reception of a Configure-Request packet with an acceptable set of
Configuration Options.

Send-Configure-Nak (scn)

A Configure-Nak or Configure-Reject packet is transmitted, as
appropriate. This negative response reports the reception of a

Configure-Request packet with an unacceptable set of Configuration
Options.

Configure-Nak packets are used to refuse a Configuration Option
value, and to suggest a new, acceptable value. Configure-Reject
packets are used to refuse all negotiation about a Configuration
Option, typically because it is not recognized or implemented.
The use of Configure-Nak versus Configure-Reject is more fully
described in the chapter on LCP Packet Formats.

Send-Terminate-Request (str)

A Terminate-Request packet is transmitted. This indicates the
desire to close a connection. The Restart timer is started when
the Terminate-Request packet is transmitted, to guard against
packet loss. The Restart counter is decremented each time a
Terminate-Request is sent.

Send-Terminate-Ack (sta)

A Terminate-Ack packet is transmitted. This acknowledges the
reception of a Terminate-Request packet or otherwise serves to
synchronize the automatons.

Send-Code-Reject (scj)

A Code-Reject packet is transmitted. This indicates the reception
of an unknown type of packet.

Send-Echo-Reply (ser)

An Echo-Reply packet is transmitted. This acknowledges the
reception of an Echo-Request packet.

4.5. Loop Avoidance

The protocol makes a reasonable attempt at avoiding Configuration
Option negotiation loops. However, the protocol does NOT guarantee
that loops will not happen. As with any negotiation, it is possible
to configure two PPP implementations with conflicting policies that
will never converge. It is also possible to configure policies which
do converge, but which take significant time to do so. Implementors
should keep this in mind and SHOULD implement loop detection
mechanisms or higher level timeouts.

4.6. Counters and Timers

Restart Timer

There is one special timer used by the automaton. The Restart
timer is used to time transmissions of Configure-Request and
Terminate-Request packets. Expiration of the Restart timer causes
a Timeout event, and retransmission of the corresponding
Configure-Request or Terminate-Request packet. The Restart timer
MUST be configurable, but SHOULD default to three (3) seconds.

Implementation Note:

The Restart timer SHOULD be based on the speed of the link.
The default value is designed for low speed (2,400 to 9,600
bps), high switching latency links (typical telephone lines).
Higher speed links, or links with low switching latency, SHOULD
have correspondingly faster retransmission times.

Instead of a constant value, the Restart timer MAY begin at an
initial small value and increase to the configured final value.
Each successive value less than the final value SHOULD be at
least twice the previous value. The initial value SHOULD be
large enough to account for the size of the packets, twice the
round trip time for transmission at the link speed, and at
least an additional 100 milliseconds to allow the peer to
process the packets before responding. Some circuits add
another 200 milliseconds of satellite delay. Round trip times
for modems operating at 14,400 bps have been measured in the
range of 160 to more than 600 milliseconds.

Max-Terminate

There is one required restart counter for Terminate-Requests.
Max-Terminate indicates the number of Terminate-Request packets
sent without receiving a Terminate-Ack before assuming that the
peer is unable to respond. Max-Terminate MUST be configurable,
but SHOULD default to two (2) transmissions.

Max-Configure

A similar counter is recommended for Configure-Requests. Max-
Configure indicates the number of Configure-Request packets sent
without receiving a valid Configure-Ack, Configure-Nak or
Configure-Reject before assuming that the peer is unable to
respond. Max-Configure MUST be configurable, but SHOULD default
to ten (10) transmissions.

Max-Failure

A related counter is recommended for Configure-Nak. Max-Failure
indicates the number of Configure-Nak packets sent without sending
a Configure-Ack before assuming that configuration is not
converging. Any further Configure-Nak packets for peer requested
options are converted to Configure-Reject packets, and locally
desired options are no longer appended. Max-Failure MUST be
configurable, but SHOULD default to five (5) transmissions.

5. LCP Packet Formats

There are three classes of LCP packets:

1. Link Configuration packets used to establish and configure a
link (Configure-Request, Configure-Ack, Configure-Nak and
Configure-Reject).

2. Link Termination packets used to terminate a link (Terminate-
Request and Terminate-Ack).

3. Link Maintenance packets used to manage and debug a link
(Code-Reject, Protocol-Reject, Echo-Request, Echo-Reply, and
Discard-Request).

In the interest of simplicity, there is no version field in the LCP
packet. A correctly functioning LCP implementation will always
respond to unknown Protocols and Codes with an easily recognizable
LCP packet, thus providing a deterministic fallback mechanism for
implementations of other versions.

Regardless of which Configuration Options are enabled, all LCP Link
Configuration, Link Termination, and Code-Reject packets (codes 1
through 7) are always sent as if no Configuration Options were
negotiated. In particular, each Configuration Option specifies a
default value. This ensures that such LCP packets are always
recognizable, even when one end of the link mistakenly believes the
link to be open.

Exactly one LCP packet is encapsulated in the PPP Information field,
where the PPP Protocol field indicates type hex c021 (Link Control
Protocol).

A summary of the Link Control Protocol packet format is shown below.
The fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data ...
+-+-+-+-+

Code

The Code field is one octet, and identifies the kind of LCP

packet. When a packet is received with an unknown Code field, a
Code-Reject packet is transmitted.

Up-to-date values of the LCP Code field are specified in the most
recent "Assigned Numbers" RFC [2]. This document concerns the
following values:

1 Configure-Request
2 Configure-Ack
3 Configure-Nak
4 Configure-Reject
5 Terminate-Request
6 Terminate-Ack
7 Code-Reject
8 Protocol-Reject
9 Echo-Request
10 Echo-Reply
11 Discard-Request

Identifier

The Identifier field is one octet, and aids in matching requests
and replies. When a packet is received with an invalid Identifier
field, the packet is silently discarded without affecting the
automaton.

Length

The Length field is two octets, and indicates the length of the
LCP packet, including the Code, Identifier, Length and Data
fields. The Length MUST NOT exceed the MRU of the link.

Octets outside the range of the Length field are treated as
padding and are ignored on reception. When a packet is received
with an invalid Length field, the packet is silently discarded
without affecting the automaton.

Data

The Data field is zero or more octets, as indicated by the Length
field. The format of the Data field is determined by the Code
field.

5.1. Configure-Request

Description

An implementation wishing to open a connection MUST transmit a
Configure-Request. The Options field is filled with any desired
changes to the link defaults. Configuration Options SHOULD NOT be
included with default values.

Upon reception of a Configure-Request, an appropriate reply MUST
be transmitted.

A summary of the Configure-Request packet format is shown below. The
fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options ...
+-+-+-+-+

Code

1 for Configure-Request.

Identifier

The Identifier field MUST be changed whenever the contents of the
Options field changes, and whenever a valid reply has been
received for a previous request. For retransmissions, the
Identifier MAY remain unchanged.

Options

The options field is variable in length, and contains the list of
zero or more Configuration Options that the sender desires to
negotiate. All Configuration Options are always negotiated
simultaneously. The format of Configuration Options is further
described in a later chapter.

5.2. Configure-Ack

Description

If every Configuration Option received in a Configure-Request is
recognizable and all values are acceptable, then the
implementation MUST transmit a Configure-Ack. The acknowledged
Configuration Options MUST NOT be reordered or modified in any
way.

On reception of a Configure-Ack, the Identifier field MUST match
that of the last transmitted Configure-Request. Additionally, the
Configuration Options in a Configure-Ack MUST exactly match those
of the last transmitted Configure-Request. Invalid packets are
silently discarded.

A summary of the Configure-Ack packet format is shown below. The
fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options ...
+-+-+-+-+

Code

2 for Configure-Ack.

Identifier

The Identifier field is a copy of the Identifier field of the
Configure-Request which caused this Configure-Ack.

Options

The Options field is variable in length, and contains the list of
zero or more Configuration Options that the sender is
acknowledging. All Configuration Options are always acknowledged
simultaneously.

5.3. Configure-Nak

Description

If every instance of the received Configuration Options is
recognizable, but some values are not acceptable, then the
implementation MUST transmit a Configure-Nak. The Options field
is filled with only the unacceptable Configuration Options from
the Configure-Request. All acceptable Configuration Options are
filtered out of the Configure-Nak, but otherwise the Configuration
Options from the Configure-Request MUST NOT be reordered.

Options which have no value fields (boolean options) MUST use the
Configure-Reject reply instead.

Each Configuration Option which is allowed only a single instance
MUST be modified to a value acceptable to the Configure-Nak
sender. The default value MAY be used, when this differs from the
requested value.

When a particular type of Configuration Option can be listed more
than once with different values, the Configure-Nak MUST include a
list of all values for that option which are acceptable to the
Configure-Nak sender. This includes acceptable values that were
present in the Configure-Request.

Finally, an implementation may be configured to request the
negotiation of a specific Configuration Option. If that option is
not listed, then that option MAY be appended to the list of Nak'd
Configuration Options, in order to prompt the peer to include that
option in its next Configure-Request packet. Any value fields for
the option MUST indicate values acceptable to the Configure-Nak
sender.

On reception of a Configure-Nak, the Identifier field MUST match
that of the last transmitted Configure-Request. Invalid packets
are silently discarded.

Reception of a valid Configure-Nak indicates that when a new
Configure-Request is sent, the Configuration Options MAY be
modified as specified in the Configure-Nak. When multiple
instances of a Configuration Option are present, the peer SHOULD
select a single value to include in its next Configure-Request
packet.

Some Configuration Options have a variable length. Since the
Nak'd Option has been modified by the peer, the implementation
MUST be able to handle an Option length which is different from

the original Configure-Request.

A summary of the Configure-Nak packet format is shown below. The
fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options ...
+-+-+-+-+

Code

3 for Configure-Nak.

Identifier

The Identifier field is a copy of the Identifier field of the
Configure-Request which caused this Configure-Nak.

Options

The Options field is variable in length, and contains the list of
zero or more Configuration Options that the sender is Nak'ing.
All Configuration Options are always Nak'd simultaneously.

5.4. Configure-Reject

Description

If some Configuration Options received in a Configure-Request are
not recognizable or are not acceptable for negotiation (as
configured by a network administrator), then the implementation
MUST transmit a Configure-Reject. The Options field is filled
with only the unacceptable Configuration Options from the
Configure-Request. All recognizable and negotiable Configuration
Options are filtered out of the Configure-Reject, but otherwise
the Configuration Options MUST NOT be reordered or modified in any
way.

On reception of a Configure-Reject, the Identifier field MUST
match that of the last transmitted Configure-Request.
Additionally, the Configuration Options in a Configure-Reject MUST

be a proper subset of those in the last transmitted Configure-
Request. Invalid packets are silently discarded.

Reception of a valid Configure-Reject indicates that when a new
Configure-Request is sent, it MUST NOT include any of the
Configuration Options listed in the Configure-Reject.

A summary of the Configure-Reject packet format is shown below. The
fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options ...
+-+-+-+-+

Code

4 for Configure-Reject.

Identifier

The Identifier field is a copy of the Identifier field of the
Configure-Request which caused this Configure-Reject.

Options

The Options field is variable in length, and contains the list of
zero or more Configuration Options that the sender is rejecting.
All Configuration Options are always rejected simultaneously.

5.5. Terminate-Request and Terminate-Ack

Description

LCP includes Terminate-Request and Terminate-Ack Codes in order to
provide a mechanism for closing a connection.

An implementation wishing to close a connection SHOULD transmit a
Terminate-Request. Terminate-Request packets SHOULD continue to
be sent until Terminate-Ack is received, the lower layer indicates
that it has gone down, or a sufficiently large number have been
transmitted such that the peer is down with reasonable certainty.

Upon reception of a Terminate-Request, a Terminate-Ack MUST be
transmitted.

Reception of an unelicited Terminate-Ack indicates that the peer
is in the Closed or Stopped states, or is otherwise in need of
re-negotiation.

A summary of the Terminate-Request and Terminate-Ack packet formats
is shown below. The fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data ...
+-+-+-+-+

Code

5 for Terminate-Request;

6 for Terminate-Ack.

Identifier

On transmission, the Identifier field MUST be changed whenever the
content of the Data field changes, and whenever a valid reply has
been received for a previous request. For retransmissions, the
Identifier MAY remain unchanged.

On reception, the Identifier field of the Terminate-Request is
copied into the Identifier field of the Terminate-Ack packet.

Data

The Data field is zero or more octets, and contains uninterpreted
data for use by the sender. The data may consist of any binary
value. The end of the field is indicated by the Length.

5.6. Code-Reject

Description

Reception of a LCP packet with an unknown Code indicates that the
peer is operating with a different version. This MUST be reported
back to the sender of the unknown Code by transmitting a Code-
Reject.

Upon reception of the Code-Reject of a code which is fundamental
to this version of the protocol, the implementation SHOULD report
the problem and drop the connection, since it is unlikely that the
situation can be rectified automatically.

A summary of the Code-Reject packet format is shown below. The
fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Rejected-Packet ...
+-+-+-+-+-+-+-+-+

Code

7 for Code-Reject.

Identifier

The Identifier field MUST be changed for each Code-Reject sent.

Rejected-Packet

The Rejected-Packet field contains a copy of the LCP packet which
is being rejected. It begins with the Information field, and does
not include any Data Link Layer headers nor an FCS. The
Rejected-Packet MUST be truncated to comply with the peer's

established MRU.

5.7. Protocol-Reject

Description

Reception of a PPP packet with an unknown Protocol field indicates
that the peer is attempting to use a protocol which is
unsupported. This usually occurs when the peer attempts to
configure a new protocol. If the LCP automaton is in the Opened
state, then this MUST be reported back to the peer by transmitting
a Protocol-Reject.

Upon reception of a Protocol-Reject, the implementation MUST stop
sending packets of the indicated protocol at the earliest
opportunity.

Protocol-Reject packets can only be sent in the LCP Opened state.
Protocol-Reject packets received in any state other than the LCP
Opened state SHOULD be silently discarded.

A summary of the Protocol-Reject packet format is shown below. The
fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Rejected-Protocol | Rejected-Information ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Code

8 for Protocol-Reject.

Identifier

The Identifier field MUST be changed for each Protocol-Reject
sent.

Rejected-Protocol

The Rejected-Protocol field is two octets, and contains the PPP
Protocol field of the packet which is being rejected.

Rejected-Information

The Rejected-Information field contains a copy of the packet which
is being rejected. It begins with the Information field, and does
not include any Data Link Layer headers nor an FCS. The
Rejected-Information MUST be truncated to comply with the peer's
established MRU.

5.8. Echo-Request and Echo-Reply

Description

LCP includes Echo-Request and Echo-Reply Codes in order to provide
a Data Link Layer loopback mechanism for use in exercising both
directions of the link. This is useful as an aid in debugging,
link quality determination, performance testing, and for numerous
other functions.

Upon reception of an Echo-Request in the LCP Opened state, an
Echo-Reply MUST be transmitted.

Echo-Request and Echo-Reply packets MUST only be sent in the LCP
Opened state. Echo-Request and Echo-Reply packets received in any
state other than the LCP Opened state SHOULD be silently
discarded.

A summary of the Echo-Request and Echo-Reply packet formats is shown
below. The fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Magic-Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data ...
+-+-+-+-+

Code

9 for Echo-Request;

10 for Echo-Reply.

Identifier

On transmission, the Identifier field MUST be changed whenever the
content of the Data field changes, and whenever a valid reply has
been received for a previous request. For retransmissions, the
Identifier MAY remain unchanged.

On reception, the Identifier field of the Echo-Request is copied
into the Identifier field of the Echo-Reply packet.

Magic-Number

The Magic-Number field is four octets, and aids in detecting links
which are in the looped-back condition. Until the Magic-Number
Configuration Option has been successfully negotiated, the Magic-
Number MUST be transmitted as zero. See the Magic-Number
Configuration Option for further explanation.

Data

The Data field is zero or more octets, and contains uninterpreted
data for use by the sender. The data may consist of any binary
value. The end of the field is indicated by the Length.

5.9. Discard-Request

Description

LCP includes a Discard-Request Code in order to provide a Data
Link Layer sink mechanism for use in exercising the local to
remote direction of the link. This is useful as an aid in
debugging, performance testing, and for numerous other functions.

Discard-Request packets MUST only be sent in the LCP Opened state.
On reception, the receiver MUST silently discard any Discard-
Request that it receives.

A summary of the Discard-Request packet format is shown below. The
fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Magic-Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data ...
+-+-+-+-+

Code

11 for Discard-Request.

Identifier

The Identifier field MUST be changed for each Discard-Request
sent.

Magic-Number

The Magic-Number field is four octets, and aids in detecting links
which are in the looped-back condition. Until the Magic-Number
Configuration Option has been successfully negotiated, the Magic-
Number MUST be transmitted as zero. See the Magic-Number
Configuration Option for further explanation.

Data

The Data field is zero or more octets, and contains uninterpreted
data for use by the sender. The data may consist of any binary
value. The end of the field is indicated by the Length.

6. LCP Configuration Options

LCP Configuration Options allow negotiation of modifications to the
default characteristics of a point-to-point link. If a Configuration
Option is not included in a Configure-Request packet, the default
value for that Configuration Option is assumed.

Some Configuration Options MAY be listed more than once. The effect
of this is Configuration Option specific, and is specified by each
such Configuration Option description. (None of the Configuration
Options in this specification can be listed more than once.)

The end of the list of Configuration Options is indicated by the
Length field of the LCP packet.

Unless otherwise specified, all Configuration Options apply in a
half-duplex fashion; typically, in the receive direction of the link
from the point of view of the Configure-Request sender.

Design Philosophy

The options indicate additional capabilities or requirements of
the implementation that is requesting the option. An
implementation which does not understand any option SHOULD
interoperate with one which implements every option.

A default is specified for each option which allows the link to
correctly function without negotiation of the option, although
perhaps with less than optimal performance.

Except where explicitly specified, acknowledgement of an option
does not require the peer to take any additional action other than
the default.

It is not necessary to send the default values for the options in
a Configure-Request.

A summary of the Configuration Option format is shown below. The
fields are transmitted from left to right.

0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | Data ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Type

The Type field is one octet, and indicates the type of
Configuration Option. Up-to-date values of the LCP Option Type
field are specified in the most recent "Assigned Numbers" RFC [2].
This document concerns the following values:

0 RESERVED
1 Maximum-Receive-Unit
3 Authentication-Protocol
4 Quality-Protocol
5 Magic-Number
7 Protocol-Field-Compression
8 Address-and-Control-Field-Compression

Length

The Length field is one octet, and indicates the length of this
Configuration Option including the Type, Length and Data fields.

If a negotiable Configuration Option is received in a Configure-
Request, but with an invalid or unrecognized Length, a Configure-
Nak SHOULD be transmitted which includes the desired Configuration
Option with an appropriate Length and Data.

Data

The Data field is zero or more octets, and contains information
specific to the Configuration Option. The format and length of
the Data field is determined by the Type and Length fields.

When the Data field is indicated by the Length to extend beyond
the end of the Information field, the entire packet is silently
discarded without affecting the automaton.

6.1. Maximum-Receive-Unit (MRU)

Description

This Configuration Option may be sent to inform the peer that the
implementation can receive larger packets, or to request that the
peer send smaller packets.

The default value is 1500 octets. If smaller packets are
requested, an implementation MUST still be able to receive the
full 1500 octet information field in case link synchronization is
lost.

Implementation Note:

This option is used to indicate an implementation capability.
The peer is not required to maximize the use of the capacity.
For example, when a MRU is indicated which is 2048 octets, the
peer is not required to send any packet with 2048 octets. The
peer need not Configure-Nak to indicate that it will only send
smaller packets, since the implementation will always require
support for at least 1500 octets.

A summary of the Maximum-Receive-Unit Configuration Option format is
shown below. The fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | Maximum-Receive-Unit |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Type

1

Length

4

Maximum-Receive-Unit

The Maximum-Receive-Unit field is two octets, and specifies the
maximum number of octets in the Information and Padding fields.
It does not include the framing, Protocol field, FCS, nor any
transparency bits or bytes.

6.2. Authentication-Protocol

Description

On some links it may be desirable to require a peer to
authenticate itself before allowing network-layer protocol packets
to be exchanged.

This Configuration Option provides a method to negotiate the use
of a specific protocol for authentication. By default,
authentication is not required.

An implementation MUST NOT include multiple Authentication-
Protocol Configuration Options in its Configure-Request packets.
Instead, it SHOULD attempt to configure the most desirable
protocol first. If that protocol is Configure-Nak'd, then the
implementation SHOULD attempt the next most desirable protocol in
the next Configure-Request.

The implementation sending the Configure-Request is indicating
that it expects authentication from its peer. If an
implementation sends a Configure-Ack, then it is agreeing to
authenticate with the specified protocol. An implementation
receiving a Configure-Ack SHOULD expect the peer to authenticate
with the acknowledged protocol.

There is no requirement that authentication be full-duplex or that
the same protocol be used in both directions. It is perfectly
acceptable for different protocols to be used in each direction.
This will, of course, depend on the specific protocols negotiated.

A summary of the Authentication-Protocol Configuration Option format
is shown below. The fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | Authentication-Protocol |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data ...
+-+-+-+-+

Type

3

Length

>= 4

Authentication-Protocol

The Authentication-Protocol field is two octets, and indicates the
authentication protocol desired. Values for this field are always
the same as the PPP Protocol field values for that same
authentication protocol.

Up-to-date values of the Authentication-Protocol field are
specified in the most recent "Assigned Numbers" RFC [2]. Current
values are assigned as follows:

Value (in hex) Protocol

c023 Password Authentication Protocol
c223 Challenge Handshake Authentication Protocol

Data

The Data field is zero or more octets, and contains additional
data as determined by the particular protocol.

6.3. Quality-Protocol

Description

On some links it may be desirable to determine when, and how
often, the link is dropping data. This process is called link
quality monitoring.

This Configuration Option provides a method to negotiate the use
of a specific protocol for link quality monitoring. By default,
link quality monitoring is disabled.

The implementation sending the Configure-Request is indicating
that it expects to receive monitoring information from its peer.
If an implementation sends a Configure-Ack, then it is agreeing to
send the specified protocol. An implementation receiving a
Configure-Ack SHOULD expect the peer to send the acknowledged
protocol.

There is no requirement that quality monitoring be full-duplex or

that the same protocol be used in both directions. It is
perfectly acceptable for different protocols to be used in each
direction. This will, of course, depend on the specific protocols
negotiated.

A summary of the Quality-Protocol Configuration Option format is
shown below. The fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | Quality-Protocol |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data ...
+-+-+-+-+

Type

4

Length

>= 4

Quality-Protocol

The Quality-Protocol field is two octets, and indicates the link
quality monitoring protocol desired. Values for this field are
always the same as the PPP Protocol field values for that same
monitoring protocol.

Up-to-date values of the Quality-Protocol field are specified in
the most recent "Assigned Numbers" RFC [2]. Current values are
assigned as follows:

Value (in hex) Protocol

c025 Link Quality Report

Data

The Data field is zero or more octets, and contains additional
data as determined by the particular protocol.

6.4. Magic-Number

Description

This Configuration Option provides a method to detect looped-back
links and other Data Link Layer anomalies. This Configuration
Option MAY be required by some other Configuration Options such as
the Quality-Protocol Configuration Option. By default, the
Magic-Number is not negotiated, and zero is inserted where a
Magic-Number might otherwise be used.

Before this Configuration Option is requested, an implementation
MUST choose its Magic-Number. It is recommended that the Magic-
Number be chosen in the most random manner possible in order to
guarantee with very high probability that an implementation will
arrive at a unique number. A good way to choose a unique random
number is to start with a unique seed. Suggested sources of
uniqueness include machine serial numbers, other network hardware
addresses, time-of-day clocks, etc. Particularly good random
number seeds are precise measurements of the inter-arrival time of
physical events such as packet reception on other connected
networks, server response time, or the typing rate of a human
user. It is also suggested that as many sources as possible be
used simultaneously.

When a Configure-Request is received with a Magic-Number
Configuration Option, the received Magic-Number is compared with
the Magic-Number of the last Configure-Request sent to the peer.
If the two Magic-Numbers are different, then the link is not
looped-back, and the Magic-Number SHOULD be acknowledged. If the
two Magic-Numbers are equal, then it is possible, but not certain,
that the link is looped-back and that this Configure-Request is
actually the one last sent. To determine this, a Configure-Nak
MUST be sent specifying a different Magic-Number value. A new
Configure-Request SHOULD NOT be sent to the peer until normal
processing would cause it to be sent (that is, until a Configure-
Nak is received or the Restart timer runs out).

Reception of a Configure-Nak with a Magic-Number different from
that of the last Configure-Nak sent to the peer proves that a link
is not looped-back, and indicates a unique Magic-Number. If the
Magic-Number is equal to the one sent in the last Configure-Nak,
the possibility of a looped-back link is increased, and a new
Magic-Number MUST be chosen. In either case, a new Configure-
Request SHOULD be sent with the new Magic-Number.

If the link is indeed looped-back, this sequence (transmit
Configure-Request, receive Configure-Request, transmit Configure-

Nak, receive Configure-Nak) will repeat over and over again. If
the link is not looped-back, this sequence might occur a few
times, but it is extremely unlikely to occur repeatedly. More
likely, the Magic-Numbers chosen at either end will quickly
diverge, terminating the sequence. The following table shows the
probability of collisions assuming that both ends of the link
select Magic-Numbers with a perfectly uniform distribution:

Number of Collisions Probability
-------------------- ---------------------
1 1/2**32 = 2.3 E-10
2 1/2**32**2 = 5.4 E-20
3 1/2**32**3 = 1.3 E-29

Good sources of uniqueness or randomness are required for this
divergence to occur. If a good source of uniqueness cannot be
found, it is recommended that this Configuration Option not be
enabled; Configure-Requests with the option SHOULD NOT be
transmitted and any Magic-Number Configuration Options which the
peer sends SHOULD be either acknowledged or rejected. In this
case, looped-back links cannot be reliably detected by the
implementation, although they may still be detectable by the peer.

If an implementation does transmit a Configure-Request with a
Magic-Number Configuration Option, then it MUST NOT respond with a
Configure-Reject when it receives a Configure-Request with a
Magic-Number Configuration Option. That is, if an implementation
desires to use Magic Numbers, then it MUST also allow its peer to
do so. If an implementation does receive a Configure-Reject in
response to a Configure-Request, it can only mean that the link is
not looped-back, and that its peer will not be using Magic-
Numbers. In this case, an implementation SHOULD act as if the
negotiation had been successful (as if it had instead received a
Configure-Ack).

The Magic-Number also may be used to detect looped-back links
during normal operation, as well as during Configuration Option
negotiation. All LCP Echo-Request, Echo-Reply, and Discard-
Request packets have a Magic-Number field. If Magic-Number has
been successfully negotiated, an implementation MUST transmit
these packets with the Magic-Number field set to its negotiated
Magic-Number.

The Magic-Number field of these packets SHOULD be inspected on
reception. All received Magic-Number fields MUST be equal to
either zero or the peer's unique Magic-Number, depending on
whether or not the peer negotiated a Magic-Number.

Reception of a Magic-Number field equal to the negotiated local
Magic-Number indicates a looped-back link. Reception of a Magic-
Number other than the negotiated local Magic-Number, the peer's
negotiated Magic-Number, or zero if the peer didn't negotiate one,
indicates a link which has been (mis)configured for communications
with a different peer.

Procedures for recovery from either case are unspecified, and may
vary from implementation to implementation. A somewhat
pessimistic procedure is to assume a LCP Down event. A further
Open event will begin the process of re-establishing the link,
which can't complete until the looped-back condition is
terminated, and Magic-Numbers are successfully negotiated. A more
optimistic procedure (in the case of a looped-back link) is to
begin transmitting LCP Echo-Request packets until an appropriate
Echo-Reply is received, indicating a termination of the looped-
back condition.

A summary of the Magic-Number Configuration Option format is shown
below. The fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | Magic-Number
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Magic-Number (cont) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Type

5

Length

6

Magic-Number

The Magic-Number field is four octets, and indicates a number
which is very likely to be unique to one end of the link. A
Magic-Number of zero is illegal and MUST always be Nak'd, if it is
not Rejected outright.

6.5. Protocol-Field-Compression (PFC)

Description

This Configuration Option provides a method to negotiate the
compression of the PPP Protocol field. By default, all
implementations MUST transmit packets with two octet PPP Protocol
fields.

PPP Protocol field numbers are chosen such that some values may be
compressed into a single octet form which is clearly
distinguishable from the two octet form. This Configuration
Option is sent to inform the peer that the implementation can
receive such single octet Protocol fields.

As previously mentioned, the Protocol field uses an extension
mechanism consistent with the ISO 3309 extension mechanism for the
Address field; the Least Significant Bit (LSB) of each octet is
used to indicate extension of the Protocol field. A binary "0" as
the LSB indicates that the Protocol field continues with the
following octet. The presence of a binary "1" as the LSB marks
the last octet of the Protocol field. Notice that any number of
"0" octets may be prepended to the field, and will still indicate
the same value (consider the two binary representations for 3,
00000011 and 00000000 00000011).

When using low speed links, it is desirable to conserve bandwidth
by sending as little redundant data as possible. The Protocol-
Field-Compression Configuration Option allows a trade-off between
implementation simplicity and bandwidth efficiency. If
successfully negotiated, the ISO 3309 extension mechanism may be
used to compress the Protocol field to one octet instead of two.
The large majority of packets are compressible since data
protocols are typically assigned with Protocol field values less
than 256.

Compressed Protocol fields MUST NOT be transmitted unless this
Configuration Option has been negotiated. When negotiated, PPP
implementations MUST accept PPP packets with either double-octet
or single-octet Protocol fields, and MUST NOT distinguish between
them.

The Protocol field is never compressed when sending any LCP
packet. This rule guarantees unambiguous recognition of LCP
packets.

When a Protocol field is compressed, the Data Link Layer FCS field
is calculated on the compressed frame, not the original

uncompressed frame.

A summary of the Protocol-Field-Compression Configuration Option
format is shown below. The fields are transmitted from left to
right.

0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Type

7

Length

2

6.6. Address-and-Control-Field-Compression (ACFC)

Description

This Configuration Option provides a method to negotiate the
compression of the Data Link Layer Address and Control fields. By
default, all implementations MUST transmit frames with Address and
Control fields appropriate to the link framing.

Since these fields usually have constant values for point-to-point
links, they are easily compressed. This Configuration Option is
sent to inform the peer that the implementation can receive
compressed Address and Control fields.

If a compressed frame is received when Address-and-Control-Field-
Compression has not been negotiated, the implementation MAY
silently discard the frame.

The Address and Control fields MUST NOT be compressed when sending
any LCP packet. This rule guarantees unambiguous recognition of
LCP packets.

When the Address and Control fields are compressed, the Data Link
Layer FCS field is calculated on the compressed frame, not the
original uncompressed frame.

A summary of the Address-and-Control-Field-Compression configuration
option format is shown below. The fields are transmitted from left
to right.

0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Type

8

Length

2

Security Considerations

Security issues are briefly discussed in sections concerning the
Authentication Phase, the Close event, and the Authentication-
Protocol Configuration Option.

References

[1] Perkins, D., "Requirements for an Internet Standard Point-to-
Point Protocol", RFC 1547, Carnegie Mellon University,
December 1993.

[2] Reynolds, J., and Postel, J., "Assigned Numbers", STD 2, RFC
1340, USC/Information Sciences Institute, July 1992.

Acknowledgements

This document is the product of the Point-to-Point Protocol Working
Group of the Internet Engineering Task Force (IETF). Comments should
be submitted to the ietf-ppp@merit.edu mailing list.

Much of the text in this document is taken from the working group
requirements [1]; and RFCs 1171 & 1172, by Drew Perkins while at
Carnegie Mellon University, and by Russ Hobby of the University of
California at Davis.

William Simpson was principally responsible for introducing
consistent terminology and philosophy, and the re-design of the phase
and negotiation state machines.

Many people spent significant time helping to develop the Point-to-
Point Protocol. The complete list of people is too numerous to list,
but the following people deserve special thanks: Rick Adams, Ken
Adelman, Fred Baker, Mike Ballard, Craig Fox, Karl Fox, Phill Gross,
Kory Hamzeh, former WG chair Russ Hobby, David Kaufman, former WG
chair Steve Knowles, Mark Lewis, former WG chair Brian Lloyd, John
LoVerso, Bill Melohn, Mike Patton, former WG chair Drew Perkins, Greg
Satz, John Shriver, Vernon Schryver, and Asher Waldfogel.

Special thanks to Morning Star Technologies for providing computing
resources and network access support for writing this specification.

Chair's Address

The working group can be contacted via the current chair:

Fred Baker
Advanced Computer Communications
315 Bollay Drive
Santa Barbara, California 93117

fbaker@acc.com

Editor's Address

Questions about this memo can also be directed to:

William Allen Simpson
Daydreamer
Computer Systems Consulting Services
1384 Fontaine
Madison Heights, Michigan 48071

Bill.Simpson@um.cc.umich.edu
bsimpson@MorningStar.com

Simpson [Page 52]

RFC 2406 – IP Encapsulating Security Payload (ESP)

 
Network Working Group                                            S. Kent
Request for Comments: 2406 BBN Corp
Obsoletes: 1827 R. Atkinson
Category: Standards Track @Home Network
November 1998

IP Encapsulating Security Payload (ESP)

Status of this Memo

This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.

Copyright Notice

Copyright (C) The Internet Society (1998). All Rights Reserved.

Table of Contents

1. Introduction..................................................2
2. Encapsulating Security Payload Packet Format..................3
2.1 Security Parameters Index................................4
2.2 Sequence Number .........................................4
2.3 Payload Data.............................................5
2.4 Padding (for Encryption).................................5
2.5 Pad Length...............................................7
2.6 Next Header..............................................7
2.7 Authentication Data......................................7
3. Encapsulating Security Protocol Processing....................7
3.1 ESP Header Location......................................7
3.2 Algorithms..............................................10
3.2.1 Encryption Algorithms..............................10
3.2.2 Authentication Algorithms..........................10
3.3 Outbound Packet Processing..............................10
3.3.1 Security Association Lookup........................11
3.3.2 Packet Encryption..................................11
3.3.3 Sequence Number Generation.........................12
3.3.4 Integrity Check Value Calculation..................12
3.3.5 Fragmentation......................................13
3.4 Inbound Packet Processing...............................13
3.4.1 Reassembly.........................................13
3.4.2 Security Association Lookup........................13
3.4.3 Sequence Number Verification.......................14
3.4.4 Integrity Check Value Verification.................15

3.4.5 Packet Decryption..................................16
4. Auditing.....................................................17
5. Conformance Requirements.....................................18
6. Security Considerations......................................18
7. Differences from RFC 1827....................................18
Acknowledgements................................................19
References......................................................19
Disclaimer......................................................20
Author Information..............................................21
Full Copyright Statement........................................22

1. Introduction

The Encapsulating Security Payload (ESP) header is designed to
provide a mix of security services in IPv4 and IPv6. ESP may be
applied alone, in combination with the IP Authentication Header (AH)
[KA97b], or in a nested fashion, e.g., through the use of tunnel mode
(see "Security Architecture for the Internet Protocol" [KA97a],
hereafter referred to as the Security Architecture document).
Security services can be provided between a pair of communicating
hosts, between a pair of communicating security gateways, or between
a security gateway and a host. For more details on how to use ESP
and AH in various network environments, see the Security Architecture
document [KA97a].

The ESP header is inserted after the IP header and before the upper
layer protocol header (transport mode) or before an encapsulated IP
header (tunnel mode). These modes are described in more detail
below.

ESP is used to provide confidentiality, data origin authentication,
connectionless integrity, an anti-replay service (a form of partial
sequence integrity), and limited traffic flow confidentiality. The
set of services provided depends on options selected at the time of
Security Association establishment and on the placement of the
implementation. Confidentiality may be selected independent of all
other services. However, use of confidentiality without
integrity/authentication (either in ESP or separately in AH) may
subject traffic to certain forms of active attacks that could
undermine the confidentiality service (see [Bel96]). Data origin
authentication and connectionless integrity are joint services
(hereafter referred to jointly as "authentication) and are offered as
an option in conjunction with (optional) confidentiality. The anti-
replay service may be selected only if data origin authentication is
selected, and its election is solely at the discretion of the
receiver. (Although the default calls for the sender to increment
the Sequence Number used for anti-replay, the service is effective
only if the receiver checks the Sequence Number.) Traffic flow

confidentiality requires selection of tunnel mode, and is most
effective if implemented at a security gateway, where traffic
aggregation may be able to mask true source-destination patterns.
Note that although both confidentiality and authentication are
optional, at least one of them MUST be selected.

It is assumed that the reader is familiar with the terms and concepts
described in the Security Architecture document. In particular, the
reader should be familiar with the definitions of security services
offered by ESP and AH, the concept of Security Associations, the ways
in which ESP can be used in conjunction with the Authentication
Header (AH), and the different key management options available for
ESP and AH. (With regard to the last topic, the current key
management options required for both AH and ESP are manual keying and
automated keying via IKE [HC98].)

The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in RFC 2119 [Bra97].

2. Encapsulating Security Payload Packet Format

The protocol header (IPv4, IPv6, or Extension) immediately preceding
the ESP header will contain the value 50 in its Protocol (IPv4) or
Next Header (IPv6, Extension) field [STD-2].

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ----
| Security Parameters Index (SPI) | ^Auth.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Cov-
| Sequence Number | |erage
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ----
| Payload Data* (variable) | | ^
~ ~ | |
| | |Conf.
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Cov-
| | Padding (0-255 bytes) | |erage*
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
| | Pad Length | Next Header | v v
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------
| Authentication Data (variable) |
~ ~
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

* If included in the Payload field, cryptographic
synchronization data, e.g., an Initialization Vector (IV, see

Section 2.3), usually is not encrypted per se, although it
often is referred to as being part of the ciphertext.

The following subsections define the fields in the header format.
"Optional" means that the field is omitted if the option is not
selected, i.e., it is present in neither the packet as transmitted
nor as formatted for computation of an Integrity Check Value (ICV,
see Section 2.7). Whether or not an option is selected is defined as
part of Security Association (SA) establishment. Thus the format of
ESP packets for a given SA is fixed, for the duration of the SA. In
contrast, "mandatory" fields are always present in the ESP packet
format, for all SAs.

2.1 Security Parameters Index

The SPI is an arbitrary 32-bit value that, in combination with the
destination IP address and security protocol (ESP), uniquely
identifies the Security Association for this datagram. The set of
SPI values in the range 1 through 255 are reserved by the Internet
Assigned Numbers Authority (IANA) for future use; a reserved SPI
value will not normally be assigned by IANA unless the use of the
assigned SPI value is specified in an RFC. It is ordinarily selected
by the destination system upon establishment of an SA (see the
Security Architecture document for more details). The SPI field is
mandatory.

The SPI value of zero (0) is reserved for local, implementation-
specific use and MUST NOT be sent on the wire. For example, a key
management implementation MAY use the zero SPI value to mean "No
Security Association Exists" during the period when the IPsec
implementation has requested that its key management entity establish
a new SA, but the SA has not yet been established.

2.2 Sequence Number

This unsigned 32-bit field contains a monotonically increasing
counter value (sequence number). It is mandatory and is always
present even if the receiver does not elect to enable the anti-replay
service for a specific SA. Processing of the Sequence Number field
is at the discretion of the receiver, i.e., the sender MUST always
transmit this field, but the receiver need not act upon it (see the
discussion of Sequence Number Verification in the "Inbound Packet
Processing" section below).

The sender's counter and the receiver's counter are initialized to 0
when an SA is established. (The first packet sent using a given SA
will have a Sequence Number of 1; see Section 3.3.3 for more details
on how the Sequence Number is generated.) If anti-replay is enabled

(the default), the transmitted Sequence Number must never be allowed
to cycle. Thus, the sender's counter and the receiver's counter MUST
be reset (by establishing a new SA and thus a new key) prior to the
transmission of the 2^32nd packet on an SA.

2.3 Payload Data

Payload Data is a variable-length field containing data described by
the Next Header field. The Payload Data field is mandatory and is an
integral number of bytes in length. If the algorithm used to encrypt
the payload requires cryptographic synchronization data, e.g., an
Initialization Vector (IV), then this data MAY be carried explicitly
in the Payload field. Any encryption algorithm that requires such
explicit, per-packet synchronization data MUST indicate the length,
any structure for such data, and the location of this data as part of
an RFC specifying how the algorithm is used with ESP. If such
synchronization data is implicit, the algorithm for deriving the data
MUST be part of the RFC.

Note that with regard to ensuring the alignment of the (real)
ciphertext in the presence of an IV:

o For some IV-based modes of operation, the receiver treats
the IV as the start of the ciphertext, feeding it into the
algorithm directly. In these modes, alignment of the start
of the (real) ciphertext is not an issue at the receiver.
o In some cases, the receiver reads the IV in separately from
the ciphertext. In these cases, the algorithm
specification MUST address how alignment of the (real)
ciphertext is to be achieved.

2.4 Padding (for Encryption)

Several factors require or motivate use of the Padding field.

o If an encryption algorithm is employed that requires the
plaintext to be a multiple of some number of bytes, e.g.,
the block size of a block cipher, the Padding field is used
to fill the plaintext (consisting of the Payload Data, Pad
Length and Next Header fields, as well as the Padding) to
the size required by the algorithm.

o Padding also may be required, irrespective of encryption
algorithm requirements, to ensure that the resulting
ciphertext terminates on a 4-byte boundary. Specifically,

the Pad Length and Next Header fields must be right aligned
within a 4-byte word, as illustrated in the ESP packet
format figure above, to ensure that the Authentication Data
field (if present) is aligned on a 4-byte boundary.

o Padding beyond that required for the algorithm or alignment
reasons cited above, may be used to conceal the actual
length of the payload, in support of (partial) traffic flow
confidentiality. However, inclusion of such additional
padding has adverse bandwidth implications and thus its use
should be undertaken with care.

The sender MAY add 0-255 bytes of padding. Inclusion of the Padding
field in an ESP packet is optional, but all implementations MUST
support generation and consumption of padding.

a. For the purpose of ensuring that the bits to be encrypted
are a multiple of the algorithm's blocksize (first bullet
above), the padding computation applies to the Payload
Data exclusive of the IV, the Pad Length, and Next Header
fields.

b. For the purposes of ensuring that the Authentication Data
is aligned on a 4-byte boundary (second bullet above), the
padding computation applies to the Payload Data inclusive
of the IV, the Pad Length, and Next Header fields.

If Padding bytes are needed but the encryption algorithm does not
specify the padding contents, then the following default processing
MUST be used. The Padding bytes are initialized with a series of
(unsigned, 1-byte) integer values. The first padding byte appended
to the plaintext is numbered 1, with subsequent padding bytes making
up a monotonically increasing sequence: 1, 2, 3, ... When this
padding scheme is employed, the receiver SHOULD inspect the Padding
field. (This scheme was selected because of its relative simplicity,
ease of implementation in hardware, and because it offers limited
protection against certain forms of "cut and paste" attacks in the
absence of other integrity measures, if the receiver checks the
padding values upon decryption.)

Any encryption algorithm that requires Padding other than the default
described above, MUST define the Padding contents (e.g., zeros or
random data) and any required receiver processing of these Padding
bytes in an RFC specifying how the algorithm is used with ESP. In
such circumstances, the content of the Padding field will be
determined by the encryption algorithm and mode selected and defined
in the corresponding algorithm RFC. The relevant algorithm RFC MAY
specify that a receiver MUST inspect the Padding field or that a

receiver MUST inform senders of how the receiver will handle the
Padding field.

2.5 Pad Length

The Pad Length field indicates the number of pad bytes immediately
preceding it. The range of valid values is 0-255, where a value of
zero indicates that no Padding bytes are present. The Pad Length
field is mandatory.

2.6 Next Header

The Next Header is an 8-bit field that identifies the type of data
contained in the Payload Data field, e.g., an extension header in
IPv6 or an upper layer protocol identifier. The value of this field
is chosen from the set of IP Protocol Numbers defined in the most
recent "Assigned Numbers" [STD-2] RFC from the Internet Assigned
Numbers Authority (IANA). The Next Header field is mandatory.

2.7 Authentication Data

The Authentication Data is a variable-length field containing an
Integrity Check Value (ICV) computed over the ESP packet minus the
Authentication Data. The length of the field is specified by the
authentication function selected. The Authentication Data field is
optional, and is included only if the authentication service has been
selected for the SA in question. The authentication algorithm
specification MUST specify the length of the ICV and the comparison
rules and processing steps for validation.

3. Encapsulating Security Protocol Processing

3.1 ESP Header Location

Like AH, ESP may be employed in two ways: transport mode or tunnel
mode. The former mode is applicable only to host implementations and
provides protection for upper layer protocols, but not the IP header.
(In this mode, note that for "bump-in-the-stack" or "bump-in-the-
wire" implementations, as defined in the Security Architecture
document, inbound and outbound IP fragments may require an IPsec
implementation to perform extra IP reassembly/fragmentation in order
to both conform to this specification and provide transparent IPsec
support. Special care is required to perform such operations within
these implementations when multiple interfaces are in use.)

In transport mode, ESP is inserted after the IP header and before an
upper layer protocol, e.g., TCP, UDP, ICMP, etc. or before any other
IPsec headers that have already been inserted. In the context of

IPv4, this translates to placing ESP after the IP header (and any
options that it contains), but before the upper layer protocol.
(Note that the term "transport" mode should not be misconstrued as
restricting its use to TCP and UDP. For example, an ICMP message MAY
be sent using either "transport" mode or "tunnel" mode.) The
following diagram illustrates ESP transport mode positioning for a
typical IPv4 packet, on a "before and after" basis. (The "ESP
trailer" encompasses any Padding, plus the Pad Length, and Next
Header fields.)

BEFORE APPLYING ESP
----------------------------
IPv4 |orig IP hdr | | |
|(any options)| TCP | Data |
----------------------------

AFTER APPLYING ESP
-------------------------------------------------
IPv4 |orig IP hdr | ESP | | | ESP | ESP|
|(any options)| Hdr | TCP | Data | Trailer |Auth|
-------------------------------------------------
|<----- encrypted ---->|
|<------ authenticated ----->|

In the IPv6 context, ESP is viewed as an end-to-end payload, and thus
should appear after hop-by-hop, routing, and fragmentation extension
headers. The destination options extension header(s) could appear
either before or after the ESP header depending on the semantics
desired. However, since ESP protects only fields after the ESP
header, it generally may be desirable to place the destination
options header(s) after the ESP header. The following diagram
illustrates ESP transport mode positioning for a typical IPv6 packet.

BEFORE APPLYING ESP
---------------------------------------
IPv6 | | ext hdrs | | |
| orig IP hdr |if present| TCP | Data |
---------------------------------------

AFTER APPLYING ESP
---------------------------------------------------------
IPv6 | orig |hop-by-hop,dest*,| |dest| | | ESP | ESP|
|IP hdr|routing,fragment.|ESP|opt*|TCP|Data|Trailer|Auth|
---------------------------------------------------------
|<---- encrypted ---->|
|<---- authenticated ---->|

* = if present, could be before ESP, after ESP, or both

ESP and AH headers can be combined in a variety of modes. The IPsec
Architecture document describes the combinations of security
associations that must be supported.

Tunnel mode ESP may be employed in either hosts or security gateways.
When ESP is implemented in a security gateway (to protect subscriber
transit traffic), tunnel mode must be used. In tunnel mode, the
"inner" IP header carries the ultimate source and destination
addresses, while an "outer" IP header may contain distinct IP
addresses, e.g., addresses of security gateways. In tunnel mode, ESP
protects the entire inner IP packet, including the entire inner IP
header. The position of ESP in tunnel mode, relative to the outer IP
header, is the same as for ESP in transport mode. The following
diagram illustrates ESP tunnel mode positioning for typical IPv4 and
IPv6 packets.

-----------------------------------------------------------
IPv4 | new IP hdr* | | orig IP hdr* | | | ESP | ESP|
|(any options)| ESP | (any options) |TCP|Data|Trailer|Auth|
-----------------------------------------------------------
|<--------- encrypted ---------->|
|<----------- authenticated ---------->|

------------------------------------------------------------
IPv6 | new* |new ext | | orig*|orig ext | | | ESP | ESP|
|IP hdr| hdrs* |ESP|IP hdr| hdrs * |TCP|Data|Trailer|Auth|
------------------------------------------------------------
|<--------- encrypted ----------->|
|<---------- authenticated ---------->|

* = if present, construction of outer IP hdr/extensions
and modification of inner IP hdr/extensions is
discussed below.

3.2 Algorithms

The mandatory-to-implement algorithms are described in Section 5,
"Conformance Requirements". Other algorithms MAY be supported. Note
that although both confidentiality and authentication are optional,
at least one of these services MUST be selected hence both algorithms
MUST NOT be simultaneously NULL.

3.2.1 Encryption Algorithms

The encryption algorithm employed is specified by the SA. ESP is
designed for use with symmetric encryption algorithms. Because IP
packets may arrive out of order, each packet must carry any data
required to allow the receiver to establish cryptographic
synchronization for decryption. This data may be carried explicitly
in the payload field, e.g., as an IV (as described above), or the
data may be derived from the packet header. Since ESP makes
provision for padding of the plaintext, encryption algorithms
employed with ESP may exhibit either block or stream mode
characteristics. Note that since encryption (confidentiality) is
optional, this algorithm may be "NULL".

3.2.2 Authentication Algorithms

The authentication algorithm employed for the ICV computation is
specified by the SA. For point-to-point communication, suitable
authentication algorithms include keyed Message Authentication Codes
(MACs) based on symmetric encryption algorithms (e.g., DES) or on
one-way hash functions (e.g., MD5 or SHA-1). For multicast
communication, one-way hash algorithms combined with asymmetric
signature algorithms are appropriate, though performance and space
considerations currently preclude use of such algorithms. Note that
since authentication is optional, this algorithm may be "NULL".

3.3 Outbound Packet Processing

In transport mode, the sender encapsulates the upper layer protocol
information in the ESP header/trailer, and retains the specified IP
header (and any IP extension headers in the IPv6 context). In tunnel
mode, the outer and inner IP header/extensions can be inter-related
in a variety of ways. The construction of the outer IP
header/extensions during the encapsulation process is described in
the Security Architecture document. If there is more than one IPsec
header/extension required by security policy, the order of the
application of the security headers MUST be defined by security
policy.

3.3.1 Security Association Lookup

ESP is applied to an outbound packet only after an IPsec
implementation determines that the packet is associated with an SA
that calls for ESP processing. The process of determining what, if
any, IPsec processing is applied to outbound traffic is described in
the Security Architecture document.

3.3.2 Packet Encryption

In this section, we speak in terms of encryption always being applied
because of the formatting implications. This is done with the
understanding that "no confidentiality" is offered by using the NULL
encryption algorithm. Accordingly, the sender:

1. encapsulates (into the ESP Payload field):
- for transport mode -- just the original upper layer
protocol information.
- for tunnel mode -- the entire original IP datagram.
2. adds any necessary padding.
3. encrypts the result (Payload Data, Padding, Pad Length, and
Next Header) using the key, encryption algorithm, algorithm
mode indicated by the SA and cryptographic synchronization
data (if any).
- If explicit cryptographic synchronization data, e.g.,
an IV, is indicated, it is input to the encryption
algorithm per the algorithm specification and placed
in the Payload field.
- If implicit cryptographic synchronication data, e.g.,
an IV, is indicated, it is constructed and input to
the encryption algorithm as per the algorithm
specification.

The exact steps for constructing the outer IP header depend on the
mode (transport or tunnel) and are described in the Security
Architecture document.

If authentication is selected, encryption is performed first, before
the authentication, and the encryption does not encompass the
Authentication Data field. This order of processing facilitates
rapid detection and rejection of replayed or bogus packets by the
receiver, prior to decrypting the packet, hence potentially reducing
the impact of denial of service attacks. It also allows for the
possibility of parallel processing of packets at the receiver, i.e.,
decryption can take place in parallel with authentication. Note that
since the Authentication Data is not protected by encryption, a keyed
authentication algorithm must be employed to compute the ICV.

3.3.3 Sequence Number Generation

The sender's counter is initialized to 0 when an SA is established.
The sender increments the Sequence Number for this SA and inserts the
new value into the Sequence Number field. Thus the first packet sent
using a given SA will have a Sequence Number of 1.

If anti-replay is enabled (the default), the sender checks to ensure
that the counter has not cycled before inserting the new value in the
Sequence Number field. In other words, the sender MUST NOT send a
packet on an SA if doing so would cause the Sequence Number to cycle.
An attempt to transmit a packet that would result in Sequence Number
overflow is an auditable event. (Note that this approach to Sequence
Number management does not require use of modular arithmetic.)

The sender assumes anti-replay is enabled as a default, unless
otherwise notified by the receiver (see 3.4.3). Thus, if the counter
has cycled, the sender will set up a new SA and key (unless the SA
was configured with manual key management).

If anti-replay is disabled, the sender does not need to monitor or
reset the counter, e.g., in the case of manual key management (see
Section 5). However, the sender still increments the counter and
when it reaches the maximum value, the counter rolls over back to
zero.

3.3.4 Integrity Check Value Calculation

If authentication is selected for the SA, the sender computes the ICV
over the ESP packet minus the Authentication Data. Thus the SPI,
Sequence Number, Payload Data, Padding (if present), Pad Length, and
Next Header are all encompassed by the ICV computation. Note that
the last 4 fields will be in ciphertext form, since encryption is
performed prior to authentication.

For some authentication algorithms, the byte string over which the
ICV computation is performed must be a multiple of a blocksize
specified by the algorithm. If the length of this byte string does
not match the blocksize requirements for the algorithm, implicit
padding MUST be appended to the end of the ESP packet, (after the
Next Header field) prior to ICV computation. The padding octets MUST
have a value of zero. The blocksize (and hence the length of the
padding) is specified by the algorithm specification. This padding
is not transmitted with the packet. Note that MD5 and SHA-1 are
viewed as having a 1-byte blocksize because of their internal padding
conventions.

3.3.5 Fragmentation

If necessary, fragmentation is performed after ESP processing within
an IPsec implementation. Thus, transport mode ESP is applied only to
whole IP datagrams (not to IP fragments). An IP packet to which ESP
has been applied may itself be fragmented by routers en route, and
such fragments must be reassembled prior to ESP processing at a
receiver. In tunnel mode, ESP is applied to an IP packet, the
payload of which may be a fragmented IP packet. For example, a
security gateway or a "bump-in-the-stack" or "bump-in-the-wire" IPsec
implementation (as defined in the Security Architecture document) may
apply tunnel mode ESP to such fragments.

NOTE: For transport mode -- As mentioned at the beginning of Section
3.1, bump-in-the-stack and bump-in-the-wire implementations may have
to first reassemble a packet fragmented by the local IP layer, then
apply IPsec, and then fragment the resulting packet.

NOTE: For IPv6 -- For bump-in-the-stack and bump-in-the-wire
implementations, it will be necessary to walk through all the
extension headers to determine if there is a fragmentation header and
hence that the packet needs reassembling prior to IPsec processing.

3.4 Inbound Packet Processing

3.4.1 Reassembly

If required, reassembly is performed prior to ESP processing. If a
packet offered to ESP for processing appears to be an IP fragment,
i.e., the OFFSET field is non-zero or the MORE FRAGMENTS flag is set,
the receiver MUST discard the packet; this is an auditable event. The
audit log entry for this event SHOULD include the SPI value,
date/time received, Source Address, Destination Address, Sequence
Number, and (in IPv6) the Flow ID.

NOTE: For packet reassembly, the current IPv4 spec does NOT require
either the zero'ing of the OFFSET field or the clearing of the MORE
FRAGMENTS flag. In order for a reassembled packet to be processed by
IPsec (as opposed to discarded as an apparent fragment), the IP code
must do these two things after it reassembles a packet.

3.4.2 Security Association Lookup

Upon receipt of a (reassembled) packet containing an ESP Header, the
receiver determines the appropriate (unidirectional) SA, based on the
destination IP address, security protocol (ESP), and the SPI. (This
process is described in more detail in the Security Architecture
document.) The SA indicates whether the Sequence Number field will

be checked, whether the Authentication Data field should be present,
and it will specify the algorithms and keys to be employed for
decryption and ICV computations (if applicable).

If no valid Security Association exists for this session (for
example, the receiver has no key), the receiver MUST discard the
packet; this is an auditable event. The audit log entry for this
event SHOULD include the SPI value, date/time received, Source
Address, Destination Address, Sequence Number, and (in IPv6) the
cleartext Flow ID.

3.4.3 Sequence Number Verification

All ESP implementations MUST support the anti-replay service, though
its use may be enabled or disabled by the receiver on a per-SA basis.
This service MUST NOT be enabled unless the authentication service
also is enabled for the SA, since otherwise the Sequence Number field
has not been integrity protected. (Note that there are no provisions
for managing transmitted Sequence Number values among multiple
senders directing traffic to a single SA (irrespective of whether the
destination address is unicast, broadcast, or multicast). Thus the
anti-replay service SHOULD NOT be used in a multi-sender environment
that employs a single SA.)

If the receiver does not enable anti-replay for an SA, no inbound
checks are performed on the Sequence Number. However, from the
perspective of the sender, the default is to assume that anti-replay
is enabled at the receiver. To avoid having the sender do
unnecessary sequence number monitoring and SA setup (see section
3.3.3), if an SA establishment protocol such as IKE is employed, the
receiver SHOULD notify the sender, during SA establishment, if the
receiver will not provide anti-replay protection.

If the receiver has enabled the anti-replay service for this SA, the
receive packet counter for the SA MUST be initialized to zero when
the SA is established. For each received packet, the receiver MUST
verify that the packet contains a Sequence Number that does not
duplicate the Sequence Number of any other packets received during
the life of this SA. This SHOULD be the first ESP check applied to a
packet after it has been matched to an SA, to speed rejection of
duplicate packets.

Duplicates are rejected through the use of a sliding receive window.
(How the window is implemented is a local matter, but the following
text describes the functionality that the implementation must
exhibit.) A MINIMUM window size of 32 MUST be supported; but a
window size of 64 is preferred and SHOULD be employed as the default.

Another window size (larger than the MINIMUM) MAY be chosen by the
receiver. (The receiver does NOT notify the sender of the window
size.)

The "right" edge of the window represents the highest, validated
Sequence Number value received on this SA. Packets that contain
Sequence Numbers lower than the "left" edge of the window are
rejected. Packets falling within the window are checked against a
list of received packets within the window. An efficient means for
performing this check, based on the use of a bit mask, is described
in the Security Architecture document.

If the received packet falls within the window and is new, or if the
packet is to the right of the window, then the receiver proceeds to
ICV verification. If the ICV validation fails, the receiver MUST
discard the received IP datagram as invalid; this is an auditable
event. The audit log entry for this event SHOULD include the SPI
value, date/time received, Source Address, Destination Address, the
Sequence Number, and (in IPv6) the Flow ID. The receive window is
updated only if the ICV verification succeeds.

DISCUSSION:

Note that if the packet is either inside the window and new, or is
outside the window on the "right" side, the receiver MUST
authenticate the packet before updating the Sequence Number window
data.

3.4.4 Integrity Check Value Verification

If authentication has been selected, the receiver computes the ICV
over the ESP packet minus the Authentication Data using the specified
authentication algorithm and verifies that it is the same as the ICV
included in the Authentication Data field of the packet. Details of
the computation are provided below.

If the computed and received ICV's match, then the datagram is valid,
and it is accepted. If the test fails, then the receiver MUST
discard the received IP datagram as invalid; this is an auditable
event. The log data SHOULD include the SPI value, date/time
received, Source Address, Destination Address, the Sequence Number,
and (in IPv6) the cleartext Flow ID.

DISCUSSION:

Begin by removing and saving the ICV value (Authentication Data
field). Next check the overall length of the ESP packet minus the
Authentication Data. If implicit padding is required, based on

the blocksize of the authentication algorithm, append zero-filled
bytes to the end of the ESP packet directly after the Next Header
field. Perform the ICV computation and compare the result with
the saved value, using the comparison rules defined by the
algorithm specification. (For example, if a digital signature and
one-way hash are used for the ICV computation, the matching
process is more complex.)

3.4.5 Packet Decryption

As in section 3.3.2, "Packet Encryption", we speak here in terms of
encryption always being applied because of the formatting
implications. This is done with the understanding that "no
confidentiality" is offered by using the NULL encryption algorithm.
Accordingly, the receiver:

1. decrypts the ESP Payload Data, Padding, Pad Length, and Next
Header using the key, encryption algorithm, algorithm mode,
and cryptographic synchronization data (if any), indicated by
the SA.
- If explicit cryptographic synchronization data, e.g.,
an IV, is indicated, it is taken from the Payload
field and input to the decryption algorithm as per the
algorithm specification.
- If implicit cryptographic synchronization data, e.g.,
an IV, is indicated, a local version of the IV is
constructed and input to the decryption algorithm as
per the algorithm specification.
2. processes any padding as specified in the encryption
algorithm specification. If the default padding scheme (see
Section 2.4) has been employed, the receiver SHOULD inspect
the Padding field before removing the padding prior to
passing the decrypted data to the next layer.
3. reconstructs the original IP datagram from:
- for transport mode -- original IP header plus the
original upper layer protocol information in the ESP
Payload field
- for tunnel mode -- tunnel IP header + the entire IP
datagram in the ESP Payload field.

The exact steps for reconstructing the original datagram depend on
the mode (transport or tunnel) and are described in the Security
Architecture document. At a minimum, in an IPv6 context, the
receiver SHOULD ensure that the decrypted data is 8-byte aligned, to
facilitate processing by the protocol identified in the Next Header
field.

If authentication has been selected, verification and decryption MAY
be performed serially or in parallel. If performed serially, then
ICV verification SHOULD be performed first. If performed in
parallel, verification MUST be completed before the decrypted packet
is passed on for further processing. This order of processing
facilitates rapid detection and rejection of replayed or bogus
packets by the receiver, prior to decrypting the packet, hence
potentially reducing the impact of denial of service attacks. Note:

If the receiver performs decryption in parallel with authentication,
care must be taken to avoid possible race conditions with regard to
packet access and reconstruction of the decrypted packet.

Note that there are several ways in which the decryption can "fail":

a. The selected SA may not be correct -- The SA may be
mis-selected due to tampering with the SPI, destination
address, or IPsec protocol type fields. Such errors, if they
map the packet to another extant SA, will be
indistinguishable from a corrupted packet, (case c).
Tampering with the SPI can be detected by use of
authentication. However, an SA mismatch might still occur
due to tampering with the IP Destination Address or the IPsec
protocol type field.

b. The pad length or pad values could be erroneous -- Bad pad
lengths or pad values can be detected irrespective of the use
of authentication.

c. The encrypted ESP packet could be corrupted -- This can be
detected if authentication is selected for the SA.,

In case (a) or (c), the erroneous result of the decryption operation
(an invalid IP datagram or transport-layer frame) will not
necessarily be detected by IPsec, and is the responsibility of later
protocol processing.

4. Auditing

Not all systems that implement ESP will implement auditing. However,
if ESP is incorporated into a system that supports auditing, then the
ESP implementation MUST also support auditing and MUST allow a system
administrator to enable or disable auditing for ESP. For the most
part, the granularity of auditing is a local matter. However,
several auditable events are identified in this specification and for
each of these events a minimum set of information that SHOULD be
included in an audit log is defined. Additional information also MAY
be included in the audit log for each of these events, and additional

events, not explicitly called out in this specification, also MAY
result in audit log entries. There is no requirement for the
receiver to transmit any message to the purported sender in response
to the detection of an auditable event, because of the potential to
induce denial of service via such action.

5. Conformance Requirements

Implementations that claim conformance or compliance with this
specification MUST implement the ESP syntax and processing described
here and MUST comply with all requirements of the Security
Architecture document. If the key used to compute an ICV is manually
distributed, correct provision of the anti-replay service would
require correct maintenance of the counter state at the sender, until
the key is replaced, and there likely would be no automated recovery
provision if counter overflow were imminent. Thus a compliant
implementation SHOULD NOT provide this service in conjunction with
SAs that are manually keyed. A compliant ESP implementation MUST
support the following mandatory-to-implement algorithms:

- DES in CBC mode [MD97]
- HMAC with MD5 [MG97a]
- HMAC with SHA-1 [MG97b]
- NULL Authentication algorithm
- NULL Encryption algorithm

Since ESP encryption and authentication are optional, support for the
2 "NULL" algorithms is required to maintain consistency with the way
these services are negotiated. NOTE that while authentication and
encryption can each be "NULL", they MUST NOT both be "NULL".

6. Security Considerations

Security is central to the design of this protocol, and thus security
considerations permeate the specification. Additional security-
relevant aspects of using the IPsec protocol are discussed in the
Security Architecture document.

7. Differences from RFC 1827

This document differs from RFC 1827 [ATK95] in several significant
ways. The major difference is that, this document attempts to
specify a complete framework and context for ESP, whereas RFC 1827
provided a "shell" that was completed through the definition of
transforms. The combinatorial growth of transforms motivated the
reformulation of the ESP specification as a more complete document,
with options for security services that may be offered in the context
of ESP. Thus, fields previously defined in transform documents are

now part of this base ESP specification. For example, the fields
necessary to support authentication (and anti-replay) are now defined
here, even though the provision of this service is an option. The
fields used to support padding for encryption, and for next protocol
identification, are now defined here as well. Packet processing
consistent with the definition of these fields also is included in
the document.

Acknowledgements

Many of the concepts embodied in this specification were derived from
or influenced by the US Government's SP3 security protocol, ISO/IEC's
NLSP, or from the proposed swIPe security protocol. [SDNS89, ISO92,
IB93].

For over 3 years, this document has evolved through multiple versions
and iterations. During this time, many people have contributed
significant ideas and energy to the process and the documents
themselves. The authors would like to thank Karen Seo for providing
extensive help in the review, editing, background research, and
coordination for this version of the specification. The authors
would also like to thank the members of the IPsec and IPng working
groups, with special mention of the efforts of (in alphabetic order):
Steve Bellovin, Steve Deering, Phil Karn, Perry Metzger, David
Mihelcic, Hilarie Orman, Norman Shulman, William Simpson and Nina
Yuan.

References

[ATK95] Atkinson, R., "IP Encapsulating Security Payload (ESP)",
RFC 1827, August 1995.

[Bel96] Steven M. Bellovin, "Problem Areas for the IP Security
Protocols", Proceedings of the Sixth Usenix Unix Security
Symposium, July, 1996.

[Bra97] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Level", BCP 14, RFC 2119, March 1997.

[HC98] Harkins, D., and D. Carrel, "The Internet Key Exchange
(IKE)", RFC 2409, November 1998.

[IB93] John Ioannidis & Matt Blaze, "Architecture and
Implementation of Network-layer Security Under Unix",
Proceedings of the USENIX Security Symposium, Santa Clara,
CA, October 1993.

[ISO92] ISO/IEC JTC1/SC6, Network Layer Security Protocol, ISO-IEC
DIS 11577, International Standards Organisation, Geneva,
Switzerland, 29 November 1992.

[KA97a] Kent, S., and R. Atkinson, "Security Architecture for the
Internet Protocol", RFC 2401, November 1998.

[KA97b] Kent, S., and R. Atkinson, "IP Authentication Header", RFC
2402, November 1998.

[MD97] Madson, C., and N. Doraswamy, "The ESP DES-CBC Cipher
Algorithm With Explicit IV", RFC 2405, November 1998.

[MG97a] Madson, C., and R. Glenn, "The Use of HMAC-MD5-96 within
ESP and AH", RFC 2403, November 1998.

[MG97b] Madson, C., and R. Glenn, "The Use of HMAC-SHA-1-96 within
ESP and AH", RFC 2404, November 1998.

[STD-2] Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC
1700, October 1994. See also:
http://www.iana.org/numbers.html

[SDNS89] SDNS Secure Data Network System, Security Protocol 3, SP3,
Document SDN.301, Revision 1.5, 15 May 1989, as published
in NIST Publication NIST-IR-90-4250, February 1990.

Disclaimer

The views and specification here are those of the authors and are not
necessarily those of their employers. The authors and their
employers specifically disclaim responsibility for any problems
arising from correct or incorrect implementation or use of this
specification.

Author Information

Stephen Kent
BBN Corporation
70 Fawcett Street
Cambridge, MA 02140
USA

Phone: +1 (617) 873-3988
EMail: kent@bbn.com

Randall Atkinson
@Home Network
425 Broadway,
Redwood City, CA 94063
USA

Phone: +1 (415) 569-5000
EMail: rja@corp.home.net

Full Copyright Statement

Copyright (C) The Internet Society (1998). All Rights Reserved.

This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.

The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.

This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


RFC 2402 – IP Authentication Header


Network Working Group S. Kent
Request for Comments: 2402 BBN Corp
Obsoletes: 1826 R. Atkinson
Category: Standards Track @Home Network
November 1998

IP Authentication Header

Status of this Memo

This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.

Copyright Notice

Copyright (C) The Internet Society (1998). All Rights Reserved.

Table of Contents

1. Introduction......................................................2
2. Authentication Header Format......................................3
2.1 Next Header...................................................4
2.2 Payload Length................................................4
2.3 Reserved......................................................4
2.4 Security Parameters Index (SPI)...............................4
2.5 Sequence Number...............................................5
2.6 Authentication Data ..........................................5
3. Authentication Header Processing..................................5
3.1 Authentication Header Location...............................5
3.2 Authentication Algorithms....................................7
3.3 Outbound Packet Processing...................................8
3.3.1 Security Association Lookup.............................8
3.3.2 Sequence Number Generation..............................8
3.3.3 Integrity Check Value Calculation.......................9
3.3.3.1 Handling Mutable Fields............................9
3.3.3.1.1 ICV Computation for IPv4.....................10
3.3.3.1.1.1 Base Header Fields.......................10
3.3.3.1.1.2 Options..................................11
3.3.3.1.2 ICV Computation for IPv6.....................11
3.3.3.1.2.1 Base Header Fields.......................11
3.3.3.1.2.2 Extension Headers Containing Options.....11
3.3.3.1.2.3 Extension Headers Not Containing Options.11
3.3.3.2 Padding...........................................12
3.3.3.2.1 Authentication Data Padding..................12

3.3.3.2.2 Implicit Packet Padding......................12
3.3.4 Fragmentation..........................................12
3.4 Inbound Packet Processing...................................13
3.4.1 Reassembly.............................................13
3.4.2 Security Association Lookup............................13
3.4.3 Sequence Number Verification...........................13
3.4.4 Integrity Check Value Verification.....................15
4. Auditing.........................................................15
5. Conformance Requirements.........................................16
6. Security Considerations..........................................16
7. Differences from RFC 1826........................................16
Acknowledgements....................................................17
Appendix A -- Mutability of IP Options/Extension Headers............18
A1. IPv4 Options.................................................18
A2. IPv6 Extension Headers.......................................19
References..........................................................20
Disclaimer..........................................................21
Author Information..................................................22
Full Copyright Statement............................................22

1. Introduction

The IP Authentication Header (AH) is used to provide connectionless
integrity and data origin authentication for IP datagrams (hereafter
referred to as just "authentication"), and to provide protection
against replays. This latter, optional service may be selected, by
the receiver, when a Security Association is established. (Although
the default calls for the sender to increment the Sequence Number
used for anti-replay, the service is effective only if the receiver
checks the Sequence Number.) AH provides authentication for as much
of the IP header as possible, as well as for upper level protocol
data. However, some IP header fields may change in transit and the
value of these fields, when the packet arrives at the receiver, may
not be predictable by the sender. The values of such fields cannot
be protected by AH. Thus the protection provided to the IP header by
AH is somewhat piecemeal.

AH may be applied alone, in combination with the IP Encapsulating
Security Payload (ESP) [KA97b], or in a nested fashion through the
use of tunnel mode (see "Security Architecture for the Internet
Protocol" [KA97a], hereafter referred to as the Security Architecture
document). Security services can be provided between a pair of
communicating hosts, between a pair of communicating security
gateways, or between a security gateway and a host. ESP may be used
to provide the same security services, and it also provides a
confidentiality (encryption) service. The primary difference between
the authentication provided by ESP and AH is the extent of the
coverage. Specifically, ESP does not protect any IP header fields

unless those fields are encapsulated by ESP (tunnel mode). For more
details on how to use AH and ESP in various network environments, see
the Security Architecture document [KA97a].

It is assumed that the reader is familiar with the terms and concepts
described in the Security Architecture document. In particular, the
reader should be familiar with the definitions of security services
offered by AH and ESP, the concept of Security Associations, the ways
in which AH can be used in conjunction with ESP, and the different
key management options available for AH and ESP. (With regard to the
last topic, the current key management options required for both AH
and ESP are manual keying and automated keying via IKE [HC98].)

The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in RFC 2119 [Bra97].

2. Authentication Header Format

The protocol header (IPv4, IPv6, or Extension) immediately preceding
the AH header will contain the value 51 in its Protocol (IPv4) or
Next Header (IPv6, Extension) field [STD-2].

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next Header | Payload Len | RESERVED |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Security Parameters Index (SPI) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number Field |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ Authentication Data (variable) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The following subsections define the fields that comprise the AH
format. All the fields described here are mandatory, i.e., they are
always present in the AH format and are included in the Integrity
Check Value (ICV) computation (see Sections 2.6 and 3.3.3).

2.1 Next Header

The Next Header is an 8-bit field that identifies the type of the
next payload after the Authentication Header. The value of this
field is chosen from the set of IP Protocol Numbers defined in the
most recent "Assigned Numbers" [STD-2] RFC from the Internet Assigned
Numbers Authority (IANA).

2.2 Payload Length

This 8-bit field specifies the length of AH in 32-bit words (4-byte
units), minus "2". (All IPv6 extension headers, as per RFC 1883,
encode the "Hdr Ext Len" field by first subtracting 1 (64-bit word)
from the header length (measured in 64-bit words). AH is an IPv6
extension header. However, since its length is measured in 32-bit
words, the "Payload Length" is calculated by subtracting 2 (32 bit
words).) In the "standard" case of a 96-bit authentication value
plus the 3 32-bit word fixed portion, this length field will be "4".
A "null" authentication algorithm may be used only for debugging
purposes. Its use would result in a "1" value for this field for
IPv4 or a "2" for IPv6, as there would be no corresponding
Authentication Data field (see Section 3.3.3.2.1 on "Authentication
Data Padding").

2.3 Reserved

This 16-bit field is reserved for future use. It MUST be set to
"zero." (Note that the value is included in the Authentication Data
calculation, but is otherwise ignored by the recipient.)

2.4 Security Parameters Index (SPI)

The SPI is an arbitrary 32-bit value that, in combination with the
destination IP address and security protocol (AH), uniquely
identifies the Security Association for this datagram. The set of
SPI values in the range 1 through 255 are reserved by the Internet
Assigned Numbers Authority (IANA) for future use; a reserved SPI
value will not normally be assigned by IANA unless the use of the
assigned SPI value is specified in an RFC. It is ordinarily selected
by the destination system upon establishment of an SA (see the
Security Architecture document for more details).

The SPI value of zero (0) is reserved for local, implementation-
specific use and MUST NOT be sent on the wire. For example, a key
management implementation MAY use the zero SPI value to mean "No
Security Association Exists" during the period when the IPsec
implementation has requested that its key management entity establish
a new SA, but the SA has not yet been established.

2.5 Sequence Number

This unsigned 32-bit field contains a monotonically increasing
counter value (sequence number). It is mandatory and is always
present even if the receiver does not elect to enable the anti-replay
service for a specific SA. Processing of the Sequence Number field
is at the discretion of the receiver, i.e., the sender MUST always
transmit this field, but the receiver need not act upon it (see the
discussion of Sequence Number Verification in the "Inbound Packet
Processing" section below).

The sender's counter and the receiver's counter are initialized to 0
when an SA is established. (The first packet sent using a given SA
will have a Sequence Number of 1; see Section 3.3.2 for more details
on how the Sequence Number is generated.) If anti-replay is enabled
(the default), the transmitted Sequence Number must never be allowed
to cycle. Thus, the sender's counter and the receiver's counter MUST
be reset (by establishing a new SA and thus a new key) prior to the
transmission of the 2^32nd packet on an SA.

2.6 Authentication Data

This is a variable-length field that contains the Integrity Check
Value (ICV) for this packet. The field must be an integral multiple
of 32 bits in length. The details of the ICV computation are
described in Section 3.3.2 below. This field may include explicit
padding. This padding is included to ensure that the length of the
AH header is an integral multiple of 32 bits (IPv4) or 64 bits
(IPv6). All implementations MUST support such padding. Details of
how to compute the required padding length are provided below. The
authentication algorithm specification MUST specify the length of the
ICV and the comparison rules and processing steps for validation.

3. Authentication Header Processing

3.1 Authentication Header Location

Like ESP, AH may be employed in two ways: transport mode or tunnel
mode. The former mode is applicable only to host implementations and
provides protection for upper layer protocols, in addition to
selected IP header fields. (In this mode, note that for "bump-in-
the-stack" or "bump-in-the-wire" implementations, as defined in the
Security Architecture document, inbound and outbound IP fragments may
require an IPsec implementation to perform extra IP
reassembly/fragmentation in order to both conform to this
specification and provide transparent IPsec support. Special care is
required to perform such operations within these implementations when
multiple interfaces are in use.)

In transport mode, AH is inserted after the IP header and before an
upper layer protocol, e.g., TCP, UDP, ICMP, etc. or before any other
IPsec headers that have already been inserted. In the context of
IPv4, this calls for placing AH after the IP header (and any options
that it contains), but before the upper layer protocol. (Note that
the term "transport" mode should not be misconstrued as restricting
its use to TCP and UDP. For example, an ICMP message MAY be sent
using either "transport" mode or "tunnel" mode.) The following
diagram illustrates AH transport mode positioning for a typical IPv4
packet, on a "before and after" basis.

BEFORE APPLYING AH
----------------------------
IPv4 |orig IP hdr | | |
|(any options)| TCP | Data |
----------------------------

AFTER APPLYING AH
---------------------------------
IPv4 |orig IP hdr | | | |
|(any options)| AH | TCP | Data |
---------------------------------
|<------- authenticated ------->|
except for mutable fields

In the IPv6 context, AH is viewed as an end-to-end payload, and thus
should appear after hop-by-hop, routing, and fragmentation extension
headers. The destination options extension header(s) could appear
either before or after the AH header depending on the semantics
desired. The following diagram illustrates AH transport mode
positioning for a typical IPv6 packet.

BEFORE APPLYING AH
---------------------------------------
IPv6 | | ext hdrs | | |
| orig IP hdr |if present| TCP | Data |
---------------------------------------

AFTER APPLYING AH
------------------------------------------------------------
IPv6 | |hop-by-hop, dest*, | | dest | | |
|orig IP hdr |routing, fragment. | AH | opt* | TCP | Data |
------------------------------------------------------------
|<---- authenticated except for mutable fields ----------->|

* = if present, could be before AH, after AH, or both

ESP and AH headers can be combined in a variety of modes. The IPsec
Architecture document describes the combinations of security
associations that must be supported.

Tunnel mode AH may be employed in either hosts or security gateways
(or in so-called "bump-in-the-stack" or "bump-in-the-wire"
implementations, as defined in the Security Architecture document).
When AH is implemented in a security gateway (to protect transit
traffic), tunnel mode must be used. In tunnel mode, the "inner" IP
header carries the ultimate source and destination addresses, while
an "outer" IP header may contain distinct IP addresses, e.g.,
addresses of security gateways. In tunnel mode, AH protects the
entire inner IP packet, including the entire inner IP header. The
position of AH in tunnel mode, relative to the outer IP header, is
the same as for AH in transport mode. The following diagram
illustrates AH tunnel mode positioning for typical IPv4 and IPv6
packets.

------------------------------------------------
IPv4 | new IP hdr* | | orig IP hdr* | | |
|(any options)| AH | (any options) |TCP | Data |
------------------------------------------------
|<- authenticated except for mutable fields -->|
| in the new IP hdr |

--------------------------------------------------------------
IPv6 | | ext hdrs*| | | ext hdrs*| | |
|new IP hdr*|if present| AH |orig IP hdr*|if present|TCP|Data|
--------------------------------------------------------------
|<-- authenticated except for mutable fields in new IP hdr ->|

* = construction of outer IP hdr/extensions and modification
of inner IP hdr/extensions is discussed below.

3.2 Authentication Algorithms

The authentication algorithm employed for the ICV computation is
specified by the SA. For point-to-point communication, suitable
authentication algorithms include keyed Message Authentication Codes
(MACs) based on symmetric encryption algorithms (e.g., DES) or on
one-way hash functions (e.g., MD5 or SHA-1). For multicast
communication, one-way hash algorithms combined with asymmetric
signature algorithms are appropriate, though performance and space
considerations currently preclude use of such algorithms. The
mandatory-to-implement authentication algorithms are described in
Section 5 "Conformance Requirements". Other algorithms MAY be
supported.

3.3 Outbound Packet Processing

In transport mode, the sender inserts the AH header after the IP
header and before an upper layer protocol header, as described above.
In tunnel mode, the outer and inner IP header/extensions can be
inter-related in a variety of ways. The construction of the outer IP
header/extensions during the encapsulation process is described in
the Security Architecture document.

If there is more than one IPsec header/extension required, the order
of the application of the security headers MUST be defined by
security policy. For simplicity of processing, each IPsec header
SHOULD ignore the existence (i.e., not zero the contents or try to
predict the contents) of IPsec headers to be applied later. (While a
native IP or bump-in-the-stack implementation could predict the
contents of later IPsec headers that it applies itself, it won't be
possible for it to predict any IPsec headers added by a bump-in-the-
wire implementation between the host and the network.)

3.3.1 Security Association Lookup

AH is applied to an outbound packet only after an IPsec
implementation determines that the packet is associated with an SA
that calls for AH processing. The process of determining what, if
any, IPsec processing is applied to outbound traffic is described in
the Security Architecture document.

3.3.2 Sequence Number Generation

The sender's counter is initialized to 0 when an SA is established.
The sender increments the Sequence Number for this SA and inserts the
new value into the Sequence Number Field. Thus the first packet sent
using a given SA will have a Sequence Number of 1.

If anti-replay is enabled (the default), the sender checks to ensure
that the counter has not cycled before inserting the new value in the
Sequence Number field. In other words, the sender MUST NOT send a
packet on an SA if doing so would cause the Sequence Number to cycle.
An attempt to transmit a packet that would result in Sequence Number
overflow is an auditable event. (Note that this approach to Sequence
Number management does not require use of modular arithmetic.)

The sender assumes anti-replay is enabled as a default, unless
otherwise notified by the receiver (see 3.4.3). Thus, if the counter
has cycled, the sender will set up a new SA and key (unless the SA
was configured with manual key management).

If anti-replay is disabled, the sender does not need to monitor or
reset the counter, e.g., in the case of manual key management (see
Section 5.) However, the sender still increments the counter and when
it reaches the maximum value, the counter rolls over back to zero.

3.3.3 Integrity Check Value Calculation

The AH ICV is computed over:
o IP header fields that are either immutable in transit or
that are predictable in value upon arrival at the endpoint
for the AH SA
o the AH header (Next Header, Payload Len, Reserved, SPI,
Sequence Number, and the Authentication Data (which is set
to zero for this computation), and explicit padding bytes
(if any))
o the upper level protocol data, which is assumed to be
immutable in transit

3.3.3.1 Handling Mutable Fields

If a field may be modified during transit, the value of the field is
set to zero for purposes of the ICV computation. If a field is
mutable, but its value at the (IPsec) receiver is predictable, then
that value is inserted into the field for purposes of the ICV
calculation. The Authentication Data field is also set to zero in
preparation for this computation. Note that by replacing each
field's value with zero, rather than omitting the field, alignment is
preserved for the ICV calculation. Also, the zero-fill approach
ensures that the length of the fields that are so handled cannot be
changed during transit, even though their contents are not explicitly
covered by the ICV.

As a new extension header or IPv4 option is created, it will be
defined in its own RFC and SHOULD include (in the Security
Considerations section) directions for how it should be handled when
calculating the AH ICV. If the IP (v4 or v6) implementation
encounters an extension header that it does not recognize, it will
discard the packet and send an ICMP message. IPsec will never see
the packet. If the IPsec implementation encounters an IPv4 option
that it does not recognize, it should zero the whole option, using
the second byte of the option as the length. IPv6 options (in
Destination extension headers or Hop by Hop extension header) contain
a flag indicating mutability, which determines appropriate processing
for such options.

3.3.3.1.1 ICV Computation for IPv4

3.3.3.1.1.1 Base Header Fields

The IPv4 base header fields are classified as follows:

Immutable
Version
Internet Header Length
Total Length
Identification
Protocol (This should be the value for AH.)
Source Address
Destination Address (without loose or strict source routing)

Mutable but predictable
Destination Address (with loose or strict source routing)

Mutable (zeroed prior to ICV calculation)
Type of Service (TOS)
Flags
Fragment Offset
Time to Live (TTL)
Header Checksum

TOS -- This field is excluded because some routers are known to
change the value of this field, even though the IP
specification does not consider TOS to be a mutable header
field.

Flags -- This field is excluded since an intermediate router might
set the DF bit, even if the source did not select it.

Fragment Offset -- Since AH is applied only to non-fragmented IP
packets, the Offset Field must always be zero, and thus it
is excluded (even though it is predictable).

TTL -- This is changed en-route as a normal course of processing
by routers, and thus its value at the receiver is not
predictable by the sender.

Header Checksum -- This will change if any of these other fields
changes, and thus its value upon reception cannot be
predicted by the sender.

3.3.3.1.1.2 Options

For IPv4 (unlike IPv6), there is no mechanism for tagging options as
mutable in transit. Hence the IPv4 options are explicitly listed in
Appendix A and classified as immutable, mutable but predictable, or
mutable. For IPv4, the entire option is viewed as a unit; so even
though the type and length fields within most options are immutable
in transit, if an option is classified as mutable, the entire option
is zeroed for ICV computation purposes.

3.3.3.1.2 ICV Computation for IPv6

3.3.3.1.2.1 Base Header Fields

The IPv6 base header fields are classified as follows:

Immutable
Version
Payload Length
Next Header (This should be the value for AH.)
Source Address
Destination Address (without Routing Extension Header)

Mutable but predictable
Destination Address (with Routing Extension Header)

Mutable (zeroed prior to ICV calculation)
Class
Flow Label
Hop Limit

3.3.3.1.2.2 Extension Headers Containing Options

IPv6 options in the Hop-by-Hop and Destination Extension Headers
contain a bit that indicates whether the option might change
(unpredictably) during transit. For any option for which contents
may change en-route, the entire "Option Data" field must be treated
as zero-valued octets when computing or verifying the ICV. The
Option Type and Opt Data Len are included in the ICV calculation.
All options for which the bit indicates immutability are included in
the ICV calculation. See the IPv6 specification [DH95] for more
information.

3.3.3.1.2.3 Extension Headers Not Containing Options

The IPv6 extension headers that do not contain options are explicitly
listed in Appendix A and classified as immutable, mutable but
predictable, or mutable.

3.3.3.2 Padding

3.3.3.2.1 Authentication Data Padding

As mentioned in section 2.6, the Authentication Data field explicitly
includes padding to ensure that the AH header is a multiple of 32
bits (IPv4) or 64 bits (IPv6). If padding is required, its length is
determined by two factors:

- the length of the ICV
- the IP protocol version (v4 or v6)

For example, if the output of the selected algorithm is 96-bits, no
padding is required for either IPv4 or for IPv6. However, if a
different length ICV is generated, due to use of a different
algorithm, then padding may be required depending on the length and
IP protocol version. The content of the padding field is arbitrarily
selected by the sender. (The padding is arbitrary, but need not be
random to achieve security.) These padding bytes are included in the
Authentication Data calculation, counted as part of the Payload
Length, and transmitted at the end of the Authentication Data field
to enable the receiver to perform the ICV calculation.

3.3.3.2.2 Implicit Packet Padding

For some authentication algorithms, the byte string over which the
ICV computation is performed must be a multiple of a blocksize
specified by the algorithm. If the IP packet length (including AH)
does not match the blocksize requirements for the algorithm, implicit
padding MUST be appended to the end of the packet, prior to ICV
computation. The padding octets MUST have a value of zero. The
blocksize (and hence the length of the padding) is specified by the
algorithm specification. This padding is not transmitted with the
packet. Note that MD5 and SHA-1 are viewed as having a 1-byte
blocksize because of their internal padding conventions.

3.3.4 Fragmentation

If required, IP fragmentation occurs after AH processing within an
IPsec implementation. Thus, transport mode AH is applied only to
whole IP datagrams (not to IP fragments). An IP packet to which AH
has been applied may itself be fragmented by routers en route, and
such fragments must be reassembled prior to AH processing at a
receiver. In tunnel mode, AH is applied to an IP packet, the payload
of which may be a fragmented IP packet. For example, a security
gateway or a "bump-in-the-stack" or "bump-in-the-wire" IPsec
implementation (see the Security Architecture document for details)
may apply tunnel mode AH to such fragments.

3.4 Inbound Packet Processing

If there is more than one IPsec header/extension present, the
processing for each one ignores (does not zero, does not use) any
IPsec headers applied subsequent to the header being processed.

3.4.1 Reassembly

If required, reassembly is performed prior to AH processing. If a
packet offered to AH for processing appears to be an IP fragment,
i.e., the OFFSET field is non-zero or the MORE FRAGMENTS flag is set,
the receiver MUST discard the packet; this is an auditable event. The
audit log entry for this event SHOULD include the SPI value,
date/time, Source Address, Destination Address, and (in IPv6) the
Flow ID.

NOTE: For packet reassembly, the current IPv4 spec does NOT require
either the zero'ing of the OFFSET field or the clearing of the MORE
FRAGMENTS flag. In order for a reassembled packet to be processed by
IPsec (as opposed to discarded as an apparent fragment), the IP code
must do these two things after it reassembles a packet.

3.4.2 Security Association Lookup

Upon receipt of a packet containing an IP Authentication Header, the
receiver determines the appropriate (unidirectional) SA, based on the
destination IP address, security protocol (AH), and the SPI. (This
process is described in more detail in the Security Architecture
document.) The SA indicates whether the Sequence Number field will
be checked, specifies the algorithm(s) employed for ICV computation,
and indicates the key(s) required to validate the ICV.

If no valid Security Association exists for this session (e.g., the
receiver has no key), the receiver MUST discard the packet; this is
an auditable event. The audit log entry for this event SHOULD
include the SPI value, date/time, Source Address, Destination
Address, and (in IPv6) the Flow ID.

3.4.3 Sequence Number Verification

All AH implementations MUST support the anti-replay service, though
its use may be enabled or disabled by the receiver on a per-SA basis.
(Note that there are no provisions for managing transmitted Sequence
Number values among multiple senders directing traffic to a single SA
(irrespective of whether the destination address is unicast,
broadcast, or multicast). Thus the anti-replay service SHOULD NOT be
used in a multi-sender environment that employs a single SA.)

If the receiver does not enable anti-replay for an SA, no inbound
checks are performed on the Sequence Number. However, from the
perspective of the sender, the default is to assume that anti-replay
is enabled at the receiver. To avoid having the sender do
unnecessary sequence number monitoring and SA setup (see section
3.3.2), if an SA establishment protocol such as IKE is employed, the
receiver SHOULD notify the sender, during SA establishment, if the
receiver will not provide anti-replay protection.

If the receiver has enabled the anti-replay service for this SA, the
receiver packet counter for the SA MUST be initialized to zero when
the SA is established. For each received packet, the receiver MUST
verify that the packet contains a Sequence Number that does not
duplicate the Sequence Number of any other packets received during
the life of this SA. This SHOULD be the first AH check applied to a
packet after it has been matched to an SA, to speed rejection of
duplicate packets.

Duplicates are rejected through the use of a sliding receive window.
(How the window is implemented is a local matter, but the following
text describes the functionality that the implementation must
exhibit.) A MINIMUM window size of 32 MUST be supported; but a
window size of 64 is preferred and SHOULD be employed as the default.
Another window size (larger than the MINIMUM) MAY be chosen by the
receiver. (The receiver does NOT notify the sender of the window
size.)

The "right" edge of the window represents the highest, validated
Sequence Number value received on this SA. Packets that contain
Sequence Numbers lower than the "left" edge of the window are
rejected. Packets falling within the window are checked against a
list of received packets within the window. An efficient means for
performing this check, based on the use of a bit mask, is described
in the Security Architecture document.

If the received packet falls within the window and is new, or if the
packet is to the right of the window, then the receiver proceeds to
ICV verification. If the ICV validation fails, the receiver MUST
discard the received IP datagram as invalid; this is an auditable
event. The audit log entry for this event SHOULD include the SPI
value, date/time, Source Address, Destination Address, the Sequence
Number, and (in IPv6) the Flow ID. The receive window is updated
only if the ICV verification succeeds.

DISCUSSION:

Note that if the packet is either inside the window and new, or is
outside the window on the "right" side, the receiver MUST
authenticate the packet before updating the Sequence Number window
data.

3.4.4 Integrity Check Value Verification

The receiver computes the ICV over the appropriate fields of the
packet, using the specified authentication algorithm, and verifies
that it is the same as the ICV included in the Authentication Data
field of the packet. Details of the computation are provided below.

If the computed and received ICV's match, then the datagram is valid,
and it is accepted. If the test fails, then the receiver MUST
discard the received IP datagram as invalid; this is an auditable
event. The audit log entry SHOULD include the SPI value, date/time
received, Source Address, Destination Address, and (in IPv6) the Flow
ID.

DISCUSSION:

Begin by saving the ICV value and replacing it (but not any
Authentication Data padding) with zero. Zero all other fields
that may have been modified during transit. (See section 3.3.3.1
for a discussion of which fields are zeroed before performing the
ICV calculation.) Check the overall length of the packet, and if
it requires implicit padding based on the requirements of the
authentication algorithm, append zero-filled bytes to the end of
the packet as required. Perform the ICV computation and compare
the result with the saved value, using the comparison rules
defined by the algorithm specification. (For example, if a
digital signature and one-way hash are used for the ICV
computation, the matching process is more complex.)

4. Auditing

Not all systems that implement AH will implement auditing. However,
if AH is incorporated into a system that supports auditing, then the
AH implementation MUST also support auditing and MUST allow a system
administrator to enable or disable auditing for AH. For the most
part, the granularity of auditing is a local matter. However,
several auditable events are identified in this specification and for
each of these events a minimum set of information that SHOULD be
included in an audit log is defined. Additional information also MAY
be included in the audit log for each of these events, and additional
events, not explicitly called out in this specification, also MAY

result in audit log entries. There is no requirement for the
receiver to transmit any message to the purported sender in response
to the detection of an auditable event, because of the potential to
induce denial of service via such action.

5. Conformance Requirements

Implementations that claim conformance or compliance with this
specification MUST fully implement the AH syntax and processing
described here and MUST comply with all requirements of the Security
Architecture document. If the key used to compute an ICV is manually
distributed, correct provision of the anti-replay service would
require correct maintenance of the counter state at the sender, until
the key is replaced, and there likely would be no automated recovery
provision if counter overflow were imminent. Thus a compliant
implementation SHOULD NOT provide this service in conjunction with
SAs that are manually keyed. A compliant AH implementation MUST
support the following mandatory-to-implement algorithms:

- HMAC with MD5 [MG97a]
- HMAC with SHA-1 [MG97b]

6. Security Considerations

Security is central to the design of this protocol, and these
security considerations permeate the specification. Additional
security-relevant aspects of using the IPsec protocol are discussed
in the Security Architecture document.

7. Differences from RFC 1826

This specification of AH differs from RFC 1826 [ATK95] in several
important respects, but the fundamental features of AH remain intact.
One goal of the revision of RFC 1826 was to provide a complete
framework for AH, with ancillary RFCs required only for algorithm
specification. For example, the anti-replay service is now an
integral, mandatory part of AH, not a feature of a transform defined
in another RFC. Carriage of a sequence number to support this
service is now required at all times. The default algorithms
required for interoperability have been changed to HMAC with MD5 or
SHA-1 (vs. keyed MD5), for security reasons. The list of IPv4 header
fields excluded from the ICV computation has been expanded to include
the OFFSET and FLAGS fields.

Another motivation for revision was to provide additional detail and
clarification of subtle points. This specification provides
rationale for exclusion of selected IPv4 header fields from AH
coverage and provides examples on positioning of AH in both the IPv4

and v6 contexts. Auditing requirements have been clarified in this
version of the specification. Tunnel mode AH was mentioned only in
passing in RFC 1826, but now is a mandatory feature of AH.
Discussion of interactions with key management and with security
labels have been moved to the Security Architecture document.

Acknowledgements

For over 3 years, this document has evolved through multiple versions
and iterations. During this time, many people have contributed
significant ideas and energy to the process and the documents
themselves. The authors would like to thank Karen Seo for providing
extensive help in the review, editing, background research, and
coordination for this version of the specification. The authors
would also like to thank the members of the IPsec and IPng working
groups, with special mention of the efforts of (in alphabetic order):
Steve Bellovin, Steve Deering, Francis Dupont, Phil Karn, Frank
Kastenholz, Perry Metzger, David Mihelcic, Hilarie Orman, Norman
Shulman, William Simpson, and Nina Yuan.

Appendix A -- Mutability of IP Options/Extension Headers

A1. IPv4 Options

This table shows how the IPv4 options are classified with regard to
"mutability". Where two references are provided, the second one
supercedes the first. This table is based in part on information
provided in RFC1700, "ASSIGNED NUMBERS", (October 1994).

Opt.
Copy Class # Name Reference
---- ----- --- ------------------------ ---------
IMMUTABLE -- included in ICV calculation
0 0 0 End of Options List [RFC791]
0 0 1 No Operation [RFC791]
1 0 2 Security [RFC1108(historic but in use)]
1 0 5 Extended Security [RFC1108(historic but in use)]
1 0 6 Commercial Security [expired I-D, now US MIL STD]
1 0 20 Router Alert [RFC2113]
1 0 21 Sender Directed Multi- [RFC1770]
Destination Delivery
MUTABLE -- zeroed
1 0 3 Loose Source Route [RFC791]
0 2 4 Time Stamp [RFC791]
0 0 7 Record Route [RFC791]
1 0 9 Strict Source Route [RFC791]
0 2 18 Traceroute [RFC1393]

EXPERIMENTAL, SUPERCEDED -- zeroed
1 0 8 Stream ID [RFC791, RFC1122 (Host Req)]
0 0 11 MTU Probe [RFC1063, RFC1191 (PMTU)]
0 0 12 MTU Reply [RFC1063, RFC1191 (PMTU)]
1 0 17 Extended Internet Proto [RFC1385, RFC1883 (IPv6)]
0 0 10 Experimental Measurement [ZSu]
1 2 13 Experimental Flow Control [Finn]
1 0 14 Experimental Access Ctl [Estrin]
0 0 15 ??? [VerSteeg]
1 0 16 IMI Traffic Descriptor [Lee]
1 0 19 Address Extension [Ullmann IPv7]

NOTE: Use of the Router Alert option is potentially incompatible with
use of IPsec. Although the option is immutable, its use implies that
each router along a packet's path will "process" the packet and
consequently might change the packet. This would happen on a hop by
hop basis as the packet goes from router to router. Prior to being
processed by the application to which the option contents are
directed, e.g., RSVP/IGMP, the packet should encounter AH processing.

However, AH processing would require that each router along the path
is a member of a multicast-SA defined by the SPI. This might pose
problems for packets that are not strictly source routed, and it
requires multicast support techniques not currently available.

NOTE: Addition or removal of any security labels (BSO, ESO, CIPSO) by
systems along a packet's path conflicts with the classification of
these IP Options as immutable and is incompatible with the use of
IPsec.

NOTE: End of Options List options SHOULD be repeated as necessary to
ensure that the IP header ends on a 4 byte boundary in order to
ensure that there are no unspecified bytes which could be used for a
covert channel.

A2. IPv6 Extension Headers

This table shows how the IPv6 Extension Headers are classified with
regard to "mutability".

Option/Extension Name Reference
----------------------------------- ---------
MUTABLE BUT PREDICTABLE -- included in ICV calculation
Routing (Type 0) [RFC1883]

BIT INDICATES IF OPTION IS MUTABLE (CHANGES UNPREDICTABLY DURING TRANSIT)
Hop by Hop options [RFC1883]
Destination options [RFC1883]

NOT APPLICABLE
Fragmentation [RFC1883]

Options -- IPv6 options in the Hop-by-Hop and Destination
Extension Headers contain a bit that indicates whether the
option might change (unpredictably) during transit. For
any option for which contents may change en-route, the
entire "Option Data" field must be treated as zero-valued
octets when computing or verifying the ICV. The Option
Type and Opt Data Len are included in the ICV calculation.
All options for which the bit indicates immutability are
included in the ICV calculation. See the IPv6
specification [DH95] for more information.

Routing (Type 0) -- The IPv6 Routing Header "Type 0" will
rearrange the address fields within the packet during
transit from source to destination. However, the contents
of the packet as it will appear at the receiver are known
to the sender and to all intermediate hops. Hence, the

IPv6 Routing Header "Type 0" is included in the
Authentication Data calculation as mutable but predictable.
The sender must order the field so that it appears as it
will at the receiver, prior to performing the ICV
computation.

Fragmentation -- Fragmentation occurs after outbound IPsec
processing (section 3.3) and reassembly occurs before
inbound IPsec processing (section 3.4). So the
Fragmentation Extension Header, if it exists, is not seen
by IPsec.

Note that on the receive side, the IP implementation could
leave a Fragmentation Extension Header in place when it
does re-assembly. If this happens, then when AH receives
the packet, before doing ICV processing, AH MUST "remove"
(or skip over) this header and change the previous header's
"Next Header" field to be the "Next Header" field in the
Fragmentation Extension Header.

Note that on the send side, the IP implementation could
give the IPsec code a packet with a Fragmentation Extension
Header with Offset of 0 (first fragment) and a More
Fragments Flag of 0 (last fragment). If this happens, then
before doing ICV processing, AH MUST first "remove" (or
skip over) this header and change the previous header's
"Next Header" field to be the "Next Header" field in the
Fragmentation Extension Header.

References

[ATK95] Atkinson, R., "The IP Authentication Header", RFC 1826,
August 1995.

[Bra97] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Level", BCP 14, RFC 2119, March 1997.

[DH95] Deering, S., and B. Hinden, "Internet Protocol version 6
(IPv6) Specification", RFC 1883, December 1995.

[HC98] Harkins, D., and D. Carrel, "The Internet Key Exchange
(IKE)", RFC 2409, November 1998.

[KA97a] Kent, S., and R. Atkinson, "Security Architecture for the
Internet Protocol", RFC 2401, November 1998.

[KA97b] Kent, S., and R. Atkinson, "IP Encapsulating Security
Payload (ESP)", RFC 2406, November 1998.

[MG97a] Madson, C., and R. Glenn, "The Use of HMAC-MD5-96 within
ESP and AH", RFC 2403, November 1998.

[MG97b] Madson, C., and R. Glenn, "The Use of HMAC-SHA-1-96 within
ESP and AH", RFC 2404, November 1998.

[STD-2] Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC
1700, October 1994. See also:
http://www.iana.org/numbers.html

Disclaimer

The views and specification here are those of the authors and are not
necessarily those of their employers. The authors and their
employers specifically disclaim responsibility for any problems
arising from correct or incorrect implementation or use of this
specification.

Author Information

Stephen Kent
BBN Corporation
70 Fawcett Street
Cambridge, MA 02140
USA

Phone: +1 (617) 873-3988
EMail: kent@bbn.com

Randall Atkinson
@Home Network
425 Broadway,
Redwood City, CA 94063
USA

Phone: +1 (415) 569-5000
EMail: rja@corp.home.net

Copyright (C) The Internet Society (1998). All Rights Reserved.

This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.

The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.

This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.



RFC 791 – Internet Protocol Version 4 Specification


INTERNET PROTOCOL

DARPA INTERNET PROGRAM

PROTOCOL SPECIFICATION

September 1981

prepared for

Defense Advanced Research Projects Agency
Information Processing Techniques Office
1400 Wilson Boulevard
Arlington, Virginia 22209

by

Information Sciences Institute
University of Southern California
4676 Admiralty Way
Marina del Rey, California 90291

September 1981
Internet Protocol

TABLE OF CONTENTS

PREFACE ........................................................ iii

1. INTRODUCTION ..................................................... 1

1.1 Motivation .................................................... 1
1.2 Scope ......................................................... 1
1.3 Interfaces .................................................... 1
1.4 Operation ..................................................... 2

2. OVERVIEW ......................................................... 5

2.1 Relation to Other Protocols ................................... 9
2.2 Model of Operation ............................................ 5
2.3 Function Description .......................................... 7
2.4 Gateways ...................................................... 9

3. SPECIFICATION ................................................... 11

3.1 Internet Header Format ....................................... 11
3.2 Discussion ................................................... 23
3.3 Interfaces ................................................... 31

APPENDIX A: Examples & Scenarios ................................... 34
APPENDIX B: Data Transmission Order ................................ 39

GLOSSARY ............................................................ 41

REFERENCES .......................................................... 45

[Page i]

September 1981
Internet Protocol

[Page ii]

September 1981
Internet Protocol

PREFACE

This document specifies the DoD Standard Internet Protocol. This
document is based on six earlier editions of the ARPA Internet Protocol
Specification, and the present text draws heavily from them. There have
been many contributors to this work both in terms of concepts and in
terms of text. This edition revises aspects of addressing, error
handling, option codes, and the security, precedence, compartments, and
handling restriction features of the internet protocol.

Jon Postel

Editor

September 1981

RFC: 791
Replaces: RFC 760
IENs 128, 123, 111,
80, 54, 44, 41, 28, 26

INTERNET PROTOCOL

DARPA INTERNET PROGRAM
PROTOCOL SPECIFICATION

1. INTRODUCTION

1.1. Motivation

The Internet Protocol is designed for use in interconnected systems of
packet-switched computer communication networks. Such a system has
been called a "catenet" [1]. The internet protocol provides for
transmitting blocks of data called datagrams from sources to
destinations, where sources and destinations are hosts identified by
fixed length addresses. The internet protocol also provides for
fragmentation and reassembly of long datagrams, if necessary, for
transmission through "small packet" networks.

1.2. Scope

The internet protocol is specifically limited in scope to provide the
functions necessary to deliver a package of bits (an internet
datagram) from a source to a destination over an interconnected system
of networks. There are no mechanisms to augment end-to-end data
reliability, flow control, sequencing, or other services commonly
found in host-to-host protocols. The internet protocol can capitalize
on the services of its supporting networks to provide various types
and qualities of service.

1.3. Interfaces

This protocol is called on by host-to-host protocols in an internet
environment. This protocol calls on local network protocols to carry
the internet datagram to the next gateway or destination host.

For example, a TCP module would call on the internet module to take a
TCP segment (including the TCP header and user data) as the data
portion of an internet datagram. The TCP module would provide the
addresses and other parameters in the internet header to the internet
module as arguments of the call. The internet module would then
create an internet datagram and call on the local network interface to
transmit the internet datagram.

In the ARPANET case, for example, the internet module would call on a

[Page 1]

September 1981
Internet Protocol
Introduction

local net module which would add the 1822 leader [2] to the internet
datagram creating an ARPANET message to transmit to the IMP. The
ARPANET address would be derived from the internet address by the
local network interface and would be the address of some host in the
ARPANET, that host might be a gateway to other networks.

1.4. Operation

The internet protocol implements two basic functions: addressing and
fragmentation.

The internet modules use the addresses carried in the internet header
to transmit internet datagrams toward their destinations. The
selection of a path for transmission is called routing.

The internet modules use fields in the internet header to fragment and
reassemble internet datagrams when necessary for transmission through
"small packet" networks.

The model of operation is that an internet module resides in each host
engaged in internet communication and in each gateway that
interconnects networks. These modules share common rules for
interpreting address fields and for fragmenting and assembling
internet datagrams. In addition, these modules (especially in
gateways) have procedures for making routing decisions and other
functions.

The internet protocol treats each internet datagram as an independent
entity unrelated to any other internet datagram. There are no
connections or logical circuits (virtual or otherwise).

The internet protocol uses four key mechanisms in providing its
service: Type of Service, Time to Live, Options, and Header Checksum.

The Type of Service is used to indicate the quality of the service
desired. The type of service is an abstract or generalized set of
parameters which characterize the service choices provided in the
networks that make up the internet. This type of service indication
is to be used by gateways to select the actual transmission parameters
for a particular network, the network to be used for the next hop, or
the next gateway when routing an internet datagram.

The Time to Live is an indication of an upper bound on the lifetime of
an internet datagram. It is set by the sender of the datagram and
reduced at the points along the route where it is processed. If the
time to live reaches zero before the internet datagram reaches its
destination, the internet datagram is destroyed. The time to live can
be thought of as a self destruct time limit.

[Page 2]

September 1981
Internet Protocol
Introduction

The Options provide for control functions needed or useful in some
situations but unnecessary for the most common communications. The
options include provisions for timestamps, security, and special
routing.

The Header Checksum provides a verification that the information used
in processing internet datagram has been transmitted correctly. The
data may contain errors. If the header checksum fails, the internet
datagram is discarded at once by the entity which detects the error.

The internet protocol does not provide a reliable communication
facility. There are no acknowledgments either end-to-end or
hop-by-hop. There is no error control for data, only a header
checksum. There are no retransmissions. There is no flow control.

Errors detected may be reported via the Internet Control Message
Protocol (ICMP) [3] which is implemented in the internet protocol
module.

[Page 3]

September 1981
Internet Protocol

[Page 4]

September 1981
Internet Protocol

2. OVERVIEW

2.1. Relation to Other Protocols

The following diagram illustrates the place of the internet protocol
in the protocol hierarchy:

+------+ +-----+ +-----+ +-----+
|Telnet| | FTP | | TFTP| ... | ... |
+------+ +-----+ +-----+ +-----+
| | | |
+-----+ +-----+ +-----+
| TCP | | UDP | ... | ... |
+-----+ +-----+ +-----+
| | |
+--------------------------+----+
| Internet Protocol & ICMP |
+--------------------------+----+
|
+---------------------------+
| Local Network Protocol |
+---------------------------+

Protocol Relationships

Figure 1.

Internet protocol interfaces on one side to the higher level
host-to-host protocols and on the other side to the local network
protocol. In this context a "local network" may be a small network in
a building or a large network such as the ARPANET.

2.2. Model of Operation

The model of operation for transmitting a datagram from one
application program to another is illustrated by the following
scenario:

We suppose that this transmission will involve one intermediate
gateway.

The sending application program prepares its data and calls on its
local internet module to send that data as a datagram and passes the
destination address and other parameters as arguments of the call.

The internet module prepares a datagram header and attaches the data
to it. The internet module determines a local network address for
this internet address, in this case it is the address of a gateway.

[Page 5]

September 1981
Internet Protocol
Overview

It sends this datagram and the local network address to the local
network interface.

The local network interface creates a local network header, and
attaches the datagram to it, then sends the result via the local
network.

The datagram arrives at a gateway host wrapped in the local network
header, the local network interface strips off this header, and
turns the datagram over to the internet module. The internet module
determines from the internet address that the datagram is to be
forwarded to another host in a second network. The internet module
determines a local net address for the destination host. It calls
on the local network interface for that network to send the
datagram.

This local network interface creates a local network header and
attaches the datagram sending the result to the destination host.

At this destination host the datagram is stripped of the local net
header by the local network interface and handed to the internet
module.

The internet module determines that the datagram is for an
application program in this host. It passes the data to the
application program in response to a system call, passing the source
address and other parameters as results of the call.

Application Application
Program Program
\ /
Internet Module Internet Module Internet Module
\ / \ /
LNI-1 LNI-1 LNI-2 LNI-2
\ / \ /
Local Network 1 Local Network 2

Transmission Path

Figure 2

[Page 6]

September 1981
Internet Protocol
Overview

2.3. Function Description

The function or purpose of Internet Protocol is to move datagrams
through an interconnected set of networks. This is done by passing
the datagrams from one internet module to another until the
destination is reached. The internet modules reside in hosts and
gateways in the internet system. The datagrams are routed from one
internet module to another through individual networks based on the
interpretation of an internet address. Thus, one important mechanism
of the internet protocol is the internet address.

In the routing of messages from one internet module to another,
datagrams may need to traverse a network whose maximum packet size is
smaller than the size of the datagram. To overcome this difficulty, a
fragmentation mechanism is provided in the internet protocol.

Addressing

A distinction is made between names, addresses, and routes [4]. A
name indicates what we seek. An address indicates where it is. A
route indicates how to get there. The internet protocol deals
primarily with addresses. It is the task of higher level (i.e.,
host-to-host or application) protocols to make the mapping from
names to addresses. The internet module maps internet addresses to
local net addresses. It is the task of lower level (i.e., local net
or gateways) procedures to make the mapping from local net addresses
to routes.

Addresses are fixed length of four octets (32 bits). An address
begins with a network number, followed by local address (called the
"rest" field). There are three formats or classes of internet
addresses: in class a, the high order bit is zero, the next 7 bits
are the network, and the last 24 bits are the local address; in
class b, the high order two bits are one-zero, the next 14 bits are
the network and the last 16 bits are the local address; in class c,
the high order three bits are one-one-zero, the next 21 bits are the
network and the last 8 bits are the local address.

Care must be taken in mapping internet addresses to local net
addresses; a single physical host must be able to act as if it were
several distinct hosts to the extent of using several distinct
internet addresses. Some hosts will also have several physical
interfaces (multi-homing).

That is, provision must be made for a host to have several physical
interfaces to the network with each having several logical internet
addresses.

[Page 7]

September 1981
Internet Protocol
Overview

Examples of address mappings may be found in "Address Mappings" [5].

Fragmentation

Fragmentation of an internet datagram is necessary when it
originates in a local net that allows a large packet size and must
traverse a local net that limits packets to a smaller size to reach
its destination.

An internet datagram can be marked "don't fragment." Any internet
datagram so marked is not to be internet fragmented under any
circumstances. If internet datagram marked don't fragment cannot be
delivered to its destination without fragmenting it, it is to be
discarded instead.

Fragmentation, transmission and reassembly across a local network
which is invisible to the internet protocol module is called
intranet fragmentation and may be used [6].

The internet fragmentation and reassembly procedure needs to be able
to break a datagram into an almost arbitrary number of pieces that
can be later reassembled. The receiver of the fragments uses the
identification field to ensure that fragments of different datagrams
are not mixed. The fragment offset field tells the receiver the
position of a fragment in the original datagram. The fragment
offset and length determine the portion of the original datagram
covered by this fragment. The more-fragments flag indicates (by
being reset) the last fragment. These fields provide sufficient
information to reassemble datagrams.

The identification field is used to distinguish the fragments of one
datagram from those of another. The originating protocol module of
an internet datagram sets the identification field to a value that
must be unique for that source-destination pair and protocol for the
time the datagram will be active in the internet system. The
originating protocol module of a complete datagram sets the
more-fragments flag to zero and the fragment offset to zero.

To fragment a long internet datagram, an internet protocol module
(for example, in a gateway), creates two new internet datagrams and
copies the contents of the internet header fields from the long
datagram into both new internet headers. The data of the long
datagram is divided into two portions on a 8 octet (64 bit) boundary
(the second portion might not be an integral multiple of 8 octets,
but the first must be). Call the number of 8 octet blocks in the
first portion NFB (for Number of Fragment Blocks). The first
portion of the data is placed in the first new internet datagram,
and the total length field is set to the length of the first

[Page 8]

September 1981
Internet Protocol
Overview

datagram. The more-fragments flag is set to one. The second
portion of the data is placed in the second new internet datagram,
and the total length field is set to the length of the second
datagram. The more-fragments flag carries the same value as the
long datagram. The fragment offset field of the second new internet
datagram is set to the value of that field in the long datagram plus
NFB.

This procedure can be generalized for an n-way split, rather than
the two-way split described.

To assemble the fragments of an internet datagram, an internet
protocol module (for example at a destination host) combines
internet datagrams that all have the same value for the four fields:
identification, source, destination, and protocol. The combination
is done by placing the data portion of each fragment in the relative
position indicated by the fragment offset in that fragment's
internet header. The first fragment will have the fragment offset
zero, and the last fragment will have the more-fragments flag reset
to zero.

2.4. Gateways

Gateways implement internet protocol to forward datagrams between
networks. Gateways also implement the Gateway to Gateway Protocol
(GGP) [7] to coordinate routing and other internet control
information.

In a gateway the higher level protocols need not be implemented and
the GGP functions are added to the IP module.

+-------------------------------+
| Internet Protocol & ICMP & GGP|
+-------------------------------+
| |
+---------------+ +---------------+
| Local Net | | Local Net |
+---------------+ +---------------+

Gateway Protocols

Figure 3.

[Page 9]

September 1981
Internet Protocol

[Page 10]

September 1981
Internet Protocol

3. SPECIFICATION

3.1. Internet Header Format

A summary of the contents of the internet header follows:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Example Internet Datagram Header

Figure 4.

Note that each tick mark represents one bit position.

Version: 4 bits

The Version field indicates the format of the internet header. This
document describes version 4.

IHL: 4 bits

Internet Header Length is the length of the internet header in 32
bit words, and thus points to the beginning of the data. Note that
the minimum value for a correct header is 5.

[Page 11]

September 1981
Internet Protocol
Specification

Type of Service: 8 bits

The Type of Service provides an indication of the abstract
parameters of the quality of service desired. These parameters are
to be used to guide the selection of the actual service parameters
when transmitting a datagram through a particular network. Several
networks offer service precedence, which somehow treats high
precedence traffic as more important than other traffic (generally
by accepting only traffic above a certain precedence at time of high
load). The major choice is a three way tradeoff between low-delay,
high-reliability, and high-throughput.

Bits 0-2: Precedence.
Bit 3: 0 = Normal Delay, 1 = Low Delay.
Bits 4: 0 = Normal Throughput, 1 = High Throughput.
Bits 5: 0 = Normal Relibility, 1 = High Relibility.
Bit 6-7: Reserved for Future Use.

0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| | | | | | |
| PRECEDENCE | D | T | R | 0 | 0 |
| | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+

Precedence

111 - Network Control
110 - Internetwork Control
101 - CRITIC/ECP
100 - Flash Override
011 - Flash
010 - Immediate
001 - Priority
000 - Routine

The use of the Delay, Throughput, and Reliability indications may
increase the cost (in some sense) of the service. In many networks
better performance for one of these parameters is coupled with worse
performance on another. Except for very unusual cases at most two
of these three indications should be set.

The type of service is used to specify the treatment of the datagram
during its transmission through the internet system. Example
mappings of the internet type of service to the actual service
provided on networks such as AUTODIN II, ARPANET, SATNET, and PRNET
is given in "Service Mappings" [8].

[Page 12]

September 1981
Internet Protocol
Specification

The Network Control precedence designation is intended to be used
within a network only. The actual use and control of that
designation is up to each network. The Internetwork Control
designation is intended for use by gateway control originators only.
If the actual use of these precedence designations is of concern to
a particular network, it is the responsibility of that network to
control the access to, and use of, those precedence designations.

Total Length: 16 bits

Total Length is the length of the datagram, measured in octets,
including internet header and data. This field allows the length of
a datagram to be up to 65,535 octets. Such long datagrams are
impractical for most hosts and networks. All hosts must be prepared
to accept datagrams of up to 576 octets (whether they arrive whole
or in fragments). It is recommended that hosts only send datagrams
larger than 576 octets if they have assurance that the destination
is prepared to accept the larger datagrams.

The number 576 is selected to allow a reasonable sized data block to
be transmitted in addition to the required header information. For
example, this size allows a data block of 512 octets plus 64 header
octets to fit in a datagram. The maximal internet header is 60
octets, and a typical internet header is 20 octets, allowing a
margin for headers of higher level protocols.

Identification: 16 bits

An identifying value assigned by the sender to aid in assembling the
fragments of a datagram.

Flags: 3 bits

Various Control Flags.

Bit 0: reserved, must be zero
Bit 1: (DF) 0 = May Fragment, 1 = Don't Fragment.
Bit 2: (MF) 0 = Last Fragment, 1 = More Fragments.

0 1 2
+---+---+---+
| | D | M |
| 0 | F | F |
+---+---+---+

Fragment Offset: 13 bits

This field indicates where in the datagram this fragment belongs.

[Page 13]

September 1981
Internet Protocol
Specification

The fragment offset is measured in units of 8 octets (64 bits). The
first fragment has offset zero.

Time to Live: 8 bits

This field indicates the maximum time the datagram is allowed to
remain in the internet system. If this field contains the value
zero, then the datagram must be destroyed. This field is modified
in internet header processing. The time is measured in units of
seconds, but since every module that processes a datagram must
decrease the TTL by at least one even if it process the datagram in
less than a second, the TTL must be thought of only as an upper
bound on the time a datagram may exist. The intention is to cause
undeliverable datagrams to be discarded, and to bound the maximum
datagram lifetime.

Protocol: 8 bits

This field indicates the next level protocol used in the data
portion of the internet datagram. The values for various protocols
are specified in "Assigned Numbers" [9].

Header Checksum: 16 bits

A checksum on the header only. Since some header fields change
(e.g., time to live), this is recomputed and verified at each point
that the internet header is processed.

The checksum algorithm is:

The checksum field is the 16 bit one's complement of the one's
complement sum of all 16 bit words in the header. For purposes of
computing the checksum, the value of the checksum field is zero.

This is a simple to compute checksum and experimental evidence
indicates it is adequate, but it is provisional and may be replaced
by a CRC procedure, depending on further experience.

Source Address: 32 bits

The source address. See section 3.2.

Destination Address: 32 bits

The destination address. See section 3.2.

[Page 14]

September 1981
Internet Protocol
Specification

Options: variable

The options may appear or not in datagrams. They must be
implemented by all IP modules (host and gateways). What is optional
is their transmission in any particular datagram, not their
implementation.

In some environments the security option may be required in all
datagrams.

The option field is variable in length. There may be zero or more
options. There are two cases for the format of an option:

Case 1: A single octet of option-type.

Case 2: An option-type octet, an option-length octet, and the
actual option-data octets.

The option-length octet counts the option-type octet and the
option-length octet as well as the option-data octets.

The option-type octet is viewed as having 3 fields:

1 bit copied flag,
2 bits option class,
5 bits option number.

The copied flag indicates that this option is copied into all
fragments on fragmentation.

0 = not copied
1 = copied

The option classes are:

0 = control
1 = reserved for future use
2 = debugging and measurement
3 = reserved for future use

[Page 15]

September 1981
Internet Protocol
Specification

The following internet options are defined:

CLASS NUMBER LENGTH DESCRIPTION
----- ------ ------ -----------
0 0 - End of Option list. This option occupies only
1 octet; it has no length octet.
0 1 - No Operation. This option occupies only 1
octet; it has no length octet.
0 2 11 Security. Used to carry Security,
Compartmentation, User Group (TCC), and
Handling Restriction Codes compatible with DOD
requirements.
0 3 var. Loose Source Routing. Used to route the
internet datagram based on information
supplied by the source.
0 9 var. Strict Source Routing. Used to route the
internet datagram based on information
supplied by the source.
0 7 var. Record Route. Used to trace the route an
internet datagram takes.
0 8 4 Stream ID. Used to carry the stream
identifier.
2 4 var. Internet Timestamp.

Specific Option Definitions

End of Option List

+--------+
|00000000|
+--------+
Type=0

This option indicates the end of the option list. This might
not coincide with the end of the internet header according to
the internet header length. This is used at the end of all
options, not the end of each option, and need only be used if
the end of the options would not otherwise coincide with the end
of the internet header.

May be copied, introduced, or deleted on fragmentation, or for
any other reason.

[Page 16]

September 1981
Internet Protocol
Specification

No Operation

+--------+
|00000001|
+--------+
Type=1

This option may be used between options, for example, to align
the beginning of a subsequent option on a 32 bit boundary.

May be copied, introduced, or deleted on fragmentation, or for
any other reason.

Security

This option provides a way for hosts to send security,
compartmentation, handling restrictions, and TCC (closed user
group) parameters. The format for this option is as follows:

+--------+--------+---//---+---//---+---//---+---//---+
|10000010|00001011|SSS SSS|CCC CCC|HHH HHH| TCC |
+--------+--------+---//---+---//---+---//---+---//---+
Type=130 Length=11

Security (S field): 16 bits

Specifies one of 16 levels of security (eight of which are
reserved for future use).

00000000 00000000 - Unclassified
11110001 00110101 - Confidential
01111000 10011010 - EFTO
10111100 01001101 - MMMM
01011110 00100110 - PROG
10101111 00010011 - Restricted
11010111 10001000 - Secret
01101011 11000101 - Top Secret
00110101 11100010 - (Reserved for future use)
10011010 11110001 - (Reserved for future use)
01001101 01111000 - (Reserved for future use)
00100100 10111101 - (Reserved for future use)
00010011 01011110 - (Reserved for future use)
10001001 10101111 - (Reserved for future use)
11000100 11010110 - (Reserved for future use)
11100010 01101011 - (Reserved for future use)

[Page 17]

September 1981
Internet Protocol
Specification

Compartments (C field): 16 bits

An all zero value is used when the information transmitted is
not compartmented. Other values for the compartments field
may be obtained from the Defense Intelligence Agency.

Handling Restrictions (H field): 16 bits

The values for the control and release markings are
alphanumeric digraphs and are defined in the Defense
Intelligence Agency Manual DIAM 65-19, "Standard Security
Markings".

Transmission Control Code (TCC field): 24 bits

Provides a means to segregate traffic and define controlled
communities of interest among subscribers. The TCC values are
trigraphs, and are available from HQ DCA Code 530.

Must be copied on fragmentation. This option appears at most
once in a datagram.

Loose Source and Record Route

+--------+--------+--------+---------//--------+
|10000011| length | pointer| route data |
+--------+--------+--------+---------//--------+
Type=131

The loose source and record route (LSRR) option provides a means
for the source of an internet datagram to supply routing
information to be used by the gateways in forwarding the
datagram to the destination, and to record the route
information.

The option begins with the option type code. The second octet
is the option length which includes the option type code and the
length octet, the pointer octet, and length-3 octets of route
data. The third octet is the pointer into the route data
indicating the octet which begins the next source address to be
processed. The pointer is relative to this option, and the
smallest legal value for the pointer is 4.

A route data is composed of a series of internet addresses.
Each internet address is 32 bits or 4 octets. If the pointer is
greater than the length, the source route is empty (and the
recorded route full) and the routing is to be based on the
destination address field.

[Page 18]

September 1981
Internet Protocol
Specification

If the address in destination address field has been reached and
the pointer is not greater than the length, the next address in
the source route replaces the address in the destination address
field, and the recorded route address replaces the source
address just used, and pointer is increased by four.

The recorded route address is the internet module's own internet
address as known in the environment into which this datagram is
being forwarded.

This procedure of replacing the source route with the recorded
route (though it is in the reverse of the order it must be in to
be used as a source route) means the option (and the IP header
as a whole) remains a constant length as the datagram progresses
through the internet.

This option is a loose source route because the gateway or host
IP is allowed to use any route of any number of other
intermediate gateways to reach the next address in the route.

Must be copied on fragmentation. Appears at most once in a
datagram.

Strict Source and Record Route

+--------+--------+--------+---------//--------+
|10001001| length | pointer| route data |
+--------+--------+--------+---------//--------+
Type=137

The strict source and record route (SSRR) option provides a
means for the source of an internet datagram to supply routing
information to be used by the gateways in forwarding the
datagram to the destination, and to record the route
information.

The option begins with the option type code. The second octet
is the option length which includes the option type code and the
length octet, the pointer octet, and length-3 octets of route
data. The third octet is the pointer into the route data
indicating the octet which begins the next source address to be
processed. The pointer is relative to this option, and the
smallest legal value for the pointer is 4.

A route data is composed of a series of internet addresses.
Each internet address is 32 bits or 4 octets. If the pointer is
greater than the length, the source route is empty (and the

[Page 19]

September 1981
Internet Protocol
Specification

recorded route full) and the routing is to be based on the
destination address field.

If the address in destination address field has been reached and
the pointer is not greater than the length, the next address in
the source route replaces the address in the destination address
field, and the recorded route address replaces the source
address just used, and pointer is increased by four.

The recorded route address is the internet module's own internet
address as known in the environment into which this datagram is
being forwarded.

This procedure of replacing the source route with the recorded
route (though it is in the reverse of the order it must be in to
be used as a source route) means the option (and the IP header
as a whole) remains a constant length as the datagram progresses
through the internet.

This option is a strict source route because the gateway or host
IP must send the datagram directly to the next address in the
source route through only the directly connected network
indicated in the next address to reach the next gateway or host
specified in the route.

Must be copied on fragmentation. Appears at most once in a
datagram.

Record Route

+--------+--------+--------+---------//--------+
|00000111| length | pointer| route data |
+--------+--------+--------+---------//--------+
Type=7

The record route option provides a means to record the route of
an internet datagram.

The option begins with the option type code. The second octet
is the option length which includes the option type code and the
length octet, the pointer octet, and length-3 octets of route
data. The third octet is the pointer into the route data
indicating the octet which begins the next area to store a route
address. The pointer is relative to this option, and the
smallest legal value for the pointer is 4.

A recorded route is composed of a series of internet addresses.
Each internet address is 32 bits or 4 octets. If the pointer is

[Page 20]

September 1981
Internet Protocol
Specification

greater than the length, the recorded route data area is full.
The originating host must compose this option with a large
enough route data area to hold all the address expected. The
size of the option does not change due to adding addresses. The
intitial contents of the route data area must be zero.

When an internet module routes a datagram it checks to see if
the record route option is present. If it is, it inserts its
own internet address as known in the environment into which this
datagram is being forwarded into the recorded route begining at
the octet indicated by the pointer, and increments the pointer
by four.

If the route data area is already full (the pointer exceeds the
length) the datagram is forwarded without inserting the address
into the recorded route. If there is some room but not enough
room for a full address to be inserted, the original datagram is
considered to be in error and is discarded. In either case an
ICMP parameter problem message may be sent to the source
host [3].

Not copied on fragmentation, goes in first fragment only.
Appears at most once in a datagram.

Stream Identifier

+--------+--------+--------+--------+
|10001000|00000010| Stream ID |
+--------+--------+--------+--------+
Type=136 Length=4

This option provides a way for the 16-bit SATNET stream
identifier to be carried through networks that do not support
the stream concept.

Must be copied on fragmentation. Appears at most once in a
datagram.

[Page 21]

September 1981
Internet Protocol
Specification

Internet Timestamp

+--------+--------+--------+--------+
|01000100| length | pointer|oflw|flg|
+--------+--------+--------+--------+
| internet address |
+--------+--------+--------+--------+
| timestamp |
+--------+--------+--------+--------+
| . |
.
.
Type = 68

The Option Length is the number of octets in the option counting
the type, length, pointer, and overflow/flag octets (maximum
length 40).

The Pointer is the number of octets from the beginning of this
option to the end of timestamps plus one (i.e., it points to the
octet beginning the space for next timestamp). The smallest
legal value is 5. The timestamp area is full when the pointer
is greater than the length.

The Overflow (oflw) [4 bits] is the number of IP modules that
cannot register timestamps due to lack of space.

The Flag (flg) [4 bits] values are

0 -- time stamps only, stored in consecutive 32-bit words,

1 -- each timestamp is preceded with internet address of the
registering entity,

3 -- the internet address fields are prespecified. An IP
module only registers its timestamp if it matches its own
address with the next specified internet address.

The Timestamp is a right-justified, 32-bit timestamp in
milliseconds since midnight UT. If the time is not available in
milliseconds or cannot be provided with respect to midnight UT
then any time may be inserted as a timestamp provided the high
order bit of the timestamp field is set to one to indicate the
use of a non-standard value.

The originating host must compose this option with a large
enough timestamp data area to hold all the timestamp information
expected. The size of the option does not change due to adding

[Page 22]

September 1981
Internet Protocol
Specification

timestamps. The intitial contents of the timestamp data area
must be zero or internet address/zero pairs.

If the timestamp data area is already full (the pointer exceeds
the length) the datagram is forwarded without inserting the
timestamp, but the overflow count is incremented by one.

If there is some room but not enough room for a full timestamp
to be inserted, or the overflow count itself overflows, the
original datagram is considered to be in error and is discarded.
In either case an ICMP parameter problem message may be sent to
the source host [3].

The timestamp option is not copied upon fragmentation. It is
carried in the first fragment. Appears at most once in a
datagram.

Padding: variable

The internet header padding is used to ensure that the internet
header ends on a 32 bit boundary. The padding is zero.

3.2. Discussion

The implementation of a protocol must be robust. Each implementation
must expect to interoperate with others created by different
individuals. While the goal of this specification is to be explicit
about the protocol there is the possibility of differing
interpretations. In general, an implementation must be conservative
in its sending behavior, and liberal in its receiving behavior. That
is, it must be careful to send well-formed datagrams, but must accept
any datagram that it can interpret (e.g., not object to technical
errors where the meaning is still clear).

The basic internet service is datagram oriented and provides for the
fragmentation of datagrams at gateways, with reassembly taking place
at the destination internet protocol module in the destination host.
Of course, fragmentation and reassembly of datagrams within a network
or by private agreement between the gateways of a network is also
allowed since this is transparent to the internet protocols and the
higher-level protocols. This transparent type of fragmentation and
reassembly is termed "network-dependent" (or intranet) fragmentation
and is not discussed further here.

Internet addresses distinguish sources and destinations to the host
level and provide a protocol field as well. It is assumed that each
protocol will provide for whatever multiplexing is necessary within a
host.

[Page 23]

September 1981
Internet Protocol
Specification

Addressing

To provide for flexibility in assigning address to networks and
allow for the large number of small to intermediate sized networks
the interpretation of the address field is coded to specify a small
number of networks with a large number of host, a moderate number of
networks with a moderate number of hosts, and a large number of
networks with a small number of hosts. In addition there is an
escape code for extended addressing mode.

Address Formats:

High Order Bits Format Class
--------------- ------------------------------- -----
0 7 bits of net, 24 bits of host a
10 14 bits of net, 16 bits of host b
110 21 bits of net, 8 bits of host c
111 escape to extended addressing mode

A value of zero in the network field means this network. This is
only used in certain ICMP messages. The extended addressing mode
is undefined. Both of these features are reserved for future use.

The actual values assigned for network addresses is given in
"Assigned Numbers" [9].

The local address, assigned by the local network, must allow for a
single physical host to act as several distinct internet hosts.
That is, there must be a mapping between internet host addresses and
network/host interfaces that allows several internet addresses to
correspond to one interface. It must also be allowed for a host to
have several physical interfaces and to treat the datagrams from
several of them as if they were all addressed to a single host.

Address mappings between internet addresses and addresses for
ARPANET, SATNET, PRNET, and other networks are described in "Address
Mappings" [5].

Fragmentation and Reassembly.

The internet identification field (ID) is used together with the
source and destination address, and the protocol fields, to identify
datagram fragments for reassembly.

The More Fragments flag bit (MF) is set if the datagram is not the
last fragment. The Fragment Offset field identifies the fragment
location, relative to the beginning of the original unfragmented
datagram. Fragments are counted in units of 8 octets. The

[Page 24]

September 1981
Internet Protocol
Specification

fragmentation strategy is designed so than an unfragmented datagram
has all zero fragmentation information (MF = 0, fragment offset =
0). If an internet datagram is fragmented, its data portion must be
broken on 8 octet boundaries.

This format allows 2**13 = 8192 fragments of 8 octets each for a
total of 65,536 octets. Note that this is consistent with the the
datagram total length field (of course, the header is counted in the
total length and not in the fragments).

When fragmentation occurs, some options are copied, but others
remain with the first fragment only.

Every internet module must be able to forward a datagram of 68
octets without further fragmentation. This is because an internet
header may be up to 60 octets, and the minimum fragment is 8 octets.

Every internet destination must be able to receive a datagram of 576
octets either in one piece or in fragments to be reassembled.

The fields which may be affected by fragmentation include:

(1) options field
(2) more fragments flag
(3) fragment offset
(4) internet header length field
(5) total length field
(6) header checksum

If the Don't Fragment flag (DF) bit is set, then internet
fragmentation of this datagram is NOT permitted, although it may be
discarded. This can be used to prohibit fragmentation in cases
where the receiving host does not have sufficient resources to
reassemble internet fragments.

One example of use of the Don't Fragment feature is to down line
load a small host. A small host could have a boot strap program
that accepts a datagram stores it in memory and then executes it.

The fragmentation and reassembly procedures are most easily
described by examples. The following procedures are example
implementations.

General notation in the following pseudo programs: "=<" means "less
than or equal", "#" means "not equal", "=" means "equal", "<-" means
"is set to". Also, "x to y" includes x and excludes y; for example,
"4 to 7" would include 4, 5, and 6 (but not 7).

[Page 25]

September 1981
Internet Protocol
Specification

An Example Fragmentation Procedure

The maximum sized datagram that can be transmitted through the
next network is called the maximum transmission unit (MTU).

If the total length is less than or equal the maximum transmission
unit then submit this datagram to the next step in datagram
processing; otherwise cut the datagram into two fragments, the
first fragment being the maximum size, and the second fragment
being the rest of the datagram. The first fragment is submitted
to the next step in datagram processing, while the second fragment
is submitted to this procedure in case it is still too large.

Notation:

FO - Fragment Offset
IHL - Internet Header Length
DF - Don't Fragment flag
MF - More Fragments flag
TL - Total Length
OFO - Old Fragment Offset
OIHL - Old Internet Header Length
OMF - Old More Fragments flag
OTL - Old Total Length
NFB - Number of Fragment Blocks
MTU - Maximum Transmission Unit

Procedure:

IF TL =< MTU THEN Submit this datagram to the next step
in datagram processing ELSE IF DF = 1 THEN discard the
datagram ELSE
To produce the first fragment:
(1) Copy the original internet header;
(2) OIHL <- IHL; OTL <- TL; OFO <- FO; OMF <- MF;
(3) NFB <- (MTU-IHL*4)/8;
(4) Attach the first NFB*8 data octets;
(5) Correct the header:
MF <- 1; TL <- (IHL*4)+(NFB*8);
Recompute Checksum;
(6) Submit this fragment to the next step in
datagram processing;
To produce the second fragment:
(7) Selectively copy the internet header (some options
are not copied, see option definitions);
(8) Append the remaining data;
(9) Correct the header:
IHL <- (((OIHL*4)-(length of options not copied))+3)/4;

[Page 26]

September 1981
Internet Protocol
Specification

TL <- OTL - NFB*8 - (OIHL-IHL)*4);
FO <- OFO + NFB; MF <- OMF; Recompute Checksum;
(10) Submit this fragment to the fragmentation test; DONE.

In the above procedure each fragment (except the last) was made
the maximum allowable size. An alternative might produce less
than the maximum size datagrams. For example, one could implement
a fragmentation procedure that repeatly divided large datagrams in
half until the resulting fragments were less than the maximum
transmission unit size.

An Example Reassembly Procedure

For each datagram the buffer identifier is computed as the
concatenation of the source, destination, protocol, and
identification fields. If this is a whole datagram (that is both
the fragment offset and the more fragments fields are zero), then
any reassembly resources associated with this buffer identifier
are released and the datagram is forwarded to the next step in
datagram processing.

If no other fragment with this buffer identifier is on hand then
reassembly resources are allocated. The reassembly resources
consist of a data buffer, a header buffer, a fragment block bit
table, a total data length field, and a timer. The data from the
fragment is placed in the data buffer according to its fragment
offset and length, and bits are set in the fragment block bit
table corresponding to the fragment blocks received.

If this is the first fragment (that is the fragment offset is
zero) this header is placed in the header buffer. If this is the
last fragment ( that is the more fragments field is zero) the
total data length is computed. If this fragment completes the
datagram (tested by checking the bits set in the fragment block
table), then the datagram is sent to the next step in datagram
processing; otherwise the timer is set to the maximum of the
current timer value and the value of the time to live field from
this fragment; and the reassembly routine gives up control.

If the timer runs out, the all reassembly resources for this
buffer identifier are released. The initial setting of the timer
is a lower bound on the reassembly waiting time. This is because
the waiting time will be increased if the Time to Live in the
arriving fragment is greater than the current timer value but will
not be decreased if it is less. The maximum this timer value
could reach is the maximum time to live (approximately 4.25
minutes). The current recommendation for the initial timer
setting is 15 seconds. This may be changed as experience with

[Page 27]

September 1981
Internet Protocol
Specification

this protocol accumulates. Note that the choice of this parameter
value is related to the buffer capacity available and the data
rate of the transmission medium; that is, data rate times timer
value equals buffer size (e.g., 10Kb/s X 15s = 150Kb).

Notation:

FO - Fragment Offset
IHL - Internet Header Length
MF - More Fragments flag
TTL - Time To Live
NFB - Number of Fragment Blocks
TL - Total Length
TDL - Total Data Length
BUFID - Buffer Identifier
RCVBT - Fragment Received Bit Table
TLB - Timer Lower Bound

Procedure:

(1) BUFID <- source|destination|protocol|identification;
(2) IF FO = 0 AND MF = 0
(3) THEN IF buffer with BUFID is allocated
(4) THEN flush all reassembly for this BUFID;
(5) Submit datagram to next step; DONE.
(6) ELSE IF no buffer with BUFID is allocated
(7) THEN allocate reassembly resources
with BUFID;
TIMER <- TLB; TDL <- 0;
(8) put data from fragment into data buffer with
BUFID from octet FO*8 to
octet (TL-(IHL*4))+FO*8;
(9) set RCVBT bits from FO
to FO+((TL-(IHL*4)+7)/8);
(10) IF MF = 0 THEN TDL <- TL-(IHL*4)+(FO*8)
(11) IF FO = 0 THEN put header in header buffer
(12) IF TDL # 0
(13) AND all RCVBT bits from 0
to (TDL+7)/8 are set
(14) THEN TL <- TDL+(IHL*4)
(15) Submit datagram to next step;
(16) free all reassembly resources
for this BUFID; DONE.
(17) TIMER <- MAX(TIMER,TTL);
(18) give up until next fragment or timer expires;
(19) timer expires: flush all reassembly with this BUFID; DONE.

In the case that two or more fragments contain the same data

[Page 28]

September 1981
Internet Protocol
Specification

either identically or through a partial overlap, this procedure
will use the more recently arrived copy in the data buffer and
datagram delivered.

Identification

The choice of the Identifier for a datagram is based on the need to
provide a way to uniquely identify the fragments of a particular
datagram. The protocol module assembling fragments judges fragments
to belong to the same datagram if they have the same source,
destination, protocol, and Identifier. Thus, the sender must choose
the Identifier to be unique for this source, destination pair and
protocol for the time the datagram (or any fragment of it) could be
alive in the internet.

It seems then that a sending protocol module needs to keep a table
of Identifiers, one entry for each destination it has communicated
with in the last maximum packet lifetime for the internet.

However, since the Identifier field allows 65,536 different values,
some host may be able to simply use unique identifiers independent
of destination.

It is appropriate for some higher level protocols to choose the
identifier. For example, TCP protocol modules may retransmit an
identical TCP segment, and the probability for correct reception
would be enhanced if the retransmission carried the same identifier
as the original transmission since fragments of either datagram
could be used to construct a correct TCP segment.

Type of Service

The type of service (TOS) is for internet service quality selection.
The type of service is specified along the abstract parameters
precedence, delay, throughput, and reliability. These abstract
parameters are to be mapped into the actual service parameters of
the particular networks the datagram traverses.

Precedence. An independent measure of the importance of this
datagram.

Delay. Prompt delivery is important for datagrams with this
indication.

Throughput. High data rate is important for datagrams with this
indication.

[Page 29]

September 1981
Internet Protocol
Specification

Reliability. A higher level of effort to ensure delivery is
important for datagrams with this indication.

For example, the ARPANET has a priority bit, and a choice between
"standard" messages (type 0) and "uncontrolled" messages (type 3),
(the choice between single packet and multipacket messages can also
be considered a service parameter). The uncontrolled messages tend
to be less reliably delivered and suffer less delay. Suppose an
internet datagram is to be sent through the ARPANET. Let the
internet type of service be given as:

Precedence: 5
Delay: 0
Throughput: 1
Reliability: 1

In this example, the mapping of these parameters to those available
for the ARPANET would be to set the ARPANET priority bit on since
the Internet precedence is in the upper half of its range, to select
standard messages since the throughput and reliability requirements
are indicated and delay is not. More details are given on service
mappings in "Service Mappings" [8].

Time to Live

The time to live is set by the sender to the maximum time the
datagram is allowed to be in the internet system. If the datagram
is in the internet system longer than the time to live, then the
datagram must be destroyed.

This field must be decreased at each point that the internet header
is processed to reflect the time spent processing the datagram.
Even if no local information is available on the time actually
spent, the field must be decremented by 1. The time is measured in
units of seconds (i.e. the value 1 means one second). Thus, the
maximum time to live is 255 seconds or 4.25 minutes. Since every
module that processes a datagram must decrease the TTL by at least
one even if it process the datagram in less than a second, the TTL
must be thought of only as an upper bound on the time a datagram may
exist. The intention is to cause undeliverable datagrams to be
discarded, and to bound the maximum datagram lifetime.

Some higher level reliable connection protocols are based on
assumptions that old duplicate datagrams will not arrive after a
certain time elapses. The TTL is a way for such protocols to have
an assurance that their assumption is met.

[Page 30]

September 1981
Internet Protocol
Specification

Options

The options are optional in each datagram, but required in
implementations. That is, the presence or absence of an option is
the choice of the sender, but each internet module must be able to
parse every option. There can be several options present in the
option field.

The options might not end on a 32-bit boundary. The internet header
must be filled out with octets of zeros. The first of these would
be interpreted as the end-of-options option, and the remainder as
internet header padding.

Every internet module must be able to act on every option. The
Security Option is required if classified, restricted, or
compartmented traffic is to be passed.

Checksum

The internet header checksum is recomputed if the internet header is
changed. For example, a reduction of the time to live, additions or
changes to internet options, or due to fragmentation. This checksum
at the internet level is intended to protect the internet header
fields from transmission errors.

There are some applications where a few data bit errors are
acceptable while retransmission delays are not. If the internet
protocol enforced data correctness such applications could not be
supported.

Errors

Internet protocol errors may be reported via the ICMP messages [3].

3.3. Interfaces

The functional description of user interfaces to the IP is, at best,
fictional, since every operating system will have different
facilities. Consequently, we must warn readers that different IP
implementations may have different user interfaces. However, all IPs
must provide a certain minimum set of services to guarantee that all
IP implementations can support the same protocol hierarchy. This
section specifies the functional interfaces required of all IP
implementations.

Internet protocol interfaces on one side to the local network and on
the other side to either a higher level protocol or an application
program. In the following, the higher level protocol or application

[Page 31]

September 1981
Internet Protocol
Specification

program (or even a gateway program) will be called the "user" since it
is using the internet module. Since internet protocol is a datagram
protocol, there is minimal memory or state maintained between datagram
transmissions, and each call on the internet protocol module by the
user supplies all information necessary for the IP to perform the
service requested.

An Example Upper Level Interface

The following two example calls satisfy the requirements for the user
to internet protocol module communication ("=>" means returns):

SEND (src, dst, prot, TOS, TTL, BufPTR, len, Id, DF, opt => result)

where:

src = source address
dst = destination address
prot = protocol
TOS = type of service
TTL = time to live
BufPTR = buffer pointer
len = length of buffer
Id = Identifier
DF = Don't Fragment
opt = option data
result = response
OK = datagram sent ok
Error = error in arguments or local network error

Note that the precedence is included in the TOS and the
security/compartment is passed as an option.

RECV (BufPTR, prot, => result, src, dst, TOS, len, opt)

where:

BufPTR = buffer pointer
prot = protocol
result = response
OK = datagram received ok
Error = error in arguments
len = length of buffer
src = source address
dst = destination address
TOS = type of service
opt = option data

[Page 32]

September 1981
Internet Protocol
Specification

When the user sends a datagram, it executes the SEND call supplying
all the arguments. The internet protocol module, on receiving this
call, checks the arguments and prepares and sends the message. If the
arguments are good and the datagram is accepted by the local network,
the call returns successfully. If either the arguments are bad, or
the datagram is not accepted by the local network, the call returns
unsuccessfully. On unsuccessful returns, a reasonable report must be
made as to the cause of the problem, but the details of such reports
are up to individual implementations.

When a datagram arrives at the internet protocol module from the local
network, either there is a pending RECV call from the user addressed
or there is not. In the first case, the pending call is satisfied by
passing the information from the datagram to the user. In the second
case, the user addressed is notified of a pending datagram. If the
user addressed does not exist, an ICMP error message is returned to
the sender, and the data is discarded.

The notification of a user may be via a pseudo interrupt or similar
mechanism, as appropriate in the particular operating system
environment of the implementation.

A user's RECV call may then either be immediately satisfied by a
pending datagram, or the call may be pending until a datagram arrives.

The source address is included in the send call in case the sending
host has several addresses (multiple physical connections or logical
addresses). The internet module must check to see that the source
address is one of the legal address for this host.

An implementation may also allow or require a call to the internet
module to indicate interest in or reserve exclusive use of a class of
datagrams (e.g., all those with a certain value in the protocol
field).

This section functionally characterizes a USER/IP interface. The
notation used is similar to most procedure of function calls in high
level languages, but this usage is not meant to rule out trap type
service calls (e.g., SVCs, UUOs, EMTs), or any other form of
interprocess communication.

[Page 33]

September 1981
Internet Protocol

APPENDIX A: Examples & Scenarios

Example 1:

This is an example of the minimal data carrying internet datagram:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 5 |Type of Service| Total Length = 21 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=0| Fragment Offset = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 123 | Protocol = 1 | header checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+

Example Internet Datagram

Figure 5.

Note that each tick mark represents one bit position.

This is a internet datagram in version 4 of internet protocol; the
internet header consists of five 32 bit words, and the total length of
the datagram is 21 octets. This datagram is a complete datagram (not
a fragment).

[Page 34]

September 1981
Internet Protocol

Example 2:

In this example, we show first a moderate size internet datagram (452
data octets), then two internet fragments that might result from the
fragmentation of this datagram if the maximum sized transmission
allowed were 280 octets.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 5 |Type of Service| Total Length = 472 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=0| Fragment Offset = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 123 | Protocol = 6 | header checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
\ \
\ \
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Example Internet Datagram

Figure 6.

[Page 35]

September 1981
Internet Protocol

Now the first fragment that results from splitting the datagram after
256 data octets.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 5 |Type of Service| Total Length = 276 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=1| Fragment Offset = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 119 | Protocol = 6 | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
\ \
\ \
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Example Internet Fragment

Figure 7.

[Page 36]

September 1981
Internet Protocol

And the second fragment.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 5 |Type of Service| Total Length = 216 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=0| Fragment Offset = 32 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 119 | Protocol = 6 | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
\ \
\ \
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Example Internet Fragment

Figure 8.

[Page 37]

September 1981
Internet Protocol

Example 3:

Here, we show an example of a datagram containing options:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 8 |Type of Service| Total Length = 576 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=0| Fragment Offset = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 123 | Protocol = 6 | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opt. Code = x | Opt. Len.= 3 | option value | Opt. Code = x |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opt. Len. = 4 | option value | Opt. Code = 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opt. Code = y | Opt. Len. = 3 | option value | Opt. Code = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
\ \
\ \
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Example Internet Datagram

Figure 9.

[Page 38]

September 1981
Internet Protocol

APPENDIX B: Data Transmission Order

The order of transmission of the header and data described in this
document is resolved to the octet level. Whenever a diagram shows a
group of octets, the order of transmission of those octets is the normal
order in which they are read in English. For example, in the following
diagram the octets are transmitted in the order they are numbered.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 1 | 2 | 3 | 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 5 | 6 | 7 | 8 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 9 | 10 | 11 | 12 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Transmission Order of Bytes

Figure 10.

Whenever an octet represents a numeric quantity the left most bit in the
diagram is the high order or most significant bit. That is, the bit
labeled 0 is the most significant bit. For example, the following
diagram represents the value 170 (decimal).

0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|1 0 1 0 1 0 1 0|
+-+-+-+-+-+-+-+-+

Significance of Bits

Figure 11.

Similarly, whenever a multi-octet field represents a numeric quantity
the left most bit of the whole field is the most significant bit. When
a multi-octet quantity is transmitted the most significant octet is
transmitted first.

[Page 39]

September 1981
Internet Protocol

[Page 40]

September 1981
Internet Protocol

GLOSSARY

1822
BBN Report 1822, "The Specification of the Interconnection of
a Host and an IMP". The specification of interface between a
host and the ARPANET.

ARPANET leader
The control information on an ARPANET message at the host-IMP
interface.

ARPANET message
The unit of transmission between a host and an IMP in the
ARPANET. The maximum size is about 1012 octets (8096 bits).

ARPANET packet
A unit of transmission used internally in the ARPANET between
IMPs. The maximum size is about 126 octets (1008 bits).

Destination
The destination address, an internet header field.

DF
The Don't Fragment bit carried in the flags field.

Flags
An internet header field carrying various control flags.

Fragment Offset
This internet header field indicates where in the internet
datagram a fragment belongs.

GGP
Gateway to Gateway Protocol, the protocol used primarily
between gateways to control routing and other gateway
functions.

header
Control information at the beginning of a message, segment,
datagram, packet or block of data.

ICMP
Internet Control Message Protocol, implemented in the internet
module, the ICMP is used from gateways to hosts and between
hosts to report errors and make routing suggestions.

[Page 41]

September 1981
Internet Protocol
Glossary

Identification
An internet header field carrying the identifying value
assigned by the sender to aid in assembling the fragments of a
datagram.

IHL
The internet header field Internet Header Length is the length
of the internet header measured in 32 bit words.

IMP
The Interface Message Processor, the packet switch of the
ARPANET.

Internet Address
A four octet (32 bit) source or destination address consisting
of a Network field and a Local Address field.

internet datagram
The unit of data exchanged between a pair of internet modules
(includes the internet header).

internet fragment
A portion of the data of an internet datagram with an internet
header.

Local Address
The address of a host within a network. The actual mapping of
an internet local address on to the host addresses in a
network is quite general, allowing for many to one mappings.

MF
The More-Fragments Flag carried in the internet header flags
field.

module
An implementation, usually in software, of a protocol or other
procedure.

more-fragments flag
A flag indicating whether or not this internet datagram
contains the end of an internet datagram, carried in the
internet header Flags field.

NFB
The Number of Fragment Blocks in a the data portion of an
internet fragment. That is, the length of a portion of data
measured in 8 octet units.

[Page 42]

September 1981
Internet Protocol
Glossary

octet
An eight bit byte.

Options
The internet header Options field may contain several options,
and each option may be several octets in length.

Padding
The internet header Padding field is used to ensure that the
data begins on 32 bit word boundary. The padding is zero.

Protocol
In this document, the next higher level protocol identifier,
an internet header field.

Rest
The local address portion of an Internet Address.

Source
The source address, an internet header field.

TCP
Transmission Control Protocol: A host-to-host protocol for
reliable communication in internet environments.

TCP Segment
The unit of data exchanged between TCP modules (including the
TCP header).

TFTP
Trivial File Transfer Protocol: A simple file transfer
protocol built on UDP.

Time to Live
An internet header field which indicates the upper bound on
how long this internet datagram may exist.

TOS
Type of Service

Total Length
The internet header field Total Length is the length of the
datagram in octets including internet header and data.

TTL
Time to Live

[Page 43]

September 1981
Internet Protocol
Glossary

Type of Service
An internet header field which indicates the type (or quality)
of service for this internet datagram.

UDP
User Datagram Protocol: A user level protocol for transaction
oriented applications.

User
The user of the internet protocol. This may be a higher level
protocol module, an application program, or a gateway program.

Version
The Version field indicates the format of the internet header.

[Page 44]

September 1981
Internet Protocol

REFERENCES

[1] Cerf, V., "The Catenet Model for Internetworking," Information
Processing Techniques Office, Defense Advanced Research Projects
Agency, IEN 48, July 1978.

[2] Bolt Beranek and Newman, "Specification for the Interconnection of
a Host and an IMP," BBN Technical Report 1822, Revised May 1978.

[3] Postel, J., "Internet Control Message Protocol - DARPA Internet
Program Protocol Specification," RFC 792, USC/Information Sciences
Institute, September 1981.

[4] Shoch, J., "Inter-Network Naming, Addressing, and Routing,"
COMPCON, IEEE Computer Society, Fall 1978.

[5] Postel, J., "Address Mappings," RFC 796, USC/Information Sciences
Institute, September 1981.

[6] Shoch, J., "Packet Fragmentation in Inter-Network Protocols,"
Computer Networks, v. 3, n. 1, February 1979.

[7] Strazisar, V., "How to Build a Gateway", IEN 109, Bolt Beranek and
Newman, August 1979.

[8] Postel, J., "Service Mappings," RFC 795, USC/Information Sciences
Institute, September 1981.

[9] Postel, J., "Assigned Numbers," RFC 790, USC/Information Sciences
Institute, September 1981.


%d bloggers like this: