ipv6 presentation

RFC 2401 – Security Architecture for the Internet Protocol

 
Network Working Group                                            S. Kent
Request for Comments: 2401 BBN Corp
Obsoletes: 1825 R. Atkinson
Category: Standards Track @Home Network
November 1998

Security Architecture for the Internet Protocol

Status of this Memo

This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.

Copyright Notice

Copyright (C) The Internet Society (1998). All Rights Reserved.

Table of Contents

1. Introduction........................................................3
1.1 Summary of Contents of Document..................................3
1.2 Audience.........................................................3
1.3 Related Documents................................................4
2. Design Objectives...................................................4
2.1 Goals/Objectives/Requirements/Problem Description................4
2.2 Caveats and Assumptions..........................................5
3. System Overview.....................................................5
3.1 What IPsec Does..................................................6
3.2 How IPsec Works..................................................6
3.3 Where IPsec May Be Implemented...................................7
4. Security Associations...............................................8
4.1 Definition and Scope.............................................8
4.2 Security Association Functionality..............................10
4.3 Combining Security Associations.................................11
4.4 Security Association Databases..................................13
4.4.1 The Security Policy Database (SPD).........................14
4.4.2 Selectors..................................................17
4.4.3 Security Association Database (SAD)........................21
4.5 Basic Combinations of Security Associations.....................24
4.6 SA and Key Management...........................................26
4.6.1 Manual Techniques..........................................27
4.6.2 Automated SA and Key Management............................27
4.6.3 Locating a Security Gateway................................28
4.7 Security Associations and Multicast.............................29

5. IP Traffic Processing..............................................30
5.1 Outbound IP Traffic Processing..................................30
5.1.1 Selecting and Using an SA or SA Bundle.....................30
5.1.2 Header Construction for Tunnel Mode........................31
5.1.2.1 IPv4 -- Header Construction for Tunnel Mode...........31
5.1.2.2 IPv6 -- Header Construction for Tunnel Mode...........32
5.2 Processing Inbound IP Traffic...................................33
5.2.1 Selecting and Using an SA or SA Bundle.....................33
5.2.2 Handling of AH and ESP tunnels.............................34
6. ICMP Processing (relevant to IPsec)................................35
6.1 PMTU/DF Processing..............................................36
6.1.1 DF Bit.....................................................36
6.1.2 Path MTU Discovery (PMTU)..................................36
6.1.2.1 Propagation of PMTU...................................36
6.1.2.2 Calculation of PMTU...................................37
6.1.2.3 Granularity of PMTU Processing........................37
6.1.2.4 PMTU Aging............................................38
7. Auditing...........................................................39
8. Use in Systems Supporting Information Flow Security................39
8.1 Relationship Between Security Associations and Data Sensitivity.40
8.2 Sensitivity Consistency Checking................................40
8.3 Additional MLS Attributes for Security Association Databases....41
8.4 Additional Inbound Processing Steps for MLS Networking..........41
8.5 Additional Outbound Processing Steps for MLS Networking.........41
8.6 Additional MLS Processing for Security Gateways.................42
9. Performance Issues.................................................42
10. Conformance Requirements..........................................43
11. Security Considerations...........................................43
12. Differences from RFC 1825.........................................43
Acknowledgements......................................................44
Appendix A -- Glossary................................................45
Appendix B -- Analysis/Discussion of PMTU/DF/Fragmentation Issues.....48
B.1 DF bit..........................................................48
B.2 Fragmentation...................................................48
B.3 Path MTU Discovery..............................................52
B.3.1 Identifying the Originating Host(s)........................53
B.3.2 Calculation of PMTU........................................55
B.3.3 Granularity of Maintaining PMTU Data.......................56
B.3.4 Per Socket Maintenance of PMTU Data........................57
B.3.5 Delivery of PMTU Data to the Transport Layer...............57
B.3.6 Aging of PMTU Data.........................................57
Appendix C -- Sequence Space Window Code Example......................58
Appendix D -- Categorization of ICMP messages.........................60
References............................................................63
Disclaimer............................................................64
Author Information....................................................65
Full Copyright Statement..............................................66

1. Introduction

1.1 Summary of Contents of Document

This memo specifies the base architecture for IPsec compliant
systems. The goal of the architecture is to provide various security
services for traffic at the IP layer, in both the IPv4 and IPv6
environments. This document describes the goals of such systems,
their components and how they fit together with each other and into
the IP environment. It also describes the security services offered
by the IPsec protocols, and how these services can be employed in the
IP environment. This document does not address all aspects of IPsec
architecture. Subsequent documents will address additional
architectural details of a more advanced nature, e.g., use of IPsec
in NAT environments and more complete support for IP multicast. The
following fundamental components of the IPsec security architecture
are discussed in terms of their underlying, required functionality.
Additional RFCs (see Section 1.3 for pointers to other documents)
define the protocols in (a), (c), and (d).

a. Security Protocols -- Authentication Header (AH) and
Encapsulating Security Payload (ESP)
b. Security Associations -- what they are and how they work,
how they are managed, associated processing
c. Key Management -- manual and automatic (The Internet Key
Exchange (IKE))
d. Algorithms for authentication and encryption

This document is not an overall Security Architecture for the
Internet; it addresses security only at the IP layer, provided
through the use of a combination of cryptographic and protocol
security mechanisms.

The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in RFC 2119 [Bra97].

1.2 Audience

The target audience for this document includes implementers of this
IP security technology and others interested in gaining a general
background understanding of this system. In particular, prospective
users of this technology (end users or system administrators) are
part of the target audience. A glossary is provided as an appendix

to help fill in gaps in background/vocabulary. This document assumes
that the reader is familiar with the Internet Protocol, related
networking technology, and general security terms and concepts.

1.3 Related Documents

As mentioned above, other documents provide detailed definitions of
some of the components of IPsec and of their inter-relationship.
They include RFCs on the following topics:

a. "IP Security Document Roadmap" [TDG97] -- a document
providing guidelines for specifications describing encryption
and authentication algorithms used in this system.
b. security protocols -- RFCs describing the Authentication
Header (AH) [KA98a] and Encapsulating Security Payload (ESP)
[KA98b] protocols.
c. algorithms for authentication and encryption -- a separate
RFC for each algorithm.
d. automatic key management -- RFCs on "The Internet Key
Exchange (IKE)" [HC98], "Internet Security Association and
Key Management Protocol (ISAKMP)" [MSST97],"The OAKLEY Key
Determination Protocol" [Orm97], and "The Internet IP
Security Domain of Interpretation for ISAKMP" [Pip98].

2. Design Objectives

2.1 Goals/Objectives/Requirements/Problem Description

IPsec is designed to provide interoperable, high quality,
cryptographically-based security for IPv4 and IPv6. The set of
security services offered includes access control, connectionless
integrity, data origin authentication, protection against replays (a
form of partial sequence integrity), confidentiality (encryption),
and limited traffic flow confidentiality. These services are
provided at the IP layer, offering protection for IP and/or upper
layer protocols.

These objectives are met through the use of two traffic security
protocols, the Authentication Header (AH) and the Encapsulating
Security Payload (ESP), and through the use of cryptographic key
management procedures and protocols. The set of IPsec protocols
employed in any context, and the ways in which they are employed,
will be determined by the security and system requirements of users,
applications, and/or sites/organizations.

When these mechanisms are correctly implemented and deployed, they
ought not to adversely affect users, hosts, and other Internet
components that do not employ these security mechanisms for

protection of their traffic. These mechanisms also are designed to
be algorithm-independent. This modularity permits selection of
different sets of algorithms without affecting the other parts of the
implementation. For example, different user communities may select
different sets of algorithms (creating cliques) if required.

A standard set of default algorithms is specified to facilitate
interoperability in the global Internet. The use of these
algorithms, in conjunction with IPsec traffic protection and key
management protocols, is intended to permit system and application
developers to deploy high quality, Internet layer, cryptographic
security technology.

2.2 Caveats and Assumptions

The suite of IPsec protocols and associated default algorithms are
designed to provide high quality security for Internet traffic.
However, the security offered by use of these protocols ultimately
depends on the quality of the their implementation, which is outside
the scope of this set of standards. Moreover, the security of a
computer system or network is a function of many factors, including
personnel, physical, procedural, compromising emanations, and
computer security practices. Thus IPsec is only one part of an
overall system security architecture.

Finally, the security afforded by the use of IPsec is critically
dependent on many aspects of the operating environment in which the
IPsec implementation executes. For example, defects in OS security,
poor quality of random number sources, sloppy system management
protocols and practices, etc. can all degrade the security provided
by IPsec. As above, none of these environmental attributes are
within the scope of this or other IPsec standards.

3. System Overview

This section provides a high level description of how IPsec works,
the components of the system, and how they fit together to provide
the security services noted above. The goal of this description is
to enable the reader to "picture" the overall process/system, see how
it fits into the IP environment, and to provide context for later
sections of this document, which describe each of the components in
more detail.

An IPsec implementation operates in a host or a security gateway
environment, affording protection to IP traffic. The protection
offered is based on requirements defined by a Security Policy
Database (SPD) established and maintained by a user or system
administrator, or by an application operating within constraints

established by either of the above. In general, packets are selected
for one of three processing modes based on IP and transport layer
header information (Selectors, Section 4.4.2) matched against entries
in the database (SPD). Each packet is either afforded IPsec security
services, discarded, or allowed to bypass IPsec, based on the
applicable database policies identified by the Selectors.

3.1 What IPsec Does

IPsec provides security services at the IP layer by enabling a system
to select required security protocols, determine the algorithm(s) to
use for the service(s), and put in place any cryptographic keys
required to provide the requested services. IPsec can be used to
protect one or more "paths" between a pair of hosts, between a pair
of security gateways, or between a security gateway and a host. (The
term "security gateway" is used throughout the IPsec documents to
refer to an intermediate system that implements IPsec protocols. For
example, a router or a firewall implementing IPsec is a security
gateway.)

The set of security services that IPsec can provide includes access
control, connectionless integrity, data origin authentication,
rejection of replayed packets (a form of partial sequence integrity),
confidentiality (encryption), and limited traffic flow
confidentiality. Because these services are provided at the IP
layer, they can be used by any higher layer protocol, e.g., TCP, UDP,
ICMP, BGP, etc.

The IPsec DOI also supports negotiation of IP compression [SMPT98],
motivated in part by the observation that when encryption is employed
within IPsec, it prevents effective compression by lower protocol
layers.

3.2 How IPsec Works

IPsec uses two protocols to provide traffic security --
Authentication Header (AH) and Encapsulating Security Payload (ESP).
Both protocols are described in more detail in their respective RFCs
[KA98a, KA98b].

o The IP Authentication Header (AH) [KA98a] provides
connectionless integrity, data origin authentication, and an
optional anti-replay service.
o The Encapsulating Security Payload (ESP) protocol [KA98b] may
provide confidentiality (encryption), and limited traffic flow
confidentiality. It also may provide connectionless

integrity, data origin authentication, and an anti-replay
service. (One or the other set of these security services
must be applied whenever ESP is invoked.)
o Both AH and ESP are vehicles for access control, based on the
distribution of cryptographic keys and the management of
traffic flows relative to these security protocols.

These protocols may be applied alone or in combination with each
other to provide a desired set of security services in IPv4 and IPv6.
Each protocol supports two modes of use: transport mode and tunnel
mode. In transport mode the protocols provide protection primarily
for upper layer protocols; in tunnel mode, the protocols are applied
to tunneled IP packets. The differences between the two modes are
discussed in Section 4.

IPsec allows the user (or system administrator) to control the
granularity at which a security service is offered. For example, one
can create a single encrypted tunnel to carry all the traffic between
two security gateways or a separate encrypted tunnel can be created
for each TCP connection between each pair of hosts communicating
across these gateways. IPsec management must incorporate facilities
for specifying:

o which security services to use and in what combinations
o the granularity at which a given security protection should be
applied
o the algorithms used to effect cryptographic-based security

Because these security services use shared secret values
(cryptographic keys), IPsec relies on a separate set of mechanisms
for putting these keys in place. (The keys are used for
authentication/integrity and encryption services.) This document
requires support for both manual and automatic distribution of keys.
It specifies a specific public-key based approach (IKE -- [MSST97,
Orm97, HC98]) for automatic key management, but other automated key
distribution techniques MAY be used. For example, KDC-based systems
such as Kerberos and other public-key systems such as SKIP could be
employed.

3.3 Where IPsec May Be Implemented

There are several ways in which IPsec may be implemented in a host or
in conjunction with a router or firewall (to create a security
gateway). Several common examples are provided below:

a. Integration of IPsec into the native IP implementation. This
requires access to the IP source code and is applicable to
both hosts and security gateways.

b. "Bump-in-the-stack" (BITS) implementations, where IPsec is
implemented "underneath" an existing implementation of an IP
protocol stack, between the native IP and the local network
drivers. Source code access for the IP stack is not required
in this context, making this implementation approach
appropriate for use with legacy systems. This approach, when
it is adopted, is usually employed in hosts.

c. The use of an outboard crypto processor is a common design
feature of network security systems used by the military, and
of some commercial systems as well. It is sometimes referred
to as a "Bump-in-the-wire" (BITW) implementation. Such
implementations may be designed to serve either a host or a
gateway (or both). Usually the BITW device is IP
addressable. When supporting a single host, it may be quite
analogous to a BITS implementation, but in supporting a
router or firewall, it must operate like a security gateway.

4. Security Associations

This section defines Security Association management requirements for
all IPv6 implementations and for those IPv4 implementations that
implement AH, ESP, or both. The concept of a "Security Association"
(SA) is fundamental to IPsec. Both AH and ESP make use of SAs and a
major function of IKE is the establishment and maintenance of
Security Associations. All implementations of AH or ESP MUST support
the concept of a Security Association as described below. The
remainder of this section describes various aspects of Security
Association management, defining required characteristics for SA
policy management, traffic processing, and SA management techniques.

4.1 Definition and Scope

A Security Association (SA) is a simplex "connection" that affords
security services to the traffic carried by it. Security services
are afforded to an SA by the use of AH, or ESP, but not both. If
both AH and ESP protection is applied to a traffic stream, then two
(or more) SAs are created to afford protection to the traffic stream.
To secure typical, bi-directional communication between two hosts, or
between two security gateways, two Security Associations (one in each
direction) are required.

A security association is uniquely identified by a triple consisting
of a Security Parameter Index (SPI), an IP Destination Address, and a
security protocol (AH or ESP) identifier. In principle, the
Destination Address may be a unicast address, an IP broadcast
address, or a multicast group address. However, IPsec SA management
mechanisms currently are defined only for unicast SAs. Hence, in the

discussions that follow, SAs will be described in the context of
point-to-point communication, even though the concept is applicable
in the point-to-multipoint case as well.

As noted above, two types of SAs are defined: transport mode and
tunnel mode. A transport mode SA is a security association between
two hosts. In IPv4, a transport mode security protocol header
appears immediately after the IP header and any options, and before
any higher layer protocols (e.g., TCP or UDP). In IPv6, the security
protocol header appears after the base IP header and extensions, but
may appear before or after destination options, and before higher
layer protocols. In the case of ESP, a transport mode SA provides
security services only for these higher layer protocols, not for the
IP header or any extension headers preceding the ESP header. In the
case of AH, the protection is also extended to selected portions of
the IP header, selected portions of extension headers, and selected
options (contained in the IPv4 header, IPv6 Hop-by-Hop extension
header, or IPv6 Destination extension headers). For more details on
the coverage afforded by AH, see the AH specification [KA98a].

A tunnel mode SA is essentially an SA applied to an IP tunnel.
Whenever either end of a security association is a security gateway,
the SA MUST be tunnel mode. Thus an SA between two security gateways
is always a tunnel mode SA, as is an SA between a host and a security
gateway. Note that for the case where traffic is destined for a
security gateway, e.g., SNMP commands, the security gateway is acting
as a host and transport mode is allowed. But in that case, the
security gateway is not acting as a gateway, i.e., not transiting
traffic. Two hosts MAY establish a tunnel mode SA between
themselves. The requirement for any (transit traffic) SA involving a
security gateway to be a tunnel SA arises due to the need to avoid
potential problems with regard to fragmentation and reassembly of
IPsec packets, and in circumstances where multiple paths (e.g., via
different security gateways) exist to the same destination behind the
security gateways.

For a tunnel mode SA, there is an "outer" IP header that specifies
the IPsec processing destination, plus an "inner" IP header that
specifies the (apparently) ultimate destination for the packet. The
security protocol header appears after the outer IP header, and
before the inner IP header. If AH is employed in tunnel mode,
portions of the outer IP header are afforded protection (as above),
as well as all of the tunneled IP packet (i.e., all of the inner IP
header is protected, as well as higher layer protocols). If ESP is
employed, the protection is afforded only to the tunneled packet, not
to the outer header.

In summary,
a) A host MUST support both transport and tunnel mode.
b) A security gateway is required to support only tunnel
mode. If it supports transport mode, that should be used
only when the security gateway is acting as a host, e.g.,
for network management.

4.2 Security Association Functionality

The set of security services offered by an SA depends on the security
protocol selected, the SA mode, the endpoints of the SA, and on the
election of optional services within the protocol. For example, AH
provides data origin authentication and connectionless integrity for
IP datagrams (hereafter referred to as just "authentication"). The
"precision" of the authentication service is a function of the
granularity of the security association with which AH is employed, as
discussed in Section 4.4.2, "Selectors".

AH also offers an anti-replay (partial sequence integrity) service at
the discretion of the receiver, to help counter denial of service
attacks. AH is an appropriate protocol to employ when
confidentiality is not required (or is not permitted, e.g , due to
government restrictions on use of encryption). AH also provides
authentication for selected portions of the IP header, which may be
necessary in some contexts. For example, if the integrity of an IPv4
option or IPv6 extension header must be protected en route between
sender and receiver, AH can provide this service (except for the
non-predictable but mutable parts of the IP header.)

ESP optionally provides confidentiality for traffic. (The strength
of the confidentiality service depends in part, on the encryption
algorithm employed.) ESP also may optionally provide authentication
(as defined above). If authentication is negotiated for an ESP SA,
the receiver also may elect to enforce an anti-replay service with
the same features as the AH anti-replay service. The scope of the
authentication offered by ESP is narrower than for AH, i.e., the IP
header(s) "outside" the ESP header is(are) not protected. If only
the upper layer protocols need to be authenticated, then ESP
authentication is an appropriate choice and is more space efficient
than use of AH encapsulating ESP. Note that although both
confidentiality and authentication are optional, they cannot both be
omitted. At least one of them MUST be selected.

If confidentiality service is selected, then an ESP (tunnel mode) SA
between two security gateways can offer partial traffic flow
confidentiality. The use of tunnel mode allows the inner IP headers
to be encrypted, concealing the identities of the (ultimate) traffic
source and destination. Moreover, ESP payload padding also can be

invoked to hide the size of the packets, further concealing the
external characteristics of the traffic. Similar traffic flow
confidentiality services may be offered when a mobile user is
assigned a dynamic IP address in a dialup context, and establishes a
(tunnel mode) ESP SA to a corporate firewall (acting as a security
gateway). Note that fine granularity SAs generally are more
vulnerable to traffic analysis than coarse granularity ones which are
carrying traffic from many subscribers.

4.3 Combining Security Associations

The IP datagrams transmitted over an individual SA are afforded
protection by exactly one security protocol, either AH or ESP, but
not both. Sometimes a security policy may call for a combination of
services for a particular traffic flow that is not achievable with a
single SA. In such instances it will be necessary to employ multiple
SAs to implement the required security policy. The term "security
association bundle" or "SA bundle" is applied to a sequence of SAs
through which traffic must be processed to satisfy a security policy.
The order of the sequence is defined by the policy. (Note that the
SAs that comprise a bundle may terminate at different endpoints. For
example, one SA may extend between a mobile host and a security
gateway and a second, nested SA may extend to a host behind the
gateway.)

Security associations may be combined into bundles in two ways:
transport adjacency and iterated tunneling.

o Transport adjacency refers to applying more than one
security protocol to the same IP datagram, without invoking
tunneling. This approach to combining AH and ESP allows
for only one level of combination; further nesting yields
no added benefit (assuming use of adequately strong
algorithms in each protocol) since the processing is
performed at one IPsec instance at the (ultimate)
destination.

Host 1 --- Security ---- Internet -- Security --- Host 2
| | Gwy 1 Gwy 2 | |
| | | |
| -----Security Association 1 (ESP transport)------- |
| |
-------Security Association 2 (AH transport)----------

o Iterated tunneling refers to the application of multiple
layers of security protocols effected through IP tunneling.
This approach allows for multiple levels of nesting, since
each tunnel can originate or terminate at a different IPsec

site along the path. No special treatment is expected for
ISAKMP traffic at intermediate security gateways other than
what can be specified through appropriate SPD entries (See
Case 3 in Section 4.5)

There are 3 basic cases of iterated tunneling -- support is
required only for cases 2 and 3.:

1. both endpoints for the SAs are the same -- The inner and
outer tunnels could each be either AH or ESP, though it
is unlikely that Host 1 would specify both to be the
same, i.e., AH inside of AH or ESP inside of ESP.

Host 1 --- Security ---- Internet -- Security --- Host 2
| | Gwy 1 Gwy 2 | |
| | | |
| -------Security Association 1 (tunnel)---------- | |
| |
---------Security Association 2 (tunnel)--------------

2. one endpoint of the SAs is the same -- The inner and
uter tunnels could each be either AH or ESP.

Host 1 --- Security ---- Internet -- Security --- Host 2
| | Gwy 1 Gwy 2 |
| | | |
| ----Security Association 1 (tunnel)---- |
| |
---------Security Association 2 (tunnel)-------------

3. neither endpoint is the same -- The inner and outer
tunnels could each be either AH or ESP.

Host 1 --- Security ---- Internet -- Security --- Host 2
| Gwy 1 Gwy 2 |
| | | |
| --Security Assoc 1 (tunnel)- |
| |
-----------Security Association 2 (tunnel)-----------

These two approaches also can be combined, e.g., an SA bundle could
be constructed from one tunnel mode SA and one or two transport mode
SAs, applied in sequence. (See Section 4.5 "Basic Combinations of
Security Associations.") Note that nested tunnels can also occur
where neither the source nor the destination endpoints of any of the
tunnels are the same. In that case, there would be no host or
security gateway with a bundle corresponding to the nested tunnels.

For transport mode SAs, only one ordering of security protocols seems
appropriate. AH is applied to both the upper layer protocols and
(parts of) the IP header. Thus if AH is used in a transport mode, in
conjunction with ESP, AH SHOULD appear as the first header after IP,
prior to the appearance of ESP. In that context, AH is applied to
the ciphertext output of ESP. In contrast, for tunnel mode SAs, one
can imagine uses for various orderings of AH and ESP. The required
set of SA bundle types that MUST be supported by a compliant IPsec
implementation is described in Section 4.5.

4.4 Security Association Databases

Many of the details associated with processing IP traffic in an IPsec
implementation are largely a local matter, not subject to
standardization. However, some external aspects of the processing
must be standardized, to ensure interoperability and to provide a
minimum management capability that is essential for productive use of
IPsec. This section describes a general model for processing IP
traffic relative to security associations, in support of these
interoperability and functionality goals. The model described below
is nominal; compliant implementations need not match details of this
model as presented, but the external behavior of such implementations
must be mappable to the externally observable characteristics of this
model.

There are two nominal databases in this model: the Security Policy
Database and the Security Association Database. The former specifies
the policies that determine the disposition of all IP traffic inbound
or outbound from a host, security gateway, or BITS or BITW IPsec
implementation. The latter database contains parameters that are
associated with each (active) security association. This section
also defines the concept of a Selector, a set of IP and upper layer
protocol field values that is used by the Security Policy Database to
map traffic to a policy, i.e., an SA (or SA bundle).

Each interface for which IPsec is enabled requires nominally separate
inbound vs. outbound databases (SAD and SPD), because of the
directionality of many of the fields that are used as selectors.
Typically there is just one such interface, for a host or security
gateway (SG). Note that an SG would always have at least 2
interfaces, but the "internal" one to the corporate net, usually
would not have IPsec enabled and so only one pair of SADs and one
pair of SPDs would be needed. On the other hand, if a host had
multiple interfaces or an SG had multiple external interfaces, it
might be necessary to have separate SAD and SPD pairs for each
interface.

4.4.1 The Security Policy Database (SPD)

Ultimately, a security association is a management construct used to
enforce a security policy in the IPsec environment. Thus an
essential element of SA processing is an underlying Security Policy
Database (SPD) that specifies what services are to be offered to IP
datagrams and in what fashion. The form of the database and its
interface are outside the scope of this specification. However, this
section does specify certain minimum management functionality that
must be provided, to allow a user or system administrator to control
how IPsec is applied to traffic transmitted or received by a host or
transiting a security gateway.

The SPD must be consulted during the processing of all traffic
(INBOUND and OUTBOUND), including non-IPsec traffic. In order to
support this, the SPD requires distinct entries for inbound and
outbound traffic. One can think of this as separate SPDs (inbound
vs. outbound). In addition, a nominally separate SPD must be
provided for each IPsec-enabled interface.

An SPD must discriminate among traffic that is afforded IPsec
protection and traffic that is allowed to bypass IPsec. This applies
to the IPsec protection to be applied by a sender and to the IPsec
protection that must be present at the receiver. For any outbound or
inbound datagram, three processing choices are possible: discard,
bypass IPsec, or apply IPsec. The first choice refers to traffic
that is not allowed to exit the host, traverse the security gateway,
or be delivered to an application at all. The second choice refers
to traffic that is allowed to pass without additional IPsec
protection. The third choice refers to traffic that is afforded
IPsec protection, and for such traffic the SPD must specify the
security services to be provided, protocols to be employed,
algorithms to be used, etc.

For every IPsec implementation, there MUST be an administrative
interface that allows a user or system administrator to manage the
SPD. Specifically, every inbound or outbound packet is subject to
processing by IPsec and the SPD must specify what action will be
taken in each case. Thus the administrative interface must allow the
user (or system administrator) to specify the security processing to
be applied to any packet entering or exiting the system, on a packet
by packet basis. (In a host IPsec implementation making use of a
socket interface, the SPD may not need to be consulted on a per
packet basis, but the effect is still the same.) The management
interface for the SPD MUST allow creation of entries consistent with
the selectors defined in Section 4.4.2, and MUST support (total)
ordering of these entries. It is expected that through the use of
wildcards in various selector fields, and because all packets on a

single UDP or TCP connection will tend to match a single SPD entry,
this requirement will not impose an unreasonably detailed level of
SPD specification. The selectors are analogous to what are found in
a stateless firewall or filtering router and which are currently
manageable this way.

In host systems, applications MAY be allowed to select what security
processing is to be applied to the traffic they generate and consume.
(Means of signalling such requests to the IPsec implementation are
outside the scope of this standard.) However, the system
administrator MUST be able to specify whether or not a user or
application can override (default) system policies. Note that
application specified policies may satisfy system requirements, so
that the system may not need to do additional IPsec processing beyond
that needed to meet an application's requirements. The form of the
management interface is not specified by this document and may differ
for hosts vs. security gateways, and within hosts the interface may
differ for socket-based vs. BITS implementations. However, this
document does specify a standard set of SPD elements that all IPsec
implementations MUST support.

The SPD contains an ordered list of policy entries. Each policy
entry is keyed by one or more selectors that define the set of IP
traffic encompassed by this policy entry. (The required selector
types are defined in Section 4.4.2.) These define the granularity of
policies or SAs. Each entry includes an indication of whether
traffic matching this policy will be bypassed, discarded, or subject
to IPsec processing. If IPsec processing is to be applied, the entry
includes an SA (or SA bundle) specification, listing the IPsec
protocols, modes, and algorithms to be employed, including any
nesting requirements. For example, an entry may call for all
matching traffic to be protected by ESP in transport mode using
3DES-CBC with an explicit IV, nested inside of AH in tunnel mode
using HMAC/SHA-1. For each selector, the policy entry specifies how
to derive the corresponding values for a new Security Association
Database (SAD, see Section 4.4.3) entry from those in the SPD and the
packet (Note that at present, ranges are only supported for IP
addresses; but wildcarding can be expressed for all selectors):

a. use the value in the packet itself -- This will limit use
of the SA to those packets which have this packet's value
for the selector even if the selector for the policy entry
has a range of allowed values or a wildcard for this
selector.
b. use the value associated with the policy entry -- If this
were to be just a single value, then there would be no
difference between (b) and (a). However, if the allowed
values for the selector are a range (for IP addresses) or

wildcard, then in the case of a range,(b) would enable use
of the SA by any packet with a selector value within the
range not just by packets with the selector value of the
packet that triggered the creation of the SA. In the case
of a wildcard, (b) would allow use of the SA by packets
with any value for this selector.

For example, suppose there is an SPD entry where the allowed value
for source address is any of a range of hosts (192.168.2.1 to
192.168.2.10). And suppose that a packet is to be sent that has a
source address of 192.168.2.3. The value to be used for the SA could
be any of the sample values below depending on what the policy entry
for this selector says is the source of the selector value:

source for the example of
value to be new SAD
used in the SA selector value
--------------- ------------
a. packet 192.168.2.3 (one host)
b. SPD entry 192.168.2.1 to 192.168.2.10 (range of hosts)

Note that if the SPD entry had an allowed value of wildcard for the
source address, then the SAD selector value could be wildcard (any
host). Case (a) can be used to prohibit sharing, even among packets
that match the same SPD entry.

As described below in Section 4.4.3, selectors may include "wildcard"
entries and hence the selectors for two entries may overlap. (This
is analogous to the overlap that arises with ACLs or filter entries
in routers or packet filtering firewalls.) Thus, to ensure
consistent, predictable processing, SPD entries MUST be ordered and
the SPD MUST always be searched in the same order, so that the first
matching entry is consistently selected. (This requirement is
necessary as the effect of processing traffic against SPD entries
must be deterministic, but there is no way to canonicalize SPD
entries given the use of wildcards for some selectors.) More detail
on matching of packets against SPD entries is provided in Section 5.

Note that if ESP is specified, either (but not both) authentication
or encryption can be omitted. So it MUST be possible to configure
the SPD value for the authentication or encryption algorithms to be
"NULL". However, at least one of these services MUST be selected,
i.e., it MUST NOT be possible to configure both of them as "NULL".

The SPD can be used to map traffic to specific SAs or SA bundles.
Thus it can function both as the reference database for security
policy and as the map to existing SAs (or SA bundles). (To
accommodate the bypass and discard policies cited above, the SPD also

MUST provide a means of mapping traffic to these functions, even
though they are not, per se, IPsec processing.) The way in which the
SPD operates is different for inbound vs. outbound traffic and it
also may differ for host vs. security gateway, BITS, and BITW
implementations. Sections 5.1 and 5.2 describe the use of the SPD
for outbound and inbound processing, respectively.

Because a security policy may require that more than one SA be
applied to a specified set of traffic, in a specific order, the
policy entry in the SPD must preserve these ordering requirements,
when present. Thus, it must be possible for an IPsec implementation
to determine that an outbound or inbound packet must be processed
thorough a sequence of SAs. Conceptually, for outbound processing,
one might imagine links (to the SAD) from an SPD entry for which
there are active SAs, and each entry would consist of either a single
SA or an ordered list of SAs that comprise an SA bundle. When a
packet is matched against an SPD entry and there is an existing SA or
SA bundle that can be used to carry the traffic, the processing of
the packet is controlled by the SA or SA bundle entry on the list.
For an inbound IPsec packet for which multiple IPsec SAs are to be
applied, the lookup based on destination address, IPsec protocol, and
SPI should identify a single SA.

The SPD is used to control the flow of ALL traffic through an IPsec
system, including security and key management traffic (e.g., ISAKMP)
from/to entities behind a security gateway. This means that ISAKMP
traffic must be explicitly accounted for in the SPD, else it will be
discarded. Note that a security gateway could prohibit traversal of
encrypted packets in various ways, e.g., having a DISCARD entry in
the SPD for ESP packets or providing proxy key exchange. In the
latter case, the traffic would be internally routed to the key
management module in the security gateway.

4.4.2 Selectors

An SA (or SA bundle) may be fine-grained or coarse-grained, depending
on the selectors used to define the set of traffic for the SA. For
example, all traffic between two hosts may be carried via a single
SA, and afforded a uniform set of security services. Alternatively,
traffic between a pair of hosts might be spread over multiple SAs,
depending on the applications being used (as defined by the Next
Protocol and Port fields), with different security services offered
by different SAs. Similarly, all traffic between a pair of security
gateways could be carried on a single SA, or one SA could be assigned
for each communicating host pair. The following selector parameters
MUST be supported for SA management to facilitate control of SA
granularity. Note that in the case of receipt of a packet with an
ESP header, e.g., at an encapsulating security gateway or BITW

implementation, the transport layer protocol, source/destination
ports, and Name (if present) may be "OPAQUE", i.e., inaccessible
because of encryption or fragmentation. Note also that both Source
and Destination addresses should either be IPv4 or IPv6.

- Destination IP Address (IPv4 or IPv6): this may be a single IP
address (unicast, anycast, broadcast (IPv4 only), or multicast
group), a range of addresses (high and low values (inclusive),
address + mask, or a wildcard address. The last three are used
to support more than one destination system sharing the same SA
(e.g., behind a security gateway). Note that this selector is
conceptually different from the "Destination IP Address" field
in the <Destination IP Address, IPsec Protocol, SPI> tuple used
to uniquely identify an SA. When a tunneled packet arrives at
the tunnel endpoint, its SPI/Destination address/Protocol are
used to look up the SA for this packet in the SAD. This
destination address comes from the encapsulating IP header.
Once the packet has been processed according to the tunnel SA
and has come out of the tunnel, its selectors are "looked up" in
the Inbound SPD. The Inbound SPD has a selector called
destination address. This IP destination address is the one in
the inner (encapsulated) IP header. In the case of a
transport'd packet, there will be only one IP header and this
ambiguity does not exist. [REQUIRED for all implementations]

- Source IP Address(es) (IPv4 or IPv6): this may be a single IP
address (unicast, anycast, broadcast (IPv4 only), or multicast
group), range of addresses (high and low values inclusive),
address + mask, or a wildcard address. The last three are used
to support more than one source system sharing the same SA
(e.g., behind a security gateway or in a multihomed host).
[REQUIRED for all implementations]

- Name: There are 2 cases (Note that these name forms are
supported in the IPsec DOI.)
1. User ID
a. a fully qualified user name string (DNS), e.g.,
mozart@foo.bar.com
b. X.500 distinguished name, e.g., C = US, SP = MA,
O = GTE Internetworking, CN = Stephen T. Kent.
2. System name (host, security gateway, etc.)
a. a fully qualified DNS name, e.g., foo.bar.com
b. X.500 distinguished name
c. X.500 general name

NOTE: One of the possible values of this selector is "OPAQUE".

[REQUIRED for the following cases. Note that support for name
forms other than addresses is not required for manually keyed
SAs.
o User ID
- native host implementations
- BITW and BITS implementations acting as HOSTS
with only one user
- security gateway implementations for INBOUND
processing.
o System names -- all implementations]

- Data sensitivity level: (IPSO/CIPSO labels)
[REQUIRED for all systems providing information flow security as
per Section 8, OPTIONAL for all other systems.]

- Transport Layer Protocol: Obtained from the IPv4 "Protocol" or
the IPv6 "Next Header" fields. This may be an individual
protocol number. These packet fields may not contain the
Transport Protocol due to the presence of IP extension headers,
e.g., a Routing Header, AH, ESP, Fragmentation Header,
Destination Options, Hop-by-hop options, etc. Note that the
Transport Protocol may not be available in the case of receipt
of a packet with an ESP header, thus a value of "OPAQUE" SHOULD
be supported.
[REQUIRED for all implementations]

NOTE: To locate the transport protocol, a system has to chain
through the packet headers checking the "Protocol" or "Next
Header" field until it encounters either one it recognizes as a
transport protocol, or until it reaches one that isn't on its
list of extension headers, or until it encounters an ESP header
that renders the transport protocol opaque.

- Source and Destination (e.g., TCP/UDP) Ports: These may be
individual UDP or TCP port values or a wildcard port. (The use
of the Next Protocol field and the Source and/or Destination
Port fields (in conjunction with the Source and/or Destination
Address fields), as an SA selector is sometimes referred to as
"session-oriented keying."). Note that the source and
destination ports may not be available in the case of receipt of
a packet with an ESP header, thus a value of "OPAQUE" SHOULD be
supported.

The following table summarizes the relationship between the
"Next Header" value in the packet and SPD and the derived Port
Selector value for the SPD and SAD.

Next Hdr Transport Layer Derived Port Selector Field
in Packet Protocol in SPD Value in SPD and SAD
-------- --------------- ---------------------------
ESP ESP or ANY ANY (i.e., don't look at it)
-don't care- ANY ANY (i.e., don't look at it)
specific value specific value NOT ANY (i.e., drop packet)
fragment
specific value specific value actual port selector field
not fragment

If the packet has been fragmented, then the port information may
not be available in the current fragment. If so, discard the
fragment. An ICMP PMTU should be sent for the first fragment,
which will have the port information. [MAY be supported]

The IPsec implementation context determines how selectors are used.
For example, a host implementation integrated into the stack may make
use of a socket interface. When a new connection is established the
SPD can be consulted and an SA (or SA bundle) bound to the socket.
Thus traffic sent via that socket need not result in additional
lookups to the SPD/SAD. In contrast, a BITS, BITW, or security
gateway implementation needs to look at each packet and perform an
SPD/SAD lookup based on the selectors. The allowable values for the
selector fields differ between the traffic flow, the security
association, and the security policy.

The following table summarizes the kinds of entries that one needs to
be able to express in the SPD and SAD. It shows how they relate to
the fields in data traffic being subjected to IPsec screening.
(Note: the "wild" or "wildcard" entry for src and dst addresses
includes a mask, range, etc.)

Field Traffic Value SAD Entry SPD Entry
-------- ------------- ---------------- --------------------
src addr single IP addr single,range,wild single,range,wildcard
dst addr single IP addr single,range,wild single,range,wildcard
xpt protocol* xpt protocol single,wildcard single,wildcard
src port* single src port single,wildcard single,wildcard
dst port* single dst port single,wildcard single,wildcard
user id* single user id single,wildcard single,wildcard
sec. labels single value single,wildcard single,wildcard

* The SAD and SPD entries for these fields could be "OPAQUE"
because the traffic value is encrypted.

NOTE: In principle, one could have selectors and/or selector values
in the SPD which cannot be negotiated for an SA or SA bundle.
Examples might include selector values used to select traffic for

discarding or enumerated lists which cause a separate SA to be
created for each item on the list. For now, this is left for future
versions of this document and the list of required selectors and
selector values is the same for the SPD and the SAD. However, it is
acceptable to have an administrative interface that supports use of
selector values which cannot be negotiated provided that it does not
mislead the user into believing it is creating an SA with these
selector values. For example, the interface may allow the user to
specify an enumerated list of values but would result in the creation
of a separate policy and SA for each item on the list. A vendor
might support such an interface to make it easier for its customers
to specify clear and concise policy specifications.

4.4.3 Security Association Database (SAD)

In each IPsec implementation there is a nominal Security Association
Database, in which each entry defines the parameters associated with
one SA. Each SA has an entry in the SAD. For outbound processing,
entries are pointed to by entries in the SPD. Note that if an SPD
entry does not currently point to an SA that is appropriate for the
packet, the implementation creates an appropriate SA (or SA Bundle)
and links the SPD entry to the SAD entry (see Section 5.1.1). For
inbound processing, each entry in the SAD is indexed by a destination
IP address, IPsec protocol type, and SPI. The following parameters
are associated with each entry in the SAD. This description does not
purport to be a MIB, but only a specification of the minimal data
items required to support an SA in an IPsec implementation.

For inbound processing: The following packet fields are used to look
up the SA in the SAD:

o Outer Header's Destination IP address: the IPv4 or IPv6
Destination address.
[REQUIRED for all implementations]
o IPsec Protocol: AH or ESP, used as an index for SA lookup
in this database. Specifies the IPsec protocol to be
applied to the traffic on this SA.
[REQUIRED for all implementations]
o SPI: the 32-bit value used to distinguish among different
SAs terminating at the same destination and using the same
IPsec protocol.
[REQUIRED for all implementations]

For each of the selectors defined in Section 4.4.2, the SA entry in
the SAD MUST contain the value or values which were negotiated at the
time the SA was created. For the sender, these values are used to
decide whether a given SA is appropriate for use with an outbound
packet. This is part of checking to see if there is an existing SA

that can be used. For the receiver, these values are used to check
that the selector values in an inbound packet match those for the SA
(and thus indirectly those for the matching policy). For the
receiver, this is part of verifying that the SA was appropriate for
this packet. (See Section 6 for rules for ICMP messages.) These
fields can have the form of specific values, ranges, wildcards, or
"OPAQUE" as described in section 4.4.2, "Selectors". Note that for
an ESP SA, the encryption algorithm or the authentication algorithm
could be "NULL". However they MUST not both be "NULL".

The following SAD fields are used in doing IPsec processing:

o Sequence Number Counter: a 32-bit value used to generate the
Sequence Number field in AH or ESP headers.
[REQUIRED for all implementations, but used only for outbound
traffic.]
o Sequence Counter Overflow: a flag indicating whether overflow
of the Sequence Number Counter should generate an auditable
event and prevent transmission of additional packets on the
SA.
[REQUIRED for all implementations, but used only for outbound
traffic.]
o Anti-Replay Window: a 32-bit counter and a bit-map (or
equivalent) used to determine whether an inbound AH or ESP
packet is a replay.
[REQUIRED for all implementations but used only for inbound
traffic. NOTE: If anti-replay has been disabled by the
receiver, e.g., in the case of a manually keyed SA, then the
Anti-Replay Window is not used.]
o AH Authentication algorithm, keys, etc.
[REQUIRED for AH implementations]
o ESP Encryption algorithm, keys, IV mode, IV, etc.
[REQUIRED for ESP implementations]
o ESP authentication algorithm, keys, etc. If the
authentication service is not selected, this field will be
null.
[REQUIRED for ESP implementations]
o Lifetime of this Security Association: a time interval after
which an SA must be replaced with a new SA (and new SPI) or
terminated, plus an indication of which of these actions
should occur. This may be expressed as a time or byte count,
or a simultaneous use of both, the first lifetime to expire
taking precedence. A compliant implementation MUST support
both types of lifetimes, and must support a simultaneous use
of both. If time is employed, and if IKE employs X.509
certificates for SA establishment, the SA lifetime must be
constrained by the validity intervals of the certificates,
and the NextIssueDate of the CRLs used in the IKE exchange

for the SA. Both initiator and responder are responsible for
constraining SA lifetime in this fashion.
[REQUIRED for all implementations]

NOTE: The details of how to handle the refreshing of keys
when SAs expire is a local matter. However, one reasonable
approach is:
(a) If byte count is used, then the implementation
SHOULD count the number of bytes to which the IPsec
algorithm is applied. For ESP, this is the encryption
algorithm (including Null encryption) and for AH,
this is the authentication algorithm. This includes
pad bytes, etc. Note that implementations SHOULD be
able to handle having the counters at the ends of an
SA get out of synch, e.g., because of packet loss or
because the implementations at each end of the SA
aren't doing things the same way.
(b) There SHOULD be two kinds of lifetime -- a soft
lifetime which warns the implementation to initiate
action such as setting up a replacement SA and a
hard lifetime when the current SA ends.
(c) If the entire packet does not get delivered during
the SAs lifetime, the packet SHOULD be discarded.

o IPsec protocol mode: tunnel, transport or wildcard.
Indicates which mode of AH or ESP is applied to traffic on
this SA. Note that if this field is "wildcard" at the
sending end of the SA, then the application has to specify
the mode to the IPsec implementation. This use of wildcard
allows the same SA to be used for either tunnel or transport
mode traffic on a per packet basis, e.g., by different
sockets. The receiver does not need to know the mode in
order to properly process the packet's IPsec headers.

[REQUIRED as follows, unless implicitly defined by context:
- host implementations must support all modes
- gateway implementations must support tunnel mode]

NOTE: The use of wildcard for the protocol mode of an inbound
SA may add complexity to the situation in the receiver (host
only). Since the packets on such an SA could be delivered in
either tunnel or transport mode, the security of an incoming
packet could depend in part on which mode had been used to
deliver it. If, as a result, an application cared about the
SA mode of a given packet, then the application would need a
mechanism to obtain this mode information.

o Path MTU: any observed path MTU and aging variables. See
Section 6.1.2.4
[REQUIRED for all implementations but used only for outbound
traffic]

4.5 Basic Combinations of Security Associations

This section describes four examples of combinations of security
associations that MUST be supported by compliant IPsec hosts or
security gateways. Additional combinations of AH and/or ESP in
tunnel and/or transport modes MAY be supported at the discretion of
the implementor. Compliant implementations MUST be capable of
generating these four combinations and on receipt, of processing
them, but SHOULD be able to receive and process any combination. The
diagrams and text below describe the basic cases. The legend for the
diagrams is:

==== = one or more security associations (AH or ESP, transport
or tunnel)
---- = connectivity (or if so labelled, administrative boundary)
Hx = host x
SGx = security gateway x
X* = X supports IPsec

NOTE: The security associations below can be either AH or ESP. The
mode (tunnel vs transport) is determined by the nature of the
endpoints. For host-to-host SAs, the mode can be either transport or
tunnel.

Case 1. The case of providing end-to-end security between 2 hosts
across the Internet (or an Intranet).

====================================
| |
H1* ------ (Inter/Intranet) ------ H2*

Note that either transport or tunnel mode can be selected by the
hosts. So the headers in a packet between H1 and H2 could look
like any of the following:

Transport Tunnel
----------------- ---------------------
1. [IP1][AH][upper] 4. [IP2][AH][IP1][upper]
2. [IP1][ESP][upper] 5. [IP2][ESP][IP1][upper]
3. [IP1][AH][ESP][upper]

Note that there is no requirement to support general nesting,
but in transport mode, both AH and ESP can be applied to the
packet. In this event, the SA establishment procedure MUST
ensure that first ESP, then AH are applied to the packet.

Case 2. This case illustrates simple virtual private networks
support.

===========================
| |
---------------------|---- ---|-----------------------
| | | | | |
| H1 -- (Local --- SG1* |--- (Internet) ---| SG2* --- (Local --- H2 |
| Intranet) | | Intranet) |
-------------------------- ---------------------------
admin. boundary admin. boundary

Only tunnel mode is required here. So the headers in a packet
between SG1 and SG2 could look like either of the following:

Tunnel
---------------------
4. [IP2][AH][IP1][upper]
5. [IP2][ESP][IP1][upper]

Case 3. This case combines cases 1 and 2, adding end-to-end security
between the sending and receiving hosts. It imposes no new
requirements on the hosts or security gateways, other than a
requirement for a security gateway to be configurable to pass
IPsec traffic (including ISAKMP traffic) for hosts behind it.

===============================================================
| |
| ========================= |
| | | |
---|-----------------|---- ---|-------------------|---
| | | | | | | |
| H1* -- (Local --- SG1* |-- (Internet) --| SG2* --- (Local --- H2* |
| Intranet) | | Intranet) |
-------------------------- ---------------------------
admin. boundary admin. boundary

Case 4. This covers the situation where a remote host (H1) uses the
Internet to reach an organization's firewall (SG2) and to then
gain access to some server or other machine (H2). The remote
host could be a mobile host (H1) dialing up to a local PPP/ARA
server (not shown) on the Internet and then crossing the
Internet to the home organization's firewall (SG2), etc. The

details of support for this case, (how H1 locates SG2,
authenticates it, and verifies its authorization to represent
H2) are discussed in Section 4.6.3, "Locating a Security
Gateway".

======================================================
| |
|============================== |
|| | |
|| ---|----------------------|---
|| | | | |
H1* ----- (Internet) ------| SG2* ---- (Local ----- H2* |
^ | Intranet) |
| ------------------------------
could be dialup admin. boundary (optional)
to PPP/ARA server

Only tunnel mode is required between H1 and SG2. So the choices
for the SA between H1 and SG2 would be one of the ones in case
2. The choices for the SA between H1 and H2 would be one of the
ones in case 1.

Note that in this case, the sender MUST apply the transport
header before the tunnel header. Therefore the management
interface to the IPsec implementation MUST support configuration
of the SPD and SAD to ensure this ordering of IPsec header
application.

As noted above, support for additional combinations of AH and ESP is
optional. Use of other, optional combinations may adversely affect
interoperability.

4.6 SA and Key Management

IPsec mandates support for both manual and automated SA and
cryptographic key management. The IPsec protocols, AH and ESP, are
largely independent of the associated SA management techniques,
although the techniques involved do affect some of the security
services offered by the protocols. For example, the optional anti-
replay services available for AH and ESP require automated SA
management. Moreover, the granularity of key distribution employed
with IPsec determines the granularity of authentication provided.
(See also a discussion of this issue in Section 4.7.) In general,
data origin authentication in AH and ESP is limited by the extent to
which secrets used with the authentication algorithm (or with a key
management protocol that creates such secrets) are shared among
multiple possible sources.

The following text describes the minimum requirements for both types
of SA management.

4.6.1 Manual Techniques

The simplest form of management is manual management, in which a
person manually configures each system with keying material and
security association management data relevant to secure communication
with other systems. Manual techniques are practical in small, static
environments but they do not scale well. For example, a company
could create a Virtual Private Network (VPN) using IPsec in security
gateways at several sites. If the number of sites is small, and
since all the sites come under the purview of a single administrative
domain, this is likely to be a feasible context for manual management
techniques. In this case, the security gateway might selectively
protect traffic to and from other sites within the organization using
a manually configured key, while not protecting traffic for other
destinations. It also might be appropriate when only selected
communications need to be secured. A similar argument might apply to
use of IPsec entirely within an organization for a small number of
hosts and/or gateways. Manual management techniques often employ
statically configured, symmetric keys, though other options also
exist.

4.6.2 Automated SA and Key Management

Widespread deployment and use of IPsec requires an Internet-standard,
scalable, automated, SA management protocol. Such support is
required to facilitate use of the anti-replay features of AH and ESP,
and to accommodate on-demand creation of SAs, e.g., for user- and
session-oriented keying. (Note that the notion of "rekeying" an SA
actually implies creation of a new SA with a new SPI, a process that
generally implies use of an automated SA/key management protocol.)

The default automated key management protocol selected for use with
IPsec is IKE [MSST97, Orm97, HC98] under the IPsec domain of
interpretation [Pip98]. Other automated SA management protocols MAY
be employed.

When an automated SA/key management protocol is employed, the output
from this protocol may be used to generate multiple keys, e.g., for a
single ESP SA. This may arise because:

o the encryption algorithm uses multiple keys (e.g., triple DES)
o the authentication algorithm uses multiple keys
o both encryption and authentication algorithms are employed

The Key Management System may provide a separate string of bits for
each key or it may generate one string of bits from which all of them
are extracted. If a single string of bits is provided, care needs to
be taken to ensure that the parts of the system that map the string
of bits to the required keys do so in the same fashion at both ends
of the SA. To ensure that the IPsec implementations at each end of
the SA use the same bits for the same keys, and irrespective of which
part of the system divides the string of bits into individual keys,
the encryption key(s) MUST be taken from the first (left-most, high-
order) bits and the authentication key(s) MUST be taken from the
remaining bits. The number of bits for each key is defined in the
relevant algorithm specification RFC. In the case of multiple
encryption keys or multiple authentication keys, the specification
for the algorithm must specify the order in which they are to be
selected from a single string of bits provided to the algorithm.

4.6.3 Locating a Security Gateway

This section discusses issues relating to how a host learns about the
existence of relevant security gateways and once a host has contacted
these security gateways, how it knows that these are the correct
security gateways. The details of where the required information is
stored is a local matter.

Consider a situation in which a remote host (H1) is using the
Internet to gain access to a server or other machine (H2) and there
is a security gateway (SG2), e.g., a firewall, through which H1's
traffic must pass. An example of this situation would be a mobile
host (Road Warrior) crossing the Internet to the home organization's
firewall (SG2). (See Case 4 in the section 4.5 Basic Combinations of
Security Associations.) This situation raises several issues:

1. How does H1 know/learn about the existence of the security
gateway SG2?
2. How does it authenticate SG2, and once it has authenticated
SG2, how does it confirm that SG2 has been authorized to
represent H2?
3. How does SG2 authenticate H1 and verify that H1 is authorized
to contact H2?
4. How does H1 know/learn about backup gateways which provide
alternate paths to H2?

To address these problems, a host or security gateway MUST have an
administrative interface that allows the user/administrator to
configure the address of a security gateway for any sets of
destination addresses that require its use. This includes the ability
to configure:

o the requisite information for locating and authenticating the
security gateway and verifying its authorization to represent
the destination host.
o the requisite information for locating and authenticating any
backup gateways and verifying their authorization to represent
the destination host.

It is assumed that the SPD is also configured with policy information
that covers any other IPsec requirements for the path to the security
gateway and the destination host.

This document does not address the issue of how to automate the
discovery/verification of security gateways.

4.7 Security Associations and Multicast

The receiver-orientation of the Security Association implies that, in
the case of unicast traffic, the destination system will normally
select the SPI value. By having the destination select the SPI
value, there is no potential for manually configured Security
Associations to conflict with automatically configured (e.g., via a
key management protocol) Security Associations or for Security
Associations from multiple sources to conflict with each other. For
multicast traffic, there are multiple destination systems per
multicast group. So some system or person will need to coordinate
among all multicast groups to select an SPI or SPIs on behalf of each
multicast group and then communicate the group's IPsec information to
all of the legitimate members of that multicast group via mechanisms
not defined here.

Multiple senders to a multicast group SHOULD use a single Security
Association (and hence Security Parameter Index) for all traffic to
that group when a symmetric key encryption or authentication
algorithm is employed. In such circumstances, the receiver knows only
that the message came from a system possessing the key for that
multicast group. In such circumstances, a receiver generally will
not be able to authenticate which system sent the multicast traffic.
Specifications for other, more general multicast cases are deferred
to later IPsec documents.

At the time this specification was published, automated protocols for
multicast key distribution were not considered adequately mature for
standardization. For multicast groups having relatively few members,
manual key distribution or multiple use of existing unicast key
distribution algorithms such as modified Diffie-Hellman appears
feasible. For very large groups, new scalable techniques will be
needed. An example of current work in this area is the Group Key
Management Protocol (GKMP) [HM97].

5. IP Traffic Processing

As mentioned in Section 4.4.1 "The Security Policy Database (SPD)",
the SPD must be consulted during the processing of all traffic
(INBOUND and OUTBOUND), including non-IPsec traffic. If no policy is
found in the SPD that matches the packet (for either inbound or
outbound traffic), the packet MUST be discarded.

NOTE: All of the cryptographic algorithms used in IPsec expect their
input in canonical network byte order (see Appendix in RFC 791) and
generate their output in canonical network byte order. IP packets
are also transmitted in network byte order.

5.1 Outbound IP Traffic Processing

5.1.1 Selecting and Using an SA or SA Bundle

In a security gateway or BITW implementation (and in many BITS
implementations), each outbound packet is compared against the SPD to
determine what processing is required for the packet. If the packet
is to be discarded, this is an auditable event. If the traffic is
allowed to bypass IPsec processing, the packet continues through
"normal" processing for the environment in which the IPsec processing
is taking place. If IPsec processing is required, the packet is
either mapped to an existing SA (or SA bundle), or a new SA (or SA
bundle) is created for the packet. Since a packet's selectors might
match multiple policies or multiple extant SAs and since the SPD is
ordered, but the SAD is not, IPsec MUST:

1. Match the packet's selector fields against the outbound
policies in the SPD to locate the first appropriate
policy, which will point to zero or more SA bundles in the
SAD.

2. Match the packet's selector fields against those in the SA
bundles found in (1) to locate the first SA bundle that
matches. If no SAs were found or none match, create an
appropriate SA bundle and link the SPD entry to the SAD
entry. If no key management entity is found, drop the
packet.

3. Use the SA bundle found/created in (2) to do the required
IPsec processing, e.g., authenticate and encrypt.

In a host IPsec implementation based on sockets, the SPD will be
consulted whenever a new socket is created, to determine what, if
any, IPsec processing will be applied to the traffic that will flow
on that socket.

NOTE: A compliant implementation MUST not allow instantiation of an
ESP SA that employs both a NULL encryption and a NULL authentication
algorithm. An attempt to negotiate such an SA is an auditable event.

5.1.2 Header Construction for Tunnel Mode

This section describes the handling of the inner and outer IP
headers, extension headers, and options for AH and ESP tunnels. This
includes how to construct the encapsulating (outer) IP header, how to
handle fields in the inner IP header, and what other actions should
be taken. The general idea is modeled after the one used in RFC
2003, "IP Encapsulation with IP":

o The outer IP header Source Address and Destination Address
identify the "endpoints" of the tunnel (the encapsulator and
decapsulator). The inner IP header Source Address and
Destination Addresses identify the original sender and
recipient of the datagram, (from the perspective of this
tunnel), respectively. (see footnote 3 after the table in
5.1.2.1 for more details on the encapsulating source IP
address.)
o The inner IP header is not changed except to decrement the TTL
as noted below, and remains unchanged during its delivery to
the tunnel exit point.
o No change to IP options or extension headers in the inner
header occurs during delivery of the encapsulated datagram
through the tunnel.
o If need be, other protocol headers such as the IP
Authentication header may be inserted between the outer IP
header and the inner IP header.

The tables in the following sub-sections show the handling for the
different header/option fields (constructed = the value in the outer
field is constructed independently of the value in the inner).

5.1.2.1 IPv4 -- Header Construction for Tunnel Mode

<-- How Outer Hdr Relates to Inner Hdr -->
Outer Hdr at Inner Hdr at
IPv4 Encapsulator Decapsulator
Header fields: -------------------- ------------
version 4 (1) no change
header length constructed no change
TOS copied from inner hdr (5) no change
total length constructed no change
ID constructed no change
flags (DF,MF) constructed, DF (4) no change
fragmt offset constructed no change

TTL constructed (2) decrement (2)
protocol AH, ESP, routing hdr no change
checksum constructed constructed (2)
src address constructed (3) no change
dest address constructed (3) no change
Options never copied no change

1. The IP version in the encapsulating header can be different
from the value in the inner header.

2. The TTL in the inner header is decremented by the
encapsulator prior to forwarding and by the decapsulator if
it forwards the packet. (The checksum changes when the TTL
changes.)

Note: The decrementing of the TTL is one of the usual actions
that takes place when forwarding a packet. Packets
originating from the same node as the encapsulator do not
have their TTL's decremented, as the sending node is
originating the packet rather than forwarding it.

3. src and dest addresses depend on the SA, which is used to
determine the dest address which in turn determines which src
address (net interface) is used to forward the packet.

NOTE: In principle, the encapsulating IP source address can
be any of the encapsulator's interface addresses or even an
address different from any of the encapsulator's IP
addresses, (e.g., if it's acting as a NAT box) so long as the
address is reachable through the encapsulator from the
environment into which the packet is sent. This does not
cause a problem because IPsec does not currently have any
INBOUND processing requirement that involves the Source
Address of the encapsulating IP header. So while the
receiving tunnel endpoint looks at the Destination Address in
the encapsulating IP header, it only looks at the Source
Address in the inner (encapsulated) IP header.

4. configuration determines whether to copy from the inner
header (IPv4 only), clear or set the DF.

5. If Inner Hdr is IPv4 (Protocol = 4), copy the TOS. If Inner
Hdr is IPv6 (Protocol = 41), map the Class to TOS.

5.1.2.2 IPv6 -- Header Construction for Tunnel Mode

See previous section 5.1.2 for notes 1-5 indicated by (footnote
number).

<-- How Outer Hdr Relates Inner Hdr --->
Outer Hdr at Inner Hdr at
IPv6 Encapsulator Decapsulator
Header fields: -------------------- ------------
version 6 (1) no change
class copied or configured (6) no change
flow id copied or configured no change
len constructed no change
next header AH,ESP,routing hdr no change
hop limit constructed (2) decrement (2)
src address constructed (3) no change
dest address constructed (3) no change
Extension headers never copied no change

6. If Inner Hdr is IPv6 (Next Header = 41), copy the Class. If
Inner Hdr is IPv4 (Next Header = 4), map the TOS to Class.

5.2 Processing Inbound IP Traffic

Prior to performing AH or ESP processing, any IP fragments are
reassembled. Each inbound IP datagram to which IPsec processing will
be applied is identified by the appearance of the AH or ESP values in
the IP Next Protocol field (or of AH or ESP as an extension header in
the IPv6 context).

Note: Appendix C contains sample code for a bitmask check for a 32
packet window that can be used for implementing anti-replay service.

5.2.1 Selecting and Using an SA or SA Bundle

Mapping the IP datagram to the appropriate SA is simplified because
of the presence of the SPI in the AH or ESP header. Note that the
selector checks are made on the inner headers not the outer (tunnel)
headers. The steps followed are:

1. Use the packet's destination address (outer IP header),
IPsec protocol, and SPI to look up the SA in the SAD. If
the SA lookup fails, drop the packet and log/report the
error.

2. Use the SA found in (1) to do the IPsec processing, e.g.,
authenticate and decrypt. This step includes matching the
packet's (Inner Header if tunneled) selectors to the
selectors in the SA. Local policy determines the
specificity of the SA selectors (single value, list,
range, wildcard). In general, a packet's source address
MUST match the SA selector value. However, an ICMP packet
received on a tunnel mode SA may have a source address

other than that bound to the SA and thus such packets
should be permitted as exceptions to this check. For an
ICMP packet, the selectors from the enclosed problem
packet (the source and destination addresses and ports
should be swapped) should be checked against the selectors
for the SA. Note that some or all of these selectors may
be inaccessible because of limitations on how many bits of
the problem packet the ICMP packet is allowed to carry or
due to encryption. See Section 6.

Do (1) and (2) for every IPsec header until a Transport
Protocol Header or an IP header that is NOT for this
system is encountered. Keep track of what SAs have been
used and their order of application.

3. Find an incoming policy in the SPD that matches the
packet. This could be done, for example, by use of
backpointers from the SAs to the SPD or by matching the
packet's selectors (Inner Header if tunneled) against
those of the policy entries in the SPD.

4. Check whether the required IPsec processing has been
applied, i.e., verify that the SA's found in (1) and (2)
match the kind and order of SAs required by the policy
found in (3).

NOTE: The correct "matching" policy will not necessarily
be the first inbound policy found. If the check in (4)
fails, steps (3) and (4) are repeated until all policy
entries have been checked or until the check succeeds.

At the end of these steps, pass the resulting packet to the Transport
Layer or forward the packet. Note that any IPsec headers processed
in these steps may have been removed, but that this information,
i.e., what SAs were used and the order of their application, may be
needed for subsequent IPsec or firewall processing.

Note that in the case of a security gateway, if forwarding causes a
packet to exit via an IPsec-enabled interface, then additional IPsec
processing may be applied.

5.2.2 Handling of AH and ESP tunnels

The handling of the inner and outer IP headers, extension headers,
and options for AH and ESP tunnels should be performed as described
in the tables in Section 5.1.

6. ICMP Processing (relevant to IPsec)

The focus of this section is on the handling of ICMP error messages.
Other ICMP traffic, e.g., Echo/Reply, should be treated like other
traffic and can be protected on an end-to-end basis using SAs in the
usual fashion.

An ICMP error message protected by AH or ESP and generated by a
router SHOULD be processed and forwarded in a tunnel mode SA. Local
policy determines whether or not it is subjected to source address
checks by the router at the destination end of the tunnel. Note that
if the router at the originating end of the tunnel is forwarding an
ICMP error message from another router, the source address check
would fail. An ICMP message protected by AH or ESP and generated by
a router MUST NOT be forwarded on a transport mode SA (unless the SA
has been established to the router acting as a host, e.g., a Telnet
connection used to manage a router). An ICMP message generated by a
host SHOULD be checked against the source IP address selectors bound
to the SA in which the message arrives. Note that even if the source
of an ICMP error message is authenticated, the returned IP header
could be invalid. Accordingly, the selector values in the IP header
SHOULD also be checked to be sure that they are consistent with the
selectors for the SA over which the ICMP message was received.

The table in Appendix D characterize ICMP messages as being either
host generated, router generated, both, unknown/unassigned. ICMP
messages falling into the last two categories should be handled as
determined by the receiver's policy.

An ICMP message not protected by AH or ESP is unauthenticated and its
processing and/or forwarding may result in denial of service. This
suggests that, in general, it would be desirable to ignore such
messages. However, it is expected that many routers (vs. security
gateways) will not implement IPsec for transit traffic and thus
strict adherence to this rule would cause many ICMP messages to be
discarded. The result is that some critical IP functions would be
lost, e.g., redirection and PMTU processing. Thus it MUST be
possible to configure an IPsec implementation to accept or reject
(router) ICMP traffic as per local security policy.

The remainder of this section addresses how PMTU processing MUST be
performed at hosts and security gateways. It addresses processing of
both authenticated and unauthenticated ICMP PMTU messages. However,
as noted above, unauthenticated ICMP messages MAY be discarded based
on local policy.

6.1 PMTU/DF Processing

6.1.1 DF Bit

In cases where a system (host or gateway) adds an encapsulating
header (ESP tunnel or AH tunnel), it MUST support the option of
copying the DF bit from the original packet to the encapsulating
header (and processing ICMP PMTU messages). This means that it MUST
be possible to configure the system's treatment of the DF bit (set,
clear, copy from encapsulated header) for each interface. (See
Appendix B for rationale.)

6.1.2 Path MTU Discovery (PMTU)

This section discusses IPsec handling for Path MTU Discovery
messages. ICMP PMTU is used here to refer to an ICMP message for:

IPv4 (RFC 792):
- Type = 3 (Destination Unreachable)
- Code = 4 (Fragmentation needed and DF set)
- Next-Hop MTU in the low-order 16 bits of the second
word of the ICMP header (labelled "unused" in RFC
792), with high-order 16 bits set to zero

IPv6 (RFC 1885):
- Type = 2 (Packet Too Big)
- Code = 0 (Fragmentation needed)
- Next-Hop MTU in the 32 bit MTU field of the ICMP6
message

6.1.2.1 Propagation of PMTU

The amount of information returned with the ICMP PMTU message (IPv4
or IPv6) is limited and this affects what selectors are available for
use in further propagating the PMTU information. (See Appendix B for
more detailed discussion of this topic.)

o PMTU message with 64 bits of IPsec header -- If the ICMP PMTU
message contains only 64 bits of the IPsec header (minimum for
IPv4), then a security gateway MUST support the following options
on a per SPI/SA basis:

a. if the originating host can be determined (or the possible
sources narrowed down to a manageable number), send the PM
information to all the possible originating hosts.
b. if the originating host cannot be determined, store the PMTU
with the SA and wait until the next packet(s) arrive from the
originating host for the relevant security association. If

the packet(s) are bigger than the PMTU, drop the packet(s),
and compose ICMP PMTU message(s) with the new packet(s) and
the updated PMTU, and send the ICMP message(s) about the
problem to the originating host. Retain the PMTU information
for any message that might arrive subsequently (see Section
6.1.2.4, "PMTU Aging").

o PMTU message with >64 bits of IPsec header -- If the ICMP message
contains more information from the original packet then there may
be enough non-opaque information to immediately determine to which
host to propagate the ICMP/PMTU message and to provide that system
with the 5 fields (source address, destination address, source
port, destination port, transport protocol) needed to determine
where to store/update the PMTU. Under such circumstances, a
security gateway MUST generate an ICMP PMTU message immediately
upon receipt of an ICMP PMTU from further down the path.

o Distributing the PMTU to the Transport Layer -- The host mechanism
for getting the updated PMTU to the transport layer is unchanged,
as specified in RFC 1191 (Path MTU Discovery).

6.1.2.2 Calculation of PMTU

The calculation of PMTU from an ICMP PMTU MUST take into account the
addition of any IPsec header -- AH transport, ESP transport, AH/ESP
transport, ESP tunnel, AH tunnel. (See Appendix B for discussion of
implementation issues.)

Note: In some situations the addition of IPsec headers could result
in an effective PMTU (as seen by the host or application) that is
unacceptably small. To avoid this problem, the implementation may
establish a threshold below which it will not report a reduced PMTU.
In such cases, the implementation would apply IPsec and then fragment
the resulting packet according to the PMTU. This would result in a
more efficient use of the available bandwidth.

6.1.2.3 Granularity of PMTU Processing

In hosts, the granularity with which ICMP PMTU processing can be done
differs depending on the implementation situation. Looking at a
host, there are 3 situations that are of interest with respect to
PMTU issues (See Appendix B for additional details on this topic.):

a. Integration of IPsec into the native IP implementation
b. Bump-in-the-stack implementations, where IPsec is implemented
"underneath" an existing implementation of a TCP/IP protocol
stack, between the native IP and the local network drivers

c. No IPsec implementation -- This case is included because it
is relevant in cases where a security gateway is sending PMTU
information back to a host.

Only in case (a) can the PMTU data be maintained at the same
granularity as communication associations. In (b) and (c), the IP
layer will only be able to maintain PMTU data at the granularity of
source and destination IP addresses (and optionally TOS), as
described in RFC 1191. This is an important difference, because more
than one communication association may map to the same source and
destination IP addresses, and each communication association may have
a different amount of IPsec header overhead (e.g., due to use of
different transforms or different algorithms).

Implementation of the calculation of PMTU and support for PMTUs at
the granularity of individual communication associations is a local
matter. However, a socket-based implementation of IPsec in a host
SHOULD maintain the information on a per socket basis. Bump in the
stack systems MUST pass an ICMP PMTU to the host IP implementation,
after adjusting it for any IPsec header overhead added by these
systems. The calculation of the overhead SHOULD be determined by
analysis of the SPI and any other selector information present in a
returned ICMP PMTU message.

6.1.2.4 PMTU Aging

In all systems (host or gateway) implementing IPsec and maintaining
PMTU information, the PMTU associated with a security association
(transport or tunnel) MUST be "aged" and some mechanism put in place
for updating the PMTU in a timely manner, especially for discovering
if the PMTU is smaller than it needs to be. A given PMTU has to
remain in place long enough for a packet to get from the source end
of the security association to the system at the other end of the
security association and propagate back an ICMP error message if the
current PMTU is too big. Note that if there are nested tunnels,
multiple packets and round trip times might be required to get an
ICMP message back to an encapsulator or originating host.

Systems SHOULD use the approach described in the Path MTU Discovery
document (RFC 1191, Section 6.3), which suggests periodically
resetting the PMTU to the first-hop data-link MTU and then letting
the normal PMTU Discovery processes update the PMTU as necessary.
The period SHOULD be configurable.

7. Auditing

Not all systems that implement IPsec will implement auditing. For
the most part, the granularity of auditing is a local matter.
However, several auditable events are identified in the AH and ESP
specifications and for each of these events a minimum set of
information that SHOULD be included in an audit log is defined.
Additional information also MAY be included in the audit log for each
of these events, and additional events, not explicitly called out in
this specification, also MAY result in audit log entries. There is
no requirement for the receiver to transmit any message to the
purported transmitter in response to the detection of an auditable
event, because of the potential to induce denial of service via such
action.

8. Use in Systems Supporting Information Flow Security

Information of various sensitivity levels may be carried over a
single network. Information labels (e.g., Unclassified, Company
Proprietary, Secret) [DoD85, DoD87] are often employed to distinguish
such information. The use of labels facilitates segregation of
information, in support of information flow security models, e.g.,
the Bell-LaPadula model [BL73]. Such models, and corresponding
supporting technology, are designed to prevent the unauthorized flow
of sensitive information, even in the face of Trojan Horse attacks.
Conventional, discretionary access control (DAC) mechanisms, e.g.,
based on access control lists, generally are not sufficient to
support such policies, and thus facilities such as the SPD do not
suffice in such environments.

In the military context, technology that supports such models is
often referred to as multi-level security (MLS). Computers and
networks often are designated "multi-level secure" if they support
the separation of labelled data in conjunction with information flow
security policies. Although such technology is more broadly
applicable than just military applications, this document uses the
acronym "MLS" to designate the technology, consistent with much
extant literature.

IPsec mechanisms can easily support MLS networking. MLS networking
requires the use of strong Mandatory Access Controls (MAC), which
unprivileged users or unprivileged processes are incapable of
controlling or violating. This section pertains only to the use of
these IP security mechanisms in MLS (information flow security
policy) environments. Nothing in this section applies to systems not
claiming to provide MLS.

As used in this section, "sensitivity information" might include
implementation-defined hierarchic levels, categories, and/or
releasability information.

AH can be used to provide strong authentication in support of
mandatory access control decisions in MLS environments. If explicit
IP sensitivity information (e.g., IPSO [Ken91]) is used and
confidentiality is not considered necessary within the particular
operational environment, AH can be used to authenticate the binding
between sensitivity labels in the IP header and the IP payload
(including user data). This is a significant improvement over
labeled IPv4 networks where the sensitivity information is trusted
even though there is no authentication or cryptographic binding of
the information to the IP header and user data. IPv4 networks might
or might not use explicit labelling. IPv6 will normally use implicit
sensitivity information that is part of the IPsec Security
Association but not transmitted with each packet instead of using
explicit sensitivity information. All explicit IP sensitivity
information MUST be authenticated using either ESP, AH, or both.

Encryption is useful and can be desirable even when all of the hosts
are within a protected environment, for example, behind a firewall or
disjoint from any external connectivity. ESP can be used, in
conjunction with appropriate key management and encryption
algorithms, in support of both DAC and MAC. (The choice of
encryption and authentication algorithms, and the assurance level of
an IPsec implementation will determine the environments in which an
implementation may be deemed sufficient to satisfy MLS requirements.)
Key management can make use of sensitivity information to provide
MAC. IPsec implementations on systems claiming to provide MLS SHOULD
be capable of using IPsec to provide MAC for IP-based communications.

8.1 Relationship Between Security Associations and Data Sensitivity

Both the Encapsulating Security Payload and the Authentication Header
can be combined with appropriate Security Association policies to
provide multi-level secure networking. In this case each SA (or SA
bundle) is normally used for only a single instance of sensitivity
information. For example, "PROPRIETARY - Internet Engineering" must
be associated with a different SA (or SA bundle) from "PROPRIETARY -
Finance".

8.2 Sensitivity Consistency Checking

An MLS implementation (both host and router) MAY associate
sensitivity information, or a range of sensitivity information with
an interface, or a configured IP address with its associated prefix
(the latter is sometimes referred to as a logical interface, or an

interface alias). If such properties exist, an implementation SHOULD
compare the sensitivity information associated with the packet
against the sensitivity information associated with the interface or
address/prefix from which the packet arrived, or through which the
packet will depart. This check will either verify that the
sensitivities match, or that the packet's sensitivity falls within
the range of the interface or address/prefix.

The checking SHOULD be done on both inbound and outbound processing.

8.3 Additional MLS Attributes for Security Association Databases

Section 4.4 discussed two Security Association databases (the
Security Policy Database (SPD) and the Security Association Database
(SAD)) and the associated policy selectors and SA attributes. MLS
networking introduces an additional selector/attribute:

- Sensitivity information.

The Sensitivity information aids in selecting the appropriate
algorithms and key strength, so that the traffic gets a level of
protection appropriate to its importance or sensitivity as described
in section 8.1. The exact syntax of the sensitivity information is
implementation defined.

8.4 Additional Inbound Processing Steps for MLS Networking

After an inbound packet has passed through IPsec processing, an MLS
implementation SHOULD first check the packet's sensitivity (as
defined by the SA (or SA bundle) used for the packet) with the
interface or address/prefix as described in section 8.2 before
delivering the datagram to an upper-layer protocol or forwarding it.

The MLS system MUST retain the binding between the data received in
an IPsec protected packet and the sensitivity information in the SA
or SAs used for processing, so appropriate policy decisions can be
made when delivering the datagram to an application or forwarding
engine. The means for maintaining this binding are implementation
specific.

8.5 Additional Outbound Processing Steps for MLS Networking

An MLS implementation of IPsec MUST perform two additional checks
besides the normal steps detailed in section 5.1.1. When consulting
the SPD or the SAD to find an outbound security association, the MLS
implementation MUST use the sensitivity of the data to select an

appropriate outbound SA or SA bundle. The second check comes before
forwarding the packet out to its destination, and is the sensitivity
consistency checking described in section 8.2.

8.6 Additional MLS Processing for Security Gateways

An MLS security gateway MUST follow the previously mentioned inbound
and outbound processing rules as well as perform some additional
processing specific to the intermediate protection of packets in an
MLS environment.

A security gateway MAY act as an outbound proxy, creating SAs for MLS
systems that originate packets forwarded by the gateway. These MLS
systems may explicitly label the packets to be forwarded, or the
whole originating network may have sensitivity characteristics
associated with it. The security gateway MUST create and use
appropriate SAs for AH, ESP, or both, to protect such traffic it
forwards.

Similarly such a gateway SHOULD accept and process inbound AH and/or
ESP packets and forward appropriately, using explicit packet
labeling, or relying on the sensitivity characteristics of the
destination network.

9. Performance Issues

The use of IPsec imposes computational performance costs on the hosts
or security gateways that implement these protocols. These costs are
associated with the memory needed for IPsec code and data structures,
and the computation of integrity check values, encryption and
decryption, and added per-packet handling. The per-packet
computational costs will be manifested by increased latency and,
possibly, reduced throughout. Use of SA/key management protocols,
especially ones that employ public key cryptography, also adds
computational performance costs to use of IPsec. These per-
association computational costs will be manifested in terms of
increased latency in association establishment. For many hosts, it
is anticipated that software-based cryptography will not appreciably
reduce throughput, but hardware may be required for security gateways
(since they represent aggregation points), and for some hosts.

The use of IPsec also imposes bandwidth utilization costs on
transmission, switching, and routing components of the Internet
infrastructure, components not implementing IPsec. This is due to
the increase in the packet size resulting from the addition of AH
and/or ESP headers, AH and ESP tunneling (which adds a second IP
header), and the increased packet traffic associated with key
management protocols. It is anticipated that, in most instances,

this increased bandwidth demand will not noticeably affect the
Internet infrastructure. However, in some instances, the effects may
be significant, e.g., transmission of ESP encrypted traffic over a
dialup link that otherwise would have compressed the traffic.

Note: The initial SA establishment overhead will be felt in the first
packet. This delay could impact the transport layer and application.
For example, it could cause TCP to retransmit the SYN before the
ISAKMP exchange is done. The effect of the delay would be different
on UDP than TCP because TCP shouldn't transmit anything other than
the SYN until the connection is set up whereas UDP will go ahead and
transmit data beyond the first packet.

Note: As discussed earlier, compression can still be employed at
layers above IP. There is an IETF working group (IP Payload
Compression Protocol (ippcp)) working on "protocol specifications
that make it possible to perform lossless compression on individual
payloads before the payload is processed by a protocol that encrypts
it. These specifications will allow for compression operations to be
performed prior to the encryption of a payload by IPsec protocols."

10. Conformance Requirements

All IPv4 systems that claim to implement IPsec MUST comply with all
requirements of the Security Architecture document. All IPv6 systems
MUST comply with all requirements of the Security Architecture
document.

11. Security Considerations

The focus of this document is security; hence security considerations
permeate this specification.

12. Differences from RFC 1825

This architecture document differs substantially from RFC 1825 in
detail and in organization, but the fundamental notions are
unchanged. This document provides considerable additional detail in
terms of compliance specifications. It introduces the SPD and SAD,
and the notion of SA selectors. It is aligned with the new versions
of AH and ESP, which also differ from their predecessors. Specific
requirements for supported combinations of AH and ESP are newly
added, as are details of PMTU management.

Acknowledgements

Many of the concepts embodied in this specification were derived from
or influenced by the US Government's SP3 security protocol, ISO/IEC's
NLSP, the proposed swIPe security protocol [SDNS, ISO, IB93, IBK93],
and the work done for SNMP Security and SNMPv2 Security.

For over 3 years (although it sometimes seems *much* longer), this
document has evolved through multiple versions and iterations.
During this time, many people have contributed significant ideas and
energy to the process and the documents themselves. The authors
would like to thank Karen Seo for providing extensive help in the
review, editing, background research, and coordination for this
version of the specification. The authors would also like to thank
the members of the IPsec and IPng working groups, with special
mention of the efforts of (in alphabetic order): Steve Bellovin,
Steve Deering, James Hughes, Phil Karn, Frank Kastenholz, Perry
Metzger, David Mihelcic, Hilarie Orman, Norman Shulman, William
Simpson, Harry Varnis, and Nina Yuan.

Appendix A -- Glossary

This section provides definitions for several key terms that are
employed in this document. Other documents provide additional
definitions and background information relevant to this technology,
e.g., [VK83, HA94]. Included in this glossary are generic security
service and security mechanism terms, plus IPsec-specific terms.

Access Control
Access control is a security service that prevents unauthorized
use of a resource, including the prevention of use of a resource
in an unauthorized manner. In the IPsec context, the resource
to which access is being controlled is often:
o for a host, computing cycles or data
o for a security gateway, a network behind the gateway
or
bandwidth on that network.

Anti-replay
[See "Integrity" below]

Authentication
This term is used informally to refer to the combination of two
nominally distinct security services, data origin authentication
and connectionless integrity. See the definitions below for
each of these services.

Availability
Availability, when viewed as a security service, addresses the
security concerns engendered by attacks against networks that
deny or degrade service. For example, in the IPsec context, the
use of anti-replay mechanisms in AH and ESP support
availability.

Confidentiality
Confidentiality is the security service that protects data from
unauthorized disclosure. The primary confidentiality concern in
most instances is unauthorized disclosure of application level
data, but disclosure of the external characteristics of
communication also can be a concern in some circumstances.
Traffic flow confidentiality is the service that addresses this
latter concern by concealing source and destination addresses,
message length, or frequency of communication. In the IPsec
context, using ESP in tunnel mode, especially at a security
gateway, can provide some level of traffic flow confidentiality.
(See also traffic analysis, below.)

Encryption
Encryption is a security mechanism used to transform data from
an intelligible form (plaintext) into an unintelligible form
(ciphertext), to provide confidentiality. The inverse
transformation process is designated "decryption". Oftimes the
term "encryption" is used to generically refer to both
processes.

Data Origin Authentication
Data origin authentication is a security service that verifies
the identity of the claimed source of data. This service is
usually bundled with connectionless integrity service.

Integrity
Integrity is a security service that ensures that modifications
to data are detectable. Integrity comes in various flavors to
match application requirements. IPsec supports two forms of
integrity: connectionless and a form of partial sequence
integrity. Connectionless integrity is a service that detects
modification of an individual IP datagram, without regard to the
ordering of the datagram in a stream of traffic. The form of
partial sequence integrity offered in IPsec is referred to as
anti-replay integrity, and it detects arrival of duplicate IP
datagrams (within a constrained window). This is in contrast to
connection-oriented integrity, which imposes more stringent
sequencing requirements on traffic, e.g., to be able to detect
lost or re-ordered messages. Although authentication and
integrity services often are cited separately, in practice they
are intimately connected and almost always offered in tandem.

Security Association (SA)
A simplex (uni-directional) logical connection, created for
security purposes. All traffic traversing an SA is provided the
same security processing. In IPsec, an SA is an internet layer
abstraction implemented through the use of AH or ESP.

Security Gateway
A security gateway is an intermediate system that acts as the
communications interface between two networks. The set of hosts
(and networks) on the external side of the security gateway is
viewed as untrusted (or less trusted), while the networks and
hosts and on the internal side are viewed as trusted (or more
trusted). The internal subnets and hosts served by a security
gateway are presumed to be trusted by virtue of sharing a
common, local, security administration. (See "Trusted
Subnetwork" below.) In the IPsec context, a security gateway is
a point at which AH and/or ESP is implemented in order to serve

a set of internal hosts, providing security services for these
hosts when they communicate with external hosts also employing
IPsec (either directly or via another security gateway).

SPI
Acronym for "Security Parameters Index". The combination of a
destination address, a security protocol, and an SPI uniquely
identifies a security association (SA, see above). The SPI is
carried in AH and ESP protocols to enable the receiving system
to select the SA under which a received packet will be
processed. An SPI has only local significance, as defined by
the creator of the SA (usually the receiver of the packet
carrying the SPI); thus an SPI is generally viewed as an opaque
bit string. However, the creator of an SA may choose to
interpret the bits in an SPI to facilitate local processing.

Traffic Analysis
The analysis of network traffic flow for the purpose of deducing
information that is useful to an adversary. Examples of such
information are frequency of transmission, the identities of the
conversing parties, sizes of packets, flow identifiers, etc.
[Sch94]

Trusted Subnetwork
A subnetwork containing hosts and routers that trust each other
not to engage in active or passive attacks. There also is an
assumption that the underlying communications channel (e.g., a
LAN or CAN) isn't being attacked by other means.

Appendix B -- Analysis/Discussion of PMTU/DF/Fragmentation Issues

B.1 DF bit

In cases where a system (host or gateway) adds an encapsulating
header (e.g., ESP tunnel), should/must the DF bit in the original
packet be copied to the encapsulating header?

Fragmenting seems correct for some situations, e.g., it might be
appropriate to fragment packets over a network with a very small MTU,
e.g., a packet radio network, or a cellular phone hop to mobile node,
rather than propagate back a very small PMTU for use over the rest of
the path. In other situations, it might be appropriate to set the DF
bit in order to get feedback from later routers about PMTU
constraints which require fragmentation. The existence of both of
these situations argues for enabling a system to decide whether or
not to fragment over a particular network "link", i.e., for requiring
an implementation to be able to copy the DF bit (and to process ICMP
PMTU messages), but making it an option to be selected on a per
interface basis. In other words, an administrator should be able to
configure the router's treatment of the DF bit (set, clear, copy from
encapsulated header) for each interface.

Note: If a bump-in-the-stack implementation of IPsec attempts to
apply different IPsec algorithms based on source/destination ports,
it will be difficult to apply Path MTU adjustments.

B.2 Fragmentation

If required, IP fragmentation occurs after IPsec processing within an
IPsec implementation. Thus, transport mode AH or ESP is applied only
to whole IP datagrams (not to IP fragments). An IP packet to which
AH or ESP has been applied may itself be fragmented by routers en
route, and such fragments MUST be reassembled prior to IPsec
processing at a receiver. In tunnel mode, AH or ESP is applied to an
IP packet, the payload of which may be a fragmented IP packet. For
example, a security gateway, "bump-in-the-stack" (BITS), or "bump-
in-the-wire" (BITW) IPsec implementation may apply tunnel mode AH to
such fragments. Note that BITS or BITW implementations are examples
of where a host IPsec implementation might receive fragments to which
tunnel mode is to be applied. However, if transport mode is to be
applied, then these implementations MUST reassemble the fragments
prior to applying IPsec.

NOTE: IPsec always has to figure out what the encapsulating IP header
fields are. This is independent of where you insert IPsec and is
intrinsic to the definition of IPsec. Therefore any IPsec
implementation that is not integrated into an IP implementation must
include code to construct the necessary IP headers (e.g., IP2):

o AH-tunnel --> IP2-AH-IP1-Transport-Data
o ESP-tunnel --> IP2-ESP_hdr-IP1-Transport-Data-ESP_trailer

*********************************************************************

Overall, the fragmentation/reassembly approach described above works
for all cases examined.

AH Xport AH Tunnel ESP Xport ESP Tunnel
Implementation approach IPv4 IPv6 IPv4 IPv6 IPv4 IPv6 IPv4 IPv6
----------------------- ---- ---- ---- ---- ---- ---- ---- ----
Hosts (integr w/ IP stack) Y Y Y Y Y Y Y Y
Hosts (betw/ IP and drivers) Y Y Y Y Y Y Y Y
S. Gwy (integr w/ IP stack) Y Y Y Y
Outboard crypto processor *

* If the crypto processor system has its own IP address, then it
is covered by the security gateway case. This box receives
the packet from the host and performs IPsec processing. It
has to be able to handle the same AH, ESP, and related
IPv4/IPv6 tunnel processing that a security gateway would have
to handle. If it doesn't have it's own address, then it is
similar to the bump-in-the stack implementation between IP and
the network drivers.

The following analysis assumes that:

1. There is only one IPsec module in a given system's stack.
There isn't an IPsec module A (adding ESP/encryption and
thus) hiding the transport protocol, SRC port, and DEST port
from IPsec module B.
2. There are several places where IPsec could be implemented (as
shown in the table above).
a. Hosts with integration of IPsec into the native IP
implementation. Implementer has access to the source
for the stack.
b. Hosts with bump-in-the-stack implementations, where
IPsec is implemented between IP and the local network
drivers. Source access for stack is not available;
but there are well-defined interfaces that allows the
IPsec code to be incorporated into the system.

c. Security gateways and outboard crypto processors with
integration of IPsec into the stack.
3. Not all of the above approaches are feasible in all hosts.
But it was assumed that for each approach, there are some
hosts for whom the approach is feasible.

For each of the above 3 categories, there are IPv4 and IPv6, AH
transport and tunnel modes, and ESP transport and tunnel modes -- for
a total of 24 cases (3 x 2 x 4).

Some header fields and interface fields are listed here for ease of
reference -- they're not in the header order, but instead listed to
allow comparison between the columns. (* = not covered by AH
authentication. ESP authentication doesn't cover any headers that
precede it.)

IP/Transport Interface
IPv4 IPv6 (RFC 1122 -- Sec 3.4)
---- ---- ----------------------
Version = 4 Version = 6
Header Len
*TOS Class,Flow Lbl TOS
Packet Len Payload Len Len
ID ID (optional)
*Flags DF
*Offset
*TTL *Hop Limit TTL
Protocol Next Header
*Checksum
Src Address Src Address Src Address
Dst Address Dst Address Dst Address
Options? Options? Opt

? = AH covers Option-Type and Option-Length, but
might not cover Option-Data.

The results for each of the 20 cases is shown below ("works" = will
work if system fragments after outbound IPsec processing, reassembles
before inbound IPsec processing). Notes indicate implementation
issues.

a. Hosts (integrated into IP stack)
o AH-transport --> (IP1-AH-Transport-Data)
- IPv4 -- works
- IPv6 -- works
o AH-tunnel --> (IP2-AH-IP1-Transport-Data)
- IPv4 -- works
- IPv6 -- works

o ESP-transport --> (IP1-ESP_hdr-Transport-Data-ESP_trailer)
- IPv4 -- works
- IPv6 -- works
o ESP-tunnel --> (IP2-ESP_hdr-IP1-Transport-Data-ESP_trailer)
- IPv4 -- works
- IPv6 -- works

b. Hosts (Bump-in-the-stack) -- put IPsec between IP layer and
network drivers. In this case, the IPsec module would have to do
something like one of the following for fragmentation and
reassembly.
- do the fragmentation/reassembly work itself and
send/receive the packet directly to/from the network
layer. In AH or ESP transport mode, this is fine. In AH
or ESP tunnel mode where the tunnel end is at the ultimate
destination, this is fine. But in AH or ESP tunnel modes
where the tunnel end is different from the ultimate
destination and where the source host is multi-homed, this
approach could result in sub-optimal routing because the
IPsec module may be unable to obtain the information
needed (LAN interface and next-hop gateway) to direct the
packet to the appropriate network interface. This is not
a problem if the interface and next-hop gateway are the
same for the ultimate destination and for the tunnel end.
But if they are different, then IPsec would need to know
the LAN interface and the next-hop gateway for the tunnel
end. (Note: The tunnel end (security gateway) is highly
likely to be on the regular path to the ultimate
destination. But there could also be more than one path
to the destination, e.g., the host could be at an
organization with 2 firewalls. And the path being used
could involve the less commonly chosen firewall.) OR
- pass the IPsec'd packet back to the IP layer where an
extra IP header would end up being pre-pended and the
IPsec module would have to check and let IPsec'd fragments
go by.
OR
- pass the packet contents to the IP layer in a form such
that the IP layer recreates an appropriate IP header

At the network layer, the IPsec module will have access to the
following selectors from the packet -- SRC address, DST address,
Next Protocol, and if there's a transport layer header --> SRC
port and DST port. One cannot assume IPsec has access to the
Name. It is assumed that the available selector information is
sufficient to figure out the relevant Security Policy entry and
Security Association(s).

o AH-transport --> (IP1-AH-Transport-Data)
- IPv4 -- works
- IPv6 -- works
o AH-tunnel --> (IP2-AH-IP1-Transport-Data)
- IPv4 -- works
- IPv6 -- works
o ESP-transport --> (IP1-ESP_hdr-Transport-Data-ESP_trailer)
- IPv4 -- works
- IPv6 -- works
o ESP-tunnel --> (IP2-ESP_hdr-IP1-Transport-Data-ESP_trailer)
- IPv4 -- works
- IPv6 -- works

c. Security gateways -- integrate IPsec into the IP stack

NOTE: The IPsec module will have access to the following
selectors from the packet -- SRC address, DST address, Next
Protocol, and if there's a transport layer header --> SRC port
and DST port. It won't have access to the User ID (only Hosts
have access to User ID information.) Unlike some Bump-in-the-
stack implementations, security gateways may be able to look up
the Source Address in the DNS to provide a System Name, e.g., in
situations involving use of dynamically assigned IP addresses in
conjunction with dynamically updated DNS entries. It also won't
have access to the transport layer information if there is an ESP
header, or if it's not the first fragment of a fragmented
message. It is assumed that the available selector information
is sufficient to figure out the relevant Security Policy entry
and Security Association(s).

o AH-tunnel --> (IP2-AH-IP1-Transport-Data)
- IPv4 -- works
- IPv6 -- works
o ESP-tunnel --> (IP2-ESP_hdr-IP1-Transport-Data-ESP_trailer)
- IPv4 -- works
- IPv6 -- works

**********************************************************************

B.3 Path MTU Discovery

As mentioned earlier, "ICMP PMTU" refers to an ICMP message used for
Path MTU Discovery.

The legend for the diagrams below in B.3.1 and B.3.3 (but not B.3.2)
is:

==== = security association (AH or ESP, transport or tunnel)

---- = connectivity (or if so labelled, administrative boundary)
.... = ICMP message (hereafter referred to as ICMP PMTU) for

IPv4:
- Type = 3 (Destination Unreachable)
- Code = 4 (Fragmentation needed and DF set)
- Next-Hop MTU in the low-order 16 bits of the second
word of the ICMP header (labelled unused in RFC 792),
with high-order 16 bits set to zero

IPv6 (RFC 1885):
- Type = 2 (Packet Too Big)
- Code = 0 (Fragmentation needed and DF set)
- Next-Hop MTU in the 32 bit MTU field of the ICMP6

Hx = host x
Rx = router x
SGx = security gateway x
X* = X supports IPsec

B.3.1 Identifying the Originating Host(s)

The amount of information returned with the ICMP message is limited
and this affects what selectors are available to identify security
associations, originating hosts, etc. for use in further propagating
the PMTU information.

In brief... An ICMP message must contain the following information
from the "offending" packet:
- IPv4 (RFC 792) -- IP header plus a minimum of 64 bits

Accordingly, in the IPv4 context, an ICMP PMTU may identify only the
first (outermost) security association. This is because the ICMP
PMTU may contain only 64 bits of the "offending" packet beyond the IP
header, which would capture only the first SPI from AH or ESP. In
the IPv6 context, an ICMP PMTU will probably provide all the SPIs and
the selectors in the IP header, but maybe not the SRC/DST ports (in
the transport header) or the encapsulated (TCP, UDP, etc.) protocol.
Moreover, if ESP is used, the transport ports and protocol selectors
may be encrypted.

Looking at the diagram below of a security gateway tunnel (as
mentioned elsewhere, security gateways do not use transport mode)...

H1 =================== H3
\ | | /
H0 -- SG1* ---- R1 ---- SG2* ---- R2 -- H5
/ ^ | \
H2 |........| H4

Suppose that the security policy for SG1 is to use a single SA to SG2
for all the traffic between hosts H0, H1, and H2 and hosts H3, H4,
and H5. And suppose H0 sends a data packet to H5 which causes R1 to
send an ICMP PMTU message to SG1. If the PMTU message has only the
SPI, SG1 will be able to look up the SA and find the list of possible
hosts (H0, H1, H2, wildcard); but SG1 will have no way to figure out
that H0 sent the traffic that triggered the ICMP PMTU message.

original after IPsec ICMP
packet processing packet
-------- ----------- ------
IP-3 header (S = R1, D = SG1)
ICMP header (includes PMTU)
IP-2 header IP-2 header (S = SG1, D = SG2)
ESP header minimum of 64 bits of ESP hdr (*)
IP-1 header IP-1 header
TCP header TCP header
TCP data TCP data
ESP trailer

(*) The 64 bits will include enough of the ESP (or AH) header to
include the SPI.
- ESP -- SPI (32 bits), Seq number (32 bits)
- AH -- Next header (8 bits), Payload Len (8 bits),
Reserved (16 bits), SPI (32 bits)

This limitation on the amount of information returned with an ICMP
message creates a problem in identifying the originating hosts for
the packet (so as to know where to further propagate the ICMP PMTU
information). If the ICMP message contains only 64 bits of the IPsec
header (minimum for IPv4), then the IPsec selectors (e.g., Source and
Destination addresses, Next Protocol, Source and Destination ports,
etc.) will have been lost. But the ICMP error message will still
provide SG1 with the SPI, the PMTU information and the source and
destination gateways for the relevant security association.

The destination security gateway and SPI uniquely define a security
association which in turn defines a set of possible originating
hosts. At this point, SG1 could:

a. send the PMTU information to all the possible originating hosts.
This would not work well if the host list is a wild card or if
many/most of the hosts weren't sending to SG1; but it might work
if the SPI/destination/etc mapped to just one or a small number of
hosts.
b. store the PMTU with the SPI/etc and wait until the next packet(s)
arrive from the originating host(s) for the relevant security
association. If it/they are bigger than the PMTU, drop the
packet(s), and compose ICMP PMTU message(s) with the new packet(s)
and the updated PMTU, and send the originating host(s) the ICMP
message(s) about the problem. This involves a delay in notifying
the originating host(s), but avoids the problems of (a).

Since only the latter approach is feasible in all instances, a
security gateway MUST provide such support, as an option. However,
if the ICMP message contains more information from the original
packet, then there may be enough information to immediately determine
to which host to propagate the ICMP/PMTU message and to provide that
system with the 5 fields (source address, destination address, source
port, destination port, and transport protocol) needed to determine
where to store/update the PMTU. Under such circumstances, a security
gateway MUST generate an ICMP PMTU message immediately upon receipt
of an ICMP PMTU from further down the path. NOTE: The Next Protocol
field may not be contained in the ICMP message and the use of ESP
encryption may hide the selector fields that have been encrypted.

B.3.2 Calculation of PMTU

The calculation of PMTU from an ICMP PMTU has to take into account
the addition of any IPsec header by H1 -- AH and/or ESP transport, or
ESP or AH tunnel. Within a single host, multiple applications may
share an SPI and nesting of security associations may occur. (See
Section 4.5 Basic Combinations of Security Associations for
description of the combinations that MUST be supported). The diagram
below illustrates an example of security associations between a pair
of hosts (as viewed from the perspective of one of the hosts.) (ESPx
or AHx = transport mode)

Socket 1 -------------------------|
|
Socket 2 (ESPx/SPI-A) ---------- AHx (SPI-B) -- Internet

In order to figure out the PMTU for each socket that maps to SPI-B,
it will be necessary to have backpointers from SPI-B to each of the 2
paths that lead to it -- Socket 1 and Socket 2/SPI-A.

B.3.3 Granularity of Maintaining PMTU Data

In hosts, the granularity with which PMTU ICMP processing can be done
differs depending on the implementation situation. Looking at a
host, there are three situations that are of interest with respect to
PMTU issues:

a. Integration of IPsec into the native IP implementation
b. Bump-in-the-stack implementations, where IPsec is implemented
"underneath" an existing implementation of a TCP/IP protocol
stack, between the native IP and the local network drivers
c. No IPsec implementation -- This case is included because it is
relevant in cases where a security gateway is sending PMTU
information back to a host.

Only in case (a) can the PMTU data be maintained at the same
granularity as communication associations. In the other cases, the
IP layer will maintain PMTU data at the granularity of Source and
Destination IP addresses (and optionally TOS/Class), as described in
RFC 1191. This is an important difference, because more than one
communication association may map to the same source and destination
IP addresses, and each communication association may have a different
amount of IPsec header overhead (e.g., due to use of different
transforms or different algorithms). The examples below illustrate
this.

In cases (a) and (b)... Suppose you have the following situation.
H1 is sending to H2 and the packet to be sent from R1 to R2 exceeds
the PMTU of the network hop between them.

==================================
| |
H1* --- R1 ----- R2 ---- R3 ---- H2*
^ |
|.......|

If R1 is configured to not fragment subscriber traffic, then R1 sends
an ICMP PMTU message with the appropriate PMTU to H1. H1's
processing would vary with the nature of the implementation. In case
(a) (native IP), the security services are bound to sockets or the
equivalent. Here the IP/IPsec implementation in H1 can store/update
the PMTU for the associated socket. In case (b), the IP layer in H1
can store/update the PMTU but only at the granularity of Source and
Destination addresses and possibly TOS/Class, as noted above. So the
result may be sub-optimal, since the PMTU for a given
SRC/DST/TOS/Class will be the subtraction of the largest amount of
IPsec header used for any communication association between a given
source and destination.

In case (c), there has to be a security gateway to have any IPsec
processing. So suppose you have the following situation. H1 is
sending to H2 and the packet to be sent from SG1 to R exceeds the
PMTU of the network hop between them.

================
| |
H1 ---- SG1* --- R --- SG2* ---- H2
^ |
|.......|

As described above for case (b), the IP layer in H1 can store/update
the PMTU but only at the granularity of Source and Destination
addresses, and possibly TOS/Class. So the result may be sub-optimal,
since the PMTU for a given SRC/DST/TOS/Class will be the subtraction
of the largest amount of IPsec header used for any communication
association between a given source and destination.

B.3.4 Per Socket Maintenance of PMTU Data

Implementation of the calculation of PMTU (Section B.3.2) and support
for PMTUs at the granularity of individual "communication
associations" (Section B.3.3) is a local matter. However, a socket-
based implementation of IPsec in a host SHOULD maintain the
information on a per socket basis. Bump in the stack systems MUST
pass an ICMP PMTU to the host IP implementation, after adjusting it
for any IPsec header overhead added by these systems. The
determination of the overhead SHOULD be determined by analysis of the
SPI and any other selector information present in a returned ICMP
PMTU message.

B.3.5 Delivery of PMTU Data to the Transport Layer

The host mechanism for getting the updated PMTU to the transport
layer is unchanged, as specified in RFC 1191 (Path MTU Discovery).

B.3.6 Aging of PMTU Data

This topic is covered in Section 6.1.2.4.

Appendix C -- Sequence Space Window Code Example

This appendix contains a routine that implements a bitmask check for
a 32 packet window. It was provided by James Hughes
(jim_hughes@stortek.com) and Harry Varnis (hgv@anubis.network.com)
and is intended as an implementation example. Note that this code
both checks for a replay and updates the window. Thus the algorithm,
as shown, should only be called AFTER the packet has been
authenticated. Implementers might wish to consider splitting the
code to do the check for replays before computing the ICV. If the
packet is not a replay, the code would then compute the ICV, (discard
any bad packets), and if the packet is OK, update the window.

#include <stdio.h>
#include <stdlib.h>
typedef unsigned long u_long;

enum {
ReplayWindowSize = 32
};

u_long bitmap = 0; /* session state - must be 32 bits */
u_long lastSeq = 0; /* session state */

/* Returns 0 if packet disallowed, 1 if packet permitted */
int ChkReplayWindow(u_long seq);

int ChkReplayWindow(u_long seq) {
u_long diff;

if (seq == 0) return 0; /* first == 0 or wrapped */
if (seq > lastSeq) { /* new larger sequence number */
diff = seq - lastSeq;
if (diff < ReplayWindowSize) { /* In window */
bitmap <<= diff;
bitmap |= 1; /* set bit for this packet */
} else bitmap = 1; /* This packet has a "way larger" */
lastSeq = seq;
return 1; /* larger is good */
}
diff = lastSeq - seq;
if (diff >= ReplayWindowSize) return 0; /* too old or wrapped */
if (bitmap & ((u_long)1 << diff)) return 0; /* already seen */
bitmap |= ((u_long)1 << diff); /* mark as seen */
return 1; /* out of order but good */
}

char string_buffer[512];

#define STRING_BUFFER_SIZE sizeof(string_buffer)

int main() {
int result;
u_long last, current, bits;

printf("Input initial state (bits in hex, last msgnum):\n");
if (!fgets(string_buffer, STRING_BUFFER_SIZE, stdin)) exit(0);
sscanf(string_buffer, "%lx %lu", &bits, &last);
if (last != 0)
bits |= 1;
bitmap = bits;
lastSeq = last;
printf("bits:%08lx last:%lu\n", bitmap, lastSeq);
printf("Input value to test (current):\n");

while (1) {
if (!fgets(string_buffer, STRING_BUFFER_SIZE, stdin)) break;
sscanf(string_buffer, "%lu", ¤t);
result = ChkReplayWindow(current);
printf("%-3s", result ? "OK" : "BAD");
printf(" bits:%08lx last:%lu\n", bitmap, lastSeq);
}
return 0;
}

Appendix D -- Categorization of ICMP messages

The tables below characterize ICMP messages as being either host
generated, router generated, both, unassigned/unknown. The first set
are IPv4. The second set are IPv6.

IPv4

Type Name/Codes Reference
========================================================================
HOST GENERATED:
3 Destination Unreachable
2 Protocol Unreachable [RFC792]
3 Port Unreachable [RFC792]
8 Source Host Isolated [RFC792]
14 Host Precedence Violation [RFC1812]
10 Router Selection [RFC1256]

Type Name/Codes Reference
========================================================================
ROUTER GENERATED:
3 Destination Unreachable
0 Net Unreachable [RFC792]
4 Fragmentation Needed, Don't Fragment was Set [RFC792]
5 Source Route Failed [RFC792]
6 Destination Network Unknown [RFC792]
7 Destination Host Unknown [RFC792]
9 Comm. w/Dest. Net. is Administratively Prohibited [RFC792]
11 Destination Network Unreachable for Type of Service[RFC792]
5 Redirect
0 Redirect Datagram for the Network (or subnet) [RFC792]
2 Redirect Datagram for the Type of Service & Network[RFC792]
9 Router Advertisement [RFC1256]
18 Address Mask Reply [RFC950]

IPv4
Type Name/Codes Reference
========================================================================
BOTH ROUTER AND HOST GENERATED:
0 Echo Reply [RFC792]
3 Destination Unreachable
1 Host Unreachable [RFC792]
10 Comm. w/Dest. Host is Administratively Prohibited [RFC792]
12 Destination Host Unreachable for Type of Service [RFC792]
13 Communication Administratively Prohibited [RFC1812]
15 Precedence cutoff in effect [RFC1812]
4 Source Quench [RFC792]
5 Redirect
1 Redirect Datagram for the Host [RFC792]
3 Redirect Datagram for the Type of Service and Host [RFC792]
6 Alternate Host Address [JBP]
8 Echo [RFC792]
11 Time Exceeded [RFC792]
12 Parameter Problem [RFC792,RFC1108]
13 Timestamp [RFC792]
14 Timestamp Reply [RFC792]
15 Information Request [RFC792]
16 Information Reply [RFC792]
17 Address Mask Request [RFC950]
30 Traceroute [RFC1393]
31 Datagram Conversion Error [RFC1475]
32 Mobile Host Redirect [Johnson]
39 SKIP [Markson]
40 Photuris [Simpson]

Type Name/Codes Reference
========================================================================
UNASSIGNED TYPE OR UNKNOWN GENERATOR:
1 Unassigned [JBP]
2 Unassigned [JBP]
7 Unassigned [JBP]
19 Reserved (for Security) [Solo]
20-29 Reserved (for Robustness Experiment) [ZSu]
33 IPv6 Where-Are-You [Simpson]
34 IPv6 I-Am-Here [Simpson]
35 Mobile Registration Request [Simpson]
36 Mobile Registration Reply [Simpson]
37 Domain Name Request [Simpson]
38 Domain Name Reply [Simpson]
41-255 Reserved [JBP]

IPv6

Type Name/Codes Reference
========================================================================
HOST GENERATED:
1 Destination Unreachable [RFC 1885]
4 Port Unreachable

Type Name/Codes Reference
========================================================================
ROUTER GENERATED:
1 Destination Unreachable [RFC1885]
0 No Route to Destination
1 Comm. w/Destination is Administratively Prohibited
2 Not a Neighbor
3 Address Unreachable
2 Packet Too Big [RFC1885]
0
3 Time Exceeded [RFC1885]
0 Hop Limit Exceeded in Transit
1 Fragment reassembly time exceeded

Type Name/Codes Reference
========================================================================
BOTH ROUTER AND HOST GENERATED:
4 Parameter Problem [RFC1885]
0 Erroneous Header Field Encountered
1 Unrecognized Next Header Type Encountered
2 Unrecognized IPv6 Option Encountered

References

[BL73] Bell, D.E. & LaPadula, L.J., "Secure Computer Systems:
Mathematical Foundations and Model", Technical Report M74-
244, The MITRE Corporation, Bedford, MA, May 1973.

[Bra97] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Level", BCP 14, RFC 2119, March 1997.

[DoD85] US National Computer Security Center, "Department of
Defense Trusted Computer System Evaluation Criteria", DoD
5200.28-STD, US Department of Defense, Ft. Meade, MD.,
December 1985.

[DoD87] US National Computer Security Center, "Trusted Network
Interpretation of the Trusted Computer System Evaluation
Criteria", NCSC-TG-005, Version 1, US Department of
Defense, Ft. Meade, MD., 31 July 1987.

[HA94] Haller, N., and R. Atkinson, "On Internet Authentication",
RFC 1704, October 1994.

[HC98] Harkins, D., and D. Carrel, "The Internet Key Exchange
(IKE)", RFC 2409, November 1998.

[HM97] Harney, H., and C. Muckenhirn, "Group Key Management
Protocol (GKMP) Architecture", RFC 2094, July 1997.

[ISO] ISO/IEC JTC1/SC6, Network Layer Security Protocol, ISO-IEC
DIS 11577, International Standards Organisation, Geneva,
Switzerland, 29 November 1992.

[IB93] John Ioannidis and Matt Blaze, "Architecture and
Implementation of Network-layer Security Under Unix",
Proceedings of USENIX Security Symposium, Santa Clara, CA,
October 1993.

[IBK93] John Ioannidis, Matt Blaze, & Phil Karn, "swIPe: Network-
Layer Security for IP", presentation at the Spring 1993
IETF Meeting, Columbus, Ohio

[KA98a] Kent, S., and R. Atkinson, "IP Authentication Header", RFC
2402, November 1998.

[KA98b] Kent, S., and R. Atkinson, "IP Encapsulating Security
Payload (ESP)", RFC 2406, November 1998.

[Ken91] Kent, S., "US DoD Security Options for the Internet
Protocol", RFC 1108, November 1991.

[MSST97] Maughan, D., Schertler, M., Schneider, M., and J. Turner,
"Internet Security Association and Key Management Protocol
(ISAKMP)", RFC 2408, November 1998.

[Orm97] Orman, H., "The OAKLEY Key Determination Protocol", RFC
2412, November 1998.

[Pip98] Piper, D., "The Internet IP Security Domain of
Interpretation for ISAKMP", RFC 2407, November 1998.

[Sch94] Bruce Schneier, Applied Cryptography, Section 8.6, John
Wiley & Sons, New York, NY, 1994.

[SDNS] SDNS Secure Data Network System, Security Protocol 3, SP3,
Document SDN.301, Revision 1.5, 15 May 1989, published in
NIST Publication NIST-IR-90-4250, February 1990.

[SMPT98] Shacham, A., Monsour, R., Pereira, R., and M. Thomas, "IP
Payload Compression Protocol (IPComp)", RFC 2393, August
1998.

[TDG97] Thayer, R., Doraswamy, N., and R. Glenn, "IP Security
Document Roadmap", RFC 2411, November 1998.

[VK83] V.L. Voydock & S.T. Kent, "Security Mechanisms in High-
level Networks", ACM Computing Surveys, Vol. 15, No. 2,
June 1983.

Disclaimer

The views and specification expressed in this document are those of
the authors and are not necessarily those of their employers. The
authors and their employers specifically disclaim responsibility for
any problems arising from correct or incorrect implementation or use
of this design.

Author Information

Stephen Kent
BBN Corporation
70 Fawcett Street
Cambridge, MA 02140
USA

Phone: +1 (617) 873-3988
EMail: kent@bbn.com

Randall Atkinson
@Home Network
425 Broadway
Redwood City, CA 94063
USA

Phone: +1 (415) 569-5000
EMail: rja@corp.home.net

Copyright (C) The Internet Society (1998). All Rights Reserved.

This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.

The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.

This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


RFC 1661 – The Point-to-Point Protocol (PPP)

 
Network Working Group                                 W. Simpson, Editor
Request for Comments: 1661 Daydreamer
STD: 51 July 1994
Obsoletes: 1548
Category: Standards Track

The Point-to-Point Protocol (PPP)

Status of this Memo

This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.

Abstract

The Point-to-Point Protocol (PPP) provides a standard method for
transporting multi-protocol datagrams over point-to-point links. PPP
is comprised of three main components:

1. A method for encapsulating multi-protocol datagrams.

2. A Link Control Protocol (LCP) for establishing, configuring,
and testing the data-link connection.

3. A family of Network Control Protocols (NCPs) for establishing
and configuring different network-layer protocols.

This document defines the PPP organization and methodology, and the
PPP encapsulation, together with an extensible option negotiation
mechanism which is able to negotiate a rich assortment of
configuration parameters and provides additional management
functions. The PPP Link Control Protocol (LCP) is described in terms
of this mechanism.

Table of Contents

1. Introduction .......................................... 1
1.1 Specification of Requirements ................... 2
1.2 Terminology ..................................... 3

2. PPP Encapsulation ..................................... 4

3. PPP Link Operation .................................... 6
3.1 Overview ........................................ 6
3.2 Phase Diagram ................................... 6
3.3 Link Dead (physical-layer not ready) ............ 7
3.4 Link Establishment Phase ........................ 7
3.5 Authentication Phase ............................ 8
3.6 Network-Layer Protocol Phase .................... 8
3.7 Link Termination Phase .......................... 9

4. The Option Negotiation Automaton ...................... 11
4.1 State Transition Table .......................... 12
4.2 States .......................................... 14
4.3 Events .......................................... 16
4.4 Actions ......................................... 21
4.5 Loop Avoidance .................................. 23
4.6 Counters and Timers ............................. 24

5. LCP Packet Formats .................................... 26
5.1 Configure-Request ............................... 28
5.2 Configure-Ack ................................... 29
5.3 Configure-Nak ................................... 30
5.4 Configure-Reject ................................ 31
5.5 Terminate-Request and Terminate-Ack ............. 33
5.6 Code-Reject ..................................... 34
5.7 Protocol-Reject ................................. 35
5.8 Echo-Request and Echo-Reply ..................... 36
5.9 Discard-Request ................................. 37

6. LCP Configuration Options ............................. 39
6.1 Maximum-Receive-Unit (MRU) ...................... 41
6.2 Authentication-Protocol ......................... 42
6.3 Quality-Protocol ................................ 43
6.4 Magic-Number .................................... 45
6.5 Protocol-Field-Compression (PFC) ................ 48
6.6 Address-and-Control-Field-Compression (ACFC)

SECURITY CONSIDERATIONS ...................................... 51
REFERENCES ................................................... 51
ACKNOWLEDGEMENTS ............................................. 51
CHAIR'S ADDRESS .............................................. 52
EDITOR'S ADDRESS ............................................. 52

1. Introduction

The Point-to-Point Protocol is designed for simple links which
transport packets between two peers. These links provide full-duplex
simultaneous bi-directional operation, and are assumed to deliver
packets in order. It is intended that PPP provide a common solution
for easy connection of a wide variety of hosts, bridges and routers
[1].

Encapsulation

The PPP encapsulation provides for multiplexing of different
network-layer protocols simultaneously over the same link. The
PPP encapsulation has been carefully designed to retain
compatibility with most commonly used supporting hardware.

Only 8 additional octets are necessary to form the encapsulation
when used within the default HDLC-like framing. In environments
where bandwidth is at a premium, the encapsulation and framing may
be shortened to 2 or 4 octets.

To support high speed implementations, the default encapsulation
uses only simple fields, only one of which needs to be examined
for demultiplexing. The default header and information fields
fall on 32-bit boundaries, and the trailer may be padded to an
arbitrary boundary.

Link Control Protocol

In order to be sufficiently versatile to be portable to a wide
variety of environments, PPP provides a Link Control Protocol
(LCP). The LCP is used to automatically agree upon the
encapsulation format options, handle varying limits on sizes of
packets, detect a looped-back link and other common
misconfiguration errors, and terminate the link. Other optional
facilities provided are authentication of the identity of its peer
on the link, and determination when a link is functioning properly
and when it is failing.

Network Control Protocols

Point-to-Point links tend to exacerbate many problems with the
current family of network protocols. For instance, assignment and
management of IP addresses, which is a problem even in LAN
environments, is especially difficult over circuit-switched
point-to-point links (such as dial-up modem servers). These
problems are handled by a family of Network Control Protocols
(NCPs), which each manage the specific needs required by their

respective network-layer protocols. These NCPs are defined in
companion documents.

Configuration

It is intended that PPP links be easy to configure. By design,
the standard defaults handle all common configurations. The
implementor can specify improvements to the default configuration,
which are automatically communicated to the peer without operator
intervention. Finally, the operator may explicitly configure
options for the link which enable the link to operate in
environments where it would otherwise be impossible.

This self-configuration is implemented through an extensible
option negotiation mechanism, wherein each end of the link
describes to the other its capabilities and requirements.
Although the option negotiation mechanism described in this
document is specified in terms of the Link Control Protocol (LCP),
the same facilities are designed to be used by other control
protocols, especially the family of NCPs.

1.1. Specification of Requirements

In this document, several words are used to signify the requirements
of the specification. These words are often capitalized.

MUST This word, or the adjective "required", means that the
definition is an absolute requirement of the specification.

MUST NOT This phrase means that the definition is an absolute
prohibition of the specification.

SHOULD This word, or the adjective "recommended", means that there
may exist valid reasons in particular circumstances to
ignore this item, but the full implications must be
understood and carefully weighed before choosing a
different course.

MAY This word, or the adjective "optional", means that this
item is one of an allowed set of alternatives. An
implementation which does not include this option MUST be
prepared to interoperate with another implementation which
does include the option.

1.2. Terminology

This document frequently uses the following terms:

datagram The unit of transmission in the network layer (such as IP).
A datagram may be encapsulated in one or more packets
passed to the data link layer.

frame The unit of transmission at the data link layer. A frame
may include a header and/or a trailer, along with some
number of units of data.

packet The basic unit of encapsulation, which is passed across the
interface between the network layer and the data link
layer. A packet is usually mapped to a frame; the
exceptions are when data link layer fragmentation is being
performed, or when multiple packets are incorporated into a
single frame.

peer The other end of the point-to-point link.

silently discard
The implementation discards the packet without further
processing. The implementation SHOULD provide the
capability of logging the error, including the contents of
the silently discarded packet, and SHOULD record the event
in a statistics counter.

2. PPP Encapsulation

The PPP encapsulation is used to disambiguate multiprotocol
datagrams. This encapsulation requires framing to indicate the
beginning and end of the encapsulation. Methods of providing framing
are specified in companion documents.

A summary of the PPP encapsulation is shown below. The fields are
transmitted from left to right.

+----------+-------------+---------+
| Protocol | Information | Padding |
| 8/16 bits| * | * |
+----------+-------------+---------+

Protocol Field

The Protocol field is one or two octets, and its value identifies
the datagram encapsulated in the Information field of the packet.
The field is transmitted and received most significant octet
first.

The structure of this field is consistent with the ISO 3309
extension mechanism for address fields. All Protocols MUST be
odd; the least significant bit of the least significant octet MUST
equal "1". Also, all Protocols MUST be assigned such that the
least significant bit of the most significant octet equals "0".
Frames received which don't comply with these rules MUST be
treated as having an unrecognized Protocol.

Protocol field values in the "0***" to "3***" range identify the
network-layer protocol of specific packets, and values in the
"8***" to "b***" range identify packets belonging to the
associated Network Control Protocols (NCPs), if any.

Protocol field values in the "4***" to "7***" range are used for
protocols with low volume traffic which have no associated NCP.
Protocol field values in the "c***" to "f***" range identify
packets as link-layer Control Protocols (such as LCP).

Up-to-date values of the Protocol field are specified in the most
recent "Assigned Numbers" RFC [2]. This specification reserves
the following values:

Value (in hex) Protocol Name

0001 Padding Protocol
0003 to 001f reserved (transparency inefficient)
007d reserved (Control Escape)
00cf reserved (PPP NLPID)
00ff reserved (compression inefficient)

8001 to 801f unused
807d unused
80cf unused
80ff unused

c021 Link Control Protocol
c023 Password Authentication Protocol
c025 Link Quality Report
c223 Challenge Handshake Authentication Protocol

Developers of new protocols MUST obtain a number from the Internet
Assigned Numbers Authority (IANA), at IANA@isi.edu.

Information Field

The Information field is zero or more octets. The Information
field contains the datagram for the protocol specified in the
Protocol field.

The maximum length for the Information field, including Padding,
but not including the Protocol field, is termed the Maximum
Receive Unit (MRU), which defaults to 1500 octets. By
negotiation, consenting PPP implementations may use other values
for the MRU.

Padding

On transmission, the Information field MAY be padded with an
arbitrary number of octets up to the MRU. It is the
responsibility of each protocol to distinguish padding octets from
real information.

3. PPP Link Operation

3.1. Overview

In order to establish communications over a point-to-point link, each
end of the PPP link MUST first send LCP packets to configure and test
the data link. After the link has been established, the peer MAY be
authenticated.

Then, PPP MUST send NCP packets to choose and configure one or more
network-layer protocols. Once each of the chosen network-layer
protocols has been configured, datagrams from each network-layer
protocol can be sent over the link.

The link will remain configured for communications until explicit LCP
or NCP packets close the link down, or until some external event
occurs (an inactivity timer expires or network administrator
intervention).

3.2. Phase Diagram

In the process of configuring, maintaining and terminating the
point-to-point link, the PPP link goes through several distinct
phases which are specified in the following simplified state diagram:

+------+ +-----------+ +--------------+
| | UP | | OPENED | | SUCCESS/NONE
| Dead |------->| Establish |---------->| Authenticate |--+
| | | | | | |
+------+ +-----------+ +--------------+ |
^ | | |
| FAIL | FAIL | |
+<--------------+ +----------+ |
| | |
| +-----------+ | +---------+ |
| DOWN | | | CLOSING | | |
+------------| Terminate |<---+<----------| Network |<-+
| | | |
+-----------+ +---------+

Not all transitions are specified in this diagram. The following
semantics MUST be followed.

3.3. Link Dead (physical-layer not ready)

The link necessarily begins and ends with this phase. When an
external event (such as carrier detection or network administrator
configuration) indicates that the physical-layer is ready to be used,
PPP will proceed to the Link Establishment phase.

During this phase, the LCP automaton (described later) will be in the
Initial or Starting states. The transition to the Link Establishment
phase will signal an Up event to the LCP automaton.

Implementation Note:

Typically, a link will return to this phase automatically after
the disconnection of a modem. In the case of a hard-wired link,
this phase may be extremely short -- merely long enough to detect
the presence of the device.

3.4. Link Establishment Phase

The Link Control Protocol (LCP) is used to establish the connection
through an exchange of Configure packets. This exchange is complete,
and the LCP Opened state entered, once a Configure-Ack packet
(described later) has been both sent and received.

All Configuration Options are assumed to be at default values unless
altered by the configuration exchange. See the chapter on LCP
Configuration Options for further discussion.

It is important to note that only Configuration Options which are
independent of particular network-layer protocols are configured by
LCP. Configuration of individual network-layer protocols is handled
by separate Network Control Protocols (NCPs) during the Network-Layer
Protocol phase.

Any non-LCP packets received during this phase MUST be silently
discarded.

The receipt of the LCP Configure-Request causes a return to the Link
Establishment phase from the Network-Layer Protocol phase or
Authentication phase.

3.5. Authentication Phase

On some links it may be desirable to require a peer to authenticate
itself before allowing network-layer protocol packets to be
exchanged.

By default, authentication is not mandatory. If an implementation
desires that the peer authenticate with some specific authentication
protocol, then it MUST request the use of that authentication
protocol during Link Establishment phase.

Authentication SHOULD take place as soon as possible after link
establishment. However, link quality determination MAY occur
concurrently. An implementation MUST NOT allow the exchange of link
quality determination packets to delay authentication indefinitely.

Advancement from the Authentication phase to the Network-Layer
Protocol phase MUST NOT occur until authentication has completed. If
authentication fails, the authenticator SHOULD proceed instead to the
Link Termination phase.

Only Link Control Protocol, authentication protocol, and link quality
monitoring packets are allowed during this phase. All other packets
received during this phase MUST be silently discarded.

Implementation Notes:

An implementation SHOULD NOT fail authentication simply due to
timeout or lack of response. The authentication SHOULD allow some
method of retransmission, and proceed to the Link Termination
phase only after a number of authentication attempts has been
exceeded.

The implementation responsible for commencing Link Termination
phase is the implementation which has refused authentication to
its peer.

3.6. Network-Layer Protocol Phase

Once PPP has finished the previous phases, each network-layer
protocol (such as IP, IPX, or AppleTalk) MUST be separately
configured by the appropriate Network Control Protocol (NCP).

Each NCP MAY be Opened and Closed at any time.

Implementation Note:

Because an implementation may initially use a significant amount
of time for link quality determination, implementations SHOULD
avoid fixed timeouts when waiting for their peers to configure a
NCP.

After a NCP has reached the Opened state, PPP will carry the
corresponding network-layer protocol packets. Any supported
network-layer protocol packets received when the corresponding NCP is
not in the Opened state MUST be silently discarded.

Implementation Note:

While LCP is in the Opened state, any protocol packet which is
unsupported by the implementation MUST be returned in a Protocol-
Reject (described later). Only protocols which are supported are
silently discarded.

During this phase, link traffic consists of any possible combination
of LCP, NCP, and network-layer protocol packets.

3.7. Link Termination Phase

PPP can terminate the link at any time. This might happen because of
the loss of carrier, authentication failure, link quality failure,
the expiration of an idle-period timer, or the administrative closing
of the link.

LCP is used to close the link through an exchange of Terminate
packets. When the link is closing, PPP informs the network-layer
protocols so that they may take appropriate action.

After the exchange of Terminate packets, the implementation SHOULD
signal the physical-layer to disconnect in order to enforce the
termination of the link, particularly in the case of an
authentication failure. The sender of the Terminate-Request SHOULD
disconnect after receiving a Terminate-Ack, or after the Restart
counter expires. The receiver of a Terminate-Request SHOULD wait for
the peer to disconnect, and MUST NOT disconnect until at least one
Restart time has passed after sending a Terminate-Ack. PPP SHOULD
proceed to the Link Dead phase.

Any non-LCP packets received during this phase MUST be silently
discarded.

Implementation Note:

The closing of the link by LCP is sufficient. There is no need
for each NCP to send a flurry of Terminate packets. Conversely,
the fact that one NCP has Closed is not sufficient reason to cause
the termination of the PPP link, even if that NCP was the only NCP
currently in the Opened state.

4. The Option Negotiation Automaton

The finite-state automaton is defined by events, actions and state
transitions. Events include reception of external commands such as
Open and Close, expiration of the Restart timer, and reception of
packets from a peer. Actions include the starting of the Restart
timer and transmission of packets to the peer.

Some types of packets -- Configure-Naks and Configure-Rejects, or
Code-Rejects and Protocol-Rejects, or Echo-Requests, Echo-Replies and
Discard-Requests -- are not differentiated in the automaton
descriptions. As will be described later, these packets do indeed
serve different functions. However, they always cause the same
transitions.

Events Actions

Up = lower layer is Up tlu = This-Layer-Up
Down = lower layer is Down tld = This-Layer-Down
Open = administrative Open tls = This-Layer-Started
Close= administrative Close tlf = This-Layer-Finished

TO+ = Timeout with counter > 0 irc = Initialize-Restart-Count
TO- = Timeout with counter expired zrc = Zero-Restart-Count

RCR+ = Receive-Configure-Request (Good) scr = Send-Configure-Request
RCR- = Receive-Configure-Request (Bad)
RCA = Receive-Configure-Ack sca = Send-Configure-Ack
RCN = Receive-Configure-Nak/Rej scn = Send-Configure-Nak/Rej

RTR = Receive-Terminate-Request str = Send-Terminate-Request
RTA = Receive-Terminate-Ack sta = Send-Terminate-Ack

RUC = Receive-Unknown-Code scj = Send-Code-Reject
RXJ+ = Receive-Code-Reject (permitted)
or Receive-Protocol-Reject
RXJ- = Receive-Code-Reject (catastrophic)
or Receive-Protocol-Reject
RXR = Receive-Echo-Request ser = Send-Echo-Reply
or Receive-Echo-Reply
or Receive-Discard-Request

4.1. State Transition Table

The complete state transition table follows. States are indicated
horizontally, and events are read vertically. State transitions and
actions are represented in the form action/new-state. Multiple
actions are separated by commas, and may continue on succeeding lines
as space requires; multiple actions may be implemented in any
convenient order. The state may be followed by a letter, which
indicates an explanatory footnote. The dash ('-') indicates an
illegal transition.

| State
| 0 1 2 3 4 5
Events| Initial Starting Closed Stopped Closing Stopping
------+-----------------------------------------------------------
Up | 2 irc,scr/6 - - - -
Down | - - 0 tls/1 0 1
Open | tls/1 1 irc,scr/6 3r 5r 5r
Close| 0 tlf/0 2 2 4 4
|
TO+ | - - - - str/4 str/5
TO- | - - - - tlf/2 tlf/3
|
RCR+ | - - sta/2 irc,scr,sca/8 4 5
RCR- | - - sta/2 irc,scr,scn/6 4 5
RCA | - - sta/2 sta/3 4 5
RCN | - - sta/2 sta/3 4 5
|
RTR | - - sta/2 sta/3 sta/4 sta/5
RTA | - - 2 3 tlf/2 tlf/3
|
RUC | - - scj/2 scj/3 scj/4 scj/5
RXJ+ | - - 2 3 4 5
RXJ- | - - tlf/2 tlf/3 tlf/2 tlf/3
|
RXR | - - 2 3 4 5

| State
| 6 7 8 9
Events| Req-Sent Ack-Rcvd Ack-Sent Opened
------+-----------------------------------------
Up | - - - -
Down | 1 1 1 tld/1
Open | 6 7 8 9r
Close|irc,str/4 irc,str/4 irc,str/4 tld,irc,str/4
|
TO+ | scr/6 scr/6 scr/8 -
TO- | tlf/3p tlf/3p tlf/3p -
|
RCR+ | sca/8 sca,tlu/9 sca/8 tld,scr,sca/8
RCR- | scn/6 scn/7 scn/6 tld,scr,scn/6
RCA | irc/7 scr/6x irc,tlu/9 tld,scr/6x
RCN |irc,scr/6 scr/6x irc,scr/8 tld,scr/6x
|
RTR | sta/6 sta/6 sta/6 tld,zrc,sta/5
RTA | 6 6 8 tld,scr/6
|
RUC | scj/6 scj/7 scj/8 scj/9
RXJ+ | 6 6 8 9
RXJ- | tlf/3 tlf/3 tlf/3 tld,irc,str/5
|
RXR | 6 7 8 ser/9

The states in which the Restart timer is running are identifiable by
the presence of TO events. Only the Send-Configure-Request, Send-
Terminate-Request and Zero-Restart-Count actions start or re-start
the Restart timer. The Restart timer is stopped when transitioning
from any state where the timer is running to a state where the timer
is not running.

The events and actions are defined according to a message passing
architecture, rather than a signalling architecture. If an action is
desired to control specific signals (such as DTR), additional actions
are likely to be required.

[p] Passive option; see Stopped state discussion.

[r] Restart option; see Open event discussion.

[x] Crossed connection; see RCA event discussion.

4.2. States

Following is a more detailed description of each automaton state.

Initial

In the Initial state, the lower layer is unavailable (Down), and
no Open has occurred. The Restart timer is not running in the
Initial state.

Starting

The Starting state is the Open counterpart to the Initial state.
An administrative Open has been initiated, but the lower layer is
still unavailable (Down). The Restart timer is not running in the
Starting state.

When the lower layer becomes available (Up), a Configure-Request
is sent.

Closed

In the Closed state, the link is available (Up), but no Open has
occurred. The Restart timer is not running in the Closed state.

Upon reception of Configure-Request packets, a Terminate-Ack is
sent. Terminate-Acks are silently discarded to avoid creating a
loop.

Stopped

The Stopped state is the Open counterpart to the Closed state. It
is entered when the automaton is waiting for a Down event after
the This-Layer-Finished action, or after sending a Terminate-Ack.
The Restart timer is not running in the Stopped state.

Upon reception of Configure-Request packets, an appropriate
response is sent. Upon reception of other packets, a Terminate-
Ack is sent. Terminate-Acks are silently discarded to avoid
creating a loop.

Rationale:

The Stopped state is a junction state for link termination,
link configuration failure, and other automaton failure modes.
These potentially separate states have been combined.

There is a race condition between the Down event response (from

the This-Layer-Finished action) and the Receive-Configure-
Request event. When a Configure-Request arrives before the
Down event, the Down event will supercede by returning the
automaton to the Starting state. This prevents attack by
repetition.

Implementation Option:

After the peer fails to respond to Configure-Requests, an
implementation MAY wait passively for the peer to send
Configure-Requests. In this case, the This-Layer-Finished
action is not used for the TO- event in states Req-Sent, Ack-
Rcvd and Ack-Sent.

This option is useful for dedicated circuits, or circuits which
have no status signals available, but SHOULD NOT be used for
switched circuits.

Closing

In the Closing state, an attempt is made to terminate the
connection. A Terminate-Request has been sent and the Restart
timer is running, but a Terminate-Ack has not yet been received.

Upon reception of a Terminate-Ack, the Closed state is entered.
Upon the expiration of the Restart timer, a new Terminate-Request
is transmitted, and the Restart timer is restarted. After the
Restart timer has expired Max-Terminate times, the Closed state is
entered.

Stopping

The Stopping state is the Open counterpart to the Closing state.
A Terminate-Request has been sent and the Restart timer is
running, but a Terminate-Ack has not yet been received.

Rationale:

The Stopping state provides a well defined opportunity to
terminate a link before allowing new traffic. After the link
has terminated, a new configuration may occur via the Stopped
or Starting states.

Request-Sent

In the Request-Sent state an attempt is made to configure the
connection. A Configure-Request has been sent and the Restart
timer is running, but a Configure-Ack has not yet been received

nor has one been sent.

Ack-Received

In the Ack-Received state, a Configure-Request has been sent and a
Configure-Ack has been received. The Restart timer is still
running, since a Configure-Ack has not yet been sent.

Ack-Sent

In the Ack-Sent state, a Configure-Request and a Configure-Ack
have both been sent, but a Configure-Ack has not yet been
received. The Restart timer is running, since a Configure-Ack has
not yet been received.

Opened

In the Opened state, a Configure-Ack has been both sent and
received. The Restart timer is not running.

When entering the Opened state, the implementation SHOULD signal
the upper layers that it is now Up. Conversely, when leaving the
Opened state, the implementation SHOULD signal the upper layers
that it is now Down.

4.3. Events

Transitions and actions in the automaton are caused by events.

Up

This event occurs when a lower layer indicates that it is ready to
carry packets.

Typically, this event is used by a modem handling or calling
process, or by some other coupling of the PPP link to the physical
media, to signal LCP that the link is entering Link Establishment
phase.

It also can be used by LCP to signal each NCP that the link is
entering Network-Layer Protocol phase. That is, the This-Layer-Up
action from LCP triggers the Up event in the NCP.

Down

This event occurs when a lower layer indicates that it is no

longer ready to carry packets.

Typically, this event is used by a modem handling or calling
process, or by some other coupling of the PPP link to the physical
media, to signal LCP that the link is entering Link Dead phase.

It also can be used by LCP to signal each NCP that the link is
leaving Network-Layer Protocol phase. That is, the This-Layer-
Down action from LCP triggers the Down event in the NCP.

Open

This event indicates that the link is administratively available
for traffic; that is, the network administrator (human or program)
has indicated that the link is allowed to be Opened. When this
event occurs, and the link is not in the Opened state, the
automaton attempts to send configuration packets to the peer.

If the automaton is not able to begin configuration (the lower
layer is Down, or a previous Close event has not completed), the
establishment of the link is automatically delayed.

When a Terminate-Request is received, or other events occur which
cause the link to become unavailable, the automaton will progress
to a state where the link is ready to re-open. No additional
administrative intervention is necessary.

Implementation Option:

Experience has shown that users will execute an additional Open
command when they want to renegotiate the link. This might
indicate that new values are to be negotiated.

Since this is not the meaning of the Open event, it is
suggested that when an Open user command is executed in the
Opened, Closing, Stopping, or Stopped states, the
implementation issue a Down event, immediately followed by an
Up event. Care must be taken that an intervening Down event
cannot occur from another source.

The Down followed by an Up will cause an orderly renegotiation
of the link, by progressing through the Starting to the
Request-Sent state. This will cause the renegotiation of the
link, without any harmful side effects.

Close

This event indicates that the link is not available for traffic;

that is, the network administrator (human or program) has
indicated that the link is not allowed to be Opened. When this
event occurs, and the link is not in the Closed state, the
automaton attempts to terminate the connection. Futher attempts
to re-configure the link are denied until a new Open event occurs.

Implementation Note:

When authentication fails, the link SHOULD be terminated, to
prevent attack by repetition and denial of service to other
users. Since the link is administratively available (by
definition), this can be accomplished by simulating a Close
event to the LCP, immediately followed by an Open event. Care
must be taken that an intervening Close event cannot occur from
another source.

The Close followed by an Open will cause an orderly termination
of the link, by progressing through the Closing to the Stopping
state, and the This-Layer-Finished action can disconnect the
link. The automaton waits in the Stopped or Starting states
for the next connection attempt.

Timeout (TO+,TO-)

This event indicates the expiration of the Restart timer. The
Restart timer is used to time responses to Configure-Request and
Terminate-Request packets.

The TO+ event indicates that the Restart counter continues to be
greater than zero, which triggers the corresponding Configure-
Request or Terminate-Request packet to be retransmitted.

The TO- event indicates that the Restart counter is not greater
than zero, and no more packets need to be retransmitted.

Receive-Configure-Request (RCR+,RCR-)

This event occurs when a Configure-Request packet is received from
the peer. The Configure-Request packet indicates the desire to
open a connection and may specify Configuration Options. The
Configure-Request packet is more fully described in a later
section.

The RCR+ event indicates that the Configure-Request was
acceptable, and triggers the transmission of a corresponding
Configure-Ack.

The RCR- event indicates that the Configure-Request was

unacceptable, and triggers the transmission of a corresponding
Configure-Nak or Configure-Reject.

Implementation Note:

These events may occur on a connection which is already in the
Opened state. The implementation MUST be prepared to
immediately renegotiate the Configuration Options.

Receive-Configure-Ack (RCA)

This event occurs when a valid Configure-Ack packet is received
from the peer. The Configure-Ack packet is a positive response to
a Configure-Request packet. An out of sequence or otherwise
invalid packet is silently discarded.

Implementation Note:

Since the correct packet has already been received before
reaching the Ack-Rcvd or Opened states, it is extremely
unlikely that another such packet will arrive. As specified,
all invalid Ack/Nak/Rej packets are silently discarded, and do
not affect the transitions of the automaton.

However, it is not impossible that a correctly formed packet
will arrive through a coincidentally-timed cross-connection.
It is more likely to be the result of an implementation error.
At the very least, this occurance SHOULD be logged.

Receive-Configure-Nak/Rej (RCN)

This event occurs when a valid Configure-Nak or Configure-Reject
packet is received from the peer. The Configure-Nak and
Configure-Reject packets are negative responses to a Configure-
Request packet. An out of sequence or otherwise invalid packet is
silently discarded.

Implementation Note:

Although the Configure-Nak and Configure-Reject cause the same
state transition in the automaton, these packets have
significantly different effects on the Configuration Options
sent in the resulting Configure-Request packet.

Receive-Terminate-Request (RTR)

This event occurs when a Terminate-Request packet is received.
The Terminate-Request packet indicates the desire of the peer to

close the connection.

Implementation Note:

This event is not identical to the Close event (see above), and
does not override the Open commands of the local network
administrator. The implementation MUST be prepared to receive
a new Configure-Request without network administrator
intervention.

Receive-Terminate-Ack (RTA)

This event occurs when a Terminate-Ack packet is received from the
peer. The Terminate-Ack packet is usually a response to a
Terminate-Request packet. The Terminate-Ack packet may also
indicate that the peer is in Closed or Stopped states, and serves
to re-synchronize the link configuration.

Receive-Unknown-Code (RUC)

This event occurs when an un-interpretable packet is received from
the peer. A Code-Reject packet is sent in response.

Receive-Code-Reject, Receive-Protocol-Reject (RXJ+,RXJ-)

This event occurs when a Code-Reject or a Protocol-Reject packet
is received from the peer.

The RXJ+ event arises when the rejected value is acceptable, such
as a Code-Reject of an extended code, or a Protocol-Reject of a
NCP. These are within the scope of normal operation. The
implementation MUST stop sending the offending packet type.

The RXJ- event arises when the rejected value is catastrophic,
such as a Code-Reject of Configure-Request, or a Protocol-Reject
of LCP! This event communicates an unrecoverable error that
terminates the connection.

Receive-Echo-Request, Receive-Echo-Reply, Receive-Discard-Request
(RXR)

This event occurs when an Echo-Request, Echo-Reply or Discard-
Request packet is received from the peer. The Echo-Reply packet
is a response to an Echo-Request packet. There is no reply to an
Echo-Reply or Discard-Request packet.

4.4. Actions

Actions in the automaton are caused by events and typically indicate
the transmission of packets and/or the starting or stopping of the
Restart timer.

Illegal-Event (-)

This indicates an event that cannot occur in a properly
implemented automaton. The implementation has an internal error,
which should be reported and logged. No transition is taken, and
the implementation SHOULD NOT reset or freeze.

This-Layer-Up (tlu)

This action indicates to the upper layers that the automaton is
entering the Opened state.

Typically, this action is used by the LCP to signal the Up event
to a NCP, Authentication Protocol, or Link Quality Protocol, or
MAY be used by a NCP to indicate that the link is available for
its network layer traffic.

This-Layer-Down (tld)

This action indicates to the upper layers that the automaton is
leaving the Opened state.

Typically, this action is used by the LCP to signal the Down event
to a NCP, Authentication Protocol, or Link Quality Protocol, or
MAY be used by a NCP to indicate that the link is no longer
available for its network layer traffic.

This-Layer-Started (tls)

This action indicates to the lower layers that the automaton is
entering the Starting state, and the lower layer is needed for the
link. The lower layer SHOULD respond with an Up event when the
lower layer is available.

This results of this action are highly implementation dependent.

This-Layer-Finished (tlf)

This action indicates to the lower layers that the automaton is
entering the Initial, Closed or Stopped states, and the lower
layer is no longer needed for the link. The lower layer SHOULD
respond with a Down event when the lower layer has terminated.

Typically, this action MAY be used by the LCP to advance to the
Link Dead phase, or MAY be used by a NCP to indicate to the LCP
that the link may terminate when there are no other NCPs open.

This results of this action are highly implementation dependent.

Initialize-Restart-Count (irc)

This action sets the Restart counter to the appropriate value
(Max-Terminate or Max-Configure). The counter is decremented for
each transmission, including the first.

Implementation Note:

In addition to setting the Restart counter, the implementation
MUST set the timeout period to the initial value when Restart
timer backoff is used.

Zero-Restart-Count (zrc)

This action sets the Restart counter to zero.

Implementation Note:

This action enables the FSA to pause before proceeding to the
desired final state, allowing traffic to be processed by the
peer. In addition to zeroing the Restart counter, the
implementation MUST set the timeout period to an appropriate
value.

Send-Configure-Request (scr)

A Configure-Request packet is transmitted. This indicates the
desire to open a connection with a specified set of Configuration
Options. The Restart timer is started when the Configure-Request
packet is transmitted, to guard against packet loss. The Restart
counter is decremented each time a Configure-Request is sent.

Send-Configure-Ack (sca)

A Configure-Ack packet is transmitted. This acknowledges the
reception of a Configure-Request packet with an acceptable set of
Configuration Options.

Send-Configure-Nak (scn)

A Configure-Nak or Configure-Reject packet is transmitted, as
appropriate. This negative response reports the reception of a

Configure-Request packet with an unacceptable set of Configuration
Options.

Configure-Nak packets are used to refuse a Configuration Option
value, and to suggest a new, acceptable value. Configure-Reject
packets are used to refuse all negotiation about a Configuration
Option, typically because it is not recognized or implemented.
The use of Configure-Nak versus Configure-Reject is more fully
described in the chapter on LCP Packet Formats.

Send-Terminate-Request (str)

A Terminate-Request packet is transmitted. This indicates the
desire to close a connection. The Restart timer is started when
the Terminate-Request packet is transmitted, to guard against
packet loss. The Restart counter is decremented each time a
Terminate-Request is sent.

Send-Terminate-Ack (sta)

A Terminate-Ack packet is transmitted. This acknowledges the
reception of a Terminate-Request packet or otherwise serves to
synchronize the automatons.

Send-Code-Reject (scj)

A Code-Reject packet is transmitted. This indicates the reception
of an unknown type of packet.

Send-Echo-Reply (ser)

An Echo-Reply packet is transmitted. This acknowledges the
reception of an Echo-Request packet.

4.5. Loop Avoidance

The protocol makes a reasonable attempt at avoiding Configuration
Option negotiation loops. However, the protocol does NOT guarantee
that loops will not happen. As with any negotiation, it is possible
to configure two PPP implementations with conflicting policies that
will never converge. It is also possible to configure policies which
do converge, but which take significant time to do so. Implementors
should keep this in mind and SHOULD implement loop detection
mechanisms or higher level timeouts.

4.6. Counters and Timers

Restart Timer

There is one special timer used by the automaton. The Restart
timer is used to time transmissions of Configure-Request and
Terminate-Request packets. Expiration of the Restart timer causes
a Timeout event, and retransmission of the corresponding
Configure-Request or Terminate-Request packet. The Restart timer
MUST be configurable, but SHOULD default to three (3) seconds.

Implementation Note:

The Restart timer SHOULD be based on the speed of the link.
The default value is designed for low speed (2,400 to 9,600
bps), high switching latency links (typical telephone lines).
Higher speed links, or links with low switching latency, SHOULD
have correspondingly faster retransmission times.

Instead of a constant value, the Restart timer MAY begin at an
initial small value and increase to the configured final value.
Each successive value less than the final value SHOULD be at
least twice the previous value. The initial value SHOULD be
large enough to account for the size of the packets, twice the
round trip time for transmission at the link speed, and at
least an additional 100 milliseconds to allow the peer to
process the packets before responding. Some circuits add
another 200 milliseconds of satellite delay. Round trip times
for modems operating at 14,400 bps have been measured in the
range of 160 to more than 600 milliseconds.

Max-Terminate

There is one required restart counter for Terminate-Requests.
Max-Terminate indicates the number of Terminate-Request packets
sent without receiving a Terminate-Ack before assuming that the
peer is unable to respond. Max-Terminate MUST be configurable,
but SHOULD default to two (2) transmissions.

Max-Configure

A similar counter is recommended for Configure-Requests. Max-
Configure indicates the number of Configure-Request packets sent
without receiving a valid Configure-Ack, Configure-Nak or
Configure-Reject before assuming that the peer is unable to
respond. Max-Configure MUST be configurable, but SHOULD default
to ten (10) transmissions.

Max-Failure

A related counter is recommended for Configure-Nak. Max-Failure
indicates the number of Configure-Nak packets sent without sending
a Configure-Ack before assuming that configuration is not
converging. Any further Configure-Nak packets for peer requested
options are converted to Configure-Reject packets, and locally
desired options are no longer appended. Max-Failure MUST be
configurable, but SHOULD default to five (5) transmissions.

5. LCP Packet Formats

There are three classes of LCP packets:

1. Link Configuration packets used to establish and configure a
link (Configure-Request, Configure-Ack, Configure-Nak and
Configure-Reject).

2. Link Termination packets used to terminate a link (Terminate-
Request and Terminate-Ack).

3. Link Maintenance packets used to manage and debug a link
(Code-Reject, Protocol-Reject, Echo-Request, Echo-Reply, and
Discard-Request).

In the interest of simplicity, there is no version field in the LCP
packet. A correctly functioning LCP implementation will always
respond to unknown Protocols and Codes with an easily recognizable
LCP packet, thus providing a deterministic fallback mechanism for
implementations of other versions.

Regardless of which Configuration Options are enabled, all LCP Link
Configuration, Link Termination, and Code-Reject packets (codes 1
through 7) are always sent as if no Configuration Options were
negotiated. In particular, each Configuration Option specifies a
default value. This ensures that such LCP packets are always
recognizable, even when one end of the link mistakenly believes the
link to be open.

Exactly one LCP packet is encapsulated in the PPP Information field,
where the PPP Protocol field indicates type hex c021 (Link Control
Protocol).

A summary of the Link Control Protocol packet format is shown below.
The fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data ...
+-+-+-+-+

Code

The Code field is one octet, and identifies the kind of LCP

packet. When a packet is received with an unknown Code field, a
Code-Reject packet is transmitted.

Up-to-date values of the LCP Code field are specified in the most
recent "Assigned Numbers" RFC [2]. This document concerns the
following values:

1 Configure-Request
2 Configure-Ack
3 Configure-Nak
4 Configure-Reject
5 Terminate-Request
6 Terminate-Ack
7 Code-Reject
8 Protocol-Reject
9 Echo-Request
10 Echo-Reply
11 Discard-Request

Identifier

The Identifier field is one octet, and aids in matching requests
and replies. When a packet is received with an invalid Identifier
field, the packet is silently discarded without affecting the
automaton.

Length

The Length field is two octets, and indicates the length of the
LCP packet, including the Code, Identifier, Length and Data
fields. The Length MUST NOT exceed the MRU of the link.

Octets outside the range of the Length field are treated as
padding and are ignored on reception. When a packet is received
with an invalid Length field, the packet is silently discarded
without affecting the automaton.

Data

The Data field is zero or more octets, as indicated by the Length
field. The format of the Data field is determined by the Code
field.

5.1. Configure-Request

Description

An implementation wishing to open a connection MUST transmit a
Configure-Request. The Options field is filled with any desired
changes to the link defaults. Configuration Options SHOULD NOT be
included with default values.

Upon reception of a Configure-Request, an appropriate reply MUST
be transmitted.

A summary of the Configure-Request packet format is shown below. The
fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options ...
+-+-+-+-+

Code

1 for Configure-Request.

Identifier

The Identifier field MUST be changed whenever the contents of the
Options field changes, and whenever a valid reply has been
received for a previous request. For retransmissions, the
Identifier MAY remain unchanged.

Options

The options field is variable in length, and contains the list of
zero or more Configuration Options that the sender desires to
negotiate. All Configuration Options are always negotiated
simultaneously. The format of Configuration Options is further
described in a later chapter.

5.2. Configure-Ack

Description

If every Configuration Option received in a Configure-Request is
recognizable and all values are acceptable, then the
implementation MUST transmit a Configure-Ack. The acknowledged
Configuration Options MUST NOT be reordered or modified in any
way.

On reception of a Configure-Ack, the Identifier field MUST match
that of the last transmitted Configure-Request. Additionally, the
Configuration Options in a Configure-Ack MUST exactly match those
of the last transmitted Configure-Request. Invalid packets are
silently discarded.

A summary of the Configure-Ack packet format is shown below. The
fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options ...
+-+-+-+-+

Code

2 for Configure-Ack.

Identifier

The Identifier field is a copy of the Identifier field of the
Configure-Request which caused this Configure-Ack.

Options

The Options field is variable in length, and contains the list of
zero or more Configuration Options that the sender is
acknowledging. All Configuration Options are always acknowledged
simultaneously.

5.3. Configure-Nak

Description

If every instance of the received Configuration Options is
recognizable, but some values are not acceptable, then the
implementation MUST transmit a Configure-Nak. The Options field
is filled with only the unacceptable Configuration Options from
the Configure-Request. All acceptable Configuration Options are
filtered out of the Configure-Nak, but otherwise the Configuration
Options from the Configure-Request MUST NOT be reordered.

Options which have no value fields (boolean options) MUST use the
Configure-Reject reply instead.

Each Configuration Option which is allowed only a single instance
MUST be modified to a value acceptable to the Configure-Nak
sender. The default value MAY be used, when this differs from the
requested value.

When a particular type of Configuration Option can be listed more
than once with different values, the Configure-Nak MUST include a
list of all values for that option which are acceptable to the
Configure-Nak sender. This includes acceptable values that were
present in the Configure-Request.

Finally, an implementation may be configured to request the
negotiation of a specific Configuration Option. If that option is
not listed, then that option MAY be appended to the list of Nak'd
Configuration Options, in order to prompt the peer to include that
option in its next Configure-Request packet. Any value fields for
the option MUST indicate values acceptable to the Configure-Nak
sender.

On reception of a Configure-Nak, the Identifier field MUST match
that of the last transmitted Configure-Request. Invalid packets
are silently discarded.

Reception of a valid Configure-Nak indicates that when a new
Configure-Request is sent, the Configuration Options MAY be
modified as specified in the Configure-Nak. When multiple
instances of a Configuration Option are present, the peer SHOULD
select a single value to include in its next Configure-Request
packet.

Some Configuration Options have a variable length. Since the
Nak'd Option has been modified by the peer, the implementation
MUST be able to handle an Option length which is different from

the original Configure-Request.

A summary of the Configure-Nak packet format is shown below. The
fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options ...
+-+-+-+-+

Code

3 for Configure-Nak.

Identifier

The Identifier field is a copy of the Identifier field of the
Configure-Request which caused this Configure-Nak.

Options

The Options field is variable in length, and contains the list of
zero or more Configuration Options that the sender is Nak'ing.
All Configuration Options are always Nak'd simultaneously.

5.4. Configure-Reject

Description

If some Configuration Options received in a Configure-Request are
not recognizable or are not acceptable for negotiation (as
configured by a network administrator), then the implementation
MUST transmit a Configure-Reject. The Options field is filled
with only the unacceptable Configuration Options from the
Configure-Request. All recognizable and negotiable Configuration
Options are filtered out of the Configure-Reject, but otherwise
the Configuration Options MUST NOT be reordered or modified in any
way.

On reception of a Configure-Reject, the Identifier field MUST
match that of the last transmitted Configure-Request.
Additionally, the Configuration Options in a Configure-Reject MUST

be a proper subset of those in the last transmitted Configure-
Request. Invalid packets are silently discarded.

Reception of a valid Configure-Reject indicates that when a new
Configure-Request is sent, it MUST NOT include any of the
Configuration Options listed in the Configure-Reject.

A summary of the Configure-Reject packet format is shown below. The
fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options ...
+-+-+-+-+

Code

4 for Configure-Reject.

Identifier

The Identifier field is a copy of the Identifier field of the
Configure-Request which caused this Configure-Reject.

Options

The Options field is variable in length, and contains the list of
zero or more Configuration Options that the sender is rejecting.
All Configuration Options are always rejected simultaneously.

5.5. Terminate-Request and Terminate-Ack

Description

LCP includes Terminate-Request and Terminate-Ack Codes in order to
provide a mechanism for closing a connection.

An implementation wishing to close a connection SHOULD transmit a
Terminate-Request. Terminate-Request packets SHOULD continue to
be sent until Terminate-Ack is received, the lower layer indicates
that it has gone down, or a sufficiently large number have been
transmitted such that the peer is down with reasonable certainty.

Upon reception of a Terminate-Request, a Terminate-Ack MUST be
transmitted.

Reception of an unelicited Terminate-Ack indicates that the peer
is in the Closed or Stopped states, or is otherwise in need of
re-negotiation.

A summary of the Terminate-Request and Terminate-Ack packet formats
is shown below. The fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data ...
+-+-+-+-+

Code

5 for Terminate-Request;

6 for Terminate-Ack.

Identifier

On transmission, the Identifier field MUST be changed whenever the
content of the Data field changes, and whenever a valid reply has
been received for a previous request. For retransmissions, the
Identifier MAY remain unchanged.

On reception, the Identifier field of the Terminate-Request is
copied into the Identifier field of the Terminate-Ack packet.

Data

The Data field is zero or more octets, and contains uninterpreted
data for use by the sender. The data may consist of any binary
value. The end of the field is indicated by the Length.

5.6. Code-Reject

Description

Reception of a LCP packet with an unknown Code indicates that the
peer is operating with a different version. This MUST be reported
back to the sender of the unknown Code by transmitting a Code-
Reject.

Upon reception of the Code-Reject of a code which is fundamental
to this version of the protocol, the implementation SHOULD report
the problem and drop the connection, since it is unlikely that the
situation can be rectified automatically.

A summary of the Code-Reject packet format is shown below. The
fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Rejected-Packet ...
+-+-+-+-+-+-+-+-+

Code

7 for Code-Reject.

Identifier

The Identifier field MUST be changed for each Code-Reject sent.

Rejected-Packet

The Rejected-Packet field contains a copy of the LCP packet which
is being rejected. It begins with the Information field, and does
not include any Data Link Layer headers nor an FCS. The
Rejected-Packet MUST be truncated to comply with the peer's

established MRU.

5.7. Protocol-Reject

Description

Reception of a PPP packet with an unknown Protocol field indicates
that the peer is attempting to use a protocol which is
unsupported. This usually occurs when the peer attempts to
configure a new protocol. If the LCP automaton is in the Opened
state, then this MUST be reported back to the peer by transmitting
a Protocol-Reject.

Upon reception of a Protocol-Reject, the implementation MUST stop
sending packets of the indicated protocol at the earliest
opportunity.

Protocol-Reject packets can only be sent in the LCP Opened state.
Protocol-Reject packets received in any state other than the LCP
Opened state SHOULD be silently discarded.

A summary of the Protocol-Reject packet format is shown below. The
fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Rejected-Protocol | Rejected-Information ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Code

8 for Protocol-Reject.

Identifier

The Identifier field MUST be changed for each Protocol-Reject
sent.

Rejected-Protocol

The Rejected-Protocol field is two octets, and contains the PPP
Protocol field of the packet which is being rejected.

Rejected-Information

The Rejected-Information field contains a copy of the packet which
is being rejected. It begins with the Information field, and does
not include any Data Link Layer headers nor an FCS. The
Rejected-Information MUST be truncated to comply with the peer's
established MRU.

5.8. Echo-Request and Echo-Reply

Description

LCP includes Echo-Request and Echo-Reply Codes in order to provide
a Data Link Layer loopback mechanism for use in exercising both
directions of the link. This is useful as an aid in debugging,
link quality determination, performance testing, and for numerous
other functions.

Upon reception of an Echo-Request in the LCP Opened state, an
Echo-Reply MUST be transmitted.

Echo-Request and Echo-Reply packets MUST only be sent in the LCP
Opened state. Echo-Request and Echo-Reply packets received in any
state other than the LCP Opened state SHOULD be silently
discarded.

A summary of the Echo-Request and Echo-Reply packet formats is shown
below. The fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Magic-Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data ...
+-+-+-+-+

Code

9 for Echo-Request;

10 for Echo-Reply.

Identifier

On transmission, the Identifier field MUST be changed whenever the
content of the Data field changes, and whenever a valid reply has
been received for a previous request. For retransmissions, the
Identifier MAY remain unchanged.

On reception, the Identifier field of the Echo-Request is copied
into the Identifier field of the Echo-Reply packet.

Magic-Number

The Magic-Number field is four octets, and aids in detecting links
which are in the looped-back condition. Until the Magic-Number
Configuration Option has been successfully negotiated, the Magic-
Number MUST be transmitted as zero. See the Magic-Number
Configuration Option for further explanation.

Data

The Data field is zero or more octets, and contains uninterpreted
data for use by the sender. The data may consist of any binary
value. The end of the field is indicated by the Length.

5.9. Discard-Request

Description

LCP includes a Discard-Request Code in order to provide a Data
Link Layer sink mechanism for use in exercising the local to
remote direction of the link. This is useful as an aid in
debugging, performance testing, and for numerous other functions.

Discard-Request packets MUST only be sent in the LCP Opened state.
On reception, the receiver MUST silently discard any Discard-
Request that it receives.

A summary of the Discard-Request packet format is shown below. The
fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Code | Identifier | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Magic-Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data ...
+-+-+-+-+

Code

11 for Discard-Request.

Identifier

The Identifier field MUST be changed for each Discard-Request
sent.

Magic-Number

The Magic-Number field is four octets, and aids in detecting links
which are in the looped-back condition. Until the Magic-Number
Configuration Option has been successfully negotiated, the Magic-
Number MUST be transmitted as zero. See the Magic-Number
Configuration Option for further explanation.

Data

The Data field is zero or more octets, and contains uninterpreted
data for use by the sender. The data may consist of any binary
value. The end of the field is indicated by the Length.

6. LCP Configuration Options

LCP Configuration Options allow negotiation of modifications to the
default characteristics of a point-to-point link. If a Configuration
Option is not included in a Configure-Request packet, the default
value for that Configuration Option is assumed.

Some Configuration Options MAY be listed more than once. The effect
of this is Configuration Option specific, and is specified by each
such Configuration Option description. (None of the Configuration
Options in this specification can be listed more than once.)

The end of the list of Configuration Options is indicated by the
Length field of the LCP packet.

Unless otherwise specified, all Configuration Options apply in a
half-duplex fashion; typically, in the receive direction of the link
from the point of view of the Configure-Request sender.

Design Philosophy

The options indicate additional capabilities or requirements of
the implementation that is requesting the option. An
implementation which does not understand any option SHOULD
interoperate with one which implements every option.

A default is specified for each option which allows the link to
correctly function without negotiation of the option, although
perhaps with less than optimal performance.

Except where explicitly specified, acknowledgement of an option
does not require the peer to take any additional action other than
the default.

It is not necessary to send the default values for the options in
a Configure-Request.

A summary of the Configuration Option format is shown below. The
fields are transmitted from left to right.

0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | Data ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Type

The Type field is one octet, and indicates the type of
Configuration Option. Up-to-date values of the LCP Option Type
field are specified in the most recent "Assigned Numbers" RFC [2].
This document concerns the following values:

0 RESERVED
1 Maximum-Receive-Unit
3 Authentication-Protocol
4 Quality-Protocol
5 Magic-Number
7 Protocol-Field-Compression
8 Address-and-Control-Field-Compression

Length

The Length field is one octet, and indicates the length of this
Configuration Option including the Type, Length and Data fields.

If a negotiable Configuration Option is received in a Configure-
Request, but with an invalid or unrecognized Length, a Configure-
Nak SHOULD be transmitted which includes the desired Configuration
Option with an appropriate Length and Data.

Data

The Data field is zero or more octets, and contains information
specific to the Configuration Option. The format and length of
the Data field is determined by the Type and Length fields.

When the Data field is indicated by the Length to extend beyond
the end of the Information field, the entire packet is silently
discarded without affecting the automaton.

6.1. Maximum-Receive-Unit (MRU)

Description

This Configuration Option may be sent to inform the peer that the
implementation can receive larger packets, or to request that the
peer send smaller packets.

The default value is 1500 octets. If smaller packets are
requested, an implementation MUST still be able to receive the
full 1500 octet information field in case link synchronization is
lost.

Implementation Note:

This option is used to indicate an implementation capability.
The peer is not required to maximize the use of the capacity.
For example, when a MRU is indicated which is 2048 octets, the
peer is not required to send any packet with 2048 octets. The
peer need not Configure-Nak to indicate that it will only send
smaller packets, since the implementation will always require
support for at least 1500 octets.

A summary of the Maximum-Receive-Unit Configuration Option format is
shown below. The fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | Maximum-Receive-Unit |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Type

1

Length

4

Maximum-Receive-Unit

The Maximum-Receive-Unit field is two octets, and specifies the
maximum number of octets in the Information and Padding fields.
It does not include the framing, Protocol field, FCS, nor any
transparency bits or bytes.

6.2. Authentication-Protocol

Description

On some links it may be desirable to require a peer to
authenticate itself before allowing network-layer protocol packets
to be exchanged.

This Configuration Option provides a method to negotiate the use
of a specific protocol for authentication. By default,
authentication is not required.

An implementation MUST NOT include multiple Authentication-
Protocol Configuration Options in its Configure-Request packets.
Instead, it SHOULD attempt to configure the most desirable
protocol first. If that protocol is Configure-Nak'd, then the
implementation SHOULD attempt the next most desirable protocol in
the next Configure-Request.

The implementation sending the Configure-Request is indicating
that it expects authentication from its peer. If an
implementation sends a Configure-Ack, then it is agreeing to
authenticate with the specified protocol. An implementation
receiving a Configure-Ack SHOULD expect the peer to authenticate
with the acknowledged protocol.

There is no requirement that authentication be full-duplex or that
the same protocol be used in both directions. It is perfectly
acceptable for different protocols to be used in each direction.
This will, of course, depend on the specific protocols negotiated.

A summary of the Authentication-Protocol Configuration Option format
is shown below. The fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | Authentication-Protocol |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data ...
+-+-+-+-+

Type

3

Length

>= 4

Authentication-Protocol

The Authentication-Protocol field is two octets, and indicates the
authentication protocol desired. Values for this field are always
the same as the PPP Protocol field values for that same
authentication protocol.

Up-to-date values of the Authentication-Protocol field are
specified in the most recent "Assigned Numbers" RFC [2]. Current
values are assigned as follows:

Value (in hex) Protocol

c023 Password Authentication Protocol
c223 Challenge Handshake Authentication Protocol

Data

The Data field is zero or more octets, and contains additional
data as determined by the particular protocol.

6.3. Quality-Protocol

Description

On some links it may be desirable to determine when, and how
often, the link is dropping data. This process is called link
quality monitoring.

This Configuration Option provides a method to negotiate the use
of a specific protocol for link quality monitoring. By default,
link quality monitoring is disabled.

The implementation sending the Configure-Request is indicating
that it expects to receive monitoring information from its peer.
If an implementation sends a Configure-Ack, then it is agreeing to
send the specified protocol. An implementation receiving a
Configure-Ack SHOULD expect the peer to send the acknowledged
protocol.

There is no requirement that quality monitoring be full-duplex or

that the same protocol be used in both directions. It is
perfectly acceptable for different protocols to be used in each
direction. This will, of course, depend on the specific protocols
negotiated.

A summary of the Quality-Protocol Configuration Option format is
shown below. The fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | Quality-Protocol |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data ...
+-+-+-+-+

Type

4

Length

>= 4

Quality-Protocol

The Quality-Protocol field is two octets, and indicates the link
quality monitoring protocol desired. Values for this field are
always the same as the PPP Protocol field values for that same
monitoring protocol.

Up-to-date values of the Quality-Protocol field are specified in
the most recent "Assigned Numbers" RFC [2]. Current values are
assigned as follows:

Value (in hex) Protocol

c025 Link Quality Report

Data

The Data field is zero or more octets, and contains additional
data as determined by the particular protocol.

6.4. Magic-Number

Description

This Configuration Option provides a method to detect looped-back
links and other Data Link Layer anomalies. This Configuration
Option MAY be required by some other Configuration Options such as
the Quality-Protocol Configuration Option. By default, the
Magic-Number is not negotiated, and zero is inserted where a
Magic-Number might otherwise be used.

Before this Configuration Option is requested, an implementation
MUST choose its Magic-Number. It is recommended that the Magic-
Number be chosen in the most random manner possible in order to
guarantee with very high probability that an implementation will
arrive at a unique number. A good way to choose a unique random
number is to start with a unique seed. Suggested sources of
uniqueness include machine serial numbers, other network hardware
addresses, time-of-day clocks, etc. Particularly good random
number seeds are precise measurements of the inter-arrival time of
physical events such as packet reception on other connected
networks, server response time, or the typing rate of a human
user. It is also suggested that as many sources as possible be
used simultaneously.

When a Configure-Request is received with a Magic-Number
Configuration Option, the received Magic-Number is compared with
the Magic-Number of the last Configure-Request sent to the peer.
If the two Magic-Numbers are different, then the link is not
looped-back, and the Magic-Number SHOULD be acknowledged. If the
two Magic-Numbers are equal, then it is possible, but not certain,
that the link is looped-back and that this Configure-Request is
actually the one last sent. To determine this, a Configure-Nak
MUST be sent specifying a different Magic-Number value. A new
Configure-Request SHOULD NOT be sent to the peer until normal
processing would cause it to be sent (that is, until a Configure-
Nak is received or the Restart timer runs out).

Reception of a Configure-Nak with a Magic-Number different from
that of the last Configure-Nak sent to the peer proves that a link
is not looped-back, and indicates a unique Magic-Number. If the
Magic-Number is equal to the one sent in the last Configure-Nak,
the possibility of a looped-back link is increased, and a new
Magic-Number MUST be chosen. In either case, a new Configure-
Request SHOULD be sent with the new Magic-Number.

If the link is indeed looped-back, this sequence (transmit
Configure-Request, receive Configure-Request, transmit Configure-

Nak, receive Configure-Nak) will repeat over and over again. If
the link is not looped-back, this sequence might occur a few
times, but it is extremely unlikely to occur repeatedly. More
likely, the Magic-Numbers chosen at either end will quickly
diverge, terminating the sequence. The following table shows the
probability of collisions assuming that both ends of the link
select Magic-Numbers with a perfectly uniform distribution:

Number of Collisions Probability
-------------------- ---------------------
1 1/2**32 = 2.3 E-10
2 1/2**32**2 = 5.4 E-20
3 1/2**32**3 = 1.3 E-29

Good sources of uniqueness or randomness are required for this
divergence to occur. If a good source of uniqueness cannot be
found, it is recommended that this Configuration Option not be
enabled; Configure-Requests with the option SHOULD NOT be
transmitted and any Magic-Number Configuration Options which the
peer sends SHOULD be either acknowledged or rejected. In this
case, looped-back links cannot be reliably detected by the
implementation, although they may still be detectable by the peer.

If an implementation does transmit a Configure-Request with a
Magic-Number Configuration Option, then it MUST NOT respond with a
Configure-Reject when it receives a Configure-Request with a
Magic-Number Configuration Option. That is, if an implementation
desires to use Magic Numbers, then it MUST also allow its peer to
do so. If an implementation does receive a Configure-Reject in
response to a Configure-Request, it can only mean that the link is
not looped-back, and that its peer will not be using Magic-
Numbers. In this case, an implementation SHOULD act as if the
negotiation had been successful (as if it had instead received a
Configure-Ack).

The Magic-Number also may be used to detect looped-back links
during normal operation, as well as during Configuration Option
negotiation. All LCP Echo-Request, Echo-Reply, and Discard-
Request packets have a Magic-Number field. If Magic-Number has
been successfully negotiated, an implementation MUST transmit
these packets with the Magic-Number field set to its negotiated
Magic-Number.

The Magic-Number field of these packets SHOULD be inspected on
reception. All received Magic-Number fields MUST be equal to
either zero or the peer's unique Magic-Number, depending on
whether or not the peer negotiated a Magic-Number.

Reception of a Magic-Number field equal to the negotiated local
Magic-Number indicates a looped-back link. Reception of a Magic-
Number other than the negotiated local Magic-Number, the peer's
negotiated Magic-Number, or zero if the peer didn't negotiate one,
indicates a link which has been (mis)configured for communications
with a different peer.

Procedures for recovery from either case are unspecified, and may
vary from implementation to implementation. A somewhat
pessimistic procedure is to assume a LCP Down event. A further
Open event will begin the process of re-establishing the link,
which can't complete until the looped-back condition is
terminated, and Magic-Numbers are successfully negotiated. A more
optimistic procedure (in the case of a looped-back link) is to
begin transmitting LCP Echo-Request packets until an appropriate
Echo-Reply is received, indicating a termination of the looped-
back condition.

A summary of the Magic-Number Configuration Option format is shown
below. The fields are transmitted from left to right.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | Magic-Number
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Magic-Number (cont) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Type

5

Length

6

Magic-Number

The Magic-Number field is four octets, and indicates a number
which is very likely to be unique to one end of the link. A
Magic-Number of zero is illegal and MUST always be Nak'd, if it is
not Rejected outright.

6.5. Protocol-Field-Compression (PFC)

Description

This Configuration Option provides a method to negotiate the
compression of the PPP Protocol field. By default, all
implementations MUST transmit packets with two octet PPP Protocol
fields.

PPP Protocol field numbers are chosen such that some values may be
compressed into a single octet form which is clearly
distinguishable from the two octet form. This Configuration
Option is sent to inform the peer that the implementation can
receive such single octet Protocol fields.

As previously mentioned, the Protocol field uses an extension
mechanism consistent with the ISO 3309 extension mechanism for the
Address field; the Least Significant Bit (LSB) of each octet is
used to indicate extension of the Protocol field. A binary "0" as
the LSB indicates that the Protocol field continues with the
following octet. The presence of a binary "1" as the LSB marks
the last octet of the Protocol field. Notice that any number of
"0" octets may be prepended to the field, and will still indicate
the same value (consider the two binary representations for 3,
00000011 and 00000000 00000011).

When using low speed links, it is desirable to conserve bandwidth
by sending as little redundant data as possible. The Protocol-
Field-Compression Configuration Option allows a trade-off between
implementation simplicity and bandwidth efficiency. If
successfully negotiated, the ISO 3309 extension mechanism may be
used to compress the Protocol field to one octet instead of two.
The large majority of packets are compressible since data
protocols are typically assigned with Protocol field values less
than 256.

Compressed Protocol fields MUST NOT be transmitted unless this
Configuration Option has been negotiated. When negotiated, PPP
implementations MUST accept PPP packets with either double-octet
or single-octet Protocol fields, and MUST NOT distinguish between
them.

The Protocol field is never compressed when sending any LCP
packet. This rule guarantees unambiguous recognition of LCP
packets.

When a Protocol field is compressed, the Data Link Layer FCS field
is calculated on the compressed frame, not the original

uncompressed frame.

A summary of the Protocol-Field-Compression Configuration Option
format is shown below. The fields are transmitted from left to
right.

0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Type

7

Length

2

6.6. Address-and-Control-Field-Compression (ACFC)

Description

This Configuration Option provides a method to negotiate the
compression of the Data Link Layer Address and Control fields. By
default, all implementations MUST transmit frames with Address and
Control fields appropriate to the link framing.

Since these fields usually have constant values for point-to-point
links, they are easily compressed. This Configuration Option is
sent to inform the peer that the implementation can receive
compressed Address and Control fields.

If a compressed frame is received when Address-and-Control-Field-
Compression has not been negotiated, the implementation MAY
silently discard the frame.

The Address and Control fields MUST NOT be compressed when sending
any LCP packet. This rule guarantees unambiguous recognition of
LCP packets.

When the Address and Control fields are compressed, the Data Link
Layer FCS field is calculated on the compressed frame, not the
original uncompressed frame.

A summary of the Address-and-Control-Field-Compression configuration
option format is shown below. The fields are transmitted from left
to right.

0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Type

8

Length

2

Security Considerations

Security issues are briefly discussed in sections concerning the
Authentication Phase, the Close event, and the Authentication-
Protocol Configuration Option.

References

[1] Perkins, D., "Requirements for an Internet Standard Point-to-
Point Protocol", RFC 1547, Carnegie Mellon University,
December 1993.

[2] Reynolds, J., and Postel, J., "Assigned Numbers", STD 2, RFC
1340, USC/Information Sciences Institute, July 1992.

Acknowledgements

This document is the product of the Point-to-Point Protocol Working
Group of the Internet Engineering Task Force (IETF). Comments should
be submitted to the ietf-ppp@merit.edu mailing list.

Much of the text in this document is taken from the working group
requirements [1]; and RFCs 1171 & 1172, by Drew Perkins while at
Carnegie Mellon University, and by Russ Hobby of the University of
California at Davis.

William Simpson was principally responsible for introducing
consistent terminology and philosophy, and the re-design of the phase
and negotiation state machines.

Many people spent significant time helping to develop the Point-to-
Point Protocol. The complete list of people is too numerous to list,
but the following people deserve special thanks: Rick Adams, Ken
Adelman, Fred Baker, Mike Ballard, Craig Fox, Karl Fox, Phill Gross,
Kory Hamzeh, former WG chair Russ Hobby, David Kaufman, former WG
chair Steve Knowles, Mark Lewis, former WG chair Brian Lloyd, John
LoVerso, Bill Melohn, Mike Patton, former WG chair Drew Perkins, Greg
Satz, John Shriver, Vernon Schryver, and Asher Waldfogel.

Special thanks to Morning Star Technologies for providing computing
resources and network access support for writing this specification.

Chair's Address

The working group can be contacted via the current chair:

Fred Baker
Advanced Computer Communications
315 Bollay Drive
Santa Barbara, California 93117

fbaker@acc.com

Editor's Address

Questions about this memo can also be directed to:

William Allen Simpson
Daydreamer
Computer Systems Consulting Services
1384 Fontaine
Madison Heights, Michigan 48071

Bill.Simpson@um.cc.umich.edu
bsimpson@MorningStar.com

Simpson [Page 52]

RFC 2406 – IP Encapsulating Security Payload (ESP)

 
Network Working Group                                            S. Kent
Request for Comments: 2406 BBN Corp
Obsoletes: 1827 R. Atkinson
Category: Standards Track @Home Network
November 1998

IP Encapsulating Security Payload (ESP)

Status of this Memo

This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.

Copyright Notice

Copyright (C) The Internet Society (1998). All Rights Reserved.

Table of Contents

1. Introduction..................................................2
2. Encapsulating Security Payload Packet Format..................3
2.1 Security Parameters Index................................4
2.2 Sequence Number .........................................4
2.3 Payload Data.............................................5
2.4 Padding (for Encryption).................................5
2.5 Pad Length...............................................7
2.6 Next Header..............................................7
2.7 Authentication Data......................................7
3. Encapsulating Security Protocol Processing....................7
3.1 ESP Header Location......................................7
3.2 Algorithms..............................................10
3.2.1 Encryption Algorithms..............................10
3.2.2 Authentication Algorithms..........................10
3.3 Outbound Packet Processing..............................10
3.3.1 Security Association Lookup........................11
3.3.2 Packet Encryption..................................11
3.3.3 Sequence Number Generation.........................12
3.3.4 Integrity Check Value Calculation..................12
3.3.5 Fragmentation......................................13
3.4 Inbound Packet Processing...............................13
3.4.1 Reassembly.........................................13
3.4.2 Security Association Lookup........................13
3.4.3 Sequence Number Verification.......................14
3.4.4 Integrity Check Value Verification.................15

3.4.5 Packet Decryption..................................16
4. Auditing.....................................................17
5. Conformance Requirements.....................................18
6. Security Considerations......................................18
7. Differences from RFC 1827....................................18
Acknowledgements................................................19
References......................................................19
Disclaimer......................................................20
Author Information..............................................21
Full Copyright Statement........................................22

1. Introduction

The Encapsulating Security Payload (ESP) header is designed to
provide a mix of security services in IPv4 and IPv6. ESP may be
applied alone, in combination with the IP Authentication Header (AH)
[KA97b], or in a nested fashion, e.g., through the use of tunnel mode
(see "Security Architecture for the Internet Protocol" [KA97a],
hereafter referred to as the Security Architecture document).
Security services can be provided between a pair of communicating
hosts, between a pair of communicating security gateways, or between
a security gateway and a host. For more details on how to use ESP
and AH in various network environments, see the Security Architecture
document [KA97a].

The ESP header is inserted after the IP header and before the upper
layer protocol header (transport mode) or before an encapsulated IP
header (tunnel mode). These modes are described in more detail
below.

ESP is used to provide confidentiality, data origin authentication,
connectionless integrity, an anti-replay service (a form of partial
sequence integrity), and limited traffic flow confidentiality. The
set of services provided depends on options selected at the time of
Security Association establishment and on the placement of the
implementation. Confidentiality may be selected independent of all
other services. However, use of confidentiality without
integrity/authentication (either in ESP or separately in AH) may
subject traffic to certain forms of active attacks that could
undermine the confidentiality service (see [Bel96]). Data origin
authentication and connectionless integrity are joint services
(hereafter referred to jointly as "authentication) and are offered as
an option in conjunction with (optional) confidentiality. The anti-
replay service may be selected only if data origin authentication is
selected, and its election is solely at the discretion of the
receiver. (Although the default calls for the sender to increment
the Sequence Number used for anti-replay, the service is effective
only if the receiver checks the Sequence Number.) Traffic flow

confidentiality requires selection of tunnel mode, and is most
effective if implemented at a security gateway, where traffic
aggregation may be able to mask true source-destination patterns.
Note that although both confidentiality and authentication are
optional, at least one of them MUST be selected.

It is assumed that the reader is familiar with the terms and concepts
described in the Security Architecture document. In particular, the
reader should be familiar with the definitions of security services
offered by ESP and AH, the concept of Security Associations, the ways
in which ESP can be used in conjunction with the Authentication
Header (AH), and the different key management options available for
ESP and AH. (With regard to the last topic, the current key
management options required for both AH and ESP are manual keying and
automated keying via IKE [HC98].)

The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in RFC 2119 [Bra97].

2. Encapsulating Security Payload Packet Format

The protocol header (IPv4, IPv6, or Extension) immediately preceding
the ESP header will contain the value 50 in its Protocol (IPv4) or
Next Header (IPv6, Extension) field [STD-2].

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ----
| Security Parameters Index (SPI) | ^Auth.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Cov-
| Sequence Number | |erage
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ----
| Payload Data* (variable) | | ^
~ ~ | |
| | |Conf.
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Cov-
| | Padding (0-255 bytes) | |erage*
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
| | Pad Length | Next Header | v v
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------
| Authentication Data (variable) |
~ ~
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

* If included in the Payload field, cryptographic
synchronization data, e.g., an Initialization Vector (IV, see

Section 2.3), usually is not encrypted per se, although it
often is referred to as being part of the ciphertext.

The following subsections define the fields in the header format.
"Optional" means that the field is omitted if the option is not
selected, i.e., it is present in neither the packet as transmitted
nor as formatted for computation of an Integrity Check Value (ICV,
see Section 2.7). Whether or not an option is selected is defined as
part of Security Association (SA) establishment. Thus the format of
ESP packets for a given SA is fixed, for the duration of the SA. In
contrast, "mandatory" fields are always present in the ESP packet
format, for all SAs.

2.1 Security Parameters Index

The SPI is an arbitrary 32-bit value that, in combination with the
destination IP address and security protocol (ESP), uniquely
identifies the Security Association for this datagram. The set of
SPI values in the range 1 through 255 are reserved by the Internet
Assigned Numbers Authority (IANA) for future use; a reserved SPI
value will not normally be assigned by IANA unless the use of the
assigned SPI value is specified in an RFC. It is ordinarily selected
by the destination system upon establishment of an SA (see the
Security Architecture document for more details). The SPI field is
mandatory.

The SPI value of zero (0) is reserved for local, implementation-
specific use and MUST NOT be sent on the wire. For example, a key
management implementation MAY use the zero SPI value to mean "No
Security Association Exists" during the period when the IPsec
implementation has requested that its key management entity establish
a new SA, but the SA has not yet been established.

2.2 Sequence Number

This unsigned 32-bit field contains a monotonically increasing
counter value (sequence number). It is mandatory and is always
present even if the receiver does not elect to enable the anti-replay
service for a specific SA. Processing of the Sequence Number field
is at the discretion of the receiver, i.e., the sender MUST always
transmit this field, but the receiver need not act upon it (see the
discussion of Sequence Number Verification in the "Inbound Packet
Processing" section below).

The sender's counter and the receiver's counter are initialized to 0
when an SA is established. (The first packet sent using a given SA
will have a Sequence Number of 1; see Section 3.3.3 for more details
on how the Sequence Number is generated.) If anti-replay is enabled

(the default), the transmitted Sequence Number must never be allowed
to cycle. Thus, the sender's counter and the receiver's counter MUST
be reset (by establishing a new SA and thus a new key) prior to the
transmission of the 2^32nd packet on an SA.

2.3 Payload Data

Payload Data is a variable-length field containing data described by
the Next Header field. The Payload Data field is mandatory and is an
integral number of bytes in length. If the algorithm used to encrypt
the payload requires cryptographic synchronization data, e.g., an
Initialization Vector (IV), then this data MAY be carried explicitly
in the Payload field. Any encryption algorithm that requires such
explicit, per-packet synchronization data MUST indicate the length,
any structure for such data, and the location of this data as part of
an RFC specifying how the algorithm is used with ESP. If such
synchronization data is implicit, the algorithm for deriving the data
MUST be part of the RFC.

Note that with regard to ensuring the alignment of the (real)
ciphertext in the presence of an IV:

o For some IV-based modes of operation, the receiver treats
the IV as the start of the ciphertext, feeding it into the
algorithm directly. In these modes, alignment of the start
of the (real) ciphertext is not an issue at the receiver.
o In some cases, the receiver reads the IV in separately from
the ciphertext. In these cases, the algorithm
specification MUST address how alignment of the (real)
ciphertext is to be achieved.

2.4 Padding (for Encryption)

Several factors require or motivate use of the Padding field.

o If an encryption algorithm is employed that requires the
plaintext to be a multiple of some number of bytes, e.g.,
the block size of a block cipher, the Padding field is used
to fill the plaintext (consisting of the Payload Data, Pad
Length and Next Header fields, as well as the Padding) to
the size required by the algorithm.

o Padding also may be required, irrespective of encryption
algorithm requirements, to ensure that the resulting
ciphertext terminates on a 4-byte boundary. Specifically,

the Pad Length and Next Header fields must be right aligned
within a 4-byte word, as illustrated in the ESP packet
format figure above, to ensure that the Authentication Data
field (if present) is aligned on a 4-byte boundary.

o Padding beyond that required for the algorithm or alignment
reasons cited above, may be used to conceal the actual
length of the payload, in support of (partial) traffic flow
confidentiality. However, inclusion of such additional
padding has adverse bandwidth implications and thus its use
should be undertaken with care.

The sender MAY add 0-255 bytes of padding. Inclusion of the Padding
field in an ESP packet is optional, but all implementations MUST
support generation and consumption of padding.

a. For the purpose of ensuring that the bits to be encrypted
are a multiple of the algorithm's blocksize (first bullet
above), the padding computation applies to the Payload
Data exclusive of the IV, the Pad Length, and Next Header
fields.

b. For the purposes of ensuring that the Authentication Data
is aligned on a 4-byte boundary (second bullet above), the
padding computation applies to the Payload Data inclusive
of the IV, the Pad Length, and Next Header fields.

If Padding bytes are needed but the encryption algorithm does not
specify the padding contents, then the following default processing
MUST be used. The Padding bytes are initialized with a series of
(unsigned, 1-byte) integer values. The first padding byte appended
to the plaintext is numbered 1, with subsequent padding bytes making
up a monotonically increasing sequence: 1, 2, 3, ... When this
padding scheme is employed, the receiver SHOULD inspect the Padding
field. (This scheme was selected because of its relative simplicity,
ease of implementation in hardware, and because it offers limited
protection against certain forms of "cut and paste" attacks in the
absence of other integrity measures, if the receiver checks the
padding values upon decryption.)

Any encryption algorithm that requires Padding other than the default
described above, MUST define the Padding contents (e.g., zeros or
random data) and any required receiver processing of these Padding
bytes in an RFC specifying how the algorithm is used with ESP. In
such circumstances, the content of the Padding field will be
determined by the encryption algorithm and mode selected and defined
in the corresponding algorithm RFC. The relevant algorithm RFC MAY
specify that a receiver MUST inspect the Padding field or that a

receiver MUST inform senders of how the receiver will handle the
Padding field.

2.5 Pad Length

The Pad Length field indicates the number of pad bytes immediately
preceding it. The range of valid values is 0-255, where a value of
zero indicates that no Padding bytes are present. The Pad Length
field is mandatory.

2.6 Next Header

The Next Header is an 8-bit field that identifies the type of data
contained in the Payload Data field, e.g., an extension header in
IPv6 or an upper layer protocol identifier. The value of this field
is chosen from the set of IP Protocol Numbers defined in the most
recent "Assigned Numbers" [STD-2] RFC from the Internet Assigned
Numbers Authority (IANA). The Next Header field is mandatory.

2.7 Authentication Data

The Authentication Data is a variable-length field containing an
Integrity Check Value (ICV) computed over the ESP packet minus the
Authentication Data. The length of the field is specified by the
authentication function selected. The Authentication Data field is
optional, and is included only if the authentication service has been
selected for the SA in question. The authentication algorithm
specification MUST specify the length of the ICV and the comparison
rules and processing steps for validation.

3. Encapsulating Security Protocol Processing

3.1 ESP Header Location

Like AH, ESP may be employed in two ways: transport mode or tunnel
mode. The former mode is applicable only to host implementations and
provides protection for upper layer protocols, but not the IP header.
(In this mode, note that for "bump-in-the-stack" or "bump-in-the-
wire" implementations, as defined in the Security Architecture
document, inbound and outbound IP fragments may require an IPsec
implementation to perform extra IP reassembly/fragmentation in order
to both conform to this specification and provide transparent IPsec
support. Special care is required to perform such operations within
these implementations when multiple interfaces are in use.)

In transport mode, ESP is inserted after the IP header and before an
upper layer protocol, e.g., TCP, UDP, ICMP, etc. or before any other
IPsec headers that have already been inserted. In the context of

IPv4, this translates to placing ESP after the IP header (and any
options that it contains), but before the upper layer protocol.
(Note that the term "transport" mode should not be misconstrued as
restricting its use to TCP and UDP. For example, an ICMP message MAY
be sent using either "transport" mode or "tunnel" mode.) The
following diagram illustrates ESP transport mode positioning for a
typical IPv4 packet, on a "before and after" basis. (The "ESP
trailer" encompasses any Padding, plus the Pad Length, and Next
Header fields.)

BEFORE APPLYING ESP
----------------------------
IPv4 |orig IP hdr | | |
|(any options)| TCP | Data |
----------------------------

AFTER APPLYING ESP
-------------------------------------------------
IPv4 |orig IP hdr | ESP | | | ESP | ESP|
|(any options)| Hdr | TCP | Data | Trailer |Auth|
-------------------------------------------------
|<----- encrypted ---->|
|<------ authenticated ----->|

In the IPv6 context, ESP is viewed as an end-to-end payload, and thus
should appear after hop-by-hop, routing, and fragmentation extension
headers. The destination options extension header(s) could appear
either before or after the ESP header depending on the semantics
desired. However, since ESP protects only fields after the ESP
header, it generally may be desirable to place the destination
options header(s) after the ESP header. The following diagram
illustrates ESP transport mode positioning for a typical IPv6 packet.

BEFORE APPLYING ESP
---------------------------------------
IPv6 | | ext hdrs | | |
| orig IP hdr |if present| TCP | Data |
---------------------------------------

AFTER APPLYING ESP
---------------------------------------------------------
IPv6 | orig |hop-by-hop,dest*,| |dest| | | ESP | ESP|
|IP hdr|routing,fragment.|ESP|opt*|TCP|Data|Trailer|Auth|
---------------------------------------------------------
|<---- encrypted ---->|
|<---- authenticated ---->|

* = if present, could be before ESP, after ESP, or both

ESP and AH headers can be combined in a variety of modes. The IPsec
Architecture document describes the combinations of security
associations that must be supported.

Tunnel mode ESP may be employed in either hosts or security gateways.
When ESP is implemented in a security gateway (to protect subscriber
transit traffic), tunnel mode must be used. In tunnel mode, the
"inner" IP header carries the ultimate source and destination
addresses, while an "outer" IP header may contain distinct IP
addresses, e.g., addresses of security gateways. In tunnel mode, ESP
protects the entire inner IP packet, including the entire inner IP
header. The position of ESP in tunnel mode, relative to the outer IP
header, is the same as for ESP in transport mode. The following
diagram illustrates ESP tunnel mode positioning for typical IPv4 and
IPv6 packets.

-----------------------------------------------------------
IPv4 | new IP hdr* | | orig IP hdr* | | | ESP | ESP|
|(any options)| ESP | (any options) |TCP|Data|Trailer|Auth|
-----------------------------------------------------------
|<--------- encrypted ---------->|
|<----------- authenticated ---------->|

------------------------------------------------------------
IPv6 | new* |new ext | | orig*|orig ext | | | ESP | ESP|
|IP hdr| hdrs* |ESP|IP hdr| hdrs * |TCP|Data|Trailer|Auth|
------------------------------------------------------------
|<--------- encrypted ----------->|
|<---------- authenticated ---------->|

* = if present, construction of outer IP hdr/extensions
and modification of inner IP hdr/extensions is
discussed below.

3.2 Algorithms

The mandatory-to-implement algorithms are described in Section 5,
"Conformance Requirements". Other algorithms MAY be supported. Note
that although both confidentiality and authentication are optional,
at least one of these services MUST be selected hence both algorithms
MUST NOT be simultaneously NULL.

3.2.1 Encryption Algorithms

The encryption algorithm employed is specified by the SA. ESP is
designed for use with symmetric encryption algorithms. Because IP
packets may arrive out of order, each packet must carry any data
required to allow the receiver to establish cryptographic
synchronization for decryption. This data may be carried explicitly
in the payload field, e.g., as an IV (as described above), or the
data may be derived from the packet header. Since ESP makes
provision for padding of the plaintext, encryption algorithms
employed with ESP may exhibit either block or stream mode
characteristics. Note that since encryption (confidentiality) is
optional, this algorithm may be "NULL".

3.2.2 Authentication Algorithms

The authentication algorithm employed for the ICV computation is
specified by the SA. For point-to-point communication, suitable
authentication algorithms include keyed Message Authentication Codes
(MACs) based on symmetric encryption algorithms (e.g., DES) or on
one-way hash functions (e.g., MD5 or SHA-1). For multicast
communication, one-way hash algorithms combined with asymmetric
signature algorithms are appropriate, though performance and space
considerations currently preclude use of such algorithms. Note that
since authentication is optional, this algorithm may be "NULL".

3.3 Outbound Packet Processing

In transport mode, the sender encapsulates the upper layer protocol
information in the ESP header/trailer, and retains the specified IP
header (and any IP extension headers in the IPv6 context). In tunnel
mode, the outer and inner IP header/extensions can be inter-related
in a variety of ways. The construction of the outer IP
header/extensions during the encapsulation process is described in
the Security Architecture document. If there is more than one IPsec
header/extension required by security policy, the order of the
application of the security headers MUST be defined by security
policy.

3.3.1 Security Association Lookup

ESP is applied to an outbound packet only after an IPsec
implementation determines that the packet is associated with an SA
that calls for ESP processing. The process of determining what, if
any, IPsec processing is applied to outbound traffic is described in
the Security Architecture document.

3.3.2 Packet Encryption

In this section, we speak in terms of encryption always being applied
because of the formatting implications. This is done with the
understanding that "no confidentiality" is offered by using the NULL
encryption algorithm. Accordingly, the sender:

1. encapsulates (into the ESP Payload field):
- for transport mode -- just the original upper layer
protocol information.
- for tunnel mode -- the entire original IP datagram.
2. adds any necessary padding.
3. encrypts the result (Payload Data, Padding, Pad Length, and
Next Header) using the key, encryption algorithm, algorithm
mode indicated by the SA and cryptographic synchronization
data (if any).
- If explicit cryptographic synchronization data, e.g.,
an IV, is indicated, it is input to the encryption
algorithm per the algorithm specification and placed
in the Payload field.
- If implicit cryptographic synchronication data, e.g.,
an IV, is indicated, it is constructed and input to
the encryption algorithm as per the algorithm
specification.

The exact steps for constructing the outer IP header depend on the
mode (transport or tunnel) and are described in the Security
Architecture document.

If authentication is selected, encryption is performed first, before
the authentication, and the encryption does not encompass the
Authentication Data field. This order of processing facilitates
rapid detection and rejection of replayed or bogus packets by the
receiver, prior to decrypting the packet, hence potentially reducing
the impact of denial of service attacks. It also allows for the
possibility of parallel processing of packets at the receiver, i.e.,
decryption can take place in parallel with authentication. Note that
since the Authentication Data is not protected by encryption, a keyed
authentication algorithm must be employed to compute the ICV.

3.3.3 Sequence Number Generation

The sender's counter is initialized to 0 when an SA is established.
The sender increments the Sequence Number for this SA and inserts the
new value into the Sequence Number field. Thus the first packet sent
using a given SA will have a Sequence Number of 1.

If anti-replay is enabled (the default), the sender checks to ensure
that the counter has not cycled before inserting the new value in the
Sequence Number field. In other words, the sender MUST NOT send a
packet on an SA if doing so would cause the Sequence Number to cycle.
An attempt to transmit a packet that would result in Sequence Number
overflow is an auditable event. (Note that this approach to Sequence
Number management does not require use of modular arithmetic.)

The sender assumes anti-replay is enabled as a default, unless
otherwise notified by the receiver (see 3.4.3). Thus, if the counter
has cycled, the sender will set up a new SA and key (unless the SA
was configured with manual key management).

If anti-replay is disabled, the sender does not need to monitor or
reset the counter, e.g., in the case of manual key management (see
Section 5). However, the sender still increments the counter and
when it reaches the maximum value, the counter rolls over back to
zero.

3.3.4 Integrity Check Value Calculation

If authentication is selected for the SA, the sender computes the ICV
over the ESP packet minus the Authentication Data. Thus the SPI,
Sequence Number, Payload Data, Padding (if present), Pad Length, and
Next Header are all encompassed by the ICV computation. Note that
the last 4 fields will be in ciphertext form, since encryption is
performed prior to authentication.

For some authentication algorithms, the byte string over which the
ICV computation is performed must be a multiple of a blocksize
specified by the algorithm. If the length of this byte string does
not match the blocksize requirements for the algorithm, implicit
padding MUST be appended to the end of the ESP packet, (after the
Next Header field) prior to ICV computation. The padding octets MUST
have a value of zero. The blocksize (and hence the length of the
padding) is specified by the algorithm specification. This padding
is not transmitted with the packet. Note that MD5 and SHA-1 are
viewed as having a 1-byte blocksize because of their internal padding
conventions.

3.3.5 Fragmentation

If necessary, fragmentation is performed after ESP processing within
an IPsec implementation. Thus, transport mode ESP is applied only to
whole IP datagrams (not to IP fragments). An IP packet to which ESP
has been applied may itself be fragmented by routers en route, and
such fragments must be reassembled prior to ESP processing at a
receiver. In tunnel mode, ESP is applied to an IP packet, the
payload of which may be a fragmented IP packet. For example, a
security gateway or a "bump-in-the-stack" or "bump-in-the-wire" IPsec
implementation (as defined in the Security Architecture document) may
apply tunnel mode ESP to such fragments.

NOTE: For transport mode -- As mentioned at the beginning of Section
3.1, bump-in-the-stack and bump-in-the-wire implementations may have
to first reassemble a packet fragmented by the local IP layer, then
apply IPsec, and then fragment the resulting packet.

NOTE: For IPv6 -- For bump-in-the-stack and bump-in-the-wire
implementations, it will be necessary to walk through all the
extension headers to determine if there is a fragmentation header and
hence that the packet needs reassembling prior to IPsec processing.

3.4 Inbound Packet Processing

3.4.1 Reassembly

If required, reassembly is performed prior to ESP processing. If a
packet offered to ESP for processing appears to be an IP fragment,
i.e., the OFFSET field is non-zero or the MORE FRAGMENTS flag is set,
the receiver MUST discard the packet; this is an auditable event. The
audit log entry for this event SHOULD include the SPI value,
date/time received, Source Address, Destination Address, Sequence
Number, and (in IPv6) the Flow ID.

NOTE: For packet reassembly, the current IPv4 spec does NOT require
either the zero'ing of the OFFSET field or the clearing of the MORE
FRAGMENTS flag. In order for a reassembled packet to be processed by
IPsec (as opposed to discarded as an apparent fragment), the IP code
must do these two things after it reassembles a packet.

3.4.2 Security Association Lookup

Upon receipt of a (reassembled) packet containing an ESP Header, the
receiver determines the appropriate (unidirectional) SA, based on the
destination IP address, security protocol (ESP), and the SPI. (This
process is described in more detail in the Security Architecture
document.) The SA indicates whether the Sequence Number field will

be checked, whether the Authentication Data field should be present,
and it will specify the algorithms and keys to be employed for
decryption and ICV computations (if applicable).

If no valid Security Association exists for this session (for
example, the receiver has no key), the receiver MUST discard the
packet; this is an auditable event. The audit log entry for this
event SHOULD include the SPI value, date/time received, Source
Address, Destination Address, Sequence Number, and (in IPv6) the
cleartext Flow ID.

3.4.3 Sequence Number Verification

All ESP implementations MUST support the anti-replay service, though
its use may be enabled or disabled by the receiver on a per-SA basis.
This service MUST NOT be enabled unless the authentication service
also is enabled for the SA, since otherwise the Sequence Number field
has not been integrity protected. (Note that there are no provisions
for managing transmitted Sequence Number values among multiple
senders directing traffic to a single SA (irrespective of whether the
destination address is unicast, broadcast, or multicast). Thus the
anti-replay service SHOULD NOT be used in a multi-sender environment
that employs a single SA.)

If the receiver does not enable anti-replay for an SA, no inbound
checks are performed on the Sequence Number. However, from the
perspective of the sender, the default is to assume that anti-replay
is enabled at the receiver. To avoid having the sender do
unnecessary sequence number monitoring and SA setup (see section
3.3.3), if an SA establishment protocol such as IKE is employed, the
receiver SHOULD notify the sender, during SA establishment, if the
receiver will not provide anti-replay protection.

If the receiver has enabled the anti-replay service for this SA, the
receive packet counter for the SA MUST be initialized to zero when
the SA is established. For each received packet, the receiver MUST
verify that the packet contains a Sequence Number that does not
duplicate the Sequence Number of any other packets received during
the life of this SA. This SHOULD be the first ESP check applied to a
packet after it has been matched to an SA, to speed rejection of
duplicate packets.

Duplicates are rejected through the use of a sliding receive window.
(How the window is implemented is a local matter, but the following
text describes the functionality that the implementation must
exhibit.) A MINIMUM window size of 32 MUST be supported; but a
window size of 64 is preferred and SHOULD be employed as the default.

Another window size (larger than the MINIMUM) MAY be chosen by the
receiver. (The receiver does NOT notify the sender of the window
size.)

The "right" edge of the window represents the highest, validated
Sequence Number value received on this SA. Packets that contain
Sequence Numbers lower than the "left" edge of the window are
rejected. Packets falling within the window are checked against a
list of received packets within the window. An efficient means for
performing this check, based on the use of a bit mask, is described
in the Security Architecture document.

If the received packet falls within the window and is new, or if the
packet is to the right of the window, then the receiver proceeds to
ICV verification. If the ICV validation fails, the receiver MUST
discard the received IP datagram as invalid; this is an auditable
event. The audit log entry for this event SHOULD include the SPI
value, date/time received, Source Address, Destination Address, the
Sequence Number, and (in IPv6) the Flow ID. The receive window is
updated only if the ICV verification succeeds.

DISCUSSION:

Note that if the packet is either inside the window and new, or is
outside the window on the "right" side, the receiver MUST
authenticate the packet before updating the Sequence Number window
data.

3.4.4 Integrity Check Value Verification

If authentication has been selected, the receiver computes the ICV
over the ESP packet minus the Authentication Data using the specified
authentication algorithm and verifies that it is the same as the ICV
included in the Authentication Data field of the packet. Details of
the computation are provided below.

If the computed and received ICV's match, then the datagram is valid,
and it is accepted. If the test fails, then the receiver MUST
discard the received IP datagram as invalid; this is an auditable
event. The log data SHOULD include the SPI value, date/time
received, Source Address, Destination Address, the Sequence Number,
and (in IPv6) the cleartext Flow ID.

DISCUSSION:

Begin by removing and saving the ICV value (Authentication Data
field). Next check the overall length of the ESP packet minus the
Authentication Data. If implicit padding is required, based on

the blocksize of the authentication algorithm, append zero-filled
bytes to the end of the ESP packet directly after the Next Header
field. Perform the ICV computation and compare the result with
the saved value, using the comparison rules defined by the
algorithm specification. (For example, if a digital signature and
one-way hash are used for the ICV computation, the matching
process is more complex.)

3.4.5 Packet Decryption

As in section 3.3.2, "Packet Encryption", we speak here in terms of
encryption always being applied because of the formatting
implications. This is done with the understanding that "no
confidentiality" is offered by using the NULL encryption algorithm.
Accordingly, the receiver:

1. decrypts the ESP Payload Data, Padding, Pad Length, and Next
Header using the key, encryption algorithm, algorithm mode,
and cryptographic synchronization data (if any), indicated by
the SA.
- If explicit cryptographic synchronization data, e.g.,
an IV, is indicated, it is taken from the Payload
field and input to the decryption algorithm as per the
algorithm specification.
- If implicit cryptographic synchronization data, e.g.,
an IV, is indicated, a local version of the IV is
constructed and input to the decryption algorithm as
per the algorithm specification.
2. processes any padding as specified in the encryption
algorithm specification. If the default padding scheme (see
Section 2.4) has been employed, the receiver SHOULD inspect
the Padding field before removing the padding prior to
passing the decrypted data to the next layer.
3. reconstructs the original IP datagram from:
- for transport mode -- original IP header plus the
original upper layer protocol information in the ESP
Payload field
- for tunnel mode -- tunnel IP header + the entire IP
datagram in the ESP Payload field.

The exact steps for reconstructing the original datagram depend on
the mode (transport or tunnel) and are described in the Security
Architecture document. At a minimum, in an IPv6 context, the
receiver SHOULD ensure that the decrypted data is 8-byte aligned, to
facilitate processing by the protocol identified in the Next Header
field.

If authentication has been selected, verification and decryption MAY
be performed serially or in parallel. If performed serially, then
ICV verification SHOULD be performed first. If performed in
parallel, verification MUST be completed before the decrypted packet
is passed on for further processing. This order of processing
facilitates rapid detection and rejection of replayed or bogus
packets by the receiver, prior to decrypting the packet, hence
potentially reducing the impact of denial of service attacks. Note:

If the receiver performs decryption in parallel with authentication,
care must be taken to avoid possible race conditions with regard to
packet access and reconstruction of the decrypted packet.

Note that there are several ways in which the decryption can "fail":

a. The selected SA may not be correct -- The SA may be
mis-selected due to tampering with the SPI, destination
address, or IPsec protocol type fields. Such errors, if they
map the packet to another extant SA, will be
indistinguishable from a corrupted packet, (case c).
Tampering with the SPI can be detected by use of
authentication. However, an SA mismatch might still occur
due to tampering with the IP Destination Address or the IPsec
protocol type field.

b. The pad length or pad values could be erroneous -- Bad pad
lengths or pad values can be detected irrespective of the use
of authentication.

c. The encrypted ESP packet could be corrupted -- This can be
detected if authentication is selected for the SA.,

In case (a) or (c), the erroneous result of the decryption operation
(an invalid IP datagram or transport-layer frame) will not
necessarily be detected by IPsec, and is the responsibility of later
protocol processing.

4. Auditing

Not all systems that implement ESP will implement auditing. However,
if ESP is incorporated into a system that supports auditing, then the
ESP implementation MUST also support auditing and MUST allow a system
administrator to enable or disable auditing for ESP. For the most
part, the granularity of auditing is a local matter. However,
several auditable events are identified in this specification and for
each of these events a minimum set of information that SHOULD be
included in an audit log is defined. Additional information also MAY
be included in the audit log for each of these events, and additional

events, not explicitly called out in this specification, also MAY
result in audit log entries. There is no requirement for the
receiver to transmit any message to the purported sender in response
to the detection of an auditable event, because of the potential to
induce denial of service via such action.

5. Conformance Requirements

Implementations that claim conformance or compliance with this
specification MUST implement the ESP syntax and processing described
here and MUST comply with all requirements of the Security
Architecture document. If the key used to compute an ICV is manually
distributed, correct provision of the anti-replay service would
require correct maintenance of the counter state at the sender, until
the key is replaced, and there likely would be no automated recovery
provision if counter overflow were imminent. Thus a compliant
implementation SHOULD NOT provide this service in conjunction with
SAs that are manually keyed. A compliant ESP implementation MUST
support the following mandatory-to-implement algorithms:

- DES in CBC mode [MD97]
- HMAC with MD5 [MG97a]
- HMAC with SHA-1 [MG97b]
- NULL Authentication algorithm
- NULL Encryption algorithm

Since ESP encryption and authentication are optional, support for the
2 "NULL" algorithms is required to maintain consistency with the way
these services are negotiated. NOTE that while authentication and
encryption can each be "NULL", they MUST NOT both be "NULL".

6. Security Considerations

Security is central to the design of this protocol, and thus security
considerations permeate the specification. Additional security-
relevant aspects of using the IPsec protocol are discussed in the
Security Architecture document.

7. Differences from RFC 1827

This document differs from RFC 1827 [ATK95] in several significant
ways. The major difference is that, this document attempts to
specify a complete framework and context for ESP, whereas RFC 1827
provided a "shell" that was completed through the definition of
transforms. The combinatorial growth of transforms motivated the
reformulation of the ESP specification as a more complete document,
with options for security services that may be offered in the context
of ESP. Thus, fields previously defined in transform documents are

now part of this base ESP specification. For example, the fields
necessary to support authentication (and anti-replay) are now defined
here, even though the provision of this service is an option. The
fields used to support padding for encryption, and for next protocol
identification, are now defined here as well. Packet processing
consistent with the definition of these fields also is included in
the document.

Acknowledgements

Many of the concepts embodied in this specification were derived from
or influenced by the US Government's SP3 security protocol, ISO/IEC's
NLSP, or from the proposed swIPe security protocol. [SDNS89, ISO92,
IB93].

For over 3 years, this document has evolved through multiple versions
and iterations. During this time, many people have contributed
significant ideas and energy to the process and the documents
themselves. The authors would like to thank Karen Seo for providing
extensive help in the review, editing, background research, and
coordination for this version of the specification. The authors
would also like to thank the members of the IPsec and IPng working
groups, with special mention of the efforts of (in alphabetic order):
Steve Bellovin, Steve Deering, Phil Karn, Perry Metzger, David
Mihelcic, Hilarie Orman, Norman Shulman, William Simpson and Nina
Yuan.

References

[ATK95] Atkinson, R., "IP Encapsulating Security Payload (ESP)",
RFC 1827, August 1995.

[Bel96] Steven M. Bellovin, "Problem Areas for the IP Security
Protocols", Proceedings of the Sixth Usenix Unix Security
Symposium, July, 1996.

[Bra97] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Level", BCP 14, RFC 2119, March 1997.

[HC98] Harkins, D., and D. Carrel, "The Internet Key Exchange
(IKE)", RFC 2409, November 1998.

[IB93] John Ioannidis & Matt Blaze, "Architecture and
Implementation of Network-layer Security Under Unix",
Proceedings of the USENIX Security Symposium, Santa Clara,
CA, October 1993.

[ISO92] ISO/IEC JTC1/SC6, Network Layer Security Protocol, ISO-IEC
DIS 11577, International Standards Organisation, Geneva,
Switzerland, 29 November 1992.

[KA97a] Kent, S., and R. Atkinson, "Security Architecture for the
Internet Protocol", RFC 2401, November 1998.

[KA97b] Kent, S., and R. Atkinson, "IP Authentication Header", RFC
2402, November 1998.

[MD97] Madson, C., and N. Doraswamy, "The ESP DES-CBC Cipher
Algorithm With Explicit IV", RFC 2405, November 1998.

[MG97a] Madson, C., and R. Glenn, "The Use of HMAC-MD5-96 within
ESP and AH", RFC 2403, November 1998.

[MG97b] Madson, C., and R. Glenn, "The Use of HMAC-SHA-1-96 within
ESP and AH", RFC 2404, November 1998.

[STD-2] Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC
1700, October 1994. See also:
http://www.iana.org/numbers.html

[SDNS89] SDNS Secure Data Network System, Security Protocol 3, SP3,
Document SDN.301, Revision 1.5, 15 May 1989, as published
in NIST Publication NIST-IR-90-4250, February 1990.

Disclaimer

The views and specification here are those of the authors and are not
necessarily those of their employers. The authors and their
employers specifically disclaim responsibility for any problems
arising from correct or incorrect implementation or use of this
specification.

Author Information

Stephen Kent
BBN Corporation
70 Fawcett Street
Cambridge, MA 02140
USA

Phone: +1 (617) 873-3988
EMail: kent@bbn.com

Randall Atkinson
@Home Network
425 Broadway,
Redwood City, CA 94063
USA

Phone: +1 (415) 569-5000
EMail: rja@corp.home.net

Full Copyright Statement

Copyright (C) The Internet Society (1998). All Rights Reserved.

This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.

The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.

This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


RFC 791 – Internet Protocol Version 4 Specification


INTERNET PROTOCOL

DARPA INTERNET PROGRAM

PROTOCOL SPECIFICATION

September 1981

prepared for

Defense Advanced Research Projects Agency
Information Processing Techniques Office
1400 Wilson Boulevard
Arlington, Virginia 22209

by

Information Sciences Institute
University of Southern California
4676 Admiralty Way
Marina del Rey, California 90291

September 1981
Internet Protocol

TABLE OF CONTENTS

PREFACE ........................................................ iii

1. INTRODUCTION ..................................................... 1

1.1 Motivation .................................................... 1
1.2 Scope ......................................................... 1
1.3 Interfaces .................................................... 1
1.4 Operation ..................................................... 2

2. OVERVIEW ......................................................... 5

2.1 Relation to Other Protocols ................................... 9
2.2 Model of Operation ............................................ 5
2.3 Function Description .......................................... 7
2.4 Gateways ...................................................... 9

3. SPECIFICATION ................................................... 11

3.1 Internet Header Format ....................................... 11
3.2 Discussion ................................................... 23
3.3 Interfaces ................................................... 31

APPENDIX A: Examples & Scenarios ................................... 34
APPENDIX B: Data Transmission Order ................................ 39

GLOSSARY ............................................................ 41

REFERENCES .......................................................... 45

[Page i]

September 1981
Internet Protocol

[Page ii]

September 1981
Internet Protocol

PREFACE

This document specifies the DoD Standard Internet Protocol. This
document is based on six earlier editions of the ARPA Internet Protocol
Specification, and the present text draws heavily from them. There have
been many contributors to this work both in terms of concepts and in
terms of text. This edition revises aspects of addressing, error
handling, option codes, and the security, precedence, compartments, and
handling restriction features of the internet protocol.

Jon Postel

Editor

September 1981

RFC: 791
Replaces: RFC 760
IENs 128, 123, 111,
80, 54, 44, 41, 28, 26

INTERNET PROTOCOL

DARPA INTERNET PROGRAM
PROTOCOL SPECIFICATION

1. INTRODUCTION

1.1. Motivation

The Internet Protocol is designed for use in interconnected systems of
packet-switched computer communication networks. Such a system has
been called a "catenet" [1]. The internet protocol provides for
transmitting blocks of data called datagrams from sources to
destinations, where sources and destinations are hosts identified by
fixed length addresses. The internet protocol also provides for
fragmentation and reassembly of long datagrams, if necessary, for
transmission through "small packet" networks.

1.2. Scope

The internet protocol is specifically limited in scope to provide the
functions necessary to deliver a package of bits (an internet
datagram) from a source to a destination over an interconnected system
of networks. There are no mechanisms to augment end-to-end data
reliability, flow control, sequencing, or other services commonly
found in host-to-host protocols. The internet protocol can capitalize
on the services of its supporting networks to provide various types
and qualities of service.

1.3. Interfaces

This protocol is called on by host-to-host protocols in an internet
environment. This protocol calls on local network protocols to carry
the internet datagram to the next gateway or destination host.

For example, a TCP module would call on the internet module to take a
TCP segment (including the TCP header and user data) as the data
portion of an internet datagram. The TCP module would provide the
addresses and other parameters in the internet header to the internet
module as arguments of the call. The internet module would then
create an internet datagram and call on the local network interface to
transmit the internet datagram.

In the ARPANET case, for example, the internet module would call on a

[Page 1]

September 1981
Internet Protocol
Introduction

local net module which would add the 1822 leader [2] to the internet
datagram creating an ARPANET message to transmit to the IMP. The
ARPANET address would be derived from the internet address by the
local network interface and would be the address of some host in the
ARPANET, that host might be a gateway to other networks.

1.4. Operation

The internet protocol implements two basic functions: addressing and
fragmentation.

The internet modules use the addresses carried in the internet header
to transmit internet datagrams toward their destinations. The
selection of a path for transmission is called routing.

The internet modules use fields in the internet header to fragment and
reassemble internet datagrams when necessary for transmission through
"small packet" networks.

The model of operation is that an internet module resides in each host
engaged in internet communication and in each gateway that
interconnects networks. These modules share common rules for
interpreting address fields and for fragmenting and assembling
internet datagrams. In addition, these modules (especially in
gateways) have procedures for making routing decisions and other
functions.

The internet protocol treats each internet datagram as an independent
entity unrelated to any other internet datagram. There are no
connections or logical circuits (virtual or otherwise).

The internet protocol uses four key mechanisms in providing its
service: Type of Service, Time to Live, Options, and Header Checksum.

The Type of Service is used to indicate the quality of the service
desired. The type of service is an abstract or generalized set of
parameters which characterize the service choices provided in the
networks that make up the internet. This type of service indication
is to be used by gateways to select the actual transmission parameters
for a particular network, the network to be used for the next hop, or
the next gateway when routing an internet datagram.

The Time to Live is an indication of an upper bound on the lifetime of
an internet datagram. It is set by the sender of the datagram and
reduced at the points along the route where it is processed. If the
time to live reaches zero before the internet datagram reaches its
destination, the internet datagram is destroyed. The time to live can
be thought of as a self destruct time limit.

[Page 2]

September 1981
Internet Protocol
Introduction

The Options provide for control functions needed or useful in some
situations but unnecessary for the most common communications. The
options include provisions for timestamps, security, and special
routing.

The Header Checksum provides a verification that the information used
in processing internet datagram has been transmitted correctly. The
data may contain errors. If the header checksum fails, the internet
datagram is discarded at once by the entity which detects the error.

The internet protocol does not provide a reliable communication
facility. There are no acknowledgments either end-to-end or
hop-by-hop. There is no error control for data, only a header
checksum. There are no retransmissions. There is no flow control.

Errors detected may be reported via the Internet Control Message
Protocol (ICMP) [3] which is implemented in the internet protocol
module.

[Page 3]

September 1981
Internet Protocol

[Page 4]

September 1981
Internet Protocol

2. OVERVIEW

2.1. Relation to Other Protocols

The following diagram illustrates the place of the internet protocol
in the protocol hierarchy:

+------+ +-----+ +-----+ +-----+
|Telnet| | FTP | | TFTP| ... | ... |
+------+ +-----+ +-----+ +-----+
| | | |
+-----+ +-----+ +-----+
| TCP | | UDP | ... | ... |
+-----+ +-----+ +-----+
| | |
+--------------------------+----+
| Internet Protocol & ICMP |
+--------------------------+----+
|
+---------------------------+
| Local Network Protocol |
+---------------------------+

Protocol Relationships

Figure 1.

Internet protocol interfaces on one side to the higher level
host-to-host protocols and on the other side to the local network
protocol. In this context a "local network" may be a small network in
a building or a large network such as the ARPANET.

2.2. Model of Operation

The model of operation for transmitting a datagram from one
application program to another is illustrated by the following
scenario:

We suppose that this transmission will involve one intermediate
gateway.

The sending application program prepares its data and calls on its
local internet module to send that data as a datagram and passes the
destination address and other parameters as arguments of the call.

The internet module prepares a datagram header and attaches the data
to it. The internet module determines a local network address for
this internet address, in this case it is the address of a gateway.

[Page 5]

September 1981
Internet Protocol
Overview

It sends this datagram and the local network address to the local
network interface.

The local network interface creates a local network header, and
attaches the datagram to it, then sends the result via the local
network.

The datagram arrives at a gateway host wrapped in the local network
header, the local network interface strips off this header, and
turns the datagram over to the internet module. The internet module
determines from the internet address that the datagram is to be
forwarded to another host in a second network. The internet module
determines a local net address for the destination host. It calls
on the local network interface for that network to send the
datagram.

This local network interface creates a local network header and
attaches the datagram sending the result to the destination host.

At this destination host the datagram is stripped of the local net
header by the local network interface and handed to the internet
module.

The internet module determines that the datagram is for an
application program in this host. It passes the data to the
application program in response to a system call, passing the source
address and other parameters as results of the call.

Application Application
Program Program
\ /
Internet Module Internet Module Internet Module
\ / \ /
LNI-1 LNI-1 LNI-2 LNI-2
\ / \ /
Local Network 1 Local Network 2

Transmission Path

Figure 2

[Page 6]

September 1981
Internet Protocol
Overview

2.3. Function Description

The function or purpose of Internet Protocol is to move datagrams
through an interconnected set of networks. This is done by passing
the datagrams from one internet module to another until the
destination is reached. The internet modules reside in hosts and
gateways in the internet system. The datagrams are routed from one
internet module to another through individual networks based on the
interpretation of an internet address. Thus, one important mechanism
of the internet protocol is the internet address.

In the routing of messages from one internet module to another,
datagrams may need to traverse a network whose maximum packet size is
smaller than the size of the datagram. To overcome this difficulty, a
fragmentation mechanism is provided in the internet protocol.

Addressing

A distinction is made between names, addresses, and routes [4]. A
name indicates what we seek. An address indicates where it is. A
route indicates how to get there. The internet protocol deals
primarily with addresses. It is the task of higher level (i.e.,
host-to-host or application) protocols to make the mapping from
names to addresses. The internet module maps internet addresses to
local net addresses. It is the task of lower level (i.e., local net
or gateways) procedures to make the mapping from local net addresses
to routes.

Addresses are fixed length of four octets (32 bits). An address
begins with a network number, followed by local address (called the
"rest" field). There are three formats or classes of internet
addresses: in class a, the high order bit is zero, the next 7 bits
are the network, and the last 24 bits are the local address; in
class b, the high order two bits are one-zero, the next 14 bits are
the network and the last 16 bits are the local address; in class c,
the high order three bits are one-one-zero, the next 21 bits are the
network and the last 8 bits are the local address.

Care must be taken in mapping internet addresses to local net
addresses; a single physical host must be able to act as if it were
several distinct hosts to the extent of using several distinct
internet addresses. Some hosts will also have several physical
interfaces (multi-homing).

That is, provision must be made for a host to have several physical
interfaces to the network with each having several logical internet
addresses.

[Page 7]

September 1981
Internet Protocol
Overview

Examples of address mappings may be found in "Address Mappings" [5].

Fragmentation

Fragmentation of an internet datagram is necessary when it
originates in a local net that allows a large packet size and must
traverse a local net that limits packets to a smaller size to reach
its destination.

An internet datagram can be marked "don't fragment." Any internet
datagram so marked is not to be internet fragmented under any
circumstances. If internet datagram marked don't fragment cannot be
delivered to its destination without fragmenting it, it is to be
discarded instead.

Fragmentation, transmission and reassembly across a local network
which is invisible to the internet protocol module is called
intranet fragmentation and may be used [6].

The internet fragmentation and reassembly procedure needs to be able
to break a datagram into an almost arbitrary number of pieces that
can be later reassembled. The receiver of the fragments uses the
identification field to ensure that fragments of different datagrams
are not mixed. The fragment offset field tells the receiver the
position of a fragment in the original datagram. The fragment
offset and length determine the portion of the original datagram
covered by this fragment. The more-fragments flag indicates (by
being reset) the last fragment. These fields provide sufficient
information to reassemble datagrams.

The identification field is used to distinguish the fragments of one
datagram from those of another. The originating protocol module of
an internet datagram sets the identification field to a value that
must be unique for that source-destination pair and protocol for the
time the datagram will be active in the internet system. The
originating protocol module of a complete datagram sets the
more-fragments flag to zero and the fragment offset to zero.

To fragment a long internet datagram, an internet protocol module
(for example, in a gateway), creates two new internet datagrams and
copies the contents of the internet header fields from the long
datagram into both new internet headers. The data of the long
datagram is divided into two portions on a 8 octet (64 bit) boundary
(the second portion might not be an integral multiple of 8 octets,
but the first must be). Call the number of 8 octet blocks in the
first portion NFB (for Number of Fragment Blocks). The first
portion of the data is placed in the first new internet datagram,
and the total length field is set to the length of the first

[Page 8]

September 1981
Internet Protocol
Overview

datagram. The more-fragments flag is set to one. The second
portion of the data is placed in the second new internet datagram,
and the total length field is set to the length of the second
datagram. The more-fragments flag carries the same value as the
long datagram. The fragment offset field of the second new internet
datagram is set to the value of that field in the long datagram plus
NFB.

This procedure can be generalized for an n-way split, rather than
the two-way split described.

To assemble the fragments of an internet datagram, an internet
protocol module (for example at a destination host) combines
internet datagrams that all have the same value for the four fields:
identification, source, destination, and protocol. The combination
is done by placing the data portion of each fragment in the relative
position indicated by the fragment offset in that fragment's
internet header. The first fragment will have the fragment offset
zero, and the last fragment will have the more-fragments flag reset
to zero.

2.4. Gateways

Gateways implement internet protocol to forward datagrams between
networks. Gateways also implement the Gateway to Gateway Protocol
(GGP) [7] to coordinate routing and other internet control
information.

In a gateway the higher level protocols need not be implemented and
the GGP functions are added to the IP module.

+-------------------------------+
| Internet Protocol & ICMP & GGP|
+-------------------------------+
| |
+---------------+ +---------------+
| Local Net | | Local Net |
+---------------+ +---------------+

Gateway Protocols

Figure 3.

[Page 9]

September 1981
Internet Protocol

[Page 10]

September 1981
Internet Protocol

3. SPECIFICATION

3.1. Internet Header Format

A summary of the contents of the internet header follows:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Example Internet Datagram Header

Figure 4.

Note that each tick mark represents one bit position.

Version: 4 bits

The Version field indicates the format of the internet header. This
document describes version 4.

IHL: 4 bits

Internet Header Length is the length of the internet header in 32
bit words, and thus points to the beginning of the data. Note that
the minimum value for a correct header is 5.

[Page 11]

September 1981
Internet Protocol
Specification

Type of Service: 8 bits

The Type of Service provides an indication of the abstract
parameters of the quality of service desired. These parameters are
to be used to guide the selection of the actual service parameters
when transmitting a datagram through a particular network. Several
networks offer service precedence, which somehow treats high
precedence traffic as more important than other traffic (generally
by accepting only traffic above a certain precedence at time of high
load). The major choice is a three way tradeoff between low-delay,
high-reliability, and high-throughput.

Bits 0-2: Precedence.
Bit 3: 0 = Normal Delay, 1 = Low Delay.
Bits 4: 0 = Normal Throughput, 1 = High Throughput.
Bits 5: 0 = Normal Relibility, 1 = High Relibility.
Bit 6-7: Reserved for Future Use.

0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| | | | | | |
| PRECEDENCE | D | T | R | 0 | 0 |
| | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+

Precedence

111 - Network Control
110 - Internetwork Control
101 - CRITIC/ECP
100 - Flash Override
011 - Flash
010 - Immediate
001 - Priority
000 - Routine

The use of the Delay, Throughput, and Reliability indications may
increase the cost (in some sense) of the service. In many networks
better performance for one of these parameters is coupled with worse
performance on another. Except for very unusual cases at most two
of these three indications should be set.

The type of service is used to specify the treatment of the datagram
during its transmission through the internet system. Example
mappings of the internet type of service to the actual service
provided on networks such as AUTODIN II, ARPANET, SATNET, and PRNET
is given in "Service Mappings" [8].

[Page 12]

September 1981
Internet Protocol
Specification

The Network Control precedence designation is intended to be used
within a network only. The actual use and control of that
designation is up to each network. The Internetwork Control
designation is intended for use by gateway control originators only.
If the actual use of these precedence designations is of concern to
a particular network, it is the responsibility of that network to
control the access to, and use of, those precedence designations.

Total Length: 16 bits

Total Length is the length of the datagram, measured in octets,
including internet header and data. This field allows the length of
a datagram to be up to 65,535 octets. Such long datagrams are
impractical for most hosts and networks. All hosts must be prepared
to accept datagrams of up to 576 octets (whether they arrive whole
or in fragments). It is recommended that hosts only send datagrams
larger than 576 octets if they have assurance that the destination
is prepared to accept the larger datagrams.

The number 576 is selected to allow a reasonable sized data block to
be transmitted in addition to the required header information. For
example, this size allows a data block of 512 octets plus 64 header
octets to fit in a datagram. The maximal internet header is 60
octets, and a typical internet header is 20 octets, allowing a
margin for headers of higher level protocols.

Identification: 16 bits

An identifying value assigned by the sender to aid in assembling the
fragments of a datagram.

Flags: 3 bits

Various Control Flags.

Bit 0: reserved, must be zero
Bit 1: (DF) 0 = May Fragment, 1 = Don't Fragment.
Bit 2: (MF) 0 = Last Fragment, 1 = More Fragments.

0 1 2
+---+---+---+
| | D | M |
| 0 | F | F |
+---+---+---+

Fragment Offset: 13 bits

This field indicates where in the datagram this fragment belongs.

[Page 13]

September 1981
Internet Protocol
Specification

The fragment offset is measured in units of 8 octets (64 bits). The
first fragment has offset zero.

Time to Live: 8 bits

This field indicates the maximum time the datagram is allowed to
remain in the internet system. If this field contains the value
zero, then the datagram must be destroyed. This field is modified
in internet header processing. The time is measured in units of
seconds, but since every module that processes a datagram must
decrease the TTL by at least one even if it process the datagram in
less than a second, the TTL must be thought of only as an upper
bound on the time a datagram may exist. The intention is to cause
undeliverable datagrams to be discarded, and to bound the maximum
datagram lifetime.

Protocol: 8 bits

This field indicates the next level protocol used in the data
portion of the internet datagram. The values for various protocols
are specified in "Assigned Numbers" [9].

Header Checksum: 16 bits

A checksum on the header only. Since some header fields change
(e.g., time to live), this is recomputed and verified at each point
that the internet header is processed.

The checksum algorithm is:

The checksum field is the 16 bit one's complement of the one's
complement sum of all 16 bit words in the header. For purposes of
computing the checksum, the value of the checksum field is zero.

This is a simple to compute checksum and experimental evidence
indicates it is adequate, but it is provisional and may be replaced
by a CRC procedure, depending on further experience.

Source Address: 32 bits

The source address. See section 3.2.

Destination Address: 32 bits

The destination address. See section 3.2.

[Page 14]

September 1981
Internet Protocol
Specification

Options: variable

The options may appear or not in datagrams. They must be
implemented by all IP modules (host and gateways). What is optional
is their transmission in any particular datagram, not their
implementation.

In some environments the security option may be required in all
datagrams.

The option field is variable in length. There may be zero or more
options. There are two cases for the format of an option:

Case 1: A single octet of option-type.

Case 2: An option-type octet, an option-length octet, and the
actual option-data octets.

The option-length octet counts the option-type octet and the
option-length octet as well as the option-data octets.

The option-type octet is viewed as having 3 fields:

1 bit copied flag,
2 bits option class,
5 bits option number.

The copied flag indicates that this option is copied into all
fragments on fragmentation.

0 = not copied
1 = copied

The option classes are:

0 = control
1 = reserved for future use
2 = debugging and measurement
3 = reserved for future use

[Page 15]

September 1981
Internet Protocol
Specification

The following internet options are defined:

CLASS NUMBER LENGTH DESCRIPTION
----- ------ ------ -----------
0 0 - End of Option list. This option occupies only
1 octet; it has no length octet.
0 1 - No Operation. This option occupies only 1
octet; it has no length octet.
0 2 11 Security. Used to carry Security,
Compartmentation, User Group (TCC), and
Handling Restriction Codes compatible with DOD
requirements.
0 3 var. Loose Source Routing. Used to route the
internet datagram based on information
supplied by the source.
0 9 var. Strict Source Routing. Used to route the
internet datagram based on information
supplied by the source.
0 7 var. Record Route. Used to trace the route an
internet datagram takes.
0 8 4 Stream ID. Used to carry the stream
identifier.
2 4 var. Internet Timestamp.

Specific Option Definitions

End of Option List

+--------+
|00000000|
+--------+
Type=0

This option indicates the end of the option list. This might
not coincide with the end of the internet header according to
the internet header length. This is used at the end of all
options, not the end of each option, and need only be used if
the end of the options would not otherwise coincide with the end
of the internet header.

May be copied, introduced, or deleted on fragmentation, or for
any other reason.

[Page 16]

September 1981
Internet Protocol
Specification

No Operation

+--------+
|00000001|
+--------+
Type=1

This option may be used between options, for example, to align
the beginning of a subsequent option on a 32 bit boundary.

May be copied, introduced, or deleted on fragmentation, or for
any other reason.

Security

This option provides a way for hosts to send security,
compartmentation, handling restrictions, and TCC (closed user
group) parameters. The format for this option is as follows:

+--------+--------+---//---+---//---+---//---+---//---+
|10000010|00001011|SSS SSS|CCC CCC|HHH HHH| TCC |
+--------+--------+---//---+---//---+---//---+---//---+
Type=130 Length=11

Security (S field): 16 bits

Specifies one of 16 levels of security (eight of which are
reserved for future use).

00000000 00000000 - Unclassified
11110001 00110101 - Confidential
01111000 10011010 - EFTO
10111100 01001101 - MMMM
01011110 00100110 - PROG
10101111 00010011 - Restricted
11010111 10001000 - Secret
01101011 11000101 - Top Secret
00110101 11100010 - (Reserved for future use)
10011010 11110001 - (Reserved for future use)
01001101 01111000 - (Reserved for future use)
00100100 10111101 - (Reserved for future use)
00010011 01011110 - (Reserved for future use)
10001001 10101111 - (Reserved for future use)
11000100 11010110 - (Reserved for future use)
11100010 01101011 - (Reserved for future use)

[Page 17]

September 1981
Internet Protocol
Specification

Compartments (C field): 16 bits

An all zero value is used when the information transmitted is
not compartmented. Other values for the compartments field
may be obtained from the Defense Intelligence Agency.

Handling Restrictions (H field): 16 bits

The values for the control and release markings are
alphanumeric digraphs and are defined in the Defense
Intelligence Agency Manual DIAM 65-19, "Standard Security
Markings".

Transmission Control Code (TCC field): 24 bits

Provides a means to segregate traffic and define controlled
communities of interest among subscribers. The TCC values are
trigraphs, and are available from HQ DCA Code 530.

Must be copied on fragmentation. This option appears at most
once in a datagram.

Loose Source and Record Route

+--------+--------+--------+---------//--------+
|10000011| length | pointer| route data |
+--------+--------+--------+---------//--------+
Type=131

The loose source and record route (LSRR) option provides a means
for the source of an internet datagram to supply routing
information to be used by the gateways in forwarding the
datagram to the destination, and to record the route
information.

The option begins with the option type code. The second octet
is the option length which includes the option type code and the
length octet, the pointer octet, and length-3 octets of route
data. The third octet is the pointer into the route data
indicating the octet which begins the next source address to be
processed. The pointer is relative to this option, and the
smallest legal value for the pointer is 4.

A route data is composed of a series of internet addresses.
Each internet address is 32 bits or 4 octets. If the pointer is
greater than the length, the source route is empty (and the
recorded route full) and the routing is to be based on the
destination address field.

[Page 18]

September 1981
Internet Protocol
Specification

If the address in destination address field has been reached and
the pointer is not greater than the length, the next address in
the source route replaces the address in the destination address
field, and the recorded route address replaces the source
address just used, and pointer is increased by four.

The recorded route address is the internet module's own internet
address as known in the environment into which this datagram is
being forwarded.

This procedure of replacing the source route with the recorded
route (though it is in the reverse of the order it must be in to
be used as a source route) means the option (and the IP header
as a whole) remains a constant length as the datagram progresses
through the internet.

This option is a loose source route because the gateway or host
IP is allowed to use any route of any number of other
intermediate gateways to reach the next address in the route.

Must be copied on fragmentation. Appears at most once in a
datagram.

Strict Source and Record Route

+--------+--------+--------+---------//--------+
|10001001| length | pointer| route data |
+--------+--------+--------+---------//--------+
Type=137

The strict source and record route (SSRR) option provides a
means for the source of an internet datagram to supply routing
information to be used by the gateways in forwarding the
datagram to the destination, and to record the route
information.

The option begins with the option type code. The second octet
is the option length which includes the option type code and the
length octet, the pointer octet, and length-3 octets of route
data. The third octet is the pointer into the route data
indicating the octet which begins the next source address to be
processed. The pointer is relative to this option, and the
smallest legal value for the pointer is 4.

A route data is composed of a series of internet addresses.
Each internet address is 32 bits or 4 octets. If the pointer is
greater than the length, the source route is empty (and the

[Page 19]

September 1981
Internet Protocol
Specification

recorded route full) and the routing is to be based on the
destination address field.

If the address in destination address field has been reached and
the pointer is not greater than the length, the next address in
the source route replaces the address in the destination address
field, and the recorded route address replaces the source
address just used, and pointer is increased by four.

The recorded route address is the internet module's own internet
address as known in the environment into which this datagram is
being forwarded.

This procedure of replacing the source route with the recorded
route (though it is in the reverse of the order it must be in to
be used as a source route) means the option (and the IP header
as a whole) remains a constant length as the datagram progresses
through the internet.

This option is a strict source route because the gateway or host
IP must send the datagram directly to the next address in the
source route through only the directly connected network
indicated in the next address to reach the next gateway or host
specified in the route.

Must be copied on fragmentation. Appears at most once in a
datagram.

Record Route

+--------+--------+--------+---------//--------+
|00000111| length | pointer| route data |
+--------+--------+--------+---------//--------+
Type=7

The record route option provides a means to record the route of
an internet datagram.

The option begins with the option type code. The second octet
is the option length which includes the option type code and the
length octet, the pointer octet, and length-3 octets of route
data. The third octet is the pointer into the route data
indicating the octet which begins the next area to store a route
address. The pointer is relative to this option, and the
smallest legal value for the pointer is 4.

A recorded route is composed of a series of internet addresses.
Each internet address is 32 bits or 4 octets. If the pointer is

[Page 20]

September 1981
Internet Protocol
Specification

greater than the length, the recorded route data area is full.
The originating host must compose this option with a large
enough route data area to hold all the address expected. The
size of the option does not change due to adding addresses. The
intitial contents of the route data area must be zero.

When an internet module routes a datagram it checks to see if
the record route option is present. If it is, it inserts its
own internet address as known in the environment into which this
datagram is being forwarded into the recorded route begining at
the octet indicated by the pointer, and increments the pointer
by four.

If the route data area is already full (the pointer exceeds the
length) the datagram is forwarded without inserting the address
into the recorded route. If there is some room but not enough
room for a full address to be inserted, the original datagram is
considered to be in error and is discarded. In either case an
ICMP parameter problem message may be sent to the source
host [3].

Not copied on fragmentation, goes in first fragment only.
Appears at most once in a datagram.

Stream Identifier

+--------+--------+--------+--------+
|10001000|00000010| Stream ID |
+--------+--------+--------+--------+
Type=136 Length=4

This option provides a way for the 16-bit SATNET stream
identifier to be carried through networks that do not support
the stream concept.

Must be copied on fragmentation. Appears at most once in a
datagram.

[Page 21]

September 1981
Internet Protocol
Specification

Internet Timestamp

+--------+--------+--------+--------+
|01000100| length | pointer|oflw|flg|
+--------+--------+--------+--------+
| internet address |
+--------+--------+--------+--------+
| timestamp |
+--------+--------+--------+--------+
| . |
.
.
Type = 68

The Option Length is the number of octets in the option counting
the type, length, pointer, and overflow/flag octets (maximum
length 40).

The Pointer is the number of octets from the beginning of this
option to the end of timestamps plus one (i.e., it points to the
octet beginning the space for next timestamp). The smallest
legal value is 5. The timestamp area is full when the pointer
is greater than the length.

The Overflow (oflw) [4 bits] is the number of IP modules that
cannot register timestamps due to lack of space.

The Flag (flg) [4 bits] values are

0 -- time stamps only, stored in consecutive 32-bit words,

1 -- each timestamp is preceded with internet address of the
registering entity,

3 -- the internet address fields are prespecified. An IP
module only registers its timestamp if it matches its own
address with the next specified internet address.

The Timestamp is a right-justified, 32-bit timestamp in
milliseconds since midnight UT. If the time is not available in
milliseconds or cannot be provided with respect to midnight UT
then any time may be inserted as a timestamp provided the high
order bit of the timestamp field is set to one to indicate the
use of a non-standard value.

The originating host must compose this option with a large
enough timestamp data area to hold all the timestamp information
expected. The size of the option does not change due to adding

[Page 22]

September 1981
Internet Protocol
Specification

timestamps. The intitial contents of the timestamp data area
must be zero or internet address/zero pairs.

If the timestamp data area is already full (the pointer exceeds
the length) the datagram is forwarded without inserting the
timestamp, but the overflow count is incremented by one.

If there is some room but not enough room for a full timestamp
to be inserted, or the overflow count itself overflows, the
original datagram is considered to be in error and is discarded.
In either case an ICMP parameter problem message may be sent to
the source host [3].

The timestamp option is not copied upon fragmentation. It is
carried in the first fragment. Appears at most once in a
datagram.

Padding: variable

The internet header padding is used to ensure that the internet
header ends on a 32 bit boundary. The padding is zero.

3.2. Discussion

The implementation of a protocol must be robust. Each implementation
must expect to interoperate with others created by different
individuals. While the goal of this specification is to be explicit
about the protocol there is the possibility of differing
interpretations. In general, an implementation must be conservative
in its sending behavior, and liberal in its receiving behavior. That
is, it must be careful to send well-formed datagrams, but must accept
any datagram that it can interpret (e.g., not object to technical
errors where the meaning is still clear).

The basic internet service is datagram oriented and provides for the
fragmentation of datagrams at gateways, with reassembly taking place
at the destination internet protocol module in the destination host.
Of course, fragmentation and reassembly of datagrams within a network
or by private agreement between the gateways of a network is also
allowed since this is transparent to the internet protocols and the
higher-level protocols. This transparent type of fragmentation and
reassembly is termed "network-dependent" (or intranet) fragmentation
and is not discussed further here.

Internet addresses distinguish sources and destinations to the host
level and provide a protocol field as well. It is assumed that each
protocol will provide for whatever multiplexing is necessary within a
host.

[Page 23]

September 1981
Internet Protocol
Specification

Addressing

To provide for flexibility in assigning address to networks and
allow for the large number of small to intermediate sized networks
the interpretation of the address field is coded to specify a small
number of networks with a large number of host, a moderate number of
networks with a moderate number of hosts, and a large number of
networks with a small number of hosts. In addition there is an
escape code for extended addressing mode.

Address Formats:

High Order Bits Format Class
--------------- ------------------------------- -----
0 7 bits of net, 24 bits of host a
10 14 bits of net, 16 bits of host b
110 21 bits of net, 8 bits of host c
111 escape to extended addressing mode

A value of zero in the network field means this network. This is
only used in certain ICMP messages. The extended addressing mode
is undefined. Both of these features are reserved for future use.

The actual values assigned for network addresses is given in
"Assigned Numbers" [9].

The local address, assigned by the local network, must allow for a
single physical host to act as several distinct internet hosts.
That is, there must be a mapping between internet host addresses and
network/host interfaces that allows several internet addresses to
correspond to one interface. It must also be allowed for a host to
have several physical interfaces and to treat the datagrams from
several of them as if they were all addressed to a single host.

Address mappings between internet addresses and addresses for
ARPANET, SATNET, PRNET, and other networks are described in "Address
Mappings" [5].

Fragmentation and Reassembly.

The internet identification field (ID) is used together with the
source and destination address, and the protocol fields, to identify
datagram fragments for reassembly.

The More Fragments flag bit (MF) is set if the datagram is not the
last fragment. The Fragment Offset field identifies the fragment
location, relative to the beginning of the original unfragmented
datagram. Fragments are counted in units of 8 octets. The

[Page 24]

September 1981
Internet Protocol
Specification

fragmentation strategy is designed so than an unfragmented datagram
has all zero fragmentation information (MF = 0, fragment offset =
0). If an internet datagram is fragmented, its data portion must be
broken on 8 octet boundaries.

This format allows 2**13 = 8192 fragments of 8 octets each for a
total of 65,536 octets. Note that this is consistent with the the
datagram total length field (of course, the header is counted in the
total length and not in the fragments).

When fragmentation occurs, some options are copied, but others
remain with the first fragment only.

Every internet module must be able to forward a datagram of 68
octets without further fragmentation. This is because an internet
header may be up to 60 octets, and the minimum fragment is 8 octets.

Every internet destination must be able to receive a datagram of 576
octets either in one piece or in fragments to be reassembled.

The fields which may be affected by fragmentation include:

(1) options field
(2) more fragments flag
(3) fragment offset
(4) internet header length field
(5) total length field
(6) header checksum

If the Don't Fragment flag (DF) bit is set, then internet
fragmentation of this datagram is NOT permitted, although it may be
discarded. This can be used to prohibit fragmentation in cases
where the receiving host does not have sufficient resources to
reassemble internet fragments.

One example of use of the Don't Fragment feature is to down line
load a small host. A small host could have a boot strap program
that accepts a datagram stores it in memory and then executes it.

The fragmentation and reassembly procedures are most easily
described by examples. The following procedures are example
implementations.

General notation in the following pseudo programs: "=<" means "less
than or equal", "#" means "not equal", "=" means "equal", "<-" means
"is set to". Also, "x to y" includes x and excludes y; for example,
"4 to 7" would include 4, 5, and 6 (but not 7).

[Page 25]

September 1981
Internet Protocol
Specification

An Example Fragmentation Procedure

The maximum sized datagram that can be transmitted through the
next network is called the maximum transmission unit (MTU).

If the total length is less than or equal the maximum transmission
unit then submit this datagram to the next step in datagram
processing; otherwise cut the datagram into two fragments, the
first fragment being the maximum size, and the second fragment
being the rest of the datagram. The first fragment is submitted
to the next step in datagram processing, while the second fragment
is submitted to this procedure in case it is still too large.

Notation:

FO - Fragment Offset
IHL - Internet Header Length
DF - Don't Fragment flag
MF - More Fragments flag
TL - Total Length
OFO - Old Fragment Offset
OIHL - Old Internet Header Length
OMF - Old More Fragments flag
OTL - Old Total Length
NFB - Number of Fragment Blocks
MTU - Maximum Transmission Unit

Procedure:

IF TL =< MTU THEN Submit this datagram to the next step
in datagram processing ELSE IF DF = 1 THEN discard the
datagram ELSE
To produce the first fragment:
(1) Copy the original internet header;
(2) OIHL <- IHL; OTL <- TL; OFO <- FO; OMF <- MF;
(3) NFB <- (MTU-IHL*4)/8;
(4) Attach the first NFB*8 data octets;
(5) Correct the header:
MF <- 1; TL <- (IHL*4)+(NFB*8);
Recompute Checksum;
(6) Submit this fragment to the next step in
datagram processing;
To produce the second fragment:
(7) Selectively copy the internet header (some options
are not copied, see option definitions);
(8) Append the remaining data;
(9) Correct the header:
IHL <- (((OIHL*4)-(length of options not copied))+3)/4;

[Page 26]

September 1981
Internet Protocol
Specification

TL <- OTL - NFB*8 - (OIHL-IHL)*4);
FO <- OFO + NFB; MF <- OMF; Recompute Checksum;
(10) Submit this fragment to the fragmentation test; DONE.

In the above procedure each fragment (except the last) was made
the maximum allowable size. An alternative might produce less
than the maximum size datagrams. For example, one could implement
a fragmentation procedure that repeatly divided large datagrams in
half until the resulting fragments were less than the maximum
transmission unit size.

An Example Reassembly Procedure

For each datagram the buffer identifier is computed as the
concatenation of the source, destination, protocol, and
identification fields. If this is a whole datagram (that is both
the fragment offset and the more fragments fields are zero), then
any reassembly resources associated with this buffer identifier
are released and the datagram is forwarded to the next step in
datagram processing.

If no other fragment with this buffer identifier is on hand then
reassembly resources are allocated. The reassembly resources
consist of a data buffer, a header buffer, a fragment block bit
table, a total data length field, and a timer. The data from the
fragment is placed in the data buffer according to its fragment
offset and length, and bits are set in the fragment block bit
table corresponding to the fragment blocks received.

If this is the first fragment (that is the fragment offset is
zero) this header is placed in the header buffer. If this is the
last fragment ( that is the more fragments field is zero) the
total data length is computed. If this fragment completes the
datagram (tested by checking the bits set in the fragment block
table), then the datagram is sent to the next step in datagram
processing; otherwise the timer is set to the maximum of the
current timer value and the value of the time to live field from
this fragment; and the reassembly routine gives up control.

If the timer runs out, the all reassembly resources for this
buffer identifier are released. The initial setting of the timer
is a lower bound on the reassembly waiting time. This is because
the waiting time will be increased if the Time to Live in the
arriving fragment is greater than the current timer value but will
not be decreased if it is less. The maximum this timer value
could reach is the maximum time to live (approximately 4.25
minutes). The current recommendation for the initial timer
setting is 15 seconds. This may be changed as experience with

[Page 27]

September 1981
Internet Protocol
Specification

this protocol accumulates. Note that the choice of this parameter
value is related to the buffer capacity available and the data
rate of the transmission medium; that is, data rate times timer
value equals buffer size (e.g., 10Kb/s X 15s = 150Kb).

Notation:

FO - Fragment Offset
IHL - Internet Header Length
MF - More Fragments flag
TTL - Time To Live
NFB - Number of Fragment Blocks
TL - Total Length
TDL - Total Data Length
BUFID - Buffer Identifier
RCVBT - Fragment Received Bit Table
TLB - Timer Lower Bound

Procedure:

(1) BUFID <- source|destination|protocol|identification;
(2) IF FO = 0 AND MF = 0
(3) THEN IF buffer with BUFID is allocated
(4) THEN flush all reassembly for this BUFID;
(5) Submit datagram to next step; DONE.
(6) ELSE IF no buffer with BUFID is allocated
(7) THEN allocate reassembly resources
with BUFID;
TIMER <- TLB; TDL <- 0;
(8) put data from fragment into data buffer with
BUFID from octet FO*8 to
octet (TL-(IHL*4))+FO*8;
(9) set RCVBT bits from FO
to FO+((TL-(IHL*4)+7)/8);
(10) IF MF = 0 THEN TDL <- TL-(IHL*4)+(FO*8)
(11) IF FO = 0 THEN put header in header buffer
(12) IF TDL # 0
(13) AND all RCVBT bits from 0
to (TDL+7)/8 are set
(14) THEN TL <- TDL+(IHL*4)
(15) Submit datagram to next step;
(16) free all reassembly resources
for this BUFID; DONE.
(17) TIMER <- MAX(TIMER,TTL);
(18) give up until next fragment or timer expires;
(19) timer expires: flush all reassembly with this BUFID; DONE.

In the case that two or more fragments contain the same data

[Page 28]

September 1981
Internet Protocol
Specification

either identically or through a partial overlap, this procedure
will use the more recently arrived copy in the data buffer and
datagram delivered.

Identification

The choice of the Identifier for a datagram is based on the need to
provide a way to uniquely identify the fragments of a particular
datagram. The protocol module assembling fragments judges fragments
to belong to the same datagram if they have the same source,
destination, protocol, and Identifier. Thus, the sender must choose
the Identifier to be unique for this source, destination pair and
protocol for the time the datagram (or any fragment of it) could be
alive in the internet.

It seems then that a sending protocol module needs to keep a table
of Identifiers, one entry for each destination it has communicated
with in the last maximum packet lifetime for the internet.

However, since the Identifier field allows 65,536 different values,
some host may be able to simply use unique identifiers independent
of destination.

It is appropriate for some higher level protocols to choose the
identifier. For example, TCP protocol modules may retransmit an
identical TCP segment, and the probability for correct reception
would be enhanced if the retransmission carried the same identifier
as the original transmission since fragments of either datagram
could be used to construct a correct TCP segment.

Type of Service

The type of service (TOS) is for internet service quality selection.
The type of service is specified along the abstract parameters
precedence, delay, throughput, and reliability. These abstract
parameters are to be mapped into the actual service parameters of
the particular networks the datagram traverses.

Precedence. An independent measure of the importance of this
datagram.

Delay. Prompt delivery is important for datagrams with this
indication.

Throughput. High data rate is important for datagrams with this
indication.

[Page 29]

September 1981
Internet Protocol
Specification

Reliability. A higher level of effort to ensure delivery is
important for datagrams with this indication.

For example, the ARPANET has a priority bit, and a choice between
"standard" messages (type 0) and "uncontrolled" messages (type 3),
(the choice between single packet and multipacket messages can also
be considered a service parameter). The uncontrolled messages tend
to be less reliably delivered and suffer less delay. Suppose an
internet datagram is to be sent through the ARPANET. Let the
internet type of service be given as:

Precedence: 5
Delay: 0
Throughput: 1
Reliability: 1

In this example, the mapping of these parameters to those available
for the ARPANET would be to set the ARPANET priority bit on since
the Internet precedence is in the upper half of its range, to select
standard messages since the throughput and reliability requirements
are indicated and delay is not. More details are given on service
mappings in "Service Mappings" [8].

Time to Live

The time to live is set by the sender to the maximum time the
datagram is allowed to be in the internet system. If the datagram
is in the internet system longer than the time to live, then the
datagram must be destroyed.

This field must be decreased at each point that the internet header
is processed to reflect the time spent processing the datagram.
Even if no local information is available on the time actually
spent, the field must be decremented by 1. The time is measured in
units of seconds (i.e. the value 1 means one second). Thus, the
maximum time to live is 255 seconds or 4.25 minutes. Since every
module that processes a datagram must decrease the TTL by at least
one even if it process the datagram in less than a second, the TTL
must be thought of only as an upper bound on the time a datagram may
exist. The intention is to cause undeliverable datagrams to be
discarded, and to bound the maximum datagram lifetime.

Some higher level reliable connection protocols are based on
assumptions that old duplicate datagrams will not arrive after a
certain time elapses. The TTL is a way for such protocols to have
an assurance that their assumption is met.

[Page 30]

September 1981
Internet Protocol
Specification

Options

The options are optional in each datagram, but required in
implementations. That is, the presence or absence of an option is
the choice of the sender, but each internet module must be able to
parse every option. There can be several options present in the
option field.

The options might not end on a 32-bit boundary. The internet header
must be filled out with octets of zeros. The first of these would
be interpreted as the end-of-options option, and the remainder as
internet header padding.

Every internet module must be able to act on every option. The
Security Option is required if classified, restricted, or
compartmented traffic is to be passed.

Checksum

The internet header checksum is recomputed if the internet header is
changed. For example, a reduction of the time to live, additions or
changes to internet options, or due to fragmentation. This checksum
at the internet level is intended to protect the internet header
fields from transmission errors.

There are some applications where a few data bit errors are
acceptable while retransmission delays are not. If the internet
protocol enforced data correctness such applications could not be
supported.

Errors

Internet protocol errors may be reported via the ICMP messages [3].

3.3. Interfaces

The functional description of user interfaces to the IP is, at best,
fictional, since every operating system will have different
facilities. Consequently, we must warn readers that different IP
implementations may have different user interfaces. However, all IPs
must provide a certain minimum set of services to guarantee that all
IP implementations can support the same protocol hierarchy. This
section specifies the functional interfaces required of all IP
implementations.

Internet protocol interfaces on one side to the local network and on
the other side to either a higher level protocol or an application
program. In the following, the higher level protocol or application

[Page 31]

September 1981
Internet Protocol
Specification

program (or even a gateway program) will be called the "user" since it
is using the internet module. Since internet protocol is a datagram
protocol, there is minimal memory or state maintained between datagram
transmissions, and each call on the internet protocol module by the
user supplies all information necessary for the IP to perform the
service requested.

An Example Upper Level Interface

The following two example calls satisfy the requirements for the user
to internet protocol module communication ("=>" means returns):

SEND (src, dst, prot, TOS, TTL, BufPTR, len, Id, DF, opt => result)

where:

src = source address
dst = destination address
prot = protocol
TOS = type of service
TTL = time to live
BufPTR = buffer pointer
len = length of buffer
Id = Identifier
DF = Don't Fragment
opt = option data
result = response
OK = datagram sent ok
Error = error in arguments or local network error

Note that the precedence is included in the TOS and the
security/compartment is passed as an option.

RECV (BufPTR, prot, => result, src, dst, TOS, len, opt)

where:

BufPTR = buffer pointer
prot = protocol
result = response
OK = datagram received ok
Error = error in arguments
len = length of buffer
src = source address
dst = destination address
TOS = type of service
opt = option data

[Page 32]

September 1981
Internet Protocol
Specification

When the user sends a datagram, it executes the SEND call supplying
all the arguments. The internet protocol module, on receiving this
call, checks the arguments and prepares and sends the message. If the
arguments are good and the datagram is accepted by the local network,
the call returns successfully. If either the arguments are bad, or
the datagram is not accepted by the local network, the call returns
unsuccessfully. On unsuccessful returns, a reasonable report must be
made as to the cause of the problem, but the details of such reports
are up to individual implementations.

When a datagram arrives at the internet protocol module from the local
network, either there is a pending RECV call from the user addressed
or there is not. In the first case, the pending call is satisfied by
passing the information from the datagram to the user. In the second
case, the user addressed is notified of a pending datagram. If the
user addressed does not exist, an ICMP error message is returned to
the sender, and the data is discarded.

The notification of a user may be via a pseudo interrupt or similar
mechanism, as appropriate in the particular operating system
environment of the implementation.

A user's RECV call may then either be immediately satisfied by a
pending datagram, or the call may be pending until a datagram arrives.

The source address is included in the send call in case the sending
host has several addresses (multiple physical connections or logical
addresses). The internet module must check to see that the source
address is one of the legal address for this host.

An implementation may also allow or require a call to the internet
module to indicate interest in or reserve exclusive use of a class of
datagrams (e.g., all those with a certain value in the protocol
field).

This section functionally characterizes a USER/IP interface. The
notation used is similar to most procedure of function calls in high
level languages, but this usage is not meant to rule out trap type
service calls (e.g., SVCs, UUOs, EMTs), or any other form of
interprocess communication.

[Page 33]

September 1981
Internet Protocol

APPENDIX A: Examples & Scenarios

Example 1:

This is an example of the minimal data carrying internet datagram:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 5 |Type of Service| Total Length = 21 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=0| Fragment Offset = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 123 | Protocol = 1 | header checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+

Example Internet Datagram

Figure 5.

Note that each tick mark represents one bit position.

This is a internet datagram in version 4 of internet protocol; the
internet header consists of five 32 bit words, and the total length of
the datagram is 21 octets. This datagram is a complete datagram (not
a fragment).

[Page 34]

September 1981
Internet Protocol

Example 2:

In this example, we show first a moderate size internet datagram (452
data octets), then two internet fragments that might result from the
fragmentation of this datagram if the maximum sized transmission
allowed were 280 octets.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 5 |Type of Service| Total Length = 472 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=0| Fragment Offset = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 123 | Protocol = 6 | header checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
\ \
\ \
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Example Internet Datagram

Figure 6.

[Page 35]

September 1981
Internet Protocol

Now the first fragment that results from splitting the datagram after
256 data octets.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 5 |Type of Service| Total Length = 276 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=1| Fragment Offset = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 119 | Protocol = 6 | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
\ \
\ \
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Example Internet Fragment

Figure 7.

[Page 36]

September 1981
Internet Protocol

And the second fragment.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 5 |Type of Service| Total Length = 216 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=0| Fragment Offset = 32 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 119 | Protocol = 6 | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
\ \
\ \
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Example Internet Fragment

Figure 8.

[Page 37]

September 1981
Internet Protocol

Example 3:

Here, we show an example of a datagram containing options:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 8 |Type of Service| Total Length = 576 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=0| Fragment Offset = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 123 | Protocol = 6 | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opt. Code = x | Opt. Len.= 3 | option value | Opt. Code = x |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opt. Len. = 4 | option value | Opt. Code = 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opt. Code = y | Opt. Len. = 3 | option value | Opt. Code = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
\ \
\ \
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Example Internet Datagram

Figure 9.

[Page 38]

September 1981
Internet Protocol

APPENDIX B: Data Transmission Order

The order of transmission of the header and data described in this
document is resolved to the octet level. Whenever a diagram shows a
group of octets, the order of transmission of those octets is the normal
order in which they are read in English. For example, in the following
diagram the octets are transmitted in the order they are numbered.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 1 | 2 | 3 | 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 5 | 6 | 7 | 8 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 9 | 10 | 11 | 12 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Transmission Order of Bytes

Figure 10.

Whenever an octet represents a numeric quantity the left most bit in the
diagram is the high order or most significant bit. That is, the bit
labeled 0 is the most significant bit. For example, the following
diagram represents the value 170 (decimal).

0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|1 0 1 0 1 0 1 0|
+-+-+-+-+-+-+-+-+

Significance of Bits

Figure 11.

Similarly, whenever a multi-octet field represents a numeric quantity
the left most bit of the whole field is the most significant bit. When
a multi-octet quantity is transmitted the most significant octet is
transmitted first.

[Page 39]

September 1981
Internet Protocol

[Page 40]

September 1981
Internet Protocol

GLOSSARY

1822
BBN Report 1822, "The Specification of the Interconnection of
a Host and an IMP". The specification of interface between a
host and the ARPANET.

ARPANET leader
The control information on an ARPANET message at the host-IMP
interface.

ARPANET message
The unit of transmission between a host and an IMP in the
ARPANET. The maximum size is about 1012 octets (8096 bits).

ARPANET packet
A unit of transmission used internally in the ARPANET between
IMPs. The maximum size is about 126 octets (1008 bits).

Destination
The destination address, an internet header field.

DF
The Don't Fragment bit carried in the flags field.

Flags
An internet header field carrying various control flags.

Fragment Offset
This internet header field indicates where in the internet
datagram a fragment belongs.

GGP
Gateway to Gateway Protocol, the protocol used primarily
between gateways to control routing and other gateway
functions.

header
Control information at the beginning of a message, segment,
datagram, packet or block of data.

ICMP
Internet Control Message Protocol, implemented in the internet
module, the ICMP is used from gateways to hosts and between
hosts to report errors and make routing suggestions.

[Page 41]

September 1981
Internet Protocol
Glossary

Identification
An internet header field carrying the identifying value
assigned by the sender to aid in assembling the fragments of a
datagram.

IHL
The internet header field Internet Header Length is the length
of the internet header measured in 32 bit words.

IMP
The Interface Message Processor, the packet switch of the
ARPANET.

Internet Address
A four octet (32 bit) source or destination address consisting
of a Network field and a Local Address field.

internet datagram
The unit of data exchanged between a pair of internet modules
(includes the internet header).

internet fragment
A portion of the data of an internet datagram with an internet
header.

Local Address
The address of a host within a network. The actual mapping of
an internet local address on to the host addresses in a
network is quite general, allowing for many to one mappings.

MF
The More-Fragments Flag carried in the internet header flags
field.

module
An implementation, usually in software, of a protocol or other
procedure.

more-fragments flag
A flag indicating whether or not this internet datagram
contains the end of an internet datagram, carried in the
internet header Flags field.

NFB
The Number of Fragment Blocks in a the data portion of an
internet fragment. That is, the length of a portion of data
measured in 8 octet units.

[Page 42]

September 1981
Internet Protocol
Glossary

octet
An eight bit byte.

Options
The internet header Options field may contain several options,
and each option may be several octets in length.

Padding
The internet header Padding field is used to ensure that the
data begins on 32 bit word boundary. The padding is zero.

Protocol
In this document, the next higher level protocol identifier,
an internet header field.

Rest
The local address portion of an Internet Address.

Source
The source address, an internet header field.

TCP
Transmission Control Protocol: A host-to-host protocol for
reliable communication in internet environments.

TCP Segment
The unit of data exchanged between TCP modules (including the
TCP header).

TFTP
Trivial File Transfer Protocol: A simple file transfer
protocol built on UDP.

Time to Live
An internet header field which indicates the upper bound on
how long this internet datagram may exist.

TOS
Type of Service

Total Length
The internet header field Total Length is the length of the
datagram in octets including internet header and data.

TTL
Time to Live

[Page 43]

September 1981
Internet Protocol
Glossary

Type of Service
An internet header field which indicates the type (or quality)
of service for this internet datagram.

UDP
User Datagram Protocol: A user level protocol for transaction
oriented applications.

User
The user of the internet protocol. This may be a higher level
protocol module, an application program, or a gateway program.

Version
The Version field indicates the format of the internet header.

[Page 44]

September 1981
Internet Protocol

REFERENCES

[1] Cerf, V., "The Catenet Model for Internetworking," Information
Processing Techniques Office, Defense Advanced Research Projects
Agency, IEN 48, July 1978.

[2] Bolt Beranek and Newman, "Specification for the Interconnection of
a Host and an IMP," BBN Technical Report 1822, Revised May 1978.

[3] Postel, J., "Internet Control Message Protocol - DARPA Internet
Program Protocol Specification," RFC 792, USC/Information Sciences
Institute, September 1981.

[4] Shoch, J., "Inter-Network Naming, Addressing, and Routing,"
COMPCON, IEEE Computer Society, Fall 1978.

[5] Postel, J., "Address Mappings," RFC 796, USC/Information Sciences
Institute, September 1981.

[6] Shoch, J., "Packet Fragmentation in Inter-Network Protocols,"
Computer Networks, v. 3, n. 1, February 1979.

[7] Strazisar, V., "How to Build a Gateway", IEN 109, Bolt Beranek and
Newman, August 1979.

[8] Postel, J., "Service Mappings," RFC 795, USC/Information Sciences
Institute, September 1981.

[9] Postel, J., "Assigned Numbers," RFC 790, USC/Information Sciences
Institute, September 1981.


RFC 1819 – Internet Stream Protocol Version 2 (ST2) Protocol Specification

 

Network Working Group                                  ST2 Working Group
Request for Comments: 1819           L. Delgrossi and L. Berger, Editors
Obsoletes: 1190, IEN 119                                     August 1995
Category: Experimental
                Internet Stream Protocol Version 2 (ST2)
                 Protocol Specification - Version ST2+
Status of this Memo
   This memo defines an Experimental Protocol for the Internet
   community.  This memo does not specify an Internet standard of any
   kind.  Discussion and suggestions for improvement are requested.
   Distribution of this memo is unlimited.
IESG NOTE
   This document is a revision of RFC1190. The charter of this effort
   was clarifying, simplifying and removing errors from RFC1190 to
   ensure interoperability of implementations.
   NOTE WELL: Neither the version of the protocol described in this
   document nor the previous version is an Internet Standard or under
   consideration for that status.
   Since the publication of the original version of the protocol, there
   have been significant developments in the state of the art.  Readers
   should note that standards and technology addressing alternative
   approaches to the resource reservation problem are currently under
   development within the IETF.
Abstract
   This memo contains a revised specification of the Internet STream
   Protocol Version 2 (ST2). ST2 is an experimental resource reservation
   protocol intended to provide end-to-end real-time guarantees over an
   internet. It allows applications to build multi-destination simplex
   data streams with a desired quality of service. The revised version
   of ST2 specified in this memo is called ST2+.
   This specification is a product of the STream Protocol Working Group
   of the Internet Engineering Task Force.
Table of Contents
     1  Introduction                                                   6
             1.1  What is ST2?                                         6
             1.2  ST2 and IP                                           8
             1.3  Protocol History                                     8
             1.3.1  RFC1190 ST and ST2+ Major Differences              9
             1.4  Supporting Modules for ST2                          10
             1.4.1  Data Transfer Protocol                            11
             1.4.2  Setup Protocol                                    11
             1.4.3  Flow Specification                                11
             1.4.4  Routing Function                                  12
             1.4.5  Local Resource Manager                            12
             1.5  ST2 Basic Concepts                                  15
             1.5.1  Streams                                           16
             1.5.2  Data Transmission                                 16
             1.5.3  Flow Specification                                17
             1.6  Outline of This Document                            19
     2  ST2 User Service Description                                  19
             2.1  Stream Operations and Primitive Functions           19
             2.2  State Diagrams                                      21
             2.3  State Transition Tables                             25
     3  The ST2 Data Transfer Protocol                                26
             3.1  Data Transfer with ST                               26
             3.2  ST Protocol Functions                               27
             3.2.1  Stream Identification                             27
             3.2.2  Packet Discarding based on Data Priority          27
     4  SCMP Functional Description                                   28
             4.1  Types of Streams                                    29
             4.1.1  Stream Building                                   30
             4.1.2  Knowledge of Receivers                            30
             4.2  Control PDUs                                        31
             4.3  SCMP Reliability                                    32
             4.4  Stream Options                                      33
             4.4.1  No Recovery                                       33
             4.4.2  Join Authorization Level                          34
             4.4.3  Record Route                                      34
             4.4.4  User Data                                         35
             4.5  Stream Setup                                        35
             4.5.1  Information from the Application                  35
             4.5.2  Initial Setup at the Origin                       35
             4.5.2.1  Invoking the Routing Function                   36
             4.5.2.2  Reserving Resources                             36
             4.5.3  Sending CONNECT Messages                          37
             4.5.3.1  Empty Target List                               37
             4.5.4  CONNECT Processing by an Intermediate ST agent    37
             4.5.5  CONNECT Processing at the Targets                 38
             4.5.6  ACCEPT Processing by an Intermediate ST agent     38
             4.5.7  ACCEPT Processing by the Origin                   39
             4.5.8  REFUSE Processing by the Intermediate ST agent    39
             4.5.9  REFUSE Processing by the Origin                   39
             4.5.10  Other Functions during Stream Setup              40
             4.6  Modifying an Existing Stream                        40
             4.6.1  The Origin Adding New Targets                     41
             4.6.2  The Origin Removing a Target                      41
             4.6.3  A Target Joining a Stream                         42
             4.6.3.1  Intermediate Agent (Router) as Origin           43
             4.6.4  A Target Deleting Itself                          43
             4.6.5  Changing a Stream's FlowSpec                      44
             4.7  Stream Tear Down                                    45
     5  Exceptional Cases                                             45
             5.1  Long ST Messages                                    45
             5.1.1  Handling of Long Data Packets                     45
             5.1.2  Handling of Long Control Packets                  46
             5.2  Timeout Failures                                    47
             5.2.1  Failure due to ACCEPT Acknowledgment Timeout      47
             5.2.2  Failure due to CHANGE Acknowledgment Timeout      47
             5.2.3  Failure due to CHANGE Response Timeout            48
             5.2.4  Failure due to CONNECT Acknowledgment Timeout     48
             5.2.5  Failure due to CONNECT Response Timeout           48
             5.2.6  Failure due to DISCONNECT Acknowledgment Timeout  48
             5.2.7  Failure due to JOIN Acknowledgment Timeout        48
             5.2.8  Failure due to JOIN Response Timeout              49
             5.2.9  Failure due to JOIN-REJECT Acknowledgment Timeout 49
             5.2.10  Failure due to NOTIFY Acknowledgment Timeout     49
             5.2.11  Failure due to REFUSE Acknowledgment Timeout     49
             5.2.12  Failure due to STATUS Response Timeout           49
             5.3  Setup Failures due to Routing Failures              50
             5.3.1  Path Convergence                                  50
             5.3.2  Other Cases                                       51
             5.4  Problems due to Routing Inconsistency               52
             5.5  Problems in Reserving Resources                     53
             5.5.1  Mismatched FlowSpecs                              53
             5.5.2  Unknown FlowSpec Version                          53
             5.5.3  LRM Unable to Process FlowSpec                    53
             5.5.4  Insufficient Resources                            53
             5.6  Problems Caused by CHANGE Messages                  54
             5.7  Unknown Targets in DISCONNECT and CHANGE            55
     6  Failure Detection and Recovery                                55
             6.1  Failure Detection                                   55
             6.1.1  Network Failures                                  56
             6.1.2  Detecting ST Agents Failures                      56
             6.2  Failure Recovery                                    58
             6.2.1  Problems in Stream Recovery                       60
             6.3  Stream Preemption                                   62
     7  A Group of Streams                                            63
             7.1  Basic Group Relationships                           63
             7.1.1  Bandwidth Sharing                                 63
             7.1.2  Fate Sharing                                      64
             7.1.3  Route Sharing                                     65
             7.1.4  Subnet Resources Sharing                          65
             7.2  Relationships Orthogonality                         65
     8  Ancillary Functions                                           66
             8.1  Stream ID Generation                                66
             8.2  Group Name Generator                                66
             8.3  Checksum Computation                                67
             8.4  Neighbor ST Agent Identification and
                     Information Collection                           67
             8.5  Round Trip Time Estimation                          68
             8.6  Network MTU Discovery                               68
             8.7  IP Encapsulation of ST                              69
             8.8  IP Multicasting                                     70
     9  The ST2+ Flow Specification                                   71
             9.1  FlowSpec Version #0 - (Null FlowSpec)               72
             9.2  FlowSpec Version #7 - ST2+ FlowSpec                 72
             9.2.1  QoS Classes                                       73
             9.2.2  Precedence                                        74
             9.2.3  Maximum Data Size                                 74
             9.2.4  Message Rate                                      74
             9.2.5  Delay and Delay Jitter                            74
             9.2.6  ST2+ FlowSpec Format                              75
     10  ST2 Protocol Data Units Specification                        77
             10.1  Data PDU                                           77
             10.1.1  ST Data Packets                                  78
             10.2  Control PDUs                                       78
             10.3  Common SCMP Elements                               80
             10.3.1  FlowSpec                                         80
             10.3.2  Group                                            81
             10.3.3  MulticastAddress                                 82
             10.3.4  Origin                                           82
             10.3.5  RecordRoute                                      83
             10.3.6  Target and TargetList                            84
             10.3.7  UserData                                         85
             10.3.8  Handling of Undefined Parameters                 86
             10.4  ST Control Message PDUs                            86
             10.4.1  ACCEPT                                           86
             10.4.2  ACK                                              88
             10.4.3  CHANGE                                           89
             10.4.4  CONNECT                                          89
             10.4.5  DISCONNECT                                       92
             10.4.6  ERROR                                            93
             10.4.7  HELLO                                            94
             10.4.8  JOIN                                             95
             10.4.9  JOIN-REJECT                                      96
             10.4.10  NOTIFY                                          97
             10.4.11  REFUSE                                          98
             10.4.12  STATUS                                         100
             10.4.13  STATUS-RESPONSE                                100
             10.5  Suggested Protocol Constants                      101
             10.5.1  SCMP Messages                                   102
             10.5.2  SCMP Parameters                                 102
             10.5.3  ReasonCode                                      102
             10.5.4  Timeouts and Other Constants                    104
             10.6  Data Notations                                    105
     11  References                                                  106
     12  Security Considerations                                     108
     13  Acknowledgments and Authors' Addresses                      108
1.  Introduction
1.1  What is ST2?
   The Internet Stream Protocol, Version 2 (ST2) is an experimental
   connection-oriented internetworking protocol that operates at the
   same layer as connectionless IP. It has been developed to support the
   efficient delivery of data streams to single or multiple destinations
   in applications that require guaranteed quality of service. ST2 is
   part of the IP protocol family and serves as an adjunct to, not a
   replacement for, IP. The main application areas of the protocol are
   the real-time transport of multimedia data, e.g., digital audio and
   video packet streams, and distributed simulation/gaming, across
   internets.
   ST2 can be used to reserve bandwidth for real-time streams across
   network routes. This reservation, together with appropriate network
   access and packet scheduling mechanisms in all nodes running the
   protocol, guarantees a well-defined Quality of Service (QoS) to ST2
   applications. It ensures that real-time packets are delivered within
   their deadlines, that is, at the time where they need to be
   presented.  This facilitates a smooth delivery of data that is
   essential for time- critical applications, but can typically not be
   provided by best- effort IP communication.
                      DATA PATH                         CONTROL PATH
                      =========                         ============
       Upper     +------------------+                     +---------+
       Layer     | Application data |                     | Control |
                 +------------------+                     +---------+
                          |                                    |
                          |                                    V
                          |                     +-------------------+
       SCMP               |                     |   SCMP  |         |
                          |                     +-------------------+
                          |                             |
                          V                             V
            +-----------------------+      +------------------------+
       ST   | ST |                  |      | ST |         |         |
            +-----------------------+      +------------------------+
            D-bit=1                       D-bit=0
                   Figure 1: ST2 Data and Control Path
   Just like IP, ST2 actually consists of two protocols: ST for the data
   transport and SCMP, the Stream Control Message Protocol, for all
   control functions. ST is simple and contains only a single PDU format
   that is designed for fast and efficient data forwarding in order to
   achieve low communication delays. SCMP, however, is more complex than
   IP's ICMP. As with ICMP and IP, SCMP packets are transferred within
   ST packets as shown in Figure 1.
    +--------------------+
    | Conference Control |
    +--------------------+
   +-------+ +-------+ |
   | Video | | Voice | | +-----+ +------+ +-----+     +-----+ Application
   | Appl  | | Appl  | | | SNMP| |Telnet| | FTP | ... |     |    Layer
   +-------+ +-------+ | +-----+ +------+ +-----+     +-----+
       |        |      |     |        |     |            |
       V        V      |     |        |     |            |   ------------
    +-----+  +-----+   |     |        |     |            |
    | PVP |  | NVP |   |     |        |     |            |
    +-----+  +-----+   +     |        |     |            |
     |   \      | \     \    |        |     |            |
     |    +-----|--+-----+   |        |     |            |
     |     Appl.|control  V  V        V     V            V
     | ST  data |         +-----+    +-------+        +-----+
     | & control|         | UDP |    |  TCP  |    ... | RTP | Transport
     |          |         +-----+    +-------+        +-----+   Layer
     |         /|          / | \       / / |          / /|
     |\       / |  +------+--|--\-----+-/--|--- ... -+ / |
     | \     /  |  |         |   \     /   |          /  |
     |  \   /   |  |         |    \   +----|--- ... -+   |   -----------
     |   \ /    |  |         |     \ /     |             |
     |    V     |  |         |      V      |             |
     | +------+ |  |         |   +------+  |   +------+  |
     | | SCMP | |  |         |   | ICMP |  |   | IGMP |  |    Internet
     | +------+ |  |         |   +------+  |   +------+  |     Layer
     |    |     |  |         |      |      |      |      |
     V    V     V  V         V      V      V      V      V
   +-----------------+  +-----------------------------------+
   | STream protocol |->|      Internet     Protocol        |
   +-----------------+  +-----------------------------------+
                  | \   / |
                  |  \ /  |
                  |   X   |                                  ------------
                  |  / \  |
                  | /   \ |
                  VV     VV
   +----------------+   +----------------+
   | (Sub-) Network |...| (Sub-) Network |                  (Sub-)Network
   |    Protocol    |   |    Protocol    |                     Layer
   +----------------+   +----------------+
                   Figure 2.  Protocol Relationships
1.2  ST2 and IP
   ST2 is designed to coexist with IP on each node. A typical
   distributed multimedia application would use both protocols: IP for
   the transfer of traditional data and control information, and ST2 for
   the transfer of real-time data. Whereas IP typically will be accessed
   from TCP or UDP, ST2 will be accessed via new end-to-end real-time
   protocols. The position of ST2 with respect to the other protocols of
   the Internet family is represented in Figure 2.
   Both ST2 and IP apply the same addressing schemes to identify
   different hosts. ST2 and IP packets differ in the first four bits,
   which contain the internetwork protocol version number: number 5 is
   reserved for ST2 (IP itself has version number 4). As a network layer
   protocol, like IP, ST2 operates independently of its underlying
   subnets. Existing implementations use ARP for address resolution, and
   use the same Layer 2 SAPs as IP.
   As a special function, ST2 messages can be encapsulated in IP
   packets.  This is represented in Figure 2 as a link between ST2 and
   IP. This link allows ST2 messages to pass through routers which do
   not run ST2.  Resource management is typically not available for
   these IP route segments. IP encapsulation is, therefore, suggested
   only for portions of the network which do not constitute a system
   bottleneck.
   In Figure 2, the RTP protocol is shown as an example of transport
   layer on top of ST2. Others include the Packet Video Protocol (PVP)
   [Cole81], the Network Voice Protocol (NVP) [Cohe81], and others such
   as the Heidelberg Transport Protocol (HeiTP) [DHHS92].
1.3  Protocol History
   The first version of ST was published in the late 1970's and was used
   throughout the 1980's for experimental transmission of voice, video,
   and distributed simulation. The experience gained in these
   applications led to the development of the revised protocol version
   ST2. The revision extends the original protocol to make it more
   complete and more applicable to emerging multimedia environments. The
   specification of this protocol version is contained in Internet RFC
   1190 which was published in October 1990 [RFC1190].
   With more and more developments of commercial distributed multimedia
   applications underway and with a growing dissatisfaction at the
   transmission quality for audio and video over IP in the MBONE,
   interest in ST2 has grown over the last years. Companies have
   products available incorporating the protocol. The BERKOM MMTS
   project of the German PTT [DeAl92] uses ST2 as its core protocol for
   the provision of multimedia teleservices such as conferencing and
   mailing. In addition, implementations of ST2 for Digital Equipment,
   IBM, NeXT, Macintosh, PC, Silicon Graphics, and Sun platforms are
   available.
   In 1993, the IETF started a new working group on ST2 as part of
   ongoing efforts to develop protocols that address resource
   reservation issues.  The group's mission was to clean up the existing
   protocol specification to ensure better interoperability between the
   existing and emerging implementations. It was also the goal to
   produce an updated experimental protocol specification that reflected
   the experiences gained with the existing ST2 implementations and
   applications. Which led to the specification of the ST2+ protocol
   contained in this document.
1.3.1  RFC1190 ST and ST2+ Major Differences
   The protocol changes from RFC1190 were motivated by protocol
   simplification and clarification, and codification of extensions in
   existing implementations. This section provides a list of major
   differences, and is probably of interest only to those who have
   knowledge of RFC1190. The major differences between the versions are:
o   Elimination of "Hop IDentifiers" or HIDs. HIDs added much complexity
    to the protocol and was found to be a major impediment to
    interoperability. HIDs have been replaced by globally unique
    identifiers called "Stream IDentifiers" or SIDs.
o   Elimination of a number of stream options. A number of options were
    found to not be used by any implementation, or were thought to add
    more complexity than value. These options were removed. Removed
    options include: point-to-point, full-duplex, reverse charge, and
    source route.
o   Elimination of the concept of "subset" implementations. RFC1190
    permitted subset implementations, to allow for easy implementation
    and experimentation. This led to interoperability problems. Agents
    implementing the protocol specified in this document, MUST implement
    the full protocol. A number of the protocol functions are best-
    effort. It is expected that some implementations will make more
    effort than others in satisfying particular protocol requests.
o   Clarification of the capability of targets to request to join a
    steam. RFC1190 can be interpreted to support target requests, but
    most implementors did not understand this and did not add support
    for this capability. The lack of this capability was found to be a
    significant limitation in the ability to scale the number of
    participants in a single ST stream. This clarification is based on
    work done by IBM Heidelberg.
o   Separation of functions between ST and supporting modules. An effort
    was made to improve the separation of functions provided by ST and
    those provided by other modules. This is reflected in reorganization
    of some text and some PDU formats. ST was also made FlowSpec
    independent, although it does define a FlowSpec for testing and
    interoperability purposes.
o   General reorganization and re-write of the specification. This
    document has been organized with the goal of improved readability
    and clarity. Some sections have been added, and an effort was made
    to improve the introduction of concepts.
1.4  Supporting Modules for ST2
   ST2 is one piece of a larger mosaic. This section presents the
   overall communication architecture and clarifies the role of ST2 with
   respect to its supporting modules.
   ST2 proposes a two-step communication model. In the first step, the
   real-time channels for the subsequent data transfer are built. This
   is called stream setup. It includes selecting the routes to the
   destinations and reserving the correspondent resources. In the second
   step, the data is transmitted over the previously established
   streams.  This is called data transfer. While stream setup does not
   have to be completed in real-time, data transfer has stringent real-
   time requirements. The architecture used to describe the ST2
   communication model includes:
o   a data transfer protocol for the transmission of real-time data
    over the established streams,
o   a setup protocol to establish real-time streams based on the flow
    specification,
o   a flow specification to express user real-time requirements,
o   a routing function to select routes in the Internet,
o   a local resource manager to appropriately handle resources involved
    in the communication.
   This document defines a data protocol (ST), a setup protocol (SCMP),
   and a flow specification (ST2+ FlowSpec). It does not define a
   routing function and a local resource manager. However, ST2 assumes
   their existence.
   Alternative architectures are possible, see [RFC1633] for an example
   alternative architecture that could be used when implementing ST2.
1.4.1  Data Transfer Protocol
   The data transfer protocol defines the format of the data packets
   belonging to the stream. Data packets are delivered to the targets
   along the stream paths previously established by the setup protocol.
   Data packets are delivered with the quality of service associated
   with the stream.
   Data packets contain a globally unique stream identifier that
   indicates which stream they belong to. The stream identifier is also
   known by the setup protocol, which uses it during stream
   establishment. The data transfer protocol for ST2, known simply as
   ST, is completely defined by this document.
1.4.2  Setup Protocol
   The setup protocol is responsible for establishing, maintaining, and
   releasing real-time streams. It relies on the routing function to
   select the paths from the source to the destinations. At each
   host/router on these paths, it presents the flow specification
   associated with the stream to the local resource manager. This causes
   the resource managers to reserve appropriate resources for the
   stream.  The setup protocol for ST2 is called Stream Control Message
   Protocol, or SCMP, and is completely defined by this document.
1.4.3  Flow Specification
   The flow specification is a data structure including the ST2
   applications' QoS requirements. At each host/router, it is used by
   the local resource manager to appropriately handle resources so that
   such requirements are met. Distributing the flow specification to all
   resource managers along the communication paths is the task of the
   setup protocol. However, the contents of the flow specification are
   transparent to the setup protocol, which simply carries the flow
   specification. Any operations on the flow specification, including
   updating internal fields and comparing flow specifications are
   performed by the resource managers.
   This document defines a specific flow specification format that
   allows for interoperability among ST2 implementations. This flow
   specification is intended to support a flow with a single
   transmission rate for all destinations in the stream. Implementations
   may support more than one flow specification format and the means are
   provided to add new formats as they are defined in the future.
   However, the flow specification format has to be consistent
   throughout the stream, i.e., it is not possible to use different flow
   specification formats for different parts of the same stream.
1.4.4  Routing Function
   The routing function is an external unicast route generation
   capability. It provides the setup protocol with the path to reach
   each of the desired destinations. The routing function is called on a
   hop-by-hop basis and provides next-hop information. Once a route is
   selected by the routing function, it persists for the whole stream
   lifetime. The routing function may try to optimize based on the
   number of targets, the requested resources, or use of local network
   multicast or bandwidth capabilities. Alternatively, the routing
   function may even be based on simple connectivity information.
   The setup protocol is not necessarily aware of the criteria used by
   the routing function to select routes. It works with any routing
   function algorithm. The algorithm adopted is a local matter at each
   host/router and different hosts/routers may use different algorithms.
   The interface between setup protocol and routing function is also a
   local matter and therefore it is not specified by this document.
   This version of ST does not support source routing. It does support
   route recording. It does include provisions that allow identification
   of ST capable neighbors. Identification of remote ST hosts/routers is
   not specifically addressed.
1.4.5  Local Resource Manager
   At each host/router traversed by a stream, the Local Resource Manager
   (LRM) is responsible for handling local resources. The LRM knows
   which resources are on the system and what capacity they can provide.
   Resources include:
o   CPUs on end systems and routers to execute the application and
    protocol software,
o   main memory space for this software (as in all real-time systems,
    code should be pinned in main memory, as swapping it out would have
    detrimental effects on system performance),
o   buffer space to store the data, e.g., communication packets, passing
    through the nodes,
o   network adapters, and
o   transmission networks between the nodes. Networks may be as simple
    as point-to-point links or as complex as switched networks such as
    Frame Relay and ATM networks.
   During stream setup and modification, the LRM is presented by the
   setup protocol with the flow specification associated to the stream.
   For each resource it handles, the LRM is expected to perform the
   following functions:
o   Stream Admission Control: it checks whether, given the flow
    specification, there are sufficient resources left to handle the new
    data stream. If the available resources are insufficient, the new
    data stream must be rejected.
o   QoS Computation: it calculates the best possible performance the
    resource can provide for the new data stream under the current
    traffic conditions, e.g., throughput and delay values are computed.
o   Resource Reservation: it reserves the resource capacities required
    to meet the desired QoS.
   During data transfer, the LRM is responsible for:
o   QoS Enforcement: it enforces the QoS requirements by appropriate
    scheduling of resource access. For example, data packets from an
    application with a short guaranteed delay must be served prior to
    data from an application with a less strict delay bound.
   The LRM may also provide the following additional functions:
o   Data Regulation: to smooth a stream's data traffic, e.g., as with the
    leaky bucket algorithm.
o   Policing: to prevent applications exceed their negotiated QoS, e.g.,
    to send data at a higher rate than indicated in the flow
    specification.
o   Stream Preemption: to free up resources for other streams with
    higher priority or importance.
   The strategies adopted by the LRMs to handle resources are resource-
   dependent and may vary at every host/router. However, it is necessary
   that all LRMs have the same understanding of the flow specification.
   The interface between setup protocol and LRM is a local matter at
   every host and therefore it is not specified by this document. An
   example of LRM is the Heidelberg Resource Administration Technique
   (HeiRAT) [VoHN93].
   It is also assumed that the LRM provides functions to compare flow
   specifications, i.e., to decide whether a flow specification requires
   a greater, equal, or smaller amount of resource capacities to be
   reserved.
1.5  ST2 Basic Concepts
   The following sections present at an introductory level some of the
   fundamental ST2 concepts including streams, data transfer, and flow
   specification.
            Hosts Connections...                :      ...and Streams
            ====================                :      ==============
        data       Origin                       :          Origin
       packets +-----------+                    :          +----+
          +----|Application|                    :          |    |
          |    |-----------|                    :          +----+
          +--->| ST Agent  |                    :           |  |
               +-----------+                    :           |  |
                     |                          :           |  |
                     V                          :           |  |
              +-------------+                   :           |  |
              |             |                   :           |  |
+-------------|  Network A  |                   :   +-------+  +--+
|             |             |                   :   |             |
|             +-------------+                   :   |     Target 2|
|                    |     Target 2             :   |     & Router|
|     Target 1       |    and Router            :   |             |
|  +-----------+     |  +-----------+           :   V             V
|  |Application|<-+  |  |Application|<-+        : +----+        +----+
|  |-----------|  |  |  |-----------|  |        : |    |        |    |
+->| ST Agent  |--+  +->| ST Agent  |--+        : +----+        +----+
   +-----------+        +-----------+           :Target 1         |  |
                              |                 :                 |  |
                              V                 :                 |  |
                    +-------------+             :                 |  |
                    |             |             :                 |  |
      +-------------|  Network B  |             :           +-----+  |
      |             |             |             :           |        |
      |             +-------------+             :           |        |
      |    Target 3        |    Target 4        :           |        |
      |  +-----------+     |  +-----------+     :           V        V
      |  |Application|<-+  |  |Application|<-+  :         +----+ +----+
      |  |-----------|  |  |  |-----------|  |  :         |    | |    |
      +->| ST Agent  |--+  +->| ST Agent  |--+  :         +----+ +----+
         +-----------+        +-----------+     :      Target 3 Target 4
                                                :
                         Figure 3: The Stream Concept
1.5.1  Streams
   Streams form the core concepts of ST2. They are established between a
   sending origin and one or more receiving targets in the form of a
   routing tree. Streams are uni-directional from the origin to the
   targets. Nodes in the tree represent so-called ST agents, entities
   executing the ST2 protocol; links in the tree are called hops. Any
   node in the middle of the tree is called an intermediate agent, or
   router. An agent may have any combination of origin, target, or
   intermediate capabilities.
   Figure 3 illustrates a stream from an origin to four targets, where
   the ST agent on Target 2 also functions as an intermediate agent. Let
   us use this Target 2/Router node to explain some basic ST2
   terminology: the direction of the stream from this node to Target 3
   and 4 is called downstream, the direction towards the Origin node
   upstream. ST agents that are one hop away from a given node are
   called previous-hops in the upstream, and next-hops in the downstream
   direction.
   Streams are maintained using SCMP messages. Typical SCMP messages are
   CONNECT and ACCEPT to build a stream, DISCONNECT and REFUSE to close
   a stream, CHANGE to modify the quality of service associated with a
   stream, and JOIN to request to be added to a stream.
   Each ST agent maintains state information describing the streams
   flowing through it. It can actively gather and distribute such
   information. It can recognize failed neighbor ST agents through the
   use of periodic HELLO message exchanges. It can ask other ST agents
   about a particular stream via a STATUS message. These ST agents then
   send back a STATUS-RESPONSE message. NOTIFY messages can be used to
   inform other ST agents of significant events.
   ST2 offers a wealth of functionalities for stream management. Streams
   can be grouped together to minimize allocated resources or to process
   them in the same way in case of failures. During audio conferences,
   for example, only a limited set of participants may talk at once.
   Using the group mechanism, resources for only a portion of the audio
   streams of the group need to be reserved. Using the same concept, an
   entire group of related audio and video streams can be dropped if one
   of them is preempted.
1.5.2  Data Transmission
   Data transfer in ST2 is simplex in the downstream direction. Data
   transport through streams is very simple. ST2 puts only a small
   header in front of the user data. The header contains a protocol
   identification that distinguishes ST2 from IP packets, an ST2 version
   number, a priority field (specifying a relative importance of streams
   in cases of conflict), a length counter, a stream identification, and
   a checksum. These elements form a 12-byte header.
   Efficiency is also achieved by avoiding fragmentation and reassembly
   on all agents. Stream establishment yields a maximum message size for
   data packets on a stream. This maximum message size is communicated
   to the upper layers, so that they provide data packets of suitable
   size to ST2.
   Communication with multiple next-hops can be made even more efficient
   using MAC Layer multicast when it is available. If a subnet supports
   multicast, a single multicast packet is sufficient to reach all
   next-hops connected to this subnet. This leads to a significant
   reduction of the bandwidth requirements of a stream. If multicast is
   not provided, separate packets need to be sent to each next-hop.
   As ST2 relies on reservation, it does not contain error correction
   mechanisms features for data exchange such as those found in TCP. It
   is assumed that real-time data, such as digital audio and video,
   require partially correct delivery only. In many cases, retransmitted
   packets would arrive too late to meet their real-time delivery
   requirements. Also, depending on the data encoding and the particular
   application, a small number of errors in stream data are acceptable.
   In any case, reliability can be provided by layers on top of ST2 when
   needed.
1.5.3  Flow Specification
   As part of establishing a connection, SCMP handles the negotiation of
   quality-of-service parameters for a stream. In ST2 terminology, these
   parameters form a flow specification (FlowSpec) which is associated
   with the stream. Different versions of FlowSpecs exist, see
   [RFC1190], [DHHS92] and [RFC1363], and can be distinguished by a
   version number.  Typically, they contain parameters such as average
   and maximum throughput, end-to-end delay, and delay variance of a
   stream. SCMP itself only provides the mechanism for relaying the
   quality-of-service parameters.
   Three kinds of entities participate in the quality-of-service
   negotiation: application entities on the origin and target sites as
   the service users, ST agents, and local resource managers (LRM). The
   origin application supplies the initial FlowSpec requesting a
   particular service quality. Each ST agent which obtains the FlowSpec
   as part of a connection establishment message, it presents the local
   resource manager with it. ST2 does not determine how resource
   managers make reservations and how resources are scheduled according
   to these reservations; ST2, however, assumes these mechanisms as its
   basis.
   An example of the FlowSpec negotiation procedure is illustrated in
   Figure 4. Depending on the success of its local reservations, the LRM
   updates the FlowSpec fields and returns the FlowSpec to the ST agent,
   which passes it downstream as part of the connection message.
   Eventually, the FlowSpec is communicated to the application at the
   target which may base its accept/reject decision for establishing the
   connection on it and may finally also modify the FlowSpec. If a
   target accepts the connection, the (possibly modified) FlowSpec is
   propagated back to the origin which can then calculate an overall
   service quality for all targets. The application entity at the origin
   may later request a CHANGE to adjust reservations.
                 Origin                 Router               Target 1
                +------+      1a       +------+      1b      +------+
                |      |-------------->|      |------------->|      |
                +------+               +------+              +------+
                 ^  | ^                                          |
                 |  | |                    2                     |
                 |  | +------------------------------------------+
                 +  +
 +-------------+  \  \             +-------------+       +-------------+
 |Max Delay: 12|   \  \            |Max Delay: 12|       |Max Delay: 12|
 |-------------|    \  \           |-------------|       |-------------|
 |Min Delay:  2|     \  \          |Min Delay:  5|       |Min Delay:  9|
 |-------------|      \  \         |-------------|       |-------------|
 |Max Size:4096|       +  +        |Max Size:2048|       |Max Size:2048|
 +-------------+       |  |        +-------------+       +-------------+
    FlowSpec           |  | 1
                       |  +---------------+
                       |                  |
                       |                  V
                     2 |               +------+
                       +---------------|      |
                                       +------+
                                       Target 2
                                   +-------------+
                                   |Max Delay: 12|
                                   |-------------|
                                   |Min Delay:  4|
                                   |-------------|
                                   |Max Size:4096|
                                   +-------------+
        Figure 4:  Quality-of-Service Negotiation with FlowSpecs
1.6  Outline of This Document
   This document contains the specification of the ST2+ version of the
   ST2 protocol. In the rest of the document, whenever the terms "ST" or
   "ST2" are used, they refer to the ST2+ version of ST2.
   The document is organized as follows:
o   Section 2 describes the ST2 user service from an application point
    of view.
o   Section 3 illustrates the ST2 data transfer protocol, ST.
o   Section 4 through Section 8 specify the ST2 setup protocol, SCMP.
o   the ST2 flow specification is presented in Section 9.
o   the formats of protocol elements and PDUs are defined in Section 10.
2.  ST2 User Service Description
   This section describes the ST user service from the high-level point
   of view of an application. It defines the ST stream operations and
   primitive functions. It specifies which operations on streams can be
   invoked by the applications built on top of ST and when the ST
   primitive functions can be legally executed. Note that the presented
   ST primitives do not specify an API. They are used here with the only
   purpose of illustrating the service model for ST.
2.1  Stream Operations and Primitive Functions
   An ST application at the origin may create, expand, reduce, change,
   send data to, and delete a stream. When a stream is expanded, new
   targets are added to the stream; when a stream is reduced, some of
   the current targets are dropped from it. When a stream is changed,
   the associated quality of service is modified.
   An ST application at the target may join, receive data from, and
   leave a stream. This translates into the following stream operations:
o   OPEN: create new stream [origin], CLOSE: delete stream [origin],
o   ADD: expand stream, i.e., add new targets to it [origin],
o   DROP: reduce stream, i.e., drop targets from it [origin],
o   JOIN: join a stream [target], LEAVE: leave a stream [target],
o   DATA: send data through stream [origin],
o   CHG: change a stream's QoS [origin],
   Each stream operation may require the execution of several primitive
   functions to be completed. For instance, to open a new stream, a
   request is first issued by the sender and an indication is generated
   at one or more receivers; then, the receivers may each accept or
   refuse the request and the correspondent indications are generated at
   the sender. A single receiver case is shown in Figure 5 below.
                Sender             Network             Receiver
                  |                   |                   |
     OPEN.req     |                   |                   |
                  |-----------------> |                   |
                  |                   |-----------------> |
                  |                   |                   | OPEN.ind
                  |                   |                   | OPEN.accept
                  |                   |<----------------- |
                  |<----------------- |                   |
  OPEN.accept-ind |                   |                   |
                  |                   |                   |
           Figure 5: Primitives for the OPEN Stream Operation
   Table 1 defines the ST service primitive functions associated to each
   stream operation. The column labelled "O/T" indicates whether the
   primitive is executed at the origin or at the target.
           +===================================================+
           |Primitive      | Descriptive                   |O/T|
           |===================================================|
           |OPEN.req       | open a stream                 | O |
           |OPEN.ind       | connection request indication | T |
           |OPEN.accept    | accept stream                 | T |
           |OPEN.refuse    | refuse stream                 | T |
           |OPEN.accept-ind| connection accept indication  | O |
           |OPEN.refuse-ind| connection refuse indication  | O |
           |ADD.req        | add targets to stream         | O |
           |ADD.ind        | add request indication        | T |
           |ADD.accept     | accept stream                 | T |
           |ADD.refuse     | refuse stream                 | T |
           |ADD.accept-ind | add accept indication         | O |
           |ADD.refuse-ind | add refuse indication         | O |
           |JOIN.req       | join a stream                 | T |
           |JOIN.ind       | join request indication       | O |
           |JOIN.reject    | reject a join                 | O |
           |JOIN.reject-ind| join reject indication        | T |
           |DATA.req       | send data                     | O |
           |DATA.ind       | receive data indication       | T |
           |CHG.req        | change stream QoS             | O |
           |CHG.ind        | change request indication     | T |
           |CHG.accept     | accept change                 | T |
           |CHG.refuse     | refuse change                 | T |
           |CHG.accept-ind | change accept indication      | O |
           |CHG.refuse-ind | change refuse indication      | O |
           |DROP.req       | drop targets                  | O |
           |DROP.ind       | disconnect indication         | T |
           |LEAVE.req      | leave stream                  | T |
           |LEAVE.ind      | leave stream indication       | O |
           |CLOSE.req      | close stream                  | O |
           |CLOSE.ind      | close stream indication       | T |
           +---------------------------------------------------+
                              Table 1: ST Primitives
2.2  State Diagrams
   It is not sufficient to define the set of ST stream operations. It is
   also necessary to specify when the operations can be legally
   executed.  For this reason, a set of states is now introduced and the
   transitions from one state to the others are specified. States are
   defined with respect to a single stream. The previously defined
   stream operations can be legally executed only from an appropriate
   state.
   An ST agent may, with respect to an ST stream, be in one of the
   following states:
o   IDLE: the stream has not been created yet.
o   PENDING: the stream is in the process of being established.
o   ACTIVE: the stream is established and active.
o   ADDING: the stream is established. A stream expansion is underway.
o   CHGING: the stream is established. A stream change is underway.
   Previous experience with ST has lead to limits on stream operations
   that can be executed simultaneously. These restrictions are:
   1.  A single ADD or CHG operation can be processed at one time. If
       an ADD or CHG is already underway, further requests are queued
       by the ST agent and handled only after the previous operation
       has been completed. This also applies to two subsequent
       requests of the same kind, e.g., two ADD or two CHG operations.
       The second operation is not executed until the first one has
       been completed.
   2.  Deleting a stream, leaving a stream, or dropping targets from a
       stream is possible only after stream establishment has been
       completed. A stream is considered to be established when all
       the next-hops of the origin have either accepted or refused the
       stream.  Note that stream refuse is automatically forced after
       timeout if no reply comes from a next-hop.
   3.  An ST agent forwards data only along already established paths
       to the targets, see also Section 3.1. A path is considered to
       be established when the next-hop on the path has explicitly
       accepted the stream. This implies that the target and all other
       intermediate ST agents are ready to handle the incoming data
       packets. In no cases an ST agent will forward data to a
       next-hop ST agent that has not explicitly accepted the stream.
       To be sure that all targets receive the data, an application
       should send the data only after all paths have been
       established, i.e., the stream is established.
   4.  It is allowed to send data from the CHGING and ADDING states.
       While sending data from the CHGING state, the quality of
       service to the targets affected by the change should be assumed
       to be the more restrictive quality of service. When sending
       data from the ADDING state, the targets that receive the data
       include at least all the targets that were already part of the
       stream at the time the ADD operation was invoked.
   The rules introduced above require ST agents to queue incoming
   requests when the current state does not allow to process them
   immediately. In order to preserve the semantics, ST agents have to
   maintain the order of the requests, i.e., implement FIFO queuing.
   Exceptionally, the CLOSE request at the origin and the LEAVE request
   at the target may be immediately processed: in these cases, the queue
   is deleted and it is possible that requests in the queue are not
   processed.
   The following state diagrams define the ST service. Separate diagrams
   are presented for the origin and the targets.
   The symbol (a/r)* indicates that all targets in the target list have
   explicitly accepted or refused the stream, or refuse has been forced
   after timeout. If the target list is empty, i.e., it contains no
   targets, the (a/r)* condition is immediately satisfied, so the empty
   stream is created and state ESTBL is entered.
   The separate OPEN and ADD primitives at the target are for conceptual
   purposes only. The target is actually unable to distinguish between
   an OPEN and an ADD. This is reflected in Figure 7 and Table 3 through
   the notation OPEN/ADD.
                        +------------+
                        |            |<-------------------+
            +---------->|    IDLE    |-------------+      |
            |           |            |    OPEN.req |      |
            |           +------------+             |      |
 CLOSE.req  |      CLOSE.req ^   ^ CLOSE.req       V      | CLOSE.req
            |                |   |            +---------+ |
            |                |   |            | PENDING |-|-+ JOIN.reject
            |                |   -------------|         |<|-+
            |    JOIN.reject |                +---------+ |
            |    DROP.req +----------+             |      |
            |       +-----|          |             |      |
            |       |     |  ESTDL   | OPEN.(a/r)* |      |
            |       +---->|          |<------------+      |
            |             +----------+                    |
            |              |  ^  |  ^                     |
            |              |  |  |  |                     |
       +----------+ CHG.req|  |  |  | Add.(a/r)*    +----------+
       |          |<-------+  |  |  +-------------- |          |
       |  CHGING  |           |  |                  |  ADDING  |
       |          |-----------+  +----------------->|          |
       +----------+ CHG.(a/r)*         JOIN.ind     +----------+
           |   ^                         ADD.req        |   ^
           |   |                                        |   |
           +---+                                        +---+
           DROP.req                                    DROP.req
           JOIN.reject                                 JOIN.reject
                  Figure 6: ST Service at the Origin
                 +--------+
                 |        |-----------------------+
                 |  IDLE  |                       |
                 |        |<---+                  | OPEN/ADD.ind
                 +--------+    | CLOSE.ind        | JOIN.req
                     ^         | OPEN/ADD.refuse  |
                     |         | JOIN.refect-ind  |
         CLOSE.ind   |         |                  V
         DROP.ind    |         |             +---------+
         LEAVE.req   |         +-------------|         |
                     |                       | PENDING |
                 +-------+                   |         |
                 |       |                   +---------+
                 | ESTBL |    OPEN/ADD.accept     |
                 |       |<-----------------------+
                 +-------+
                     Figure 7: ST Service at the Target
2.3  State Transition Tables
   Table 2 and Table 3 define which primitives can be processed from
   which states and the possible state transitions.
+======================================================================+
|Primitive      |IDLE|    PENDING    |  ESTBL |    CHGING  |    ADDING |
|======================================================================|
|OPEN.req       | ok | -             | -      | -          | -         |
|OPEN.accept-ind| -  |if(a,r)*->ESTBL| -      | -          | -         |
|OPEN.refuse-ind| -  |if(a,r)*->ESTBL| -      | -          | -         |
|ADD.req        | -  | queued        |->ADDING| queued     | queued    |
|ADD.accept-ind | -  | -             | -      | -          |if(a,r)*   |
|               | -  | -             | -      | -          |->ESTBL    |
|ADD.refuse-ind | -  | -             | -      | -          |if(a,r)*   |
|               | -  | -             | -      | -          |->ESTBL    |
|JOIN.ind       | -  | queued        |->ADDING| queued     |queued     |
|JOIN.reject    | -  | OK            | ok     | ok         | ok        |
|DATA.req       | -  | -             | ok     | ok         | ok        |
|CHG.req        | -  | queued        |->CHGING| queued     |queued     |
|CHG.accept-ind | -  | -             | -      |if(a,r)*    | -         |
|               | -  | -             | -      |->ESTBL     | -         |
|CHG.refuse.ind | -  | -             | -      |if(a,r)*    | -         |
|               | -  | -             | -      |->ESTBL     | -         |
|DROP.req       | -  | -             | ok     | ok         | ok        |
|LEAVE.ind      | -  | OK            | ok     | ok         | ok        |
|CLOSE.req      | -  | OK            | ok     | ok         | ok        |
+----------------------------------------------------------------------+
                Table 2: Primitives and States at the Origin
             +======================================================+
             | Primitive       |   IDLE    |  PENDING   |   ESTBL   |
             |======================================================|
             | OPEN/ADD.ind    | ->PENDING | -          | -         |
             | OPEN/ADD.accept | -         | ->ESTBL    | -         |
             | OPEN/ADD.refuse | -         | ->IDLE     | -         |
             | JOIN.req        | ->PENDING | -          | -         |
             | JOIN.reject-ind |-          | ->IDLE     | -         |
             | DATA.ind        | -         | -          | ok        |
             | CHG.ind         | -         | -          | ok        |
             | CHG.accept      | -         | -          | ok        |
             | DROP.ind        | -         | ok         | ok        |
             | LEAVE.req       | -         | ok         | ok        |
             | CLOSE.ind       | -         | ok         | ok        |
             | CHG.ind         | -         | -          | ok        |
             +------------------------------------------------------+
                Table 3: Primitives and States at the Target
3.  The ST2 Data Transfer Protocol
   This section presents the ST2 data transfer protocol, ST. First, data
   transfer is described in Section 3.1, then, the data transfer
   protocol functions are illustrated in Section 3.2.
3.1  Data Transfer with ST
   Data transmission with ST is unreliable. An application is not
   guaranteed that the data reaches its destinations and ST makes no
   attempts to recover from packet loss, e.g., due to the underlying
   network. However, if the data reaches its destination, it should do
   so according to the quality of service associated with the stream.
   Additionally, ST may deliver data corrupted in transmission. Many
   types of real-time data, such as digital audio and video, require
   partially correct delivery only. In many cases, retransmitted packets
   would arrive too late to meet their real-time delivery requirements.
   On the other hand, depending on the data encoding and the particular
   application, a small number of errors in stream data are acceptable.
   In any case, reliability can be provided by layers on top of ST2 if
   needed.
   Also, no data fragmentation is supported during the data transfer
   phase. The application is expected to segment its data PDUs according
   to the minimum MTU over all paths in the stream. The application
   receives information on the MTUs relative to the paths to the targets
   as part of the ACCEPT message, see Section 8.6. The minimum MTU over
   all paths can be calculated from the MTUs relative to the single
   paths. ST agents silently discard too long data packets, see also
   Section 5.1.1.
   An ST agent forwards the data only along already established paths to
   targets. A path is considered to be established once the next-hop ST
   agent on the path sends an ACCEPT message, see Section 2.2. This
   implies that the target and all other intermediate ST agents on the
   path to the target are ready to handle the incoming data packets. In
   no cases will an ST agent forward data to a next-hop ST agent that
   has not explicitly accepted the stream.
   To be reasonably sure that all targets receive the data with the
   desired quality of service, an application should send the data only
   after the whole stream has been established. Depending on the local
   API, an application may not be prevented from sending data before the
   completion of stream setup, but it should be aware that the data
   could be lost or not reach all intended targets. This behavior may
   actually be desirable to applications, such as those application that
   have multiple targets which can each process data as soon as it is
   available (e.g., a lecture or distributed gaming).
   It is desirable for implementations to take advantage of networks
   that support multicast. If a network does not support multicast, or
   for the case where the next-hops are on different networks, multiple
   copies of the data packet must be sent.
3.2  ST Protocol Functions
   The ST protocol provides two functions:
   o   stream identification
   o   data priority
3.2.1  Stream Identification
   ST data packets are encapsulated by an ST header containing the
   Stream IDentifier (SID). This SID is selected at the origin so that
   it is globally unique over the Internet. The SID must be known by the
   setup protocol as well. At stream establishment time, the setup
   protocol builds, at each agent traversed by the stream, an entry into
   its local database containing stream information. The SID can be used
   as a reference into this database, to obtain quickly the necessary
   replication and forwarding information.
   Stream IDentifiers are intended to be used to make the packet
   forwarding task most efficient. The time-critical operation is an
   intermediate ST agent receiving a packet from the previous-hop ST
   agent and forwarding it to the next-hop ST agents.
   The format of data PDUs including the SID is defined in Section 10.1.
   Stream IDentifier generation is discussed in Section 8.1.
3.2.2  Packet Discarding based on Data Priority
   ST provides a well defined quality of service to its applications.
   However, there may be cases where the network is temporarily
   congested and the ST agents have to discard certain packets to
   minimize the overall impact to other streams. The ST protocol
   provides a mechanism to discard data packets based on the Priority
   field in the data PDU, see Section 10.1. The application assigns each
   data packet with a discard-priority level, carried into the Priority
   field. ST agents will attempt to discard lower priority packets first
   during periods of network congestion. Applications may choose to send
   data at multiple priority levels so that less important data may be
   discarded first.
4.  SCMP Functional Description
   ST agents create and manage streams using the ST Control Message
   Protocol (SCMP). Conceptually, SCMP resides immediately above ST (as
   does ICMP above IP). SCMP follows a request-response model. SCMP
   messages are made reliable through the use of retransmission after
   timeout.
   This section contains a functional description of stream management
   with SCMP. To help clarify the SCMP exchanges used to setup and
   maintain ST streams, we include an example of a simple network
   topology, represented in Figure 8. Using the SCMP messages described
   in this section it will be possible for an ST application to:
   o   Create a stream from A to the peers at B, C and D,
   o   Add a peer at E,
   o   Drop peers B and C, and
   o   Let F join the stream
   o   Delete the stream.
                                               +---------+    +---+
                                               |         |----| B |
               +---------+      +----------+   |         |    +---+
               |         |------| Router 1 |---| Subnet2 |
               |         |      +----------+   |         |
               |         |                     |         |
               |         |                     +---------+
               |         |                         |
               | Subnet1 |                         |
               |         |                     +----------+
               |         |                     | Router 3 |
       +---+   |         |                     +----------+
       | A |---|         |    +----------+           |
       +---+   |         |----| Router 2 |           |
               |         |    +----------+           |
               +---------+         |                 |
                                   |                 |
                                   |          +----------+    +---+
                                   +----------|          |----| C |
                                              |          |    +---+
                         +---------+          |  Subnet3 |
                 +---+   |         |   +---+  |          |    +---+
                 | F |---| Subnet4 |---| E |--|          |----| D |
                 +---+   |         |   +---+  +----------+    +---+
                         +---------+
                Figure 8:  Sample Topology for an ST Stream
   We first describe the possible types of stream in Section 4.1;
   Section 4.2 introduces SCMP control message types; SCMP reliability
   is discussed in Section 4.3; stream options are covered in Section
   4.4; stream setup is presented in Section 4.5; Section 4.6
   illustrates stream modification including stream expansion,
   reduction, changes of the quality of service associated to a stream.
   Finally, stream deletion is handled in Section 4.7.
4.1  Types of Streams
   SCMP allows for the setup and management of different types of
   streams. Streams differ in the way they are built and the information
   maintained on connected targets.
4.1.1  Stream Building
   Streams may be built in a sender-oriented fashion, receiver-oriented
   fashion, or with a mixed approach:
o   in the sender-oriented fashion, the application at the origin
    provides the ST agent with the list of receivers for the stream. New
    targets, if any, are also added from the origin.
o   in the receiver-oriented approach, the application at the origin
    creates an empty stream that contains no targets. Each target then
    joins the stream autonomously.
o   in the mixed approach, the application at the origin creates a
    stream that contains some targets and other targets join the stream
    autonomously.
   ST2 provides stream options to support sender-oriented and mixed
   approach steams. Receiver-oriented streams can be emulated through
   the use of mixed streams. The fashion by which targets may be added
   to a particular stream is controlled via join authorization levels.
   Join authorization levels are described in Section 4.4.2.
4.1.2  Knowledge of Receivers
   When streams are built in the sender-oriented fashion, all ST agents
   will have full information on all targets down stream of a particular
   agent. In this case, target information is relayed down stream from
   agent-to-agent during stream set-up.
   When targets add themselves to mixed approach streams, upstream ST
   agents may or may not be informed. Propagation of information on
   targets that "join" a stream is also controlled via join
   authorization levels. As previously mentioned, join authorization
   levels are described in Section 4.4.2.
   This leads to two types of streams:
o   full target information is propagated in a full-state stream. For
    such streams, all agents are aware of all downstream targets
    connected to the stream. This results in target information being
    maintained at the origin and at intermediate agents. Operations on
    single targets are always possible, i.e., change a certain target,
    or, drop that target from the stream. It is also always possible for
    any ST agent to attempt recovery of all downstream targets.
o   in light-weight streams, it is possible that the origin and other
    upstream agents have no knowledge about some targets. This results
    in less maintained state and easier stream management, but it limits
    operations on specific targets. Special actions may be required to
    support change and drop operations on unknown targets, see Section
    5.7. Also, stream recovery may not be possible. Of course, generic
    functions such as deleting the whole stream, are still possible. It
    is expected that applications that will have a large number of
    targets will use light-weight streams in order to limit state in
    agents and the number of targets per control message.
   Full-state streams serve well applications as video conferencing or
   distributed gaming, where it is important to have knowledge on the
   connected receivers, e.g., to limit who participates. Light-weight
   streams may be exploited by applications such as remote lecturing or
   playback applications of radio and TV broadcast where the receivers
   do not need to be known by the sender. Section 4.4.2 defines join
   authorization levels, which support two types of full-state streams
   and one type of light-weight stream.
4.2  Control PDUs
   SCMP defines the following PDUs (the main purpose of each PDU is also
   indicated):
1.      ACCEPT          to accept a new stream
2.      ACK             to acknowledge an incoming message
3.      CHANGE          to change the quality of service associated with
                                a stream
4.      CONNECT         to establish a new stream or add new targets to
                                an existing stream
5.      DISCONNECT      to remove some or all of the stream's targets
6.      ERROR           to indicate an error contained in an incoming
                                message
7.      HELLO           to detect failures of neighbor ST agents
8.      JOIN            to request stream joining from a target
9.      JOIN-REJECT     to reject a stream joining request from a target
10.     NOTIFY          to inform an ST agent of a significant event
11.     REFUSE          to refuse the establishment of a new stream
12.     STATUS          to query an ST agent on a specific stream
13.     STATUS-RESPONSE to reply queries on a specific stream
   SCMP follows a request-response model with all requests expecting
   responses. Retransmission after timeout is used to allow for lost or
   ignored messages. Control messages do not extend across packet
   boundaries; if a control message is too large for the MTU of a hop,
   its information is partitioned and a control message per partition is
   sent, as described in Section 5.1.2.
   CONNECT and CHANGE request messages are answered with ACCEPT messages
   which indicate success, and with REFUSE messages which indicate
   failure. JOIN messages are answered with either a CONNECT message
   indicating success, or with a JOIN-REJECT message indicating failure.
   Targets may be removed from a stream by either the origin or the
   target via the DISCONNECT and REFUSE messages.
   The ACCEPT, CHANGE, CONNECT, DISCONNECT, JOIN, JOIN-REJECT, NOTIFY
   and REFUSE messages must always be explicitly acknowledged:
o   with an ACK message, if the message was received correctly and it
    was possible to parse and correctly extract and interpret its
    header, fields and parameters,
o   with an ERROR message, if a syntax error was detected in the header,
    fields, or parameters included in the message. The errored PDU may
    be optionally returned as part of the ERROR message. An ERROR
    message indicates a syntax error only. If any other errors are
    detected, it is necessary to first acknowledge with ACK and then
    take appropriate actions. For instance, suppose a CHANGE message
    contains an unknown SID: first, an ACK message has to be sent, then
    a REFUSE message with ReasonCode (SIDUnknown) follows.
   If no ACK or ERROR message are received before the correspondent
   timer expires, a timeout failure occurs. The way an ST agent should
   handle timeout failures is described in Section 5.2.
   ACK, ERROR, and STATUS-RESPONSE messages are never acknowledged.
   HELLO messages are a special case. If they contain a syntax error, an
   ERROR message should be generated in response. Otherwise, no
   acknowledgment or response should be generated. Use of HELLO messages
   is discussed in Section 6.1.2.
   STATUS messages containing a syntax error should be answered with an
   ERROR message. Otherwise, a STATUS-RESPONSE message should be sent
   back in response. Use of STATUS and STATUS-RESPONSE are discussed in
   Section 8.4.
4.3  SCMP Reliability
   SCMP is made reliable through the use of retransmission when a
   response is not received in a timely manner. The ACCEPT, CHANGE,
   CONNECT, DISCONNECT, JOIN, JOIN-REJECT, NOTIFY, and REFUSE messages
   all must be answered with an ACK message, see Section 4.2. In
   general, when sending a SCMP message which requires an ACK response,
   the sending ST agent needs to set the Toxxxx timer (where xxxx is the
   SCMP message type, e.g., ToConnect). If it does not receive an ACK
   before the Toxxxx timer expires, the ST agent should retransmit the
   SCMP message. If no ACK has been received within Nxxxx
   retransmissions, then a SCMP timeout condition occurs and the ST
   agent enters its SCMP timeout recovery state. The actions performed
   by the ST agent as the result of the SCMP timeout condition differ
   for different SCMP messages and are described in Section 5.2.
   For some SCMP messages (CONNECT, CHANGE, JOIN, and STATUS) the
   sending ST agent also expects a response back (ACCEPT/REFUSE,
   CONNECT/JOIN- REJECT) after ACK has been received. For these cases,
   the ST agent needs to set the ToxxxxResp timer after it receives the
   ACK. (As before, xxxx is the initiating SCMP message type, e.g.,
   ToConnectResp).  If it does not receive the appropriate response back
   when ToxxxxResp expires, the ST agent updates its state and performs
   appropriate recovery action as described in Section 5.2. Suggested
   constants are given in Section 10.5.4.
   The timeout and retransmission algorithm is implementation dependent
   and it is outside the scope of this document. Most existing
   algorithms are based on an estimation of the Round Trip Time (RTT)
   between two agents. Therefore, SCMP contains a mechanism, see Section
   8.5, to estimate this RTT. Note that the timeout related variable
   names described above are for reference purposes only, implementors
   may choose to combine certain variables.
4.4  Stream Options
   An application may select among some stream options. The desired
   options are indicated to the ST agent at the origin when a new stream
   is created. Options apply to single streams and are valid during the
   whole stream's lifetime. The options chosen by the application at the
   origin are included into the initial CONNECT message, see Section
   4.5.3. When a CONNECT message reaches a target, the application at
   the target is notified of the stream options that have been selected,
   see Section 4.5.5.
4.4.1  No Recovery
   When a stream failure is detected, an ST agent would normally attempt
   stream recovery, as described in Section 6.2. The NoRecovery option
   is used to indicate that ST agents should not attempt recovery for
   the stream. The protocol behavior in the case that the NoRecovery
   option has been selected is illustrated in Section 6.2. The
   NoRecovery option is specified by setting the S-bit in the CONNECT
   message, see Section 10.4.4. The S-bit can be set only by the origin
   and it is never modified by intermediate and target ST agents.
4.4.2  Join Authorization Level
   When a new stream is created, it is necessary to define the join
   authorization level associated with the stream. This level determines
   the protocol behavior in case of stream joining, see Section 4.1 and
   Section 4.6.3. The join authorization level for a stream is defined
   by the J-bit and N-bit in the CONNECT message header, see Section
   10.4.4.  One of the following authorization levels has to be
   selected:
   o   Level 0 - Refuse Join (JN = 00): No targets are allowed to join this
       stream.
   o   Level 1 - OK, Notify Origin (JN = 01): Targets are allowed to join
       the stream. The origin is notified that the target has joined.
   o   Level 2 - OK (JN = 10): Targets are allowed to join the stream. No
       notification is sent to the stream origin.
   Some applications may choose to maintain tight control on their
   streams and will not permit any connections without the origin's
   permission. For such streams, target applications may request to be
   added by sending an out-of-band, i.e., via regular IP, request to the
   origin. The origin, if it so chooses, can then add the target
   following the process described in Section 4.6.1.
   The selected authorization level impacts stream handling and the
   state that is maintained for the stream, as described in Section 4.1.
4.4.3  Record Route
   The RecordRoute option can be used to request the route between the
   origin and a target be recorded and delivered to the application.
   This option may be used while connecting, accepting, changing, or
   refusing a stream. The results of a RecordRoute option requested by
   the origin, i.e., as part of the CONNECT or CHANGE messages, are
   delivered to the target. The results of a RecordRoute option
   requested by the target, i.e., as part of the ACCEPT or REFUSE
   messages, are delivered to the origin.
   The RecordRoute option is specified by adding the RecordRoute
   parameter to the mentioned SCMP messages. The format of the
   RecordRoute parameter is shown in Section 10.3.5. When adding this
   parameter, the ST agent at the origin must determine the number of
   entries that may be recorded as explained in Section 10.3.5.
4.4.4  User Data
   The UserData option can be used by applications to transport
   application specific data along with some SCMP control messages. This
   option can be included with ACCEPT, CHANGE, CONNECT, DISCONNECT, and
   REFUSE messages. The format of the UserData parameter is shown in
   Section 10.3.7. This option may be included by the origin, or the
   target, by adding the UserData parameter to the mentioned SCMP
   messages. This option may only be included once per SCMP message.
4.5  Stream Setup
   This section presents a description of stream setup. For simplicity,
   we assume that everything succeeds, e.g., any required resources are
   available, messages are properly delivered, and the routing is
   correct. Possible failures in the setup phase are handled in Section
   5.2.
4.5.1  Information from the Application
   Before stream setup can be started, the application has to collect
   the necessary information to determine the characteristics for the
   connection. This includes identifying the participants and selecting
   the QoS parameters of the data flow. Information passed to the ST
   agent by the application includes:
o   the list of the stream's targets (Section 10.3.6). The list may be
    empty (Section 4.5.3.1),
o   the flow specification containing the desired quality of service for
    the stream (Section 9),
o   information on the groups in which the stream is a member, if any
    (Section 7),
o   information on the options selected for the stream (Section 4.4).
4.5.2  Initial Setup at the Origin
   The ST agent at the origin then performs the following operations:
o   allocates a stream ID (SID) for the stream (Section 8.1),
o   invokes the routing function to determine the set of next-hops for
    the stream (Section 4.5.2.1),
o   invokes the Local Resource Manager (LRM) to reserve resources
    (Section 4.5.2.2),
o   creates local database entries to store information on the new
    stream,
o   propagates the stream creation request to the next-hops determined
    by the routing function (Section 4.5.3).
4.5.2.1  Invoking the Routing Function
   An ST agent that is setting up a stream invokes the routing function
   to find the next-hop to reach each of the targets specified by the
   target list provided by the application. This is similar to the
   routing decision in IP. However, in this case the route is to a
   multitude of targets with QoS requirements rather than to a single
   destination.
   The result of the routing function is a set of next-hop ST agents.
   The set of next-hops selected by the routing function is not
   necessarily the same as the set of next-hops that IP would select
   given a number of independent IP datagrams to the same destinations.
   The routing algorithm may attempt to optimize parameters other than
   the number of hops that the packets will take, such as delay, local
   network bandwidth consumption, or total internet bandwidth
   consumption.  Alternatively, the routing algorithm may use a simple
   route lookup for each target.
   Once a next-hop is selected by the routing function, it persists for
   the whole stream lifetime, unless a network failure occurs.
4.5.2.2  Reserving Resources
   The ST agent invokes the Local Resource Manager (LRM) to perform the
   appropriate reservations. The ST agent presents the LRM with
   information including:
o   the flow specification with the desired quality of service for the
    stream (Section 9),
o   the version number associated with the flow specification
    (Section 9).
o   information on the groups the stream is member in, if any
    (Section 7),
   The flow specification contains information needed by the LRM to
   allocate resources. The LRM updates the flow specification contents
   information before returning it to the ST agent. Section 9.2.3
   defines the fields of the flow specification to be updated by the
   LRM.
   The membership of a stream in a group may affect the amount of
   resources that have to be allocated by the LRM, see Section 7.
4.5.3  Sending CONNECT Messages
   The ST agent sends a CONNECT message to each of the next-hop ST
   agents identified by the routing function. Each CONNECT message
   contains the SID, the selected stream options, the FlowSpec, and a
   TargetList. The format of the CONNECT message is defined by Section
   10.4.4. In general, the FlowSpec and TargetList depend on both the
   next-hop and the intervening network. Each TargetList is a subset of
   the original TargetList, identifying the targets that are to be
   reached through the next-hop to which the CONNECT message is being
   sent.
   The TargetList may be empty, see Section 4.5.3.1; if the TargetList
   causes a too long CONNECT message to be generated, the CONNECT
   message is partitioned as explained in Section 5.1.2. If multiple
   next-hops are to be reached through a network that supports network
   level multicast, a different CONNECT message must nevertheless be
   sent to each next-hop since each will have a different TargetList.
4.5.3.1  Empty Target List
   An application at the origin may request the local ST agent to create
   an empty stream. It does so by passing an empty TargetList to the
   local ST agent during the initial stream setup. When the local ST
   agent receives a request to create an empty stream, it allocates the
   stream ID (SID), updates its local database entries to store
   information on the new stream and notifies the application that
   stream setup is complete. The local ST agent does not generate any
   CONNECT message for streams with an empty TargetList. Targets may be
   later added by the origin, see Section 4.6.1, or they may
   autonomously join the stream, see Section 4.6.3.
4.5.4  CONNECT Processing by an Intermediate ST agent
   An ST agent receiving a CONNECT message, assuming no errors, responds
   to the previous-hop with an ACK. The ACK message must identify the
   CONNECT message to which it corresponds by including the reference
   number indicated by the Reference field of the CONNECT message. The
   intermediate ST agent calls the routing function, invokes the LRM to
   reserve resources, and then propagates the CONNECT messages to its
   next-hops, as described in the previous sections.
4.5.5  CONNECT Processing at the Targets
   An ST agent that is the target of a CONNECT message, assuming no
   errors, responds to the previous-hop with an ACK. The ST agent
   invokes the LRM to reserve local resources and then queries the
   specified application process whether or not it is willing to accept
   the connection.
   The application is presented with parameters from the CONNECT message
   including the SID, the selected stream options, Origin, FlowSpec,
   TargetList, and Group, if any, to be used as a basis for its
   decision.  The application is identified by a combination of the
   NextPcol field, from the Origin parameter, and the service access
   point, or SAP, field included in the correspondent (usually single
   remaining) Target of the TargetList. The contents of the SAP field
   may specify the port or other local identifier for use by the
   protocol layer above the host ST layer. Subsequently received data
   packets will carry the SID, that can be mapped into this information
   and be used for their delivery.
   Finally, based on the application's decision, the ST agent sends to
   the previous-hop from which the CONNECT message was received either
   an ACCEPT or REFUSE message. Since the ACCEPT (or REFUSE) message has
   to be acknowledged by the previous-hop, it is assigned a new
   Reference number that will be returned in the ACK. The CONNECT
   message to which ACCEPT (or REFUSE) is a reply is identified by
   placing the CONNECT's Reference number in the LnkReference field of
   ACCEPT (or REFUSE). The ACCEPT message contains the FlowSpec as
   accepted by the application at the target.
4.5.6  ACCEPT Processing by an Intermediate ST agent
   When an intermediate ST agent receives an ACCEPT, it first verifies
   that the message is a response to an earlier CONNECT. If not, it
   responds to the next-hop ST agent with an ERROR message, with
   ReasonCode (LnkRefUnknown). Otherwise, it responds to the next-hop ST
   agent with an ACK, and propagates the individual ACCEPT message to
   the previous-hop along the same path traced by the CONNECT but in the
   reverse direction toward the origin.
   The FlowSpec is included in the ACCEPT message so that the origin and
   intermediate ST agents can gain access to the information that was
   accumulated as the CONNECT traversed the internet. Note that the
   resources, as specified in the FlowSpec in the ACCEPT message, may
   differ from the resources that were reserved when the CONNECT was
   originally processed. Therefore, the ST agent presents the LRM with
   the FlowSpec included in the ACCEPT message. It is expected that each
   LRM adjusts local reservations releasing any excess resources. The
   LRM may choose not to adjust local reservations when that adjustment
   may result in the loss of needed resources. It may also choose to
   wait to adjust allocated resources until all targets in transition
   have been accepted or refused.
   In the case where the intermediate ST agent is acting as the origin
   with respect to this target, see Section 4.6.3.1, the ACCEPT message
   is not propagated upstream.
4.5.7  ACCEPT Processing by the Origin
   The origin will eventually receive an ACCEPT (or REFUSE) message from
   each of the targets. As each ACCEPT is received, the application is
   notified of the target and the resources that were successfully
   allocated along the path to it, as specified in the FlowSpec
   contained in the ACCEPT message. The application may then use the
   information to either adopt or terminate the portion of the stream to
   each target.
   When an ACCEPT is received by the origin, the path to the target is
   considered to be established and the ST agent is allowed to forward
   the data along this path as explained in Section 2 and in Section
   3.1.
4.5.8  REFUSE Processing by the Intermediate ST agent
   If an application at a target does not wish to participate in the
   stream, it sends a REFUSE message back to the origin with ReasonCode
   (ApplDisconnect). An intermediate ST agent that receives a REFUSE
   message with ReasonCode (ApplDisconnect) acknowledges it by sending
   an ACK to the next-hop, invokes the LRM to adjusts reservations as
   appropriate, deletes the target entry from the internal database, and
   propagates the REFUSE message back to the previous-hop ST agent.
   In the case where the intermediate ST agent is acting as the origin
   with respect to this target, see Section 4.6.3.1, the REFUSE message
   is only propagated upstream when there are no more downstream agents
   participating in the stream. In this case, the agent indicates that
   the agent is to be removed from the stream propagating the REFUSE
   message with the G-bit set (1).
4.5.9  REFUSE Processing by the Origin
   When the REFUSE message reaches the origin, the ST agent at the
   origin sends an ACK and notifies the application that the target is
   no longer part of the stream and also if the stream has no remaining
   targets. If there are no remaining targets, the application may wish
   to terminate the stream, or keep the stream active to allow addition
   of targets or stream joining as described in Section 4.6.3.
4.5.10  Other Functions during Stream Setup
   Some other functions have to be accomplished by an ST agent as
   CONNECT messages travel downstream and ACCEPT (or REFUSE) messages
   travel upstream during the stream setup phase. They were not
   mentioned in the previous sections to keep the discussion as simple
   as possible. These functions include:
   o   computing the smallest Maximum Transmission Unit size over the path
       to the targets, as part of the MTU discovery mechanism presented in
       Section 8.6. This is done by updating the MaxMsgSize field of the
       CONNECT message, see Section 10.4.4. This value is carried back to
       origin in the MaxMsgSize field of the ACCEPT message, see Section
       10.4.1.
   o   counting the number of IP clouds to be traversed to reach the
       targets, if any. IP clouds are traversed when the IP encapsulation
       mechanism is used. This mechanism described in Section 8.7.
       Encapsulating agents update the IPHops field of the CONNECT message,
       see Section 10.4.4. The resulting value is carried back to origin in
       the IPHops field of the ACCEPT message, see Section 10.4.1.
   o   updating the RecoveryTimeout value for the stream based on what can
       the agent can support. This is part of the stream recovery
       mechanism, in Section 6.2. This is done by updating the
       RecoveryTimeout field of the CONNECT message, see Section 10.4.4.
       This value is carried back to origin in the RecoveryTimeout field of
       the ACCEPT message, see Section 10.4.1.
4.6  Modifying an Existing Stream
   Some applications may wish to modify a stream after it has been
   created. Possible changes include expanding a stream, reducing it,
   and changing its FlowSpec. The origin may add or remove targets as
   described in Section 4.6.1 and Section 4.6.2. Targets may request to
   join the stream as described in Section 4.6.3 or, they may decide to
   leave a stream as described in Section 4.6.4. Section 4.6.5 explains
   how to change a stream's FlowSpec.
   As defined by Section 2, an ST agent can handle only one stream
   modification at a time. If a stream modification operation is already
   underway, further requests are queued and handled when the previous
   operation has been completed. This also applies to two subsequent
   requests of the same kind, e.g., two subsequent changes to the
   FlowSpec.
4.6.1  The Origin Adding New Targets
   It is possible for an application at the origin to add new targets to
   an existing stream any time after the stream has been established.
   Before new targets are added, the application has to collect the
   necessary information on the new targets. Such information is passed
   to the ST agent at the origin.
   The ST agent at the origin issues a CONNECT message that contains the
   SID, the FlowSpec, and the TargetList specifying the new targets.
   This is similar to sending a CONNECT message during stream
   establishment, with the following exceptions: the origin checks that
   a) the SID is valid, b) the targets are not already members of the
   stream, c) that the LRM evaluates the FlowSpec of the new target to
   be the same as the FlowSpec of the existing stream, i.e., it requires
   an equal or smaller amount of resources to be allocated. If the
   FlowSpec of the new target does not match the FlowSpec of the
   existing stream, an error is generated with ReasonCode
   (FlowSpecMismatch). Functions to compare flow specifications are
   provided by the LRM, see Section 1.4.5.
   An intermediate ST agent that is already a participant in the stream
   looks at the SID and StreamCreationTime, and verifies that the stream
   is the same. It then checks if the intersection of the TargetList and
   the targets of the established stream is empty. If this is not the
   case, it responds with a REFUSE message with ReasonCode
   (TargetExists) that contains a TargetList of those targets that were
   duplicates. To indicate that the stream exists, and includes the
   listed targets, the ST agent sets to one (1) the E-bit of the REFUSE
   message, see Section 10.4.11.  The agent then proceeds processing
   each new target in the TargetList.
   For each new target in the TargetList, processing is much the same as
   for the original CONNECT. The CONNECT is acknowledged, propagated,
   and network resources are reserved. Intermediate or target ST agents
   that are not already participants in the stream behave as in the case
   of stream setup (see Section 4.5.4 and Section 4.5.5).
4.6.2  The Origin Removing a Target
   It is possible for an application at the origin to remove existing
   targets of a stream any time after the targets have accepted the
   stream. The application at the origin specifies the set of targets
   that are to be removed and informs the local ST agent. Based on this
   information, the ST agent sends DISCONNECT messages with the
   ReasonCode (ApplDisconnect) to the next-hops relative to the targets.
   An ST agent that receives a DISCONNECT message must acknowledge it by
   sending an ACK to the previous-hop. The ST agent updates its state
   and notifies the LRM of the target deletion so that the LRM can
   modify reservations as appropriate. When the DISCONNECT message
   reaches the target, the ST agent also notifies the application that
   the target is no longer part of the stream. When there are no
   remaining targets that can be reached through a particular next-hop,
   the ST agent informs the LRM and it deletes the next-hop from its
   next-hops set.
   SCMP also provides a flooding mechanism to delete targets that joined
   the stream without notifying the origin. The special case of target
   deletion via flooding is described in Section 5.7.
4.6.3  A Target Joining a Stream
   An application may request to join an existing stream. It has to
   collect information on the stream including the stream ID (SID) and
   the IP address of the stream's origin. This can be done out-of-band,
   e.g., via regular IP. The information is then passed to the local ST
   agent. The ST agent generates a JOIN message containing the
   application's request to join the stream and sends it toward the
   stream origin.
   An ST agent receiving a JOIN message, assuming no errors, responds
   with an ACK. The ACK message must identify the JOIN message to which
   it corresponds by including the Reference number indicated by the
   Reference field of the JOIN message. If the ST agent is not traversed
   by the stream that has to be joined, it propagates the JOIN message
   toward the stream's origin. Once a JOIN message has been
   acknowledged, ST agents do not retain any state information related
   to the JOIN message.
   Eventually, an ST agent traversed by the stream or the stream's
   origin itself is reached. This agent must respond to a received JOIN
   first with an ACK to the ST agent from which the message was
   received, then, it issues either a CONNECT or a JOIN-REJECT message
   and sends it toward the target. The response to the join request is
   based on the join authorization level associated with the stream, see
   Section 4.4.2:
o   If the stream has authorization level #0 (refuse join):
    The ST agent sends a JOIN-REJECT message toward the target with
    ReasonCode (JoinAuthFailure).
o   If the stream has authorization level #1 (ok, notify origin):
    The ST agent sends a CONNECT message toward the target with a
    TargetList including the target that requested to join the stream.
    This eventually results in adding the target to the stream. When
    the ST agent receives the ACCEPT message indicating that the new
    target has been added, it does not propagate the ACCEPT message
    backwards (Section 4.5.6). Instead, it issues a NOTIFY message
    with ReasonCode (TargetJoined) so that upstream agents, including
    the origin, may add the new target to maintained state
    information. The NOTIFY message includes all target specific
    information.
o   If the stream has authorization level #2 (ok):
    The ST agent sends a CONNECT message toward the target with a
    TargetList including the target that requested to join the stream.
    This eventually results in adding the target to the stream. When
    the ST agent receives the ACCEPT message indicating that the new
    target has been added, it does not propagate the ACCEPT message
    backwards (Section 4.5.6), nor does it notify the origin. A NOTIFY
    message is generated with ReasonCode (TargetJoined) if the target
    specific information needs to be propagated back to the origin. An
    example of such information is change in MTU, see Section 8.6.
4.6.3.1  Intermediate Agent (Router) as Origin
   When a stream has join authorization level #2, see Section 4.4.2, it
   is possible that the stream origin is unaware of some targets
   participating in the stream. In this case, the ST intermediate agent
   that first sent a CONNECT message to this target has to act as the
   stream origin for the given target. This includes:
o   if the whole stream is deleted, the intermediate agent must
    disconnect the target.
o   if the stream FlowSpec is changed, the intermediate agent must
    change the FlowSpec for the target as appropriate.
o   proper handling of ACCEPT and REFUSE messages, without propagation
    to upstream ST agents.
o   generation of NOTIFY messages when needed. (As described above.)
   The intermediate agent behaves normally for all other targets added
   to the stream as a consequence of a CONNECT message issued by the
   origin.
4.6.4  A Target Deleting Itself
   The application at the target may inform the local ST agent that it
   wants to be removed from the stream. The ST agent then forms a REFUSE
   message with the target itself as the only entry in the TargetList
   and with ReasonCode (ApplDisconnect). The REFUSE message is sent back
   to the origin via the previous-hop. If a stream has multiple targets
   and one target leaves the stream using this REFUSE mechanism, the
   stream to the other targets is not affected; the stream continues to
   exist.
   An ST agent that receives a REFUSE message acknowledges it by sending
   an ACK to the next-hop. The target is deleted and the LRM is notified
   so that it adjusts reservations as appropriate. The REFUSE message is
   also propagated back to the previous-hop ST agent except in the case
   where the agent is acting as the origin. In this case a NOTIFY may be
   propagated instead, see Section 4.6.3.
   When the REFUSE reaches the origin, the origin sends an ACK and
   notifies the application that the target is no longer part of the
   stream.
4.6.5  Changing a Stream's FlowSpec
   The application at the origin may wish to change the FlowSpec of an
   established stream. Changing the FlowSpec is a critical operation and
   it may even lead in some cases to the deletion of the affected
   targets. Possible problems with FlowSpec changes are discussed in
   Section 5.6.
   To change the stream's FlowSpec, the application informs the ST agent
   at the origin of the new FlowSpec and of the list of targets relative
   to the change. The ST agent at the origin then issues one CHANGE
   message per next-hop including the new FlowSpec and sends it to the
   relevant next-hop ST agents. If the G-bit field of the CHANGE message
   is set (1), the change affects all targets in the stream.
   The CHANGE message contains a bit called I-bit, see Section 10.4.3.
   By default, the I-bit is set to zero (0) to indicate that the LRM is
   expected to try and perform the requested FlowSpec change without
   risking to tear down the stream. Applications that desire a higher
   probability of success and are willing to take the risk of breaking
   the stream can indicate this by setting the I-bit to one (1).
   Applications that require the requested modification in order to
   continue operating are expected to set this bit.
   An intermediate ST agent that receives a CHANGE message first sends
   an ACK to the previous-hop and then provides the FlowSpec to the LRM.
   If the LRM can perform the change, the ST agent propagates the CHANGE
   messages along the established paths.
   If the whole process succeeds, the CHANGE messages will eventually
   reach the targets. Targets respond with an ACCEPT (or REFUSE) message
   that is propagated back to the origin. In processing the ACCEPT
   message on the way back to the origin, excess resources may be
   released by the LRM as described in Section 4.5.6. The REFUSE message
   must have the ReasonCode (ApplRefused).
   SCMP also provides a flooding mechanism to change targets that joined
   the stream without notifying the origin. The special case of target
   change via flooding is described in Section 5.7.
4.7  Stream Tear Down
   A stream is usually terminated by the origin when it has no further
   data to send. A stream is also torn down if the application should
   terminate abnormally or if certain network failures are encountered.
   Processing in this case is identical to the previous descriptions
   except that the ReasonCode (ApplAbort, NetworkFailure, etc.) is
   different.
   When all targets have left a stream, the origin notifies the
   application of that fact, and the application is then responsible for
   terminating the stream. Note, however, that the application may
   decide to add targets to the stream instead of terminating it, or may
   just leave the stream open with no targets in order to permit stream
   joins.
5.  Exceptional Cases
   The previous descriptions covered the simple cases where everything
   worked. We now discuss what happens when things do not succeed.
   Included are situations where messages exceed a network MTU, are
   lost, the requested resources are not available, the routing fails or
   is inconsistent.
5.1  Long ST Messages
   It is possible that an ST agent, or an application, will need to send
   a message that exceeds a network's Maximum Transmission Unit (MTU).
   This case must be handled but not via generic fragmentation, since
   ST2 does not support generic fragmentation of either data or control
   messages.
5.1.1  Handling of Long Data Packets
   ST agents discard data packets that exceed the MTU of the next-hop
   network. No error message is generated. Applications should avoid
   sending data packets larger than the minimum MTU supported by a given
   stream. The application, both at the origin and targets, can learn
   the stream minimum MTU through the MTU discovery mechanism described
   in Section 8.6.
5.1.2  Handling of Long Control Packets
   Each ST agent knows the MTU of the networks to which it is connected,
   and those MTUs restrict the size of the SCMP message it can send. An
   SCMP message size can exceed the MTU of a given network for a number
   of reasons:
o   the TargetList parameter (Section 10.3.6) may be too long;
o   the RecordRoute parameter (Section 10.3.5) may be too long.
o   the UserData parameter (Section 10.3.7) may be too long;
o   the PDUInError field of the ERROR message (Section 10.4.6) may be
    too long;
   An ST agent receiving or generating a too long SCMP message should:
o   break the message into multiple messages, each carrying part of the
    TargetList. Any RecordRoute and UserData parameters are replicated
    in each message for delivery to all targets. Applications that
    support a large number of targets may avoid using long TargetList
    parameters, and are expected to do so, by exploiting the stream
    joining functions, see Section 4.6.3. One exception to this rule
    exists. In the case of a long TargetList parameter to be included in
    a STATUS-RESPONSE message, the TargetList parameter is just
    truncated to the point where the list can fit in a single message,
    see Section 8.4.
o   for down stream agents: if the TargetList parameter contains a
    single Target element and the message size is still too long, the ST
    agent should issue a REFUSE message with ReasonCode
    (RecordRouteSize) if the size of the RecordRoute parameter causes
    the SCMP message size to exceed the network MTU, or with ReasonCode
    (UserDataSize) if the size of the UserData parameter causes the SCMP
    message size to exceed the network MTU. If both RecordRoute and
    UserData parameters are present the ReasonCode (UserDataSize) should
    be sent. For messages generated at the target: the target ST agent
    must check for SCMP messages that may exceed the MTU on the complete
    target-to-origin path, and inform the application that a too long
    SCMP messages has been generated. The format for the error reporting
    is a local implementation issue. The error codes are the same as
    previously stated.
   ST agents generating too long ERROR messages, simply truncate the
   PDUInError field to the point where the message is smaller than the
   network MTU.
5.2  Timeout Failures
   As described in Section 4.3, SCMP message delivery is made reliable
   through the use of acknowledgments, timeouts, and retransmission. The
   ACCEPT, CHANGE, CONNECT, DISCONNECT, JOIN, JOIN-REJECT, NOTIFY, and
   REFUSE messages must always be acknowledged, see Section 4.2. In
   addition, for some SCMP messages (CHANGE, CONNECT, JOIN) the sending
   ST agent also expects a response back (ACCEPT/REFUSE, CONNECT/JOIN-
   REJECT) after an ACK has been received. Also, the STATUS message must
   be answered with a STATUS-RESPONSE message.
   The following sections describe the handling of each of the possible
   failure cases due to timeout situations while waiting for an
   acknowledgment or a response. The timeout related variables, and
   their names, used in the next sections are for reference purposes
   only. They may be implementation specific. Different implementations
   are not required to share variable names, or even the mechanism by
   which the timeout and retransmission behavior is implemented.
5.2.1  Failure due to ACCEPT Acknowledgment Timeout
   An ST agent that sends an ACCEPT message upstream expects an ACK from
   the previous-hop ST agent. If no ACK is received before the ToAccept
   timeout expires, the ST agent should retry and send the ACCEPT
   message again. After NAccept unsuccessful retries, the ST agent sends
   a REFUSE message toward the origin, and a DISCONNECT message toward
   the targets. Both REFUSE and DISCONNECT must identify the affected
   targets and specify the ReasonCode (RetransTimeout).
5.2.2  Failure due to CHANGE Acknowledgment Timeout
   An ST agent that sends a CHANGE message downstream expects an ACK
   from the next-hop ST agent. If no ACK is received before the ToChange
   timeout expires, the ST agent should retry and send the CHANGE
   message again. After NChange unsuccessful retries, the ST agent
   aborts the change attempt by sending a REFUSE message toward the
   origin, and a DISCONNECT message toward the targets. Both REFUSE and
   DISCONNECT must identify the affected targets and specify the
   ReasonCode (RetransTimeout).
5.2.3  Failure due to CHANGE Response Timeout
   Only the origin ST agent implements this timeout. After correctly
   receiving the ACK to a CHANGE message, an ST agent expects to receive
   an ACCEPT, or REFUSE message in response. If one of these messages is
   not received before the ToChangeResp timer expires, the ST agent at
   the origin aborts the change attempt, and behaves as if a REFUSE
   message with the E-bit set and with ReasonCode (ResponseTimeout) is
   received.
5.2.4  Failure due to CONNECT Acknowledgment Timeout
   An ST agent that sends a CONNECT message downstream expects an ACK
   from the next-hop ST agent. If no ACK is received before the
   ToConnect timeout expires, the ST agent should retry and send the
   CONNECT message again. After NConnect unsuccessful retries, the ST
   agent sends a REFUSE message toward the origin, and a DISCONNECT
   message toward the targets. Both REFUSE and DISCONNECT must identify
   the affected targets and specify the ReasonCode (RetransTimeout).
5.2.5  Failure due to CONNECT Response Timeout
   Only the origin ST agent implements this timeout. After correctly
   receiving the ACK to a CONNECT message, an ST agent expects to
   receive an ACCEPT or REFUSE message in response. If one of these
   messages is not received before the ToConnectResp timer expires, the
   origin ST agent aborts the connection setup attempt, acts as if a
   REFUSE message is received, and it sends a DISCONNECT message toward
   the targets.  Both REFUSE and DISCONNECT must identify the affected
   targets and specify the ReasonCode (ResponseTimeout).
5.2.6  Failure due to DISCONNECT Acknowledgment Timeout
   An ST agent that sends a DISCONNECT message downstream expects an ACK
   from the next-hop ST agent. If no ACK is received before the
   ToDisconnect timeout expires, the ST agent should retry and send the
   DISCONNECT message again. After NDisconnect unsuccessful retries, the
   ST agent simply gives up and it assumes the next-hop ST agent is not
   part in the stream any more.
5.2.7  Failure due to JOIN Acknowledgment Timeout
   An ST agent that sends a JOIN message toward the origin expects an
   ACK from a neighbor ST agent. If no ACK is received before the ToJoin
   timeout expires, the ST agent should retry and send the JOIN message
   again. After NJoin unsuccessful retries, the ST agent sends a JOIN-
   REJECT message back in the direction of the target with ReasonCode
   (RetransTimeout).
5.2.8  Failure due to JOIN Response Timeout
   Only the target agent implements this timeout. After correctly
   receiving the ACK to a JOIN message, the ST agent at the target
   expects to receive a CONNECT or JOIN-REJECT message in response. If
   one of these message is not received before the ToJoinResp timer
   expires, the ST agent aborts the stream join attempt and returns an
   error corresponding with ReasonCode (RetransTimeout) to the
   application.
   Note that, after correctly receiving the ACK to a JOIN message,
   intermediate ST agents do not maintain any state on the stream
   joining attempt. As a consequence, they do not set the ToJoinResp
   timer and do not wait for a CONNECT or JOIN-REJECT message. This is
   described in Section 4.6.3.
5.2.9  Failure due to JOIN-REJECT Acknowledgment Timeout
   An ST agent that sends a JOIN-REJECT message toward the target
   expects an ACK from a neighbor ST agent. If no ACK is received before
   the ToJoinReject timeout expires, the ST agent should retry and send
   the JOIN-REJECT message again. After NJoinReject unsuccessful
   retries, the ST agent simply gives up.
5.2.10  Failure due to NOTIFY Acknowledgment Timeout
   An ST agent that sends a NOTIFY message to a neighbor ST agent
   expects an ACK from that neighbor ST agent. If no ACK is received
   before the ToNotify timeout expires, the ST agent should retry and
   send the NOTIFY message again. After NNotify unsuccessful retries,
   the ST agent simply gives up and behaves as if the ACK message was
   received.
5.2.11  Failure due to REFUSE Acknowledgment Timeout
   An ST agent that sends a REFUSE message upstream expects an ACK from
   the previous-hop ST agent. If no ACK is received before the ToRefuse
   timeout expires, the ST agent should retry and send the REFUSE
   message again. After NRefuse unsuccessful retries, the ST agent gives
   up and it assumes it is not part in the stream any more.
5.2.12  Failure due to STATUS Response Timeout
   After sending a STATUS message to a neighbor ST agent, an ST agent
   expects to receive a STATUS-RESPONSE message in response. If this
   message is not received before the ToStatusResp timer expires, the ST
   agent sends the STATUS message again. After NStatus unsuccessful
   retries, the ST agent gives up and assumes that the neighbor ST agent
   is not active.
5.3  Setup Failures due to Routing Failures
   It is possible for an ST agent to receive a CONNECT message that
   contains a known SID, but from an ST agent other than the previous-
   hop ST agent of the stream with that SID. This may be:
   1.  that two branches of the tree forming the stream have joined
       back together,
   2.  the result of an attempted recovery of a partially failed
       stream, or
   3.  a routing loop.
   The TargetList contained in the CONNECT is used to distinguish the
   different cases by comparing each newly received target with those of
   the previously existing stream:
o   if the IP address of the target(s) differ, it is case #1;
o   if the target matches a target in the existing stream, it may be
    case #2 or #3.
   Case #1 is handled in Section 5.3.1, while the other cases are
   handled in Section 5.3.2.
5.3.1  Path Convergence
   It is possible for an ST agent to receive a CONNECT message that
   contains a known SID, but from an ST agent other than the previous-
   hop ST agent of the stream with that SID. This might be the result of
   two branches of the tree forming the stream have joined back
   together.  Detection of this case and other possible sources was
   discussed in Section 5.2.
   SCMP does not allow for streams which have converged paths, i.e.,
   streams are always tree-shaped and not graph-like. At the point of
   convergence, the ST agent which detects the condition generates a
   REFUSE message with ReasonCode (PathConvergence). Also, as a help to
   the upstream ST agent, the detecting agent places the IP address of
   one of the stream's connected targets in the ValidTargetIPAddress
   field of the REFUSE message. This IP address will be used by upstream
   ST agents to avoid splitting the stream.
   An upstream ST agent that receives the REFUSE with ReasonCode
   (PathConvergence) will check to see if the listed IP address is one
   of the known stream targets. If it is not, the REFUSE is propagated
   to the previous-hop agent. If the listed IP address is known by the
   upstream ST agent, this ST agent is the ST agent that caused the
   split in the stream. (This agent may even be the origin.) This agent
   then avoids splitting the stream by using the next-hop of that known
   target as the next-hop for the refused targets. It sends a CONNECT
   with the affected targets to the existing valid next-hop.
   The above process will proceed, hop by hop, until the
   ValidTargetIPAddress matches the IP address of a known target. The
   only case where this process will fail is when the known target is
   deleted prior to the REFUSE propagating to the origin. In this case
   the origin can just reissue the CONNECT and start the whole process
   over again.
5.3.2  Other Cases
   The remaining cases including a partially failed stream and a routing
   loop, are not easily distinguishable. In attempting recovery of a
   failed stream, an ST agent may issue new CONNECT messages to the
   affected targets. Such a CONNECT may reach an ST agent downstream of
   the failure before that ST agent has received a DISCONNECT from the
   neighborhood of the failure. Until that ST agent receives the
   DISCONNECT, it cannot distinguish between a failure recovery and an
   erroneous routing loop. That ST agent must therefore respond to the
   CONNECT with a REFUSE message with the affected targets specified in
   the TargetList and an appropriate ReasonCode (StreamExists).
   The ST agent immediately preceding that point, i.e., the latest ST
   agent to send the CONNECT message, will receive the REFUSE message.
   It must release any resources reserved exclusively for traffic to the
   listed targets. If this ST agent was not the one attempting the
   stream recovery, then it cannot distinguish between a failure
   recovery and an erroneous routing loop. It should repeat the CONNECT
   after a ToConnect timeout, see Section 5.2.4. If after NConnect
   retransmissions it continues to receive REFUSE messages, it should
   propagate the REFUSE message toward the origin, with the TargetList
   that specifies the affected targets, but with a different ReasonCode
   (RouteLoop).
   The REFUSE message with this ReasonCode (RouteLoop) is propagated by
   each ST agent without retransmitting any CONNECT messages. At each ST
   agent, it causes any resources reserved exclusively for the listed
   targets to be released. The REFUSE will be propagated to the origin
   in the case of an erroneous routing loop. In the case of stream
   recovery, it will be propagated to the ST agent that is attempting
   the recovery, which may be an intermediate ST agent or the origin
   itself. In the case of a stream recovery, the ST agent attempting the
   recovery may issue new CONNECT messages to the same or to different
   next-hops.
   If an ST agent receives both a REFUSE message and a DISCONNECT
   message with a target in common then it can, for the each target in
   common, release the relevant resources and propagate neither the
   REFUSE nor the DISCONNECT.
   If the origin receives such a REFUSE message, it should attempt to
   send a new CONNECT to all the affected targets. Since routing errors
   in an internet are assumed to be temporary, the new CONNECTs will
   eventually find acceptable routes to the targets, if one exists. If
   no further routes exist after NRetryRoute tries, the application
   should be informed so that it may take whatever action it seems
   necessary.
5.4  Problems due to Routing Inconsistency
   When an intermediate ST agent receives a CONNECT, it invokes the
   routing algorithm to select the next-hop ST agents based on the
   TargetList and the networks to which it is connected. If the
   resulting next-hop to any of the targets is across the same network
   from which it received the CONNECT (but not the previous-hop itself),
   there may be a routing problem. However, the routing algorithm at the
   previous- hop may be optimizing differently than the local algorithm
   would in the same situation. Since the local ST agent cannot
   distinguish the two cases, it should permit the setup but send back
   to the previous- hop ST agent an informative NOTIFY message with the
   appropriate ReasonCode (RouteBack), pertinent TargetList, and in the
   NextHopIPAddress element the address of the next-hop ST agent
   returned by its routing algorithm.
   The ST agent that receives such a NOTIFY should ACK it. If the ST
   agent is using an algorithm that would produce such behavior, no
   further action is taken; if not, the ST agent should send a
   DISCONNECT to the next-hop ST agent to correct the problem.
   Alternatively, if the next-hop returned by the routing function is in
   fact the previous-hop, a routing inconsistency has been detected. In
   this case, a REFUSE is sent back to the previous-hop ST agent
   containing an appropriate ReasonCode (RouteInconsist), pertinent
   TargetList, and in the NextHopIPAddress element the address of the
   previous-hop. When the previous-hop receives the REFUSE, it will
   recompute the next-hop for the affected targets. If there is a
   difference in the routing databases in the two ST agents, they may
   exchange CONNECT and REFUSE messages again. Since such routing errors
   in the internet are assumed to be temporary, the situation should
   eventually stabilize.
5.5  Problems in Reserving Resources
   As mentioned in Section 1.4.5, resource reservation is handled by the
   LRM. The LRM may not be able to satisfy a particular request during
   stream setup or modification for a number of reasons, including a
   mismatched FlowSpec, an unknown FlowSpec version, an error in
   processing a FlowSpec, and an inability to allocate the requested
   resource. This section discusses these cases and specifies the
   ReasonCodes that should be used when these error cases are
   encountered.
5.5.1  Mismatched FlowSpecs
   In some cases the LRM may require a requested FlowSpec to match an
   existing FlowSpec, e.g., when adding new targets to an existing
   stream, see Section 4.6.1. In case of FlowSpec mismatch the LRM
   notifies the processing ST agent which should respond with ReasonCode
   (FlowSpecMismatch).
5.5.2  Unknown FlowSpec Version
   When the LRM is invoked, it is passed information including the
   version of the FlowSpec, see Section 4.5.2.2. If this version is not
   known by the LRM, the LRM notifies the ST agent. The ST agent should
   respond with a REFUSE message with ReasonCode (FlowVerUnknown).
5.5.3  LRM Unable to Process FlowSpec
   The LRM may encounter an LRM or FlowSpec specific error while
   attempting to satisfy a request. An example of such an error is given
   in Section 9.2.1. These errors are implementation specific and will
   not be enumerated with ST ReasonCodes. They are covered by a single,
   generic ReasonCode. When an LRM encounters such an error, it should
   notify the ST agent which should respond with the generic ReasonCode
   (FlowSpecError).
5.5.4  Insufficient Resources
   If the LRM cannot make the necessary reservations because sufficient
   resources are not available, an ST agent may:
o   try alternative paths to the targets: the ST agent calls the routing
    function to find a different path to the targets. If an alternative
    path is found, stream connection setup continues in the usual way,
    as described in Section 4.5.
o   refuse to establish the stream along this path: the origin ST agent
    informs the application of the stream setup failure; intermediate
    and target ST agents issue a REFUSE message (as described in Section
    4.5.8) with ReasonCode (CantGetResrc).
   It depends on the local implementations whether an ST agent tries
   alternative paths or refuses to establish the stream. In any case, if
   enough resources cannot be found over different paths, the ST agent
   has to explicitly refuse to establish the stream.
5.6  Problems Caused by CHANGE Messages
   A CHANGE might fail for several reasons, including:
o   insufficient resources: the request may be for a larger amount of
    network resources when those resources are not available, ReasonCode
    (CantGetResrc);
o   a target application not agreeing to the change, ReasonCode
    (ApplRefused);
   The affected stream can be left in one of two states as a result of
   change failures: a) the stream can revert back to the state it was in
   prior to the CHANGE message being processed, or b) the stream may be
   torn down.
   The expected common case of failure will be when the requested change
   cannot be satisfied, but the pre-change resources remain allocated
   and available for use by the stream. In this case, the ST agent at
   the point where the failure occurred must inform upstream ST agents
   of the failure. (In the case where this ST agent is the target, there
   may not actually be a failure, the application may merely have not
   agreed to the change). The ST agent informs upstream ST agents by
   sending a REFUSE message with ReasonCode (CantGetResrc or
   ApplRefused). To indicate that the pre-change FlowSpec is still
   available and that the stream still exists, the ST agent sets the E-
   bit of the REFUSE message to one (1), see Section 10.4.11. Upstream
   ST agents receiving the REFUSE message inform the LRM so that it can
   attempt to revert back to the pre-change FlowSpec. It is permissible,
   but not desirable, for excess resources to remain allocated.
   For the case when the attempt to change the stream results in the
   loss of previously reserved resources, the stream is torn down. This
   can happen, for instance, when the I-bit is set (Section 4.6.5) and
   the LRM releases pre-change stream resources before the new ones are
   reserved, and neither new nor former resources are available. In this
   case, the ST agent where the failure occurs must inform other ST
   agents of the break in the affected portion of the stream. This is
   done by the ST agent by sending a REFUSE message upstream and a
   DISCONNECT message downstream, both with the ReasonCode
   (CantGetResrc). To indicate that pre-change stream resources have
   been lost, the E-bit of the REFUSE message is set to zero (0).
   Note that a failure to change the resources requested for specific
   targets should not cause other targets in the stream to be deleted.
5.7  Unknown Targets in DISCONNECT and CHANGE
   The handling of unknown targets listed in a DISCONNECT or CHANGE
   message is dependent on a stream's join authorization level, see
   Section 4.4.2. For streams with join authorization levels #0 and #1,
   see Section 4.4.2, all targets must be known. In this case, when
   processing a CHANGE message, the agent should generate a REFUSE
   message with ReasonCode (TargetUnknown). When processing a DISCONNECT
   message, it is possible that the DISCONNECT is a duplicate of an old
   request so the agent should respond as if it has successfully
   disconnected the target. That is, it should respond with an ACK
   message.
   For streams with join authorization level #2, it is possible that the
   origin is not aware of some targets that participate in the stream.
   The origin may delete or change these targets via the following
   flooding mechanism.
   If no next-hop ST agent can be associated with a target, the CHANGE/
   DISCONNECT message including the target is replicated to all known
   next-hop ST agents. This has the effect of propagating the CHANGE/
   DISCONNECT message to all downstream ST agents. Eventually, the ST
   agent that acts as the origin for the target (Section 4.6.3.1) is
   reached and the target is deleted.
   Target deletion/change via flooding is not expected to be the normal
   case. It is included to present the applications with uniform
   capabilities for all stream types. Flooding only applies to streams
   with join authorization level #2.
6.  Failure Detection and Recovery
6.1  Failure Detection
   The SCMP failure detection mechanism is based on two assumptions:
1.  If a neighbor of an ST agent is up, and has been up without a
    disruption, and has not notified the ST agent of a problem with
    streams that pass through both, then the ST agent can assume that
    there has not been any problem with those streams.
2.  A network through which an ST agent has routed a stream will notify
    the ST agent if there is a problem that affects the stream data
    packets but does not affect the control packets.
   The purpose of the robustness protocol defined here is for ST agents
   to determine that the streams through a neighbor have been broken by
   the failure of the neighbor or the intervening network. This protocol
   should detect the overwhelming majority of failures that can occur.
   Once a failure is detected, the recovery procedures described in
   Section 6.2 are initiated by the ST agents.
6.1.1  Network Failures
   An ST agent can detect network failures by two mechanisms:
   o   the network can report a failure, or
   o   the ST agent can discover a failure by itself.
   They differ in the amount of information that an ST agent has
   available to it in order to make a recovery decision. For example, a
   network may be able to report that reserved bandwidth has been lost
   and the reason for the loss and may also report that connectivity to
   the neighboring ST agent remains intact. On the other hand, an ST
   agent may discover that communication with a neighboring ST agent has
   ceased because it has not received any traffic from that neighbor in
   some time period. If an ST agent detects a failure, it may not be
   able to determine if the failure was in the network while the
   neighbor remains available, or the neighbor has failed while the
   network remains intact.
6.1.2  Detecting ST Agents Failures
   Each ST agent periodically sends each neighbor with which it shares
   one or more streams a HELLO message. This message exchange is between
   ST agents, not entities representing streams or applications. That
   is, an ST agent need only send a single HELLO message to a neighbor
   regardless of the number of streams that flow between them. All ST
   agents (host as well as intermediate) must participate in this
   exchange. However, only ST agents that share active streams can
   participate in this exchange and it is an error to send a HELLO
   message to a neighbor ST agent with no streams in common, e.g., to
   check whether it is active. STATUS messages can be used to poll the
   status of neighbor ST agents, see Section 8.4.
   For the purpose of HELLO message exchange, stream existence is
   bounded by ACCEPT and DISCONNECT/REFUSE processing and is defined for
   both the upstream and downstream case. A stream to a previous-hop is
   defined to start once an ACCEPT message has been forwarded upstream.
   A stream to a next-hop is defined to start once the received ACCEPT
   message has been acknowledged. A stream is defined to terminate once
   an acknowledgment is sent for a received DISCONNECT or REFUSE
   message, and an acknowledgment for a sent DISCONNECT or REFUSE
   message has been received.
   The HELLO message has two fields:
   o   a HelloTimer field that is in units of milliseconds modulo the
       maximum for the field size, and
   o   a Restarted-bit specifying that the ST agent has been restarted
       recently.
   The HelloTimer must appear to be incremented every millisecond
   whether a HELLO message is sent or not. The HelloTimer wraps around
   to zero after reaching the maximum value. Whenever an ST agent
   suffers a catastrophic event that may result in it losing ST state
   information, it must reset its HelloTimer to zero and must set the
   Restarted-bit in all HELLO messages sent in the following
   HelloTimerHoldDown seconds.
   If an ST agent receives a HELLO message that contains the Restarted-
   bit set, it must assume that the sending ST agent has lost its state.
   If it shares streams with that neighbor, it must initiate stream
   recovery activity, see Section 6.2. If it does not share streams with
   that neighbor, it should not attempt to create one until that bit is
   no longer set. If an ST agent receives a CONNECT message from a
   neighbor whose Restarted-bit is still set, the agent must respond
   with an ERROR message with the appropriate ReasonCode
   (RestartRemote). If an agent receives a CONNECT message while the
   agent's own Restarted- bit is set, the agent must respond with an
   ERROR message with the appropriate ReasonCode (RestartLocal).
   Each ST stream has an associated RecoveryTimeout value. This value is
   assigned by the origin and carried in the CONNECT message, see
   Section 4.5.10. Each agent checks to see if it can support the
   requested value. If it can not, it updates the value to the smallest
   timeout interval it can support. The RecoveryTimeout used by a
   particular stream is obtained from the ACCEPT message, see Section
   4.5.10, and is the smallest value seen across all ACCEPT messages
   from participating targets.
   An ST agent must send HELLO messages to its neighbor with a period
   shorter than the smallest RecoveryTimeout of all the active streams
   that pass between the two ST agents, regardless of direction. This
   period must be smaller by a factor, called HelloLossFactor, which is
   at least as large as the greatest number of consecutive HELLO
   messages that could credibly be lost while the communication between
   the two ST agents is still viable.
   An ST agent may send simultaneous HELLO messages to all its neighbors
   at the rate necessary to support the smallest RecoveryTimeout of any
   active stream. Alternately, it may send HELLO messages to different
   neighbors independently at different rates corresponding to
   RecoveryTimeouts of individual streams.
   An ST agent must expect to receive at least one new HELLO message
   from each neighbor at least as frequently as the smallest
   RecoveryTimeout of any active stream in common with that neighbor.
   The agent can detect duplicate or delayed HELLO messages by comparing
   the HelloTimer field of the most recent valid HELLO message from that
   neighbor with the HelloTimer field of an incoming HELLO message.
   Valid incoming HELLO messages will have a HelloTimer field that is
   greater than the field contained in the previously received valid
   HELLO message by the time elapsed since the previous message was
   received. Actual evaluation of the elapsed time interval should take
   into account the maximum likely delay variance from that neighbor.
   If the ST agent does not receive a valid HELLO message within the
   RecoveryTimeout period of a stream, it must assume that the
   neighboring ST agent or the communication link between the two has
   failed and it must initiate stream recovery activity, as described
   below in Section 6.2.
6.2  Failure Recovery
   If an intermediate ST agent fails or a network or part of a network
   fails, the previous-hop ST agent and the various next-hop ST agents
   will discover the fact by the failure detection mechanism described
   in Section 6.1.
   The recovery of an ST stream is a relatively complex and time
   consuming effort because it is designed in a general manner to
   operate across a large number of networks with diverse
   characteristics.  Therefore, it may require information to be
   distributed widely, and may require relatively long timers. On the
   other hand, since a network is typically a homogeneous system,
   failure recovery in the network may be a relatively faster and
   simpler operation. Therefore an ST agent that detects a failure
   should attempt to fix the network failure before attempting recovery
   of the ST stream. If the stream that existed between two ST agents
   before the failure cannot be reconstructed by network recovery
   mechanisms alone, then the ST stream recovery mechanism must be
   invoked.
   If stream recovery is necessary, the different ST agents will need to
   perform different functions, depending on their relation to the
   failure:
o   An ST agent that is a next-hop from a failure should first verify
    that there was a failure. It can do this using STATUS messages to
    query its upstream neighbor. If it cannot communicate with that
    neighbor, then for each active stream from that neighbor it should
    first send a REFUSE message upstream with the appropriate ReasonCode
    (STAgentFailure). This is done to the neighbor to speed up the
    failure recovery in case the hop is unidirectional, i.e., the
    neighbor can hear the ST agent but the ST agent cannot hear the
    neighbor. The ST agent detecting the failure must then, for each
    active stream from that neighbor, send DISCONNECT messages with the
    same ReasonCode toward the targets. All downstream ST agents process
    this DISCONNECT message just like the DISCONNECT that tears down the
    stream. If recovery is successful, targets will receive new CONNECT
    messages.
o   An ST agent that is the previous-hop before the failed component
    first verifies that there was a failure by querying the downstream
    neighbor using STATUS messages. If the neighbor has lost its state
    but is available, then the ST agent may try and reconstruct
    (explained below) the affected streams, for those streams that do
    not have the NoRecovery option selected. If it cannot communicate
    with the next-hop, then the ST agent detecting the failure sends a
    DISCONNECT message, for each affected stream, with the appropriate
    ReasonCode (STAgentFailure) toward the affected targets. It does so
    to speed up failure recovery in case the communication may be
    unidirectional and this message might be delivered successfully.
   Based on the NoRecovery option, the ST agent that is the previous-hop
   before the failed component takes the following actions:
o   If the NoRecovery option is selected, then the ST agent sends, per
    affected stream, a REFUSE message with the appropriate ReasonCode
    (STAgentFailure) to the previous-hop. The TargetList in these
    messages contains all the targets that were reached through the
    broken branch. As discussed in Section 5.1.2, multiple REFUSE
    messages may be required if the PDU is too long for the MTU of the
    intervening network. The REFUSE message is propagated all the way to
    the origin. The application at the origin can attempt recovery of
    the stream by sending a new CONNECT to the affected targets. For
    established streams, the new CONNECT will be treated by intermediate
    ST agents as an addition of new targets into the established stream.
o   If the NoRecovery option is not selected, the ST agent can attempt
    recovery of the affected streams. It does so one a stream by stream
    basis by issuing a new CONNECT message to the affected targets. If
    the ST agent cannot find new routes to some targets, or if the only
    route to some targets is through the previous-hop, then it sends one
    or more REFUSE messages to the previous-hop with the appropriate
    ReasonCode (CantRecover) specifying the affected targets in the
    TargetList. The previous-hop can then attempt recovery of the stream
    by issuing a CONNECT to those targets. If it cannot find an
    appropriate route, it will propagate the REFUSE message toward the
    origin.
   Regardless of which ST agent attempts recovery of a damaged stream,
   it will issue one or more CONNECT messages to the affected targets.
   These CONNECT messages are treated by intermediate ST agents as
   additions of new targets into the established stream. The FlowSpecs
   of the new CONNECT messages are the same as the ones contained in the
   most recent CONNECT or CHANGE messages that the ST agent had sent
   toward the affected targets when the stream was operational.
   Upon receiving an ACCEPT during the a stream recovery, the agent
   reconstructing the stream must ensure that the FlowSpec and other
   stream attributes (e.g., MaxMsgSize and RecoveryTimeout) of the re-
   established stream are equal to, or are less restrictive, than the
   pre-failure stream. If they are more restrictive, the recovery
   attempt must be aborted. If they are equal, or are less restrictive,
   then the recovery attempt is successful. When the attempt is a
   success, failure recovery related ACCEPTs are not forwarded upstream
   by the recovering agent.
   Any ST agent that decides that enough recovery attempts have been
   made, or that recovery attempts have no chance of succeeding, may
   indicate that no further attempts at recovery should be made. This is
   done by setting the N-bit in the REFUSE message, see Section 10.4.11.
   This bit must be set by agents, including the target, that know that
   there is no chance of recovery succeeding. An ST agent that receives
   a REFUSE message with the N-bit set (1) will not attempt recovery,
   regardless of the NoRecovery option, and it will set the N-bit when
   propagating the REFUSE message upstream.
6.2.1  Problems in Stream Recovery
   The reconstruction of a broken stream may not proceed smoothly. Since
   there may be some delay while the information concerning the failure
   is propagated throughout an internet, routing errors may occur for
   some time after a failure. As a result, the ST agent attempting the
   recovery may receive ERROR messages for the new CONNECTs that are
   caused by internet routing errors. The ST agent attempting the
   recovery should be prepared to resend CONNECTs before it succeeds in
   reconstructing the stream. If the failure partitions the internet and
   a new set of routes cannot be found to the targets, the REFUSE
   messages will eventually be propagated to the origin, which can then
   inform the application so it can decide whether to terminate or to
   continue to attempt recovery of the stream.
   The new CONNECT may at some point reach an ST agent downstream of the
   failure before the DISCONNECT does. In this case, the ST agent that
   receives the CONNECT is not yet aware that the stream has suffered a
   failure, and will interpret the new CONNECT as resulting from a
   routing failure. It will respond with an ERROR message with the
   appropriate ReasonCode (StreamExists). Since the timeout that the ST
   agents immediately preceding the failure and immediately following
   the failure are approximately the same, it is very likely that the
   remnants of the broken stream will soon be torn down by a DISCONNECT
   message. Therefore, the ST agent that receives the ERROR message with
   ReasonCode (StreamExists) should retransmit the CONNECT message after
   the ToConnect timeout expires. If this fails again, the request will
   be retried for NConnect times. Only if it still fails will the ST
   agent send a REFUSE message with the appropriate ReasonCode
   (RouteLoop) to its previous-hop. This message will be propagated back
   to the ST agent that is attempting recovery of the damaged stream.
   That ST agent can issue a new CONNECT message if it so chooses. The
   REFUSE is matched to a CONNECT message created by a recovery
   operation through the LnkReference field in the CONNECT.
   ST agents that have propagated a CONNECT message and have received a
   REFUSE message should maintain this information for some period of
   time. If an ST agent receives a second CONNECT message for a target
   that recently resulted in a REFUSE, that ST agent may respond with a
   REFUSE immediately rather than attempting to propagate the CONNECT.
   This has the effect of pruning the tree that is formed by the
   propagation of CONNECT messages to a target that is not reachable by
   the routes that are selected first. The tree will pass through any
   given ST agent only once, and the stream setup phase will be
   completed faster.
   If a CONNECT message reaches a target, the target should as
   efficiently as possible use the state that it has saved from before
   the stream failed during recovery of the stream. It will then issue
   an ACCEPT message toward the origin. The ACCEPT message will be
   intercepted by the ST agent that is attempting recovery of the
   damaged stream, if not the origin. If the FlowSpec contained in the
   ACCEPT specifies the same selection of parameters as were in effect
   before the failure, then the ST agent that is attempting recovery
   will not propagate the ACCEPT. FlowSpec comparison is done by the
   LRM. If the selections of the parameters are different, then the ST
   agent that is attempting recovery will send the origin a NOTIFY
   message with the appropriate ReasonCode (FailureRecovery) that
   contains a FlowSpec that specifies the new parameter values. The
   origin may then have to change its data generation characteristics
   and the stream's parameters with a CHANGE message to use the newly
   recovered subtree.
6.3  Stream Preemption
   As mentioned in Section 1.4.5, it is possible that the LRM decides to
   break a stream intentionally. This is called stream preemption.
   Streams are expected to be preempted in order to free resources for a
   new stream which has a higher priority.
   If the LRM decides that it is necessary to preempt one or more of the
   stream traversing it, the decision on which streams have to be
   preempted has to be made. There are two ways for an application to
   influence such decision:
   1.  based on FlowSpec information. For instance, with the ST2+
       FlowSpec, streams can be assigned a precedence value from 0
       (least important) to 256 (most important). This value is
       carried in the FlowSpec when the stream is setup, see Section
       9.2, so that the LRM is informed about it.
   2.  with the group mechanism. An application may specify that a set
       of streams are related to each other and that they are all
       candidate for preemption if one of them gets preempted. It can
       be done by using the fate-sharing relationship defined in
       Section 7.1.2. This helps the LRM making a good choice when
       more than one stream have to be preempted, because it leads to
       breaking a single application as opposed to as many
       applications as the number of preempted streams.
   If the LRM preempts a stream, it must notify the local ST agent. The
   following actions are performed by the ST agent:
o   The ST agent at the host where the stream was preempted sends
    DISCONNECT messages with the appropriate ReasonCode
    (StreamPreempted) toward the affected targets. It sends a REFUSE
    message with the appropriate ReasonCode (StreamPreempted) to the
    previous-hop.
o   A previous-hop ST agent of the preempted stream acts as in case of
    failure recovery, see Section 6.2.
o   A next-hop ST agent of the preempted stream acts as in case of
    failure recovery, see Section 6.2.
   Note that, as opposite to failure recovery, there is no need to
   verify that the failure actually occurred, because this is explicitly
   indicated by the ReasonCode (StreamPreempted).
7.  A Group of Streams
   There may be need to associate related streams. The group mechanism
   is simply an association technique that allows ST agents to identify
   the different streams that are to be associated.
   A group consists of a set of streams and a relationship. The set of
   streams may be empty. The relationship applies to all group members.
   Each group is identified by a group name. The group name must be
   globally unique.
   Streams belong to the same group if they have the same GroupName in
   the GroupName field of the Group parameter, see Section 10.3.2. The
   relationship is defined by the Relationship field. Group membership
   must be specified at stream creation time and persists for the whole
   stream lifetime. A single stream may belong to multiple groups.
   The ST agent that creates a new group is called group initiator. Any
   ST agent can be a group initiator. The initiator allocates the
   GroupName and the Relationship among group members. The initiator may
   or may not be the origin of a stream belonging to the group.
   GroupName generation is described in Section 8.2.
7.1  Basic Group Relationships
   This version of ST defines four basic group relationships. An ST2+
   implementation must support all four basic relationships. Adherence
   to specified relationships are usually best effort. The basic
   relationships are described in detail below in Section 7.1.1 -
   Section 7.1.4.
7.1.1  Bandwidth Sharing
   Streams associated with the same group share the same network
   bandwidth. The intent is to support applications such as audio
   conferences where, of all participants, only some are allowed to
   speak at one time. In such a scenario, global bandwidth utilization
   can be lowered by allocating only those resources that can be used at
   once, e.g., it is sufficient to reserve bandwidth for a small set of
   audio streams.
   The basic concept of a shared bandwidth group is that the LRM will
   allocate up to some specified multiplier of the most demanding stream
   that it knows about in the group. The LRM will allocate resources
   incrementally, as stream setup requests are received, until the total
   group requirements are satisfied. Subsequent setup requests will
   share the group's resources and will not need any additional
   resources allocated. The procedure will result in standard allocation
   where only one stream in a group traverses an agent, and shared
   allocations where multiple streams traverse an agent.
   To illustrate, let's call the multiplier mentioned above "N", and the
   most demanding stream that an agent knows about in a group Bmax. For
   an application that intends to allow three participants to speak at
   the same time, N has a value of three and each LRM will allocate for
   the group an amount of bandwidth up to 3*Bmax even when there are
   many more steams in the group. The LRM will reserve resources
   incrementally, per stream request, until N*Bmax resources are
   allocated. Each agent may be traversed by a different set and number
   of streams all belonging to the same group.
   An ST agent receiving a stream request presents the LRM with all
   necessary group information, see Section 4.5.2.2. If maximum
   bandwidth, N*Bmax, for the group has already been allocated and a new
   stream with a bandwidth demand less than Bmax is being established,
   the LRM won't allocate any further bandwidth.
   If there is less than N*Bmax resources allocated, the LRM will expand
   the resources allocated to the group by the amount requested in the
   new FlowSpec, up to N*Bmax resources. The LRM will update the
   FlowSpec based on what resources are available to the stream, but not
   the total resources allocated for the group.
   It should be noted that ST agents and LRMs become aware of a group's
   requirements only when the streams belonging to the group are
   created.  In case of the bandwidth sharing relationship, an
   application should attempt to establish the most demanding streams
   first to minimize stream setup efforts. If on the contrary the less
   demanding streams are built first, it will be always necessary to
   allocate additional bandwidth in consecutive steps as the most
   demanding streams are built. It is also up to the applications to
   coordinate their different FlowSpecs and decide upon an appropriate
   value for N.
7.1.2  Fate Sharing
   Streams belonging to this group share the same fate. If a stream is
   deleted, the other members of the group are also deleted. This is
   intended to support stream preemption by indicating which streams are
   mutually related. If preemption of multiple streams is necessary,
   this information can be used by the LRM to delete a set of related
   streams, e.g., with impact on a single application, instead of making
   a random choice with the possible effect of interrupting several
   different applications. This attribute does not apply to normal
   stream shut down, i.e., ReasonCode (ApplDisconnect). On normal
   disconnect, other streams belonging to such groups remain active.
   This relationship provides a hint on which streams should be
   preempted. Still, the LRM responsible for the preemption is not
   forced to behave accordingly, and other streams could be preempted
   first based on different criteria.
7.1.3  Route Sharing
   Streams belonging to this group share the same paths as much as is
   possible. This can be desirable for several reasons, e.g., to exploit
   the same allocated resources or in the attempt to maintain the
   transmission order. An ST agent attempts to select the same path
   although the way this is implemented depends heavily on the routing
   algorithm which is used.
   If the routing algorithm is sophisticated enough, an ST agent can
   suggest that a stream is routed over an already established path.
   Otherwise, it can ask the routing algorithm for a set of legal routes
   to the destination and check whether the desired path is included in
   those feasible.
   Route sharing is a hint to the routing algorithm used by ST. Failing
   to route a stream through a shared path should not prevent the
   creation of a new stream or result in the deletion of an existing
   stream.
7.1.4  Subnet Resources Sharing
   This relationship provides a hint to the data link layer functions.
   Streams belonging to this group may share the same MAC layer
   resources. As an example, the same MAC layer multicast address may be
   used for all the streams in a given group. This mechanism allows for
   a better utilization of MAC layer multicast addresses and it is
   especially useful when used with network adapters that offer a very
   small number of MAC layer multicast addresses.
7.2  Relationships Orthogonality
   The four basic relationships, as they have been defined, are
   orthogonal. This means, any combinations of the basic relationships
   are allowed. For instance, let's consider an application that
   requires full-duplex service for a stream with multiple targets.
   Also, let's suppose that only N targets are allowed to send data back
   to the origin at the same time. In this scenario, all the reverse
   streams could belong to the same group. They could be sharing both
   the paths and the bandwidth attributes. The Path&Bandwidth sharing
   relationship is obtained from the basic set of relationships. This
   example is important because it shows how full-duplex service can be
   efficiently obtained in ST.
8.  Ancillary Functions
   Certain functions are required by ST host and intermediate agent
   implementations. Such functions are described in this section.
8.1  Stream ID Generation
   The stream ID, or SID, is composed of 16-bit unique identifier and
   the stream origin's 32-bit IP address. Stream IDs must be globally
   unique.  The specific definition and format of the 16 -bit field is
   left to the implementor. This field is expected to have only local
   significance.
   An ST implementation has to provide a stream ID generator facility,
   so that an application or higher layer protocol can obtain a unique
   IDs from the ST layer. This is a mechanism for the application to
   request the allocation of stream ID that is independent of the
   request to create a stream. The Stream ID is used by the application
   or higher layer protocol when creating the streams.
   For instance, the following two functions could be made available:
   o   AllocateStreamID() -> result, StreamID
   o   ReleaseStreamID(StreamID) -> result
   An implementation may also provide a StreamID deletion function.
8.2  Group Name Generator
   GroupName generation is similar to Stream ID generation. The
   GroupName includes a 16-bit unique identifier, a 32-bit creation
   timestamp, and a 32-bit IP address. Group names are globally unique.
   A GroupName includes the creator's IP address, so this reduces a
   global uniqueness problem to a simple local problem. The specific
   definitions and formats of the 16-bit field and the 32-bit creation
   timestamp are left to the implementor. These fields must be locally
   unique, and only have local significance.
   An ST implementation has to provide a group name generator facility,
   so that an application or higher layer protocol can obtain a unique
   GroupName from the ST layer. This is a mechanism for the application
   to request the allocation of a GroupName that is independent of the
   request to create a stream. The GroupName is used by the application
   or higher layer protocol when creating the streams that are to be
   part of the group.
   For instance, the following two functions could be made available:
   o   AllocateGroupName() -> result, GroupName
   o   ReleaseGroupName(GroupName) -> result
   An implementation may also provide a GroupName deletion function.
8.3  Checksum Computation
   The standard Internet checksum algorithm is used for ST: "The
   checksum field is the 16-bit one's complement of the one's complement
   sum of all 16-bit words in the header. For purposes of computing the
   checksum, the value of the checksum field is zero (0)." See
   [RFC1071], [RFC1141], and [RFC791] for suggestions for efficient
   checksum algorithms.
8.4  Neighbor ST Agent Identification and Information Collection
   The STATUS message can be used to collect information about neighbor
   ST agents, streams the neighbor supports, and specific targets of
   streams the neighbor supports. An agent receiving a STATUS message
   provides the requested information via a STATUS-RESPONSE message.
   The STATUS message can be used to collect different information from
   a neighbor. It can be used to:
o   identify ST capable neighbors. If an ST agent wishes to check if
    a neighbor is ST capable, it should generate a STATUS message with
    an SID which has all its fields set to zero. An agent receiving a
    STATUS message with such SID should answer with a STATUS-RESPONSE
    containing the same SID, and no other stream information. The
    receiving ST agent must answer as soon as possible to aid in Round
    Trip Time estimation, see Section 8.5;
o   obtain information on a particular stream. If an ST agent wishes to
    check a neighbor's general information related to a specific
    stream, it should generate a STATUS message containing the stream's
    SID. An ST agent receiving such a message, will first check to see
    if the stream is known. If not known, the receiving ST agent sends a
    STATUS-RESPONSE containing the same SID, and no other stream
    information. If the stream is known, the receiving ST agent sends a
    STATUS-RESPONSE containing the stream's SID, IPHops, FlowSpec, group
    membership (if any), and as many targets as can be included in a
    single message as limited by MTU, see Section 5.1.2. Note that all
    targets may not be included in a response to a request for general
    stream information. If information on a specific target in a stream
    is desired, the mechanism described next should be used.
o   obtain information on particular targets in a stream. If an ST agent
    wishes to check a neighbor's information related to one or more
    specific targets of a specific stream, it should generate a STATUS
    message containing the stream's SID and a TargetList parameter
    listing the relevant targets. An ST agent receiving such a message,
    will first check to see if the stream and target are known. If the
    stream is not known, the agent follows the process described above.
    If both the stream and targets are known, the agent responds with
    STATUS-RESPONSE containing the stream's SID, IPHops, FlowSpec, group
    membership (if any), and the requested targets that are known. If
    the stream is known but the target is not, the agent responds with a
    STATUS-RESPONSE containing the stream's SID, IPHops, FlowSpec, group
    membership (if any), but no targets.
   The specific formats for STATUS and STATUS-RESPONSE messages are
   defined in Section 10.4.12 and Section 10.4.13.
8.5  Round Trip Time Estimation
   SCMP is made reliable through use of retransmission when an expected
   acknowledgment is not received in a timely manner. Timeout and
   retransmission algorithms are implementation dependent and are
   outside the scope of this document. However, it must be reasonable
   enough not to cause excessive retransmission of SCMP messages while
   maintaining the robustness of the protocol. Algorithms on this
   subject are described in [WoHD95], [Jaco88], [KaPa87].
   Most existing algorithms are based on an estimation of the Round Trip
   Time (RTT) between two hosts. With SCMP, if an ST agent wishes to
   have an estimate of the RTT to and from a neighbor, it should
   generate a STATUS message with an SID which has all its fields set to
   zero. An ST agent receiving a STATUS message with such SID should
   answer as rapidly as possible with a STATUS-RESPONSE message
   containing the same SID, and no other stream information. The time
   interval between the send and receive operations can be used as an
   estimate of the RTT to and from the neighbor.
8.6  Network MTU Discovery
   At connection setup, the application at the origin asks the local ST
   agent to create streams with certain QoS requirements. The local ST
   agent fills out its network MTU value in the MaxMsgSize parameter in
   the CONNECT message and forwards it to the next-hop ST agents. Each
   ST agent in the path checks to see if it's network MTU is smaller
   than the one specified in the CONNECT message and, if it is, the ST
   agent updates the MaxMsgSize in the CONNECT message to it's network
   MTU. If the target application decides to accept the stream, the ST
   agent at the target copies the MTU value in the CONNECT message to
   the MaxMsgSize field in the ACCEPT message and sends it back to the
   application at the origin. The MaxMsgSize field in the ACCEPT message
   is the minimum MTU of the intervening networks to that target. If the
   application has multiple targets then the minimum MTU of the stream
   is the smallest MaxMsgSize received from all the ACCEPT messages. It
   is the responsibility of the application to segment its PDUs
   according to the minimum MaxMsgSize of the stream since no data
   fragmentation is supported during the data transfer phase. If a
   particular target's MaxMsgSize is unacceptable to an application, it
   may disconnect the target from the stream and assume that the target
   cannot be supported.  When evaluating a particular target's
   MaxMsgSize, the application or the application interface will need to
   take into account the size of the ST data header.
8.7  IP Encapsulation of ST
   ST packets may be encapsulated in IP to allow them to pass through
   routers that don't support the ST Protocol. Of course, ST resource
   management is precluded over such a path, and packet overhead is
   increased by encapsulation, but if the performance is reasonably
   predictable this may be better than not communicating at all.
   IP-encapsulated ST packets begin with a normal IP header. Most fields
   of the IP header should be filled in according to the same rules that
   apply to any other IP packet. Three fields of special interest are:
o   Protocol is 5, see [RFC1700], to indicate an ST packet is enclosed,
    as opposed to TCP or UDP, for example.
o   Destination Address is that of the next-hop ST agent. This may or
    may not be the target of the ST stream. There may be an intermediate
    ST agent to which the packet should be routed to take advantage of
    service guarantees on the path past that agent. Such an intermediate
    agent would not be on a directly-connected network (or else IP
    encapsulation wouldn't be needed), so it would probably not be
    listed in the normal routing table. Additional routing mechanisms,
    not defined here, will be required to learn about such agents.
o   Type-of-Service may be set to an appropriate value for the service
    being requested, see [RFC1700]. This feature is not implemented
    uniformly in the Internet, so its use can't be precisely defined
    here.
   IP encapsulation adds little difficulty for the ST agent that
   receives the packet. However, when IP encapsulation is performed it
   must be done in both directions. To process the encapsulated IP
   message, the ST agents simply remove the IP header and proceed with
   ST header as usual.
   The more difficult part is during setup, when the ST agent must
   decide whether or not to encapsulate. If the next-hop ST agent is on
   a remote network and the route to that network is through a router
   that supports IP but not ST, then encapsulation is required. The
   routing function provides ST agents with the route and capability
   information needed to support encapsulation.
   On forwarding, the (mostly constant) IP Header must be inserted and
   the IP checksum appropriately updated.
   Applications are informed about the number of IP hops traversed on
   the path to each target. The IPHops field of the CONNECT message, see
   Section 10.4.4, carries the number of traversed IP hops to the target
   application. The field is incremented by each ST agent when IP
   encapsulation will be used to reach the next-hop ST agent. The number
   of IP hops traversed is returned to the origin in the IPHops field of
   the ACCEPT message, Section 10.4.1.
   When using IP Encapsulation, the MaxMsgSize field will not reflect
   the MTU of the IP encapsulated segments. This means that IP
   fragmentation and reassembly may be needed in the IP cloud to support
   a message of MaxMsgSize. IP fragmentation can only occur when the MTU
   of the IP cloud, less IP header length, is the smallest MTU in a
   stream's network path.
8.8  IP Multicasting
   If an ST agent must use IP encapsulation to reach multiple next-hops
   toward different targets, then either the packet must be replicated
   for transmission to each next-hop, or IP multicasting may be used if
   it is implemented in the next-hop ST agents and in the intervening IP
   routers.
   When the stream is established, the collection of next-hop ST agents
   must be set up as an IP multicast group. The ST agent must allocate
   an appropriate IP multicast address (see Section 10.3.3) and fill
   that address in the IPMulticastAddress field of the CONNECT message.
   The IP multicast address in the CONNECT message is used to inform the
   next-hop ST agents that they should join the multicast group to
   receive subsequent PDUs. Obviously, the CONNECT message itself must
   be sent using unicast. The next-hop ST agents must be able to receive
   on the specified multicast address in order to accept the connection.
   If the next-hop ST agent can not receive on the specified multicast
   address, it sends a REFUSE message with ReasonCode (BadMcastAddress).
   Upon receiving the REFUSE, the upstream agent can choose to retry
   with a different multicast address. Alternatively, it can choose to
   lose the efficiency of multicast and use unicast delivery.
   The following permanent IP multicast addresses have been assigned to
   ST:
           224.0.0.7 All ST routers (intermediate agents)
           224.0.0.8 All ST hosts (agents)
   In addition, a block of transient IP multicast addresses, 224.1.0.0 -
   224.1.255.255, has been allocated for ST multicast groups. For
   instance, the following two functions could be made available:
   o   AllocateMcastAddr() -> result, McastAddr
   o   ListenMcastAddr(McastAddr) -> result
   o   ReleaseMcastAddr(McastAddr) -> result
9.  The ST2+ Flow Specification
   This section defines the ST2+ flow specification. The flow
   specification contains the user application requirements in terms of
   quality of service. Its contents are LRM dependent and are
   transparent to the ST2 setup protocol. ST2 carries the flow
   specification as part of the FlowSpec parameter, which is described
   in Section 10.3.1. The required ST2+ flow specification is included
   in the protocol only to support interoperability. ST2+ also defines a
   "null" flow specification to be used only to support testing.
   ST2 is not dependent on a particular flow specification format and it
   is expected that other versions of the flow specification will be
   needed in the future. Different flow specification formats are
   distinguished by the value of the Version field of the FlowSpec
   parameter, see Section 10.3.1. A single stream is always associated
   with a single flow specification format, i.e., the Version field is
   consistent throughout the whole stream. The following Version field
   values are defined:
   0 - Null FlowSpec       /* must be supported */
   1 - ST Version 1
   2 - ST Version 1.5
   3 - RFC 1190 FlowSpec
   4 - HeiTS FlowSpec
   5 - BerKom FlowSpec
   6 - RFC 1363 FlowSpec
   7 - ST2+ FlowSpec       /* must be supported */
   FlowSpecs version #0 and #7 must be supported by ST2+
   implementations.  Version numbers in the range 1-6 indicate flow
   specifications are currently used in existing ST2 implementations.
   Values in the 128-255 range are reserved for private and experimental
   use.
   In general, a flow specification may support sophisticated flow
   descriptions. For example, a flow specification could represent sub-
   flows of a particular stream. This could then be used to by a
   cooperating application and LRM to forward designated packets to
   specific targets based on the different sub-flows. The reserved bits
   in the ST2 Data PDU, see Section 10.1, may be used with such a flow
   specification to designate packets associated with different sub-
   flows. The ST2+ FlowSpec is not so sophisticated, and is intended for
   use with applications that generate traffic at a single rate for
   uniform delivery to all targets.
9.1  FlowSpec Version #0 - (Null FlowSpec)
   The flow specification identified by a #0 value of the Version field
   is called the Null FlowSpec. This flow specification causes no
   resources to be allocated. It is ignored by the LRMs. Its contents
   are never updated. Stream setup takes place in the usual way leading
   to successful stream establishment, but no resources are actually
   reserved.
   The purpose of the Null FlowSpec is that of facilitating
   interoperability tests by allowing streams to be built without
   actually allocating the correspondent amount of resources. The Null
   FlowSpec may also be used for testing and debugging purposes.
   The Null FlowSpec comprises the 4-byte FlowSpec parameter only, see
   Section 10.3.1. The third byte (Version field) must be set to 0.
9.2  FlowSpec Version #7 - ST2+ FlowSpec
   The flow specification identified by a #7 value of the Version field
   is the ST2+ FlowSpec, to be used by all ST2+ implementations. It
   allows the user applications to express their real-time requirements
   in the form of a QoS class, precedence, and three basic QoS
   parameters:
   o   message size,
   o   message rate,
   o   end-to-end delay.
   The QoS class indicates what kind of QoS guarantees are expected by
   the application, e.g., strict guarantees or predictive, see Section
   9.2.1. QoS parameters are expressed via a set of values:
o   the "desired" values indicate the QoS desired by the application.
    These values are assigned by the application and never modified by
    the LRM.
o   the "limit" values indicate the lowest QoS the application is
    willing to accept. These values are also assigned by the application
    and never modified by the LRM.
o   the "actual" values indicate the QoS that the system is able to
    provide. They are updated by the LRM at each node. The "actual"
    values are always bounded by the "limit" and "desired" values.
9.2.1  QoS Classes
   Two QoS classes are defined:
   1 - QOS_PREDICTIVE      /* QoSClass field value = 0x01, must be
                              supported*/
   2 - QOS_GUARANTEED      /* QoSClass field value = 0x10, optional */
o   The QOS_PREDICTIVE class implies that the negotiated QoS may be
    violated for short time intervals during the data transfer. An
    application has to provide values that take into account the
    "normal" case, e.g., the "desired" message rate is the allocated rate
    for the transmission. Reservations are done for the "normal" case as
    opposite to the peak case required by the QOS_GUARANTEED service
    class. This QoS class must be supported by all implementations.
o   The QOS_GUARANTEED class implies that the negotiated QoS for the
    stream is never violated during the data transfer. An application
    has to provide values that take into account the worst possible
    case, e.g., the "desired" message rate is the peak rate for the
    transmission. As a result, sufficient resources to handle the peak
    rate are reserved. This strategy may lead to overbooking of
    resources, but it provides strict real-time guarantees. Support of
    this QoS class is optional.
   If a LRM that doesn't support class QOS_GUARANTEED receives a
   FlowSpec containing QOS_GUARANTEED class, it informs the local ST
   agent. The ST agent may try different paths or delete the
   correspondent portion of the stream as described in Section 5.5.3,
   i.e., ReasonCode (FlowSpecError).
9.2.2  Precedence
   Precedence is the importance of the connection being established.
   Zero represents the lowest precedence. The lowest level is expected
   to be used by default. In general, the distinction between precedence
   and priority is that precedence specifies streams that are permitted
   to take previously committed resources from another stream, while
   priority identifies those PDUs that a stream is most willing to have
   dropped.
9.2.3  Maximum Data Size
   This parameter is expressed in bytes. It represents the maximum
   amount of data, excluding ST and other headers, allowed to be sent in
   a messages as part of the stream. The LRM first checks whether it is
   possible to get the value desired by the application (DesMaxSize). If
   not, it updates the actual value (ActMaxSize) with the available size
   unless this value is inferior to the minimum allowed by the
   application (LimitMaxSize), in which case it informs the local ST
   agent that it is not possible to build the stream along this path.
9.2.4  Message Rate
   This parameter is expressed in messages/second. It represents the
   transmission rate for the stream. The LRM first checks whether it is
   possible to get the value desired by the application (DesRate). If
   not, it updates the actual value (ActRate) with the available rate
   unless this value is inferior to the minimum allowed by the
   application (LimitRate), in which case it informs the local ST agent
   that it is not possible to build the stream along this path.
9.2.5  Delay and Delay Jitter
   The delay parameter is expressed in milliseconds. It represents the
   maximum end-to-end delay for the stream. The LRM first checks whether
   it is possible to get the value desired by the application
   (DesMaxDelay). If not, it updates the actual value (ActMaxDelay) with
   the available delay unless this value is greater than the maximum
   delay allowed by the application (LimitMaxDelay), in which case it
   informs the local ST agent that it is not possible to build the
   stream along this path.
   The LRM also updates at each node the MinDelay field by incrementing
   it by the minimum possible delay to the next-hop. Information on the
   minimum possible delay allows to calculate the maximum end-to-end
   delay range, i.e., the time interval in which a data packet can be
   received. This interval should not exceed the DesMaxDelayRange value
   indicated by the application. The maximum end-to-end delay range is
   an upper bound of the delay jitter.
9.2.6  ST2+ FlowSpec Format
   The ST2+ FlowSpec has the following format:
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |    QosClass   |  Precedence   |            0(unused)          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                             DesRate                           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                            LimitRate                          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                             ActRate                           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            DesMaxSize         |           LimitMaxSize        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            ActMaxSize         |           DesMaxDelay         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            LimitMaxDelay      |           ActMaxDelay         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            DesMaxDelayRange   |           ActMinDelay         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                        Figure 9: The ST2+ FlowSpec.
   The LRM modifies only "actual" fields, i.e., those beginning with
   "Act". The user application assigns values to all other fields.
o   QoSClass indicates which of the two defined classes of service
    applies. The two classes are: QOS_PREDICTIVE (QoSClass = 1) and
    QOS_GUARANTEED (QoSClass = 2).
o   Precedence indicates the stream's precedence. Zero represents the
    lowest precedence, and should be the default value.
o   DesRate is the desired transmission rate for the stream in messages/
    second. This field is set by the origin and is not modified by
    intermediate agents.
o   LimitRate is the minimum acceptable transmission rate in messages/
    second. This field is set by the origin and is not modified by
    intermediate agents.
o   ActRate is the actual transmission rate allocated for the stream in
    messages/second. Each agent updates this field with the available
    rate unless this value is less than LimitRate, in which case a
    REFUSE is generated.
o   DesMaxSize is the desired maximum data size in bytes that will be
    sent in a message in the stream. This field is set by the origin.
o   LimitMaxSize is the minimum acceptable data size in bytes. This
    field is set by the origin
o   ActMaxSize is the actual maximum data size that may be sent in a
    message in the stream. This field is updated by each agent based on
    MTU and available resources. If available maximum size is less than
    LimitMaxSize, the connection must be refused with ReasonCode
    (CantGetResrc).
o   DesMaxDelay is the desired maximum end-to-end delay for the stream
    in milliseconds. This field is set by the origin.
o   LimitMaxDelay is the upper-bound of acceptable end-to-end delay for
    the stream in milliseconds. This field is set by the origin.
o   ActMaxDelay is the maximum end-to-end delay that will be seen by
    data in the stream. Each ST agent adds to this field the maximum
    delay that will be introduced by the agent, including transmission
    time to the next-hop ST agent. If the actual maximum exceeds
    LimitMaxDelay, then the connection is refused with ReasonCode
    (CantGetResrc).
o   DesMaxDelayRange is the desired maximum delay range that may be
    encountered end-to-end by stream data in milliseconds. This value is
    set by the application at the origin.
o   ActMinDelay is the actual minimum end-to-end delay that will be
    encountered by stream data in milliseconds. Each ST agent adds to
    this field the minimum delay that will be introduced by the agent,
    including transmission time to the next-hop ST agent. Each agent
    must add at least 1 millisecond. The delay range for the stream can
    be calculated from the actual maximum and minimum delay fields. It
    is expected that the range will be important to some applications.
10.  ST2 Protocol Data Units Specification
10.1  Data PDU
   IP and ST packets can be distinguished by the IP Version Number
   field, i.e., the first four (4) bits of the packet; ST has been
   assigned the value 5 (see [RFC1700]). There is no requirement for
   compatibility between IP and ST packet headers beyond the first four
   bits. (IP uses value 4.)
   The ST PDUs sent between ST agents consist of an ST Header
   encapsulating either a higher layer PDU or an ST Control Message.
   Data packets are distinguished from control messages via the D-bit
   (bit 8) in the ST header.
   The ST Header also includes an ST Version Number, a total length
   field, a header checksum, a unique id, and the stream origin 32-bit
   IP address. The unique id and the stream origin 32-bit IP address
   form the stream id (SID). This is shown in Figure 10. Please refer to
   Section 10.6 for an explanation of the notation.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  ST=5 | Ver=3 |D| Pri |   0   |            TotalBytes         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |          HeaderChecksum       |            UniqueID           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         OriginIPAddress                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                            Figure 10: ST Header
o   ST is the IP Version Number assigned to identify ST packets. The
    value for ST is 5.
o   Ver is the ST Version Number. The value for the current ST2+ version
    is 3.
o   D (bit 8) is set to 1 in all ST data packets and to 0 in all SCMP
    control messages.
o   Pri (bits 9-11) is the packet-drop priority field with zero (0)
    being lowest priority and seven the highest. The field is to be used
    as described in Section 3.2.2.
o   TotalBytes is the length, in bytes, of the entire ST packet, it
    includes the ST Header but does not include any local network
    headers or trailers. In general, all length fields in the ST
    Protocol are in units of bytes.
o   HeaderChecksum covers only the ST Header (12 bytes). The ST Protocol
    uses 16-bit checksums here in the ST Header and in each Control
    Message. For checksum computation, see Section 8.3.
o   UniqueID is the first element of the stream ID (SID). It is locally
    unique at the stream origin, see Section 8.1.
o   OriginIPAddress is the second element of the SID. It is the 32-bit
    IP address of the stream origin, see Section 8.1.
   Bits 12-15 must be set to zero (0) when using the flow specifications
   defined in this document, see Section 9. They may be set accordingly
   when other flow specifications are used, e.g., as described in
   [WoHD95].
10.1.1  ST Data Packets
   ST packets whose D-bit is non-zero are data packets. Their
   interpretation is a matter for the higher layer protocols and
   consequently is not specified here. The data packets are not
   protected by an ST checksum and will be delivered to the higher layer
   protocol even with errors. ST agents will not pass data packets over
   a new hop whose setup is not complete.
10.2  Control PDUs
   SCMP control messages are exchanged between neighbor ST agents using
   a D-bit of zero (0). The control protocol follows a request-response
   model with all requests expecting responses. Retransmission after
   timeout (see Section 4.3) is used to allow for lost or ignored
   messages. Control messages do not extend across packet boundaries; if
   a control message is too large for the MTU of a hop, its information
   is partitioned and a control message per partition is sent (see
   Section 5.1.2). All control messages have the following format
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  OpCode       |     Options   |           TotalBytes          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |          Reference            |          LnkReference         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         SenderIPAddress                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            Checksum           |            ReasonCode         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                      OpCodeSpecificData                       :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                    Figure 11: ST Control Message Format
o   OpCode identifies the type of control message.
o   Options is used to convey OpCode-specific variations for a control
    message.
o   TotalBytes is the length of the control message, in bytes, including
    all OpCode specific fields and optional parameters. The value is
    always divisible by four (4).
o   Reference is a transaction number. Each sender of a request control
    message assigns a Reference number to the message that is unique
    with respect to the stream. The Reference number is used by the
    receiver to detect and discard duplicates. Each acknowledgment
    carries the Reference number of the request being acknowledged.
    Reference zero (0) is never used, and Reference numbers are assumed
    to be monotonically increasing with wraparound so that the older-
    than and more-recent-than relations are well defined.
o   LnkReference contains the Reference field of the request control
    message that caused this request control message to be created. It
    is used in situations where a single request leads to multiple
    responses from the same ST agent. Examples are CONNECT and CHANGE
    messages that are first acknowledged hop-by-hop and then lead to an
    ACCEPT or REFUSE response from each target.
o   SenderIPAddress is the 32-bit IP address of the network interface
    that the ST agent used to send the control message. This value
    changes each time the packet is forwarded by an ST agent (hop-by-
    hop).
o   Checksum is the checksum of the control message. Because the control
    messages are sent in packets that may be delivered with bits in
    error, each control message must be checked to be error free before
    it is acted upon.
o   ReasonCode is set to zero (0 = NoError) in most SCMP messages.
    Otherwise, it can be set to an appropriate value to indicate an
    error situation as defined in Section 10.5.3.
o   OpCodeSpecificData contains any additional information that is
    associated with the control message. It depends on the specific
    control message and is explained further below. In some response
    control messages, fields of zero (0) are included to allow the
    format to match that of the corresponding request message. The
    OpCodeSpecificData may also contain optional parameters. The
    specifics of OpCodeSpecificData are defined in Section 10.3.
10.3  Common SCMP Elements
   Several fields and parameters (referred to generically as elements)
   are common to two or more PDUs. They are described in detail here
   instead of repeating their description several times. In many cases,
   the presence of a parameter is optional. To permit the parameters to
   be easily defined and parsed, each is identified with a PCode byte
   that is followed by a PBytes byte indicating the length of the
   parameter in bytes (including the PCode, PByte, and any padding
   bytes). If the length of the information is not a multiple of four
   (4) bytes, the parameter is padded with one to three zero (0) bytes.
   PBytes is thus always a multiple of four (4). Parameters can be
   present in any order.
10.3.1  FlowSpec
   The FlowSpec parameter (PCode = 1) is used in several SCMP messages
   to convey the ST2 flow specification. The FlowSpec parameter has the
   following format:
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |   PCode = 1   |    PBytes     |   Version     |       0       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                        FlowSpec detail                        :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                       Figure 12: FlowSpec Parameter
o   the Version field contains the FlowSpec version.
o   the FlowSpec detail field contains the flow specification and is
    transparent to the ST agent. It is the data structure to be passed
    to the LRM. It must be 4-byte aligned.
   The Null FlowSpec, see Section 9.1, has no FlowSpec detail field.
   PBytes is set to four (4), and Version is set to zero (0). The ST2+
   FlowSpec, see Section 9.2, is a 32-byte data structure. PBytes is set
   to 36, and Version is set to seven (7).
10.3.2  Group
   The Group parameter (PCode = 2) is an optional argument used to
   indicate that the stream is a member in the specified group.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  PCode = 2    |   PBytes = 16 |           GroupUniqueID       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                        GroupCreationTime                      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                     GroupInitiatorIPAddress                   |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            Relationship       |                 N             |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                         Figure 13: Group Parameter
o   GroupUniqueID, GroupInitiatorIPAddress, and GroupCreationTime
    together form the GroupName field. They are allocated by the group
    name generator function, see Section 8.2. GroupUniqueID and
    GroupCreationTime are implementation specific and have only local
    definitions.
o   Relationship has the following format:
                                            0
                        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
                       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                       |    0 (unused)         |S|P|F|B|
                       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                       Figure 14: Relationship Field
   The B, F, P, S bits correspond to Bandwidth, Fate, Path, and Subnet
   resources sharing, see Section 7. A value of 1 indicates that the
   relationship exists for this group. All combinations of the four bits
   are allowed. Bits 0-11 of the Relationship field are reserved for
   future use and must be set to 0.
o   N contains a legal value only if the B-bit is set. It is the value
    of the N parameter to be used as explained in Section 7.1.1.
10.3.3  MulticastAddress
   The MulticastAddress parameter (PCode = 3) is an optional parameter
   that is used when using IP encapsulation and setting up an IP
   multicast group. This parameter is used to communicate the desired IP
   multicast address to next-hop ST agents that should become members of
   the group, see Section 8.8.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  PCode = 3    |   PBytes = 8  |                0              |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                        IPMulticastAddress                     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                        Figure 15:  MulticastAddress
o   IPMulticastAddress is the 32-bit IP multicast address to be used to
    receive data packets for the stream.
10.3.4  Origin
   The Origin parameter (PCode = 4) is used to identify the next higher
   protocol, and the SAP being used in conjunction with that protocol.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  PCode = 5    |   PBytes      | NextPcol      |OriginSAPBytes |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                OriginSAP                      :     Padding   |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                             Figure 16: Origin
o   NextPcol is an 8-bit field used in demultiplexing operations to
    identify the protocol to be used above ST. The values of NextPcol
    are in the same number space as the IP header's Protocol field and
    are consequently defined in the Assigned Numbers RFC [RFC1700].
o   OriginSAPBytes specifies the length of the OriginSAP, exclusive of
    any padding required to maintain 32-bit alignment.
o   OriginSAP identifies the origin's SAP associated with the NextPcol
    protocol.
   Note that the 32-bit IP address of the stream origin is not included
   in this parameter because it is always available as part of the ST
   header.
10.3.5  RecordRoute
   The RecordRoute parameter (PCode = 5) is used to request that the
   route between the origin and a target be recorded and delivered to
   the user application. The ST agent at the origin (or target)
   including this parameter, has to determine the parameter's length,
   indicated by the PBytes field. ST agents processing messages
   containing this parameter add their receiving IP address in the
   position indicated by the FreeOffset field, space permitting. If no
   space is available, the parameter is passed unchanged. When included
   by the origin, all agents between the origin and the target add their
   IP addresses and this information is made available to the
   application at the target. When included by the target, all agents
   between the target and the origin, inclusive, add their IP addresses
   and this information is made available to the application at the
   origin.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |   PCode = 5   |     PBytes    |       0       |  FreeOffset   |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                          IP Address 1                         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                              ...                              :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                          IP Address N                         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                           Figure 17: RecordRoute
o   PBytes is the length of the parameter in bytes. Length is determined
    by the agent (target or origin) that first introduces the parameter.
    Once set, the length of the parameter remains unchanged.
o   FreeOffset indicates the offset, relative to the start of the
    parameter, for the next IP address to be recorded. When the
    FreeOffset is greater than, or equal to, PBytes the RecordRoute
    parameter is full.
o   IP Address is filled in, space permitting, by each ST agent
    processing this parameter.
10.3.6  Target and TargetList
   Several control messages use a parameter called TargetList (PCode =
   6), which contains information about the targets to which the message
   pertains. For each Target in the TargetList, the information includes
   the 32-bit IP address of the target, the SAP applicable to the next
   higher layer protocol, and the length of the SAP (SAPBytes).
   Consequently, a Target structure can be of variable length. Each
   entry has the format shown in Figure 18.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                        Target IP Address                      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  TargetBytes  |  SAPBytes     |     SAP       :    Padding    |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                             Figure 18: Target
o   TargetIPAddress is the 32-bit IP Address of the Target.
o   TargetBytes is the length of the Target structure, beginning with
    the TargetIPAddress.
o   SAPBytes is the length of the SAP, excluding any padding required to
    maintain 32-bit alignment.
o   SAP may be longer than 2 bytes and it includes a padding when
    required. There would be no padding required for SAPs with lengths
    of 2, 6, 10, etc., bytes.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  PCode = 6    |   PBytes      |           TargetCount = N     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                           Target 1                            |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                               :                               :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                           Target N                            |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                           Figure 19: TargetList
10.3.7  UserData
   The UserData parameter (PCode = 7) is an optional parameter that may
   be used by the next higher protocol or an application to convey
   arbitrary information to its peers. This parameter is propagated in
   some control messages and its contents have no significance to ST
   agents. Note that since the size of control messages is limited by
   the smallest MTU in the path to the targets, the maximum size of this
   parameter cannot be specified a priori. If the size of this parameter
   causes a message to exceed the network MTU, an ST agent behaves as
   described in Section 5.1.2. The parameter must be padded to a
   multiple of 32 bits.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  PCode = 7    |   PBytes      |           UserBytes           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                      UserInfo                 :   Padding     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                            Figure 20:  UserData
o   UserBytes specifies the number of valid UserInfo bytes.
o   UserInfo is arbitrary data meaningful to the next higher protocol
    layer or application.
10.3.8  Handling of Undefined Parameters
   An ST agent must be able to handle all parameters listed above. To
   support possible future uses, parameters with unknown PCodes must
   also be supported. If an agent receives a message containing a
   parameter with an unknown Pcode value, the agent should handle the
   parameter as if it was a UserData parameter. That is, the contents of
   the parameter should be ignored, and the message should be
   propagated, as appropriate, along with the related control message.
10.4  ST Control Message PDUs
   ST Control messages are described in the following section. Please
   refer to Section 10.6 for an explanation of the notation.
10.4.1  ACCEPT
   ACCEPT (OpCode = 1) is issued by a target as a positive response to a
   CONNECT message. It implies that the target is prepared to accept
   data from the origin along the stream that was established by the
   CONNECT.  ACCEPT is also issued as a positive response to a CHANGE
   message. It implies that the target accepts the proposed stream
   modification.
   ACCEPT is relayed by the ST agents from the target to the origin
   along the path established by CONNECT (or CHANGE) but in the reverse
   direction. ACCEPT must be acknowledged with ACK at each hop.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  OpCode = 1   |      0        |           TotalBytes          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |      Reference                |         LnkReference          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         SenderIPAddress                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            Checksum           |          ReasonCode = 0       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |          MaxMsgSize           |          RecoveryTimeout      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                      StreamCreationTime                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |   IPHops      |                        0                      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                           FlowSpec                            :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                           TargetList                          :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                           RecordRoute                         :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                           UserData                            :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     Figure 21: ACCEPT Control Message
o   Reference contains a number assigned by the ST agent sending ACCEPT
    for use in the acknowledging ACK.
o   LnkReference is the Reference number from the corresponding CONNECT
    (or CHANGE)
o   MaxMsgSize indicates the smallest MTU along the path traversed by
    the stream. This field is only set when responding to a CONNECT
    request.
o   RecoveryTimeout reflects the nominal number of milliseconds that the
    application is willing to wait for a failed system component to be
    detected and any corrective action to be taken. This field
    represents what can actually be supported by each participating
    agent, and is only set when responding to a CONNECT request.
o   StreamCreationTime is the 32- bits system dependent timestamp copied
    from the corresponding CONNECT request.
o   IPHops is the number of IP encapsulated hops traversed by the
    stream. This field is set to zero by the origin, and is incremented
    at each IP encapsulating agent.
10.4.2  ACK
   ACK (OpCode = 2) is used to acknowledge a request. The ACK message is
   not propagated beyond the previous-hop or next-hop ST agent.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  OpCode = 2   |     0         |           TotalBytes          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |       Reference               |           LnkReference = 0    |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         SenderIPAddress                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |       Checksum                |           ReasonCode          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                       Figure 22: ACK Control Message
o   Reference is the Reference number of the control message being
    acknowledged.
o   ReasonCode is usually NoError, but other possibilities exist, e.g.,
    DuplicateIgn.
10.4.3  CHANGE
   CHANGE (OpCode = 3) is used to change the FlowSpec of an established
   stream. The CHANGE message is processed similarly to CONNECT, except
   that it travels along the path of an established stream. CHANGE must
   be propagated until it reaches the related stream's targets. CHANGE
   must be acknowledged with ACK at each hop.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  OpCode = 3   |G|I|     0     |           TotalBytes          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |           Reference           |          LnkReference = 0     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                        SenderIPAddress                        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            Checksum           |          ReasonCode = 0       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                            FlowSpec                           :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                           TargetList                          :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                           RecordRoute                         :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                            UserData                           :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     Figure 23: CHANGE Control Message
o   G (bit 8) is used to request a global, stream-wide change; the
    TargetList parameter should be omitted when the G bit is specified.
o   I (bit 7) is used to indicate that the LRM is permitted to interrupt
    and, if needed, break the stream in the process of trying to satisfy
    the requested change.
o   Reference contains a number assigned by the ST agent sending CHANGE
    for use in the acknowledging ACK.
10.4.4  CONNECT
   CONNECT (OpCode = 4) requests the setup of a new stream or an
   addition to or recovery of an existing stream. Only the origin can
   issue the initial set of CONNECTs to setup a stream, and the first
   CONNECT to each next-hop is used to convey the SID.
   The next-hop initially responds with an ACK, which implies that the
   CONNECT was valid and is being processed. The next-hop will later
   relay back either an ACCEPT or REFUSE from each target. An
   intermediate ST agent that receives a CONNECT behaves as explained in
   Section 4.5.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  OpCode = 4   |J N|S|    0    |           TotalBytes          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |           Reference           |          LnkReference = 0     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         SenderIPAddress                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |           Checksum            |          ReasonCode = 0       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |           MaxMsgSize          |          RecoveryTimeout      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                        StreamCreationTime                     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |   IPHops      |                        0                      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                             Origin                            :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                           FlowSpec                            :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                          TargetList                           :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                          RecordRoute                          :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                             Group                             :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                        MulticastAddress                       :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                            UserData                           :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     Figure 24: CONNECT Control Message
o   JN (bits 8 and 9) indicate the join authorization level for the
    stream, see Section 4.4.2.
o   S (bit 10) indicates the NoRecovery option (Section 4.4.1). When the
    S-bit is set (1), the NoRecovery option is specified for the stream.
o   Reference contains a number assigned by the ST agent sending CONNECT
    for use in the acknowledging ACK.
o   MaxMsgSize indicates the smallest MTU along the path traversed by
    the stream. This field is initially set to the network MTU of the
    agent issues the CONNECT.
o   RecoveryTimeout is the nominal number of milliseconds that the
    application is willing to wait for failed system component to be
    detected and any corrective action to be taken.
o   StreamCreationTime is the 32- bits system dependent timestamp
    generated by the ST agent issuing the CONNECT.
o   IPHops is the number of IP encapsulated hops traversed by the
    stream. This field is set to zero by the origin, and is incremented
    at each IP encapsulating agent.
10.4.5  DISCONNECT
   DISCONNECT (OpCode = 5) is used by an origin to tear down an
   established stream or part of a stream, or by an intermediate ST
   agent that detects a failure between itself and its previous-hop, as
   distinguished by the ReasonCode. The DISCONNECT message specifies the
   list of targets that are to be disconnected. An ACK is required in
   response to a DISCONNECT message. The DISCONNECT message is
   propagated all the way to the specified targets. The targets are
   expected to terminate their participation in the stream.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  OpCode = 5   |G|    0        |           TotalBytes          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |      Reference                |     LnkReference = 0          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         SenderIPAddress                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            Checksum           |          ReasonCode           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                      GeneratorIPAddress                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                           TargetList                          :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                            UserData                           :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   Figure 25: DISCONNECT Control Message
o   G (bit 8) is used to request a DISCONNECT of all the stream's
    targets. TargetList should be omitted when the G-bit is set (1). If
    TargetList is present, it is ignored.
o   Reference contains a number assigned by the ST agent sending
    DISCONNECT for use in the acknowledging ACK.
o   ReasonCode reflects the event that initiated the message.
o   GeneratorIPAddress is the 32-bit IP address of the host that first
    generated the DISCONNECT message.
10.4.6  ERROR
   ERROR (OpCode = 6) is sent in acknowledgment to a request in which an
   error is detected. No action is taken on the erroneous request. No
   ACK is expected. The ERROR message is not propagated beyond the
   previous-hop or next-hop ST agent. An ERROR is never sent in response
   to another ERROR. The receiver of an ERROR is encouraged to try again
   without waiting for a retransmission timeout.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  OpCode = 6   |       0       |           TotalBytes          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |      Reference                |     LnkReference = 0          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         SenderIPAddress                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            Checksum           |        ReasonCode             |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                           PDUInError                          :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                      Figure 26: ERROR Control Message
o   Reference is the Reference number of the erroneous request.
o   ReasonCode indicates the error that triggered the message.
o   PDUInError is the PDU in error, beginning with the ST Header. This
    parameter is optional. Its length is limited by network MTU, and may
    be truncated when too long.
10.4.7  HELLO
   HELLO (OpCode = 7) is used as part of the ST failure detection
   mechanism, see Section 6.1.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  OpCode = 7   |R|    0        |           TotalBytes          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |       Reference = 0           |        LnkReference = 0       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         SenderIPAddress                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |         Checksum              |          ReasonCode = 0       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                          HelloTimer                           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                      Figure 27: HELLO Control Message
o   R (bit 8) is used for the Restarted-bit.
o   HelloTimer represents the time in millisecond since the agent was
    restarted, modulo the precision of the field. It is used to detect
    duplicate or delayed HELLO messages.
10.4.8  JOIN
   JOIN (OpCode = 8) is used as part of the ST steam joining mechanism,
   see Section 4.6.3.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  OpCode = 8   |      0        |           TotalBytes          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |      Reference                |         LnkReference = 0      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         SenderIPAddress                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            Checksum           |          ReasonCode = 0       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                      GeneratorIPAddress                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                          TargetList                           :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                      Figure 28: JOIN Control Message
o   Reference contains a number assigned by the ST agent sending JOIN
    for use in the acknowledging ACK.
o   GeneratorIPAddress is the 32-bit IP address of the host that
    generated the JOIN message.
o   TargetList is the information associated with the target to be added
    to the stream.
10.4.9  JOIN-REJECT
   JOIN-REJECT (OpCode = 9) is used as part of the ST steam joining
   mechanism, see Section 4.6.3.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  OpCode = 9   |      0        |           TotalBytes          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |      Reference                |          LnkReference         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         SenderIPAddress                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            Checksum           |          ReasonCode           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                      GeneratorIPAddress                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                   Figure 29: JOIN-REJECT Control Message
o   Reference contains a number assigned by the ST agent sending the
    REFUSE for use in the acknowledging ACK.
o   LnkReference is the Reference number from the corresponding JOIN
    message.
o   ReasonCode reflects the reason why the JOIN request was rejected.
o   GeneratorIPAddress is the 32-bit IP address of the host that first
    generated the JOIN-REJECT message.
10.4.10  NOTIFY
   NOTIFY (OpCode = 10) is issued by an ST agent to inform other ST
   agents of events that may be significant. NOTIFY may be propagated
   beyond the previous-hop or next-hop ST agent depending on the
   ReasonCode, see Section 10.5.3; NOTIFY must be acknowledged with an
   ACK.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  OpCode = 10  |      0        |           TotalBytes          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |      Reference                |         LnkReference = 0      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         SenderIPAddress                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            Checksum           |          ReasonCode           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                      DetectorIPAddress                        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |          MaxMsgSize           |          RecoveryTimeout      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                           FlowSpec                            :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                           TargetList                          :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                           UserData                            :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     Figure 30: NOTIFY Control Message
o   Reference contains a number assigned by the ST agent sending the
    NOTIFY for use in the acknowledging ACK.
o   ReasonCode identifies the reason for the notification.
o   DetectorIPAddress is the 32-bit IP address of the ST agent that
    detects the event.
o   MaxMsgSize is set when the MTU of the listed targets has changed
    (e.g., due to recovery), or when the notification is generated after
    a successful JOIN. Otherwise it is set to zero (0).
o   RecoveryTimeout is set when the notification is generated after a
    successful JOIN. Otherwise it is set to zero (0).
o   FlowSpec is present when the notification is generated after a
    successful JOIN.
o   TargetList is present when the notification is related to one or
    more targets, or when MaxMsgSize is set
o   UserData is present if the notification is generated after a
    successful JOIN and the UserData parameter was set in the ACCEPT
    message.
10.4.11  REFUSE
   REFUSE (OpCode = 11) is issued by a target that either does not wish
   to accept a CONNECT message or wishes to remove itself from an
   established stream. It might also be issued by an intermediate ST
   agent in response to a CONNECT or CHANGE either to terminate a
   routing loop, or when a satisfactory next-hop to a target cannot be
   found. It may also be a separate command when an existing stream has
   been preempted by a higher precedence stream or an ST agent detects
   the failure of a previous-hop, next-hop, or the network between them.
   In all cases, the TargetList specifies the targets that are affected
   by the condition. Each REFUSE must be acknowledged by an ACK.
   The REFUSE is relayed back by the ST agents to the origin (or
   intermediate ST agent that created the CONNECT or CHANGE) along the
   path traced by the CONNECT. The ST agent receiving the REFUSE will
   process it differently depending on the condition that caused it, as
   specified in the ReasonCode field. No special effort is made to
   combine multiple REFUSE messages since it is considered most unlikely
   that separate REFUSEs will happen to both pass through an ST agent at
   the same time and be easily combined, e.g., have identical
   ReasonCodes and parameters.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  OpCode = 11  |G|E|N|    0    |           TotalBytes          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |      Reference                |         LnkReference          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         SenderIPAddress                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            Checksum           |          ReasonCode           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                       DetectorIPAddress                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                       ValidTargetIPAddress                    |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                          TargetList                           :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                         RecordRoute                           :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                            UserData                           :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     Figure 31: REFUSE Control Message
o   G (bit 8) is used to indicate that all targets down stream from the
    sender are refusing. It is expected that this will be set most
    commonly due to network failures. The TargetList parameter is
    ignored or not present when this bit is set, and must be included
    when not set.
o   E (bit 9) is set by an ST agent to indicate that the request failed
    and that the pre-change stream attributes, including resources, and
    the stream itself still exist.
o   N (bit 10) is used to indicate that no further attempts to recover
    the stream should be made. This bit must be set when stream recovery
    should not be attempted, even in the case where the target
    application has shut down normally (ApplDisconnect).
o   Reference contains a number assigned by the ST agent sending the
    REFUSE for use in the acknowledging ACK.
o   LnkReference is either the Reference number from the corresponding
    CONNECT or CHANGE, if it is the result of such a message, or zero
    when the REFUSE was originated as a separate command.
o   DetectorIPAddress is the 32-bit IP address of the host that first
    generated the REFUSE message.
o   ValidTargetIPAddress is the 32-bit IP address of a host that is
    properly connected as part of the stream. This parameter is only
    used when recovering from stream convergence, otherwise it is set to
    zero (0).
10.4.12  STATUS
   STATUS (OpCode = 12) is used to inquire about the existence of a
   particular stream identified by the SID. Use of STATUS is intended
   for collecting information from an neighbor ST agent, including
   general and specific stream information, and round trip time
   estimation. The use of this message type is described in Section 8.4.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | OpCode = 12   |       0       |           TotalBytes          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |      Reference                |       LnkReference = 0        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         SenderIPAddress                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            Checksum           |          ReasonCode = 0       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                          TargetList                           :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     Figure 32: STATUS Control Message
o   Reference contains a number assigned by the ST agent sending STATUS
    for use in the replying STATUS-RESPONSE.
o   TargetList is an optional parameter that when present indicates that
    only information related to the specific targets should be relayed
    in the STATUS-RESPONSE.
10.4.13  STATUS-RESPONSE
   STATUS-RESPONSE (OpCode = 13) is the reply to a STATUS message. If
   the stream specified in the STATUS message is not known, the STATUS-
   RESPONSE will contain the specified SID but no other parameters. It
   will otherwise contain the current SID, FlowSpec, TargetList, and
   possibly Groups of the stream. It the full target list can not fit in
   a single message, only those targets that can be included in one
   message will be included. As mentioned in Section 10.4.12, it is
   possible to request information on a specific target.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |  OpCode = 13  |    0          |           TotalBytes          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |      Reference                |       LnkReference = 0        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         SenderIPAddress                       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |            Checksum           |       ReasonCode = 0          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                           FlowSpec                            :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                           Groups                              :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       :                          TargetList                           :
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                 Figure 33: STATUS-RESPONSE Control Message
o   Reference contains a number assigned by the ST agent sending the
    STATUS.
10.5  Suggested Protocol Constants
   The ST Protocol uses several fields that must have specific values
   for the protocol to work, and also several values that an
   implementation must select. This section specifies the required
   values and suggests initial values for others. It is recommended that
   the latter be implemented as variables so that they may be easily
   changed when experience indicates better values. Eventually, they
   should be managed via the normal network management facilities.
   ST uses IP Version Number 5.
   When encapsulated in IP, ST uses IP Protocol Number 5.
10.5.1  SCMP Messages
   1)      ACCEPT
   2)      ACK
   3)      CHANGE
   4)      CONNECT
   5)      DISCONNECT
   6)      ERROR
   7)      HELLO
   8)      JOIN
   9)      JOIN-REJECT
   10)     NOTIFY
   11)     REFUSE
   12)     STATUS
   13)     STATUS-RESPONSE
10.5.2  SCMP Parameters
   1)      FlowSpec
   2)      Group
   3)      MulticastAddress
   4)      Origin
   5)      RecordRoute
   6)      TargetList
   7)      UserData
10.5.3  ReasonCode
   Several errors may occur during protocol processing. All ST error
   codes are taken from a single number space. The currently defined
   values and their meaning is presented in the list below. Note that
   new error codes may be defined from time to time. All implementations
   are expected to handle new codes in a graceful manner. If an unknown
   ReasonCode is encountered, it should be assumed to be fatal. The
   ReasonCode is an 8-bit field. Following values are defined:
1       NoError         No error has occurred.
2       ErrorUnknown    An error not contained in this list has been
                        detected.
3       AccessDenied    Access denied.
4       AckUnexpected   An unexpected ACK was received.
5       ApplAbort       The application aborted the stream abnormally.
6       ApplDisconnect  The application closed the stream normally.
7       ApplRefused     Applications refused requested connection or
                        change.
8       AuthentFailed   The authentication function failed.
9       BadMcastAddress IP Multicast address is unacceptable in CONNECT
10      CantGetResrc    Unable to acquire (additional) resources.
11      CantRelResrc    Unable to release excess resources.
12      CantRecover     Unable to recover failed stream.
13      CksumBadCtl     Control PDU has a bad message checksum.
14      CksumBadST      PDU has a bad ST Header checksum.
15      DuplicateIgn    Control PDU is a duplicate and is being
                        acknowledged.
16      DuplicateTarget Control PDU contains a duplicate target, or an
                        attempt to add an existing target.
17      FlowSpecMismatch        FlowSpec in request does not match
                                existing FlowSpec.
18      FlowSpecError   An error occurred while processing the FlowSpec
19      FlowVerUnknown  Control PDU has a FlowSpec Version Number that
                        is not supported.
20      GroupUnknown    Control PDU contains an unknown Group Name.
21      InconsistGroup  An inconsistency has been detected with the
                        streams forming a group.
22      IntfcFailure    A network interface failure has been detected.
23      InvalidSender   Control PDU has an invalid SenderIPAddress
                        field.
24      InvalidTotByt   Control PDU has an invalid TotalBytes field.
25      JoinAuthFailure Join failed due to stream authorization level.
26      LnkRefUnknown   Control PDU contains an unknown LnkReference.
27      NetworkFailure  A network failure has been detected.
28      NoRouteToAgent  Cannot find a route to an ST agent.
29      NoRouteToHost   Cannot find a route to a host.
30      NoRouteToNet    Cannot find a route to a network.
31      OpCodeUnknown   Control PDU has an invalid OpCode field.
32      PCodeUnknown    Control PDU has a parameter with an invalid
                        PCode.
33      ParmValueBad    Control PDU contains an invalid parameter value.
34      PathConvergence Two branches of the stream join during the
                        CONNECT setup.
35      ProtocolUnknown Control PDU contains an unknown next-higher
                        layer protocol identifier.
36      RecordRouteSize RecordRoute parameter is too long to permit
                        message to fit a network's MTU.
37      RefUnknown      Control PDU contains an unknown Reference.
38      ResponseTimeout Control message has been acknowledged but not
                        answered by an appropriate control message.
39      RestartLocal    The local ST agent has recently restarted.
40      RestartRemote   The remote ST agent has recently restarted.
41      RetransTimeout  An acknowledgment has not been received after
                        several retransmissions.
42      RouteBack       Route to next-hop through same interface as
                        previous-hop and is not previous-hop.
43      RouteInconsist  A routing inconsistency has been detected.
44      RouteLoop       A routing loop has been detected.
45      SAPUnknown      Control PDU contains an unknown next-higher
                        layer SAP (port).
46      SIDUnknown      Control PDU contains an unknown SID.
47      STAgentFailure  An ST agent failure has been detected.
48      STVer3Bad       A received PDU is not ST Version 3.
49      StreamExists    A stream with the given SID already exists.
50      StreamPreempted The stream has been preempted by one with a
                        higher precedence.
51      TargetExists    A CONNECT was received that specified an
                        existing target.
52      TargetUnknown   A target is not a member of the specified
                        stream.
53      TargetMissing   A target parameter was expected and is not
                        included, or is empty.
54      TruncatedCtl    Control PDU is shorter than expected.
55      TruncatedPDU    A received ST PDU is shorter than the ST Header
                        indicates.
56      UserDataSize    UserData parameter too large to permit a
                        message to fit into a network's MTU.
10.5.4  Timeouts and Other Constants
   SCMP uses retransmission to effect reliability and thus has several
   "retransmission timers". Each "timer" is modeled by an initial time
   interval (ToXxx), which may get updated dynamically through
   measurement of control traffic, and a number of times (NXxx) to
   retransmit a message before declaring a failure. All time intervals
   are in units of milliseconds. Note that the variables are described
   for reference purposes only, different implementations may not
   include the identical variables.
Value   Timeout Name    Meaning
------------------------------------------------------------------------
  500   ToAccept        Initial hop-by-hop timeout for acknowledgment of
                        ACCEPT
    3   NAccept         ACCEPT retries before failure
  500   ToChange        Initial hop-by-hop timeout for acknowledgment of
                        CHANGE
    3   NChange         CHANGE retries before failure
 5000   ToChangeResp    End-to-End CHANGE timeout for receipt of ACCEPT
                        or REFUSE
  500   ToConnect       Initial hop-by-hop timeout for acknowledgment of
                        CONNECT
    5   NConnect        CONNECT retries before failure
 5000   ToConnectResp   End-to-End CONNECT timeout for receipt of ACCEPT
                        or REFUSE from targets by origin
  500   ToDisconnect    Initial hop-by-hop timeout for acknowledgment of
                        DISCONNECT
    3   NDisconnect     DISCONNECT retries before failure
  500   ToJoin          Initial hop-by-hop timeout for acknowledgment of
                        JOIN
    3   NJoin           JOIN retries before failure
  500   ToJoinReject    Initial hop-by-hop timeout for acknowledgment of
                        JOIN-REJECT
    3   NJoinReject     JOIN-REJECT retries before failure
 5000   ToJoinResp      Timeout for receipt of CONNECT or JOIN-REJECT
                        from origin or intermediate hop
  500   ToNotify        Initial hop-by-hop timeout for acknowledgment of
                        NOTIFY
    3   NNotify         NOTIFY retries before failure
  500   ToRefuse        Initial hop-by-hop timeout for acknowledgment of
                        REFUSE
    3   NRefuse         REFUSE retries before failure
  500   ToRetryRoute    Timeout for receipt of ACCEPT or REFUSE from
                        targets during failure recovery
    5   NRetryRoute     CONNECT retries before failure
 1000   ToStatusResp    Timeout for receipt of STATUS-RESPONSE
    3   NStatus         STATUS retries before failure
10000   HelloTimerHoldDown      Interval that Restarted bit must be set
                                after ST restart
    5   HelloLossFactor         Number of consecutively missed HELLO
                                messages before declaring link failure
 2000   DefaultRecoveryTimeout  Interval between successive HELLOs
                                to/from active neighbors
10.6  Data Notations
   The convention in the documentation of Internet Protocols is to
   express numbers in decimal and to picture data with the most
   significant octet on the left and the least significant octet on the
   right.
   The order of transmission of the header and data described in this
   document is resolved to the octet level. Whenever a diagram shows a
   group of octets, the order of transmission of those octets is the
   normal order in which they are read in English. For example, in the
   following diagram the octets are transmitted in the order they are
   numbered.
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |       1       |       2       |       3       |       4       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |       5       |       6       |       7       |       8       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |       9       |      10       |      11       |      12       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                  Figure 34:  Transmission Order of Bytes
   Whenever an octet represents a numeric quantity the left most bit in
   the diagram is the high order or most significant bit. That is, the
   bit labeled 0 is the most significant bit. For example, the following
   diagram represents the value 170 (decimal).
                                0 1 2 3 4 5 6 7
                               +-+-+-+-+-+-+-+-+
                               |1 0 1 0 1 0 1 0|
                               +-+-+-+-+-+-+-+-+
                      Figure 35: Significance of Bits
   Similarly, whenever a multi-octet field represents a numeric quantity
   the left most bit of the whole field is the most significant bit.
   When a multi-octet quantity is transmitted the most significant octet
   is transmitted first.
   Fields whose length is fixed and fully illustrated are shown with a
   vertical bar (|) at the end; fixed fields whose contents are
   abbreviated are shown with an exclamation point (!); variable fields
   are shown with colons (:). Optional parameters are separated from
   control messages with a blank line. The order of parameters is not
   meaningful.
11.  References
[RFC1071]       Braden, R., Borman, D., and C. Partridge,
                "Computing the Internet Checksum", RFC 1071,
                USC/Information Sciences Institute,
                Cray Research, BBN Laboratories, September 1988.
[RFC1112]       Deering, S., "Host Extensions for IP Multicasting",
                STD 5, RFC 1112, Stanford University, August 1989.
RFC 1819              ST2+ Protocol Specification            August 1995
[WoHD95]        L. Wolf, R. G. Herrtwich, L. Delgrossi: Filtering
                Multimedia Data in Reservation-based Networks,
                Kommunikation in Verteilten Systemen 1995 (KiVS),
                Chemnitz-Zwickau, Germany, February 1995.
[RFC1122]       Braden, R., "Requirements for Internet Hosts --
                Communication Layers", STD 3, RFC 1122,
                USC/Information Sciences Institute, October 1989.
[Jaco88]        Jacobson, V.: Congestion Avoidance and Control, ACM
                SIGCOMM-88, August 1988.
[KaPa87]        Karn, P. and C. Partridge: Round Trip Time Estimation,
                ACM SIGCOMM-87, August 1987.
[RFC1141]       Mallory, T., and A. Kullberg, "Incremental Updating
                of the Internet Checksum", RFC 1141, BBN, January 1990.
[RFC1363]       Partridge, C., "A Proposal Flow Specification",
                RFC 1363, BBN, September 1992.
[RFC791]        Postel, J., "Internet Protocol", STD 5, RFC 791,
                DARPA, September 1981.
[RFC1700]       Reynolds, J., and J. Postel, "Assigned Numbers",
                STD 2, RFC 1700, USC/Information Sciences Institute,
                October 1994.
[RFC1190]       Topolcic C., "Internet Stream Protocol Version 2
                (ST-II)", RFC 1190, CIP Working Group, October 1990.
[RFC1633]       Braden, R., Clark, D., and S. Shenker, "Integrated
                Services in the Internet Architecture: an Overview",
                RFC 1633, USC/Information Sciences Institute,
                MIT, Xerox PARC, June 1994.
[VoHN93]        C. Vogt, R. G. Herrtwich, R. Nagarajan: HeiRAT: the
                Heidelberg Resource Administration Technique - Design
                Philosophy and Goals, Kommunikation In Verteilten
                Systemen, Munich, Informatik Aktuell, Springer-Verlag,
                Heidelberg, 1993.
[Cohe81]        D. Cohen: A Network Voice Protocol NVP-II, University of
                Southern California, Los Angeles, 1981.
[Cole81]        R. Cole: PVP - A Packet Video Protocol, University of
                Southern California, Los Angeles, 1981.
[DeAl92]        L. Delgrossi (Ed.) The BERKOM-II Multimedia Transport
                System, Version 1, BERKOM Working Document, October,
                1992.
[DHHS92]        L. Delgrossi, C. Halstrick, R. G. Herrtwich, H.
                Stuettgen: HeiTP: a Transport Protocol for ST-II,
                GLOBECOM'92, Orlando (Florida), December 1992.
[Schu94]        H. Schulzrinne: RTP: A Transport Protocol for Real-Time
                Applications. Work in Progress, 1994.
12.  Security Considerations
   Security issues are not discussed in this memo.
13.  Acknowledgments and Authors' Addresses
   Many individuals have contributed to the work described in this memo.
   We thank the participants in the ST Working Group for their input,
   review, and constructive comments. George Mason University C3I Center
   for hosting an interim meeting. Murali Rajagopal for his efforts on
   ST2+ state machines. Special thanks are due to Steve DeJarnett, who
   served as working group co-chair until summer 1993.
   We would also like to acknowledge the authors of [RFC1190]. All
   authors of [RFC1190] should be considered authors of this document
   since this document contains much of their text and ideas.
   Louis Berger
   BBN Systems and Technologies
   1300 North 17th Street, Suite 1200
   Arlington, VA 22209
   Phone: 703-284-4651
   EMail: lberger@bbn.com
   Luca Delgrossi
   Andersen Consulting Technology Park
   449, Route des Cretes
   06902 Sophia Antipolis, France
   Phone: +33.92.94.80.92
   EMail: luca@andersen.fr
   Dat Duong
   BBN Systems and Technologies
   1300 North 17th Street, Suite 1200
   Arlington, VA 22209
   Phone: 703-284-4760
   EMail: dat@bbn.com
   Steve Jackowski
   Syzygy Communications Incorporated
   269 Mt. Hermon Road
   Scotts Valley, CA 95066
   Phone: 408-439-6834
   EMail: stevej@syzygycomm.com
   Sibylle Schaller
   IBM ENC
   Broadband Multimedia Communications
   Vangerowstr. 18
   D69020 Heidelberg, Germany
   Phone: +49-6221-5944553
   EMail: schaller@heidelbg.ibm.com