best ipv6 book Archives - IPv6.net

How to set up an IPv6 enabled FTP server – vsftpd

IPv6 & IoT editor — Tue, 21 Apr 2015 12:11:18 +0000

How to set up an IPv6 enabled FTP server: vsftpd

Many people are searching for configuration examples on how to set up an IPv6 enabled FTP server like vsftpd. Here you will find examples of a number of popular FTP servers. This article covers vstftpd for Linux.

vsftpd

Let’s have a look at vsftpd. vsftpd is a stable GPL licensed FTP server for UNIX systems, including Linux. It is secure and very fast. The configuration example is based on an Ubuntu installation an assumes you have an active IPv6 network stack:
[the_ad id=”956129″]

Install the vsftpd package:

apt-get update

Let’s install vsftpd and any other required package:

apt-get -y install vsftpd

Configure vsftpd:

Use your favorite editor to edit the configuration file for vsftpd:

vim /etc/vsftpd.conf

First of all, disallow anonymous, unidentified users to access files via FTP; change the anonymous_enable setting to NO:

anonymous_enable=NO

Allow local uses to login by changing the local_enable setting to YES:

local_enable=YES

If you want a local user to have write permissions, then change the write_enable setting to YES:

write_enable=YES

You probably want local users to be ‘chroot jailed’ so they will only have access to their own environment and cannot see anything else on the system; change thechroot_local_user setting to YES:

chroot_local_user=YES

Make the server stop listening on IPv4:

listen=NO

Now make the server listen on an IPv6 socket:

 listen_ipv6=YES

You may want to go over the other options which I will not cover here as they fall outside the scope of this example. Then exit and save the file by typing:

wq

Restart the vsftpd service:

service vsftpd restart

And done.
[the_ad id=”956129″]
Check to see if it is listening on an IPv6 socket:

netstat -an6|grep 21

Which should produce an output similar to:

tcp6       0      0 :::21                   :::*                    LISTEN

The post How to set up an IPv6 enabled FTP server – vsftpd appeared first on IPv6.net.

RFC 2401 – Security Architecture for the Internet Protocol

IPv6 & IoT editor — Sat, 01 Aug 2009 18:05:27 +0000

Network Working Group                                            S. Kent
Request for Comments: 2401                                      BBN Corp
Obsoletes: 1825                                              R. Atkinson
Category: Standards Track                                  @Home Network
                                                           November 1998

            Security Architecture for the Internet Protocol

Status of this Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (1998).  All Rights Reserved.

Table of Contents

1. Introduction........................................................3
  1.1 Summary of Contents of Document..................................3
  1.2 Audience.........................................................3
  1.3 Related Documents................................................4
2. Design Objectives...................................................4
  2.1 Goals/Objectives/Requirements/Problem Description................4
  2.2 Caveats and Assumptions..........................................5
3. System Overview.....................................................5
  3.1 What IPsec Does..................................................6
  3.2 How IPsec Works..................................................6
  3.3 Where IPsec May Be Implemented...................................7
4. Security Associations...............................................8
  4.1 Definition and Scope.............................................8
  4.2 Security Association Functionality..............................10
  4.3 Combining Security Associations.................................11
  4.4 Security Association Databases..................................13
     4.4.1 The Security Policy Database (SPD).........................14
     4.4.2 Selectors..................................................17
     4.4.3 Security Association Database (SAD)........................21
  4.5 Basic Combinations of Security Associations.....................24
  4.6 SA and Key Management...........................................26
     4.6.1 Manual Techniques..........................................27
     4.6.2 Automated SA and Key Management............................27
     4.6.3 Locating a Security Gateway................................28
  4.7 Security Associations and Multicast.............................29

5. IP Traffic Processing..............................................30
  5.1 Outbound IP Traffic Processing..................................30
     5.1.1 Selecting and Using an SA or SA Bundle.....................30
     5.1.2 Header Construction for Tunnel Mode........................31
        5.1.2.1 IPv4 -- Header Construction for Tunnel Mode...........31
        5.1.2.2 IPv6 -- Header Construction for Tunnel Mode...........32
  5.2 Processing Inbound IP Traffic...................................33
     5.2.1 Selecting and Using an SA or SA Bundle.....................33
     5.2.2 Handling of AH and ESP tunnels.............................34
6. ICMP Processing (relevant to IPsec)................................35
  6.1 PMTU/DF Processing..............................................36
     6.1.1 DF Bit.....................................................36
     6.1.2 Path MTU Discovery (PMTU)..................................36
        6.1.2.1 Propagation of PMTU...................................36
        6.1.2.2 Calculation of PMTU...................................37
        6.1.2.3 Granularity of PMTU Processing........................37
        6.1.2.4 PMTU Aging............................................38
7. Auditing...........................................................39
8. Use in Systems Supporting Information Flow Security................39
  8.1 Relationship Between Security Associations and Data Sensitivity.40
  8.2 Sensitivity Consistency Checking................................40
  8.3 Additional MLS Attributes for Security Association Databases....41
  8.4 Additional Inbound Processing Steps for MLS Networking..........41
  8.5 Additional Outbound Processing Steps for MLS Networking.........41
  8.6 Additional MLS Processing for Security Gateways.................42
9. Performance Issues.................................................42
10. Conformance Requirements..........................................43
11. Security Considerations...........................................43
12. Differences from RFC 1825.........................................43
Acknowledgements......................................................44
Appendix A -- Glossary................................................45
Appendix B -- Analysis/Discussion of PMTU/DF/Fragmentation Issues.....48
  B.1 DF bit..........................................................48
  B.2 Fragmentation...................................................48
  B.3 Path MTU Discovery..............................................52
     B.3.1 Identifying the Originating Host(s)........................53
     B.3.2 Calculation of PMTU........................................55
     B.3.3 Granularity of Maintaining PMTU Data.......................56
     B.3.4 Per Socket Maintenance of PMTU Data........................57
     B.3.5 Delivery of PMTU Data to the Transport Layer...............57
     B.3.6 Aging of PMTU Data.........................................57
Appendix C -- Sequence Space Window Code Example......................58
Appendix D -- Categorization of ICMP messages.........................60
References............................................................63
Disclaimer............................................................64
Author Information....................................................65
Full Copyright Statement..............................................66

1. Introduction

1.1 Summary of Contents of Document

   This memo specifies the base architecture for IPsec compliant
   systems.  The goal of the architecture is to provide various security
   services for traffic at the IP layer, in both the IPv4 and IPv6
   environments.  This document describes the goals of such systems,
   their components and how they fit together with each other and into
   the IP environment.  It also describes the security services offered
   by the IPsec protocols, and how these services can be employed in the
   IP environment.  This document does not address all aspects of IPsec
   architecture.  Subsequent documents will address additional
   architectural details of a more advanced nature, e.g., use of IPsec
   in NAT environments and more complete support for IP multicast.  The
   following fundamental components of the IPsec security architecture
   are discussed in terms of their underlying, required functionality.
   Additional RFCs (see Section 1.3 for pointers to other documents)
   define the protocols in (a), (c), and (d).

        a. Security Protocols -- Authentication Header (AH) and
           Encapsulating Security Payload (ESP)
        b. Security Associations -- what they are and how they work,
           how they are managed, associated proces
sing
        c. Key Management -- manual and automatic (The Internet Key
           Exchange (IKE))
        d. Algorithms for authentication and encryption

   This document is not an overall Security Architecture for the
   Internet; it addresses security only at the IP layer, provided
   through the use of a combination of cryptographic and protocol
   security mechanisms.

   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
   document, are to be interpreted as described in RFC 2119 [Bra97].

1.2 Audience

   The target audience for this document includes implementers of this
   IP security technology and others interested in gaining a general
   background understanding of this system.  In particular, prospective
   users of this technology (end users or system administrators) are
   part of the target audience.  A glossary is provided as an appendix

   to help fill in gaps in background/vocabulary.  This document assumes
   that the reader is familiar with the Internet Protocol, related
   networking technology, and general security terms and concepts.

1.3 Related Documents

   As mentioned above, other documents provide detailed definitions of
   some of the components of IPsec and of their inter-relationship.
   They include RFCs on the following topics:

        a. "IP Security Document Roadmap" [TDG97] -- a document
           providing guidelines for specifications describing encryption
           and authentication algorithms used in this system.
        b. security protocols -- RFCs describing the Authentication
           Header (AH) [KA98a] and Encapsulating Security Payload (ESP)
           [KA98b] protocols.
        c. algorithms for authentication and encryption -- a separate
           RFC for each algorithm.
        d. automatic key management -- RFCs on "The Internet Key
           Exchange (IKE)" [HC98], "Internet Security Association and
           Key Management Protocol (ISAKMP)" [MSST97],"The OAKLEY Key
           Determination Protocol" [Orm97], and "The Internet IP
           Security Domain of Interpretation for ISAKMP" [Pip98].

2. Design Objectives

2.1 Goals/Objectives/Requirements/Problem Description

   IPsec is designed to provide interoperable, high quality,
   cryptographically-based security for IPv4 and IPv6.  The set of
   security services offered includes access control, connectionless
   integrity, data origin authentication, protection against replays (a
   form of partial sequence integrity), confidentiality (encryption),
   and limited traffic flow confidentiality.  These services are
   provided at the IP layer, offering protection for IP and/or upper
   layer protocols.

   These objectives are met through the use of two traffic security
   protocols, the Authentication Header (AH) and the Encapsulating
   Security Payload (ESP), and through the use of cryptographic key
   management procedures and protocols.  The set of IPsec protocols
   employed in any context, and the ways in which they are employed,
   will be determined by the security and system requirements of users,
   applications, and/or sites/organizations.

   When these mechanisms are correctly implemented and deployed, they
   ought not to adversely affect users, hosts, and other Internet
   components that do not employ these security mechanisms for

   protection of their traffic.  These mechanisms also are designed to
   be algorithm-independent.  This modularity permits selection of
   different sets of algorithms without affecting the other parts of the
   implementation.  For example, different user communities may select
   different sets of algorithms (creating cliques) if required.

   A standard set of default algorithms is specified to facilitate
   interoperability in the global Internet.  The use of these
   algorithms, in conjunction with IPsec traffic protection and key
   management protocols, is intended to permit system and application
   developers to deploy high quality, Internet layer, cryptographic
   security technology.

2.2 Caveats and Assumptions

   The suite of IPsec protocols and associated default algorithms are
   designed to provide high quality security for Internet traffic.
   However, the security offered by use of these protocols ultimately
   depends on the quality of the their implementation, which is outside
   the scope of this set of standards.  Moreover, the security of a
   computer system or network is a function of many factors, including
   personnel, physical, procedural, compromising emanations, and
   computer security practices.  Thus IPsec is only one part of an
   overall system security architecture.

   Finally, the security afforded by the use of IPsec is critically
   dependent on many aspects of the operating environment in which the
   IPsec implementation executes.  For example, defects in OS security,
   poor quality of random number sources, sloppy system management
   protocols and practices, etc. can all degrade the security provided
   by IPsec.  As above, none of these environmental attributes are
   within the scope of this or other IPsec standards.

3. System Overview

   This section provides a high level description of how IPsec works,
   the components of the system, and how they fit together to provide
   the security services noted above.  The goal of this description is
   to enable the reader to "picture" the overall process/system, see how
   it fits into the IP environment, and to provide context for later
   sections of this document, which describe each of the components in
   more detail.

   An IPsec implementation operates in a host or a security gateway
   environment, affording protection to IP traffic.  The protection
   offered is based on requirements defined by a Security Policy
   Database (SPD) established and maintained by a user or system
   administrator, or by an application operating within constraints

   established by either of the above.  In general, packets are selected
   for one of three processing modes based on IP and transport layer
   header information (Selectors, Section 4.4.2) matched against entries
   in the database (SPD).  Each packet is either afforded IPsec security
   services, discarded, or allowed to bypass IPsec, based on the
   applicable database policies identified by the Selectors.

3.1 What IPsec Does

   IPsec provides security services at the IP layer by enabling a system
   to select required security protocols, determine the algorithm(s) to
   use for the service(s), and put in place any cryptographic keys
   required to provide the requested services.  IPsec can be used to
   protect one or more "paths" between a pair of hosts, between a pair
   of security gateways, or between a security gateway and a host.  (The
   term "security gateway" is used throughout the IPsec documents to
   refer to an intermediate system that implements IPsec protocols.  For
   example, a router or a firewall implementing IPsec is a security
   gateway.)

   The set of security services that IPsec can provide includes access
   control, connectionless integrity, data origin authentication,
   rejection of replayed packets (a form of partial sequence integrity),
   confidentiality (encryption), and limited traffic flow
   confidentiality.  Because these services are provided at the IP
   layer, they can be used by any high
er layer protocol, e.g., TCP, UDP,
   ICMP, BGP, etc.

   The IPsec DOI also supports negotiation of IP compression [SMPT98],
   motivated in part by the observation that when encryption is employed
   within IPsec, it prevents effective compression by lower protocol
   layers.

3.2 How IPsec Works

   IPsec uses two protocols to provide traffic security --
   Authentication Header (AH) and Encapsulating Security Payload (ESP).
   Both protocols are described in more detail in their respective RFCs
   [KA98a, KA98b].

        o The IP Authentication Header (AH) [KA98a] provides
          connectionless integrity, data origin authentication, and an
          optional anti-replay service.
        o The Encapsulating Security Payload (ESP) protocol [KA98b] may
          provide confidentiality (encryption), and limited traffic flow
          confidentiality.  It also may provide connectionless

          integrity, data origin authentication, and an anti-replay
          service.  (One or the other set of these security services
          must be applied whenever ESP is invoked.)
        o Both AH and ESP are vehicles for access control, based on the
          distribution of cryptographic keys and the management of
          traffic flows relative to these security protocols.

   These protocols may be applied alone or in combination with each
   other to provide a desired set of security services in IPv4 and IPv6.
   Each protocol supports two modes of use: transport mode and tunnel
   mode.  In transport mode the protocols provide protection primarily
   for upper layer protocols; in tunnel mode, the protocols are applied
   to tunneled IP packets.  The differences between the two modes are
   discussed in Section 4.

   IPsec allows the user (or system administrator) to control the
   granularity at which a security service is offered.  For example, one
   can create a single encrypted tunnel to carry all the traffic between
   two security gateways or a separate encrypted tunnel can be created
   for each TCP connection between each pair of hosts communicating
   across these gateways.  IPsec management must incorporate facilities
   for specifying:

        o which security services to use and in what combinations
        o the granularity at which a given security protection should be
          applied
        o the algorithms used to effect cryptographic-based security

   Because these security services use shared secret values
   (cryptographic keys), IPsec relies on a separate set of mechanisms
   for putting these keys in place. (The keys are used for
   authentication/integrity and encryption services.)  This document
   requires support for both manual and automatic distribution of keys.
   It specifies a specific public-key based approach (IKE -- [MSST97,
   Orm97, HC98]) for automatic key management, but other automated key
   distribution techniques MAY be used.  For example, KDC-based systems
   such as Kerberos and other public-key systems such as SKIP could be
   employed.

3.3 Where IPsec May Be Implemented

   There are several ways in which IPsec may be implemented in a host or
   in conjunction with a router or firewall (to create a security
   gateway).  Several common examples are provided below:

        a. Integration of IPsec into the native IP implementation.  This
           requires access to the IP source code and is applicable to
           both hosts and security gateways.

        b. "Bump-in-the-stack" (BITS) implementations, where IPsec is
           implemented "underneath" an existing implementation of an IP
           protocol stack, between the native IP and the local network
           drivers.  Source code access for the IP stack is not required
           in this context, making this implementation approach
           appropriate for use with legacy systems.  This approach, when
           it is adopted, is usually employed in hosts.

        c. The use of an outboard crypto processor is a common design
           feature of network security systems used by the military, and
           of some commercial systems as well.  It is sometimes referred
           to as a "Bump-in-the-wire" (BITW) implementation.  Such
           implementations may be designed to serve either a host or a
           gateway (or both).  Usually the BITW device is IP
           addressable.  When supporting a single host, it may be quite
           analogous to a BITS implementation, but in supporting a
           router or firewall, it must operate like a security gateway.

4. Security Associations

   This section defines Security Association management requirements for
   all IPv6 implementations and for those IPv4 implementations that
   implement AH, ESP, or both.  The concept of a "Security Association"
   (SA) is fundamental to IPsec.  Both AH and ESP make use of SAs and a
   major function of IKE is the establishment and maintenance of
   Security Associations.  All implementations of AH or ESP MUST support
   the concept of a Security Association as described below.  The
   remainder of this section describes various aspects of Security
   Association management, defining required characteristics for SA
   policy management, traffic processing, and SA management techniques.

4.1 Definition and Scope

   A Security Association (SA) is a simplex "connection" that affords
   security services to the traffic carried by it.  Security services
   are afforded to an SA by the use of AH, or ESP, but not both.  If
   both AH and ESP protection is applied to a traffic stream, then two
   (or more) SAs are created to afford protection to the traffic stream.
   To secure typical, bi-directional communication between two hosts, or
   between two security gateways, two Security Associations (one in each
   direction) are required.

   A security association is uniquely identified by a triple consisting
   of a Security Parameter Index (SPI), an IP Destination Address, and a
   security protocol (AH or ESP) identifier.  In principle, the
   Destination Address may be a unicast address, an IP broadcast
   address, or a multicast group address.  However, IPsec SA management
   mechanisms currently are defined only for unicast SAs.  Hence, in the

   discussions that follow, SAs will be described in the context of
   point-to-point communication, even though the concept is applicable
   in the point-to-multipoint case as well.

   As noted above, two types of SAs are defined: transport mode and
   tunnel mode.  A transport mode SA is a security association between
   two hosts.  In IPv4, a transport mode security protocol header
   appears immediately after the IP header and any options, and before
   any higher layer protocols (e.g., TCP or UDP).  In IPv6, the security
   protocol header appears after the base IP header and extensions, but
   may appear before or after destination options, and before higher
   layer protocols.  In the case of ESP, a transport mode SA provides
   security services only for these higher layer protocols, not for the
   IP header or any extension headers preceding the ESP header.  In the
   case of AH, the protection is also extended to selected portions of
   the IP header, selected portions of extension headers, and selected
   options (contained in the IPv4 header, IPv6 Hop-by-Hop extension
   header, or IPv6 Destination extension headers).  For more details on
   the coverage afforded by AH, see the AH specification [
KA98a].

   A tunnel mode SA is essentially an SA applied to an IP tunnel.
   Whenever either end of a security association is a security gateway,
   the SA MUST be tunnel mode.  Thus an SA between two security gateways
   is always a tunnel mode SA, as is an SA between a host and a security
   gateway.  Note that for the case where traffic is destined for a
   security gateway, e.g., SNMP commands, the security gateway is acting
   as a host and transport mode is allowed.  But in that case, the
   security gateway is not acting as a gateway, i.e., not transiting
   traffic.  Two hosts MAY establish a tunnel mode SA between
   themselves.  The requirement for any (transit traffic) SA involving a
   security gateway to be a tunnel SA arises due to the need to avoid
   potential problems with regard to fragmentation and reassembly of
   IPsec packets, and in circumstances where multiple paths (e.g., via
   different security gateways) exist to the same destination behind the
   security gateways.

   For a tunnel mode SA, there is an "outer" IP header that specifies
   the IPsec processing destination, plus an "inner" IP header that
   specifies the (apparently) ultimate destination for the packet.  The
   security protocol header appears after the outer IP header, and
   before the inner IP header.  If AH is employed in tunnel mode,
   portions of the outer IP header are afforded protection (as above),
   as well as all of the tunneled IP packet (i.e., all of the inner IP
   header is protected, as well as higher layer protocols).  If ESP is
   employed, the protection is afforded only to the tunneled packet, not
   to the outer header.

   In summary,
           a) A host MUST support both transport and tunnel mode.
           b) A security gateway is required to support only tunnel
              mode.  If it supports transport mode, that should be used
              only when the security gateway is acting as a host, e.g.,
              for network management.

4.2 Security Association Functionality

   The set of security services offered by an SA depends on the security
   protocol selected, the SA mode, the endpoints of the SA, and on the
   election of optional services within the protocol.  For example, AH
   provides data origin authentication and connectionless integrity for
   IP datagrams (hereafter referred to as just "authentication").  The
   "precision" of the authentication service is a function of the
   granularity of the security association with which AH is employed, as
   discussed in Section 4.4.2, "Selectors".

   AH also offers an anti-replay (partial sequence integrity) service at
   the discretion of the receiver, to help counter denial of service
   attacks.  AH is an appropriate protocol to employ when
   confidentiality is not required (or is not permitted, e.g , due to
   government restrictions on use of encryption).  AH also provides
   authentication for selected portions of the IP header, which may be
   necessary in some contexts.  For example, if the integrity of an IPv4
   option or IPv6 extension header must be protected en route between
   sender and receiver, AH can provide this service (except for the
   non-predictable but mutable parts of the IP header.)

   ESP optionally provides confidentiality for traffic.  (The strength
   of the confidentiality service depends in part, on the encryption
   algorithm employed.)  ESP also may optionally provide authentication
   (as defined above).  If authentication is negotiated for an ESP SA,
   the receiver also may elect to enforce an anti-replay service with
   the same features as the AH anti-replay service.  The scope of the
   authentication offered by ESP is narrower than for AH, i.e., the IP
   header(s) "outside" the ESP header is(are) not protected.  If only
   the upper layer protocols need to be authenticated, then ESP
   authentication is an appropriate choice and is more space efficient
   than use of AH encapsulating ESP.  Note that although both
   confidentiality and authentication are optional, they cannot both be
   omitted. At least one of them MUST be selected.

   If confidentiality service is selected, then an ESP (tunnel mode) SA
   between two security gateways can offer partial traffic flow
   confidentiality.  The use of tunnel mode allows the inner IP headers
   to be encrypted, concealing the identities of the (ultimate) traffic
   source and destination.  Moreover, ESP payload padding also can be

   invoked to hide the size of the packets, further concealing the
   external characteristics of the traffic.  Similar traffic flow
   confidentiality services may be offered when a mobile user is
   assigned a dynamic IP address in a dialup context, and establishes a
   (tunnel mode) ESP SA to a corporate firewall (acting as a security
   gateway).  Note that fine granularity SAs generally are more
   vulnerable to traffic analysis than coarse granularity ones which are
   carrying traffic from many subscribers.

4.3 Combining Security Associations

   The IP datagrams transmitted over an individual SA are afforded
   protection by exactly one security protocol, either AH or ESP, but
   not both.  Sometimes a security policy may call for a combination of
   services for a particular traffic flow that is not achievable with a
   single SA.  In such instances it will be necessary to employ multiple
   SAs to implement the required security policy.  The term "security
   association bundle" or "SA bundle" is applied to a sequence of SAs
   through which traffic must be processed to satisfy a security policy.
   The order of the sequence is defined by the policy.  (Note that the
   SAs that comprise a bundle may terminate at different endpoints. For
   example, one SA may extend between a mobile host and a security
   gateway and a second, nested SA may extend to a host behind the
   gateway.)

   Security associations may be combined into bundles in two ways:
   transport adjacency and iterated tunneling.

           o Transport adjacency refers to applying more than one
             security protocol to the same IP datagram, without invoking
             tunneling.  This approach to combining AH and ESP allows
             for only one level of combination; further nesting yields
             no added benefit (assuming use of adequately strong
             algorithms in each protocol) since the processing is
             performed at one IPsec instance at the (ultimate)
             destination.

             Host 1 --- Security ---- Internet -- Security --- Host 2
              | |        Gwy 1                      Gwy 2        | |
              | |                                                | |
              | -----Security Association 1 (ESP transport)------- |
              |                                                    |
              -------Security Association 2 (AH transport)----------

           o Iterated tunneling refers to the application of multiple
             layers of security protocols effected through IP tunneling.
             This approach allows for multiple levels of nesting, since
             each tunnel can originate or terminate at a different IPsec

             site along the path.  No special treatment is expected for
             ISAKMP traffic at intermediate security gateways other than
             what can be specified through appropriate SPD entries (See
             Case 3 in Section 4.5)

             There are 3 basic cases of iterated
 tunneling -- support is
             required only for cases 2 and 3.:

             1. both endpoints for the SAs are the same -- The inner and
                outer tunnels could each be either AH or ESP, though it
                is unlikely that Host 1 would specify both to be the
                same, i.e., AH inside of AH or ESP inside of ESP.

                Host 1 --- Security ---- Internet -- Security --- Host 2
                 | |        Gwy 1                      Gwy 2        | |
                 | |                                                | |
                 | -------Security Association 1 (tunnel)---------- | |
                 |                                                    |
                 ---------Security Association 2 (tunnel)--------------

             2. one endpoint of the SAs is the same -- The inner and
                uter tunnels could each be either AH or ESP.

                Host 1 --- Security ---- Internet -- Security --- Host 2
                 | |        Gwy 1                      Gwy 2         |
                 | |                                     |           |
                 | ----Security Association 1 (tunnel)----           |
                 |                                                   |
                 ---------Security Association 2 (tunnel)-------------

             3. neither endpoint is the same -- The inner and outer
                tunnels could each be either AH or ESP.

                Host 1 --- Security ---- Internet -- Security --- Host 2
                 |          Gwy 1                      Gwy 2         |
                 |            |                          |           |
                 |            --Security Assoc 1 (tunnel)-           |
                 |                                                   |
                 -----------Security Association 2 (tunnel)-----------

   These two approaches also can be combined, e.g., an SA bundle could
   be constructed from one tunnel mode SA and one or two transport mode
   SAs, applied in sequence.  (See Section 4.5 "Basic Combinations of
   Security Associations.") Note that nested tunnels can also occur
   where neither the source nor the destination endpoints of any of the
   tunnels are the same.  In that case, there would be no host or
   security gateway with a bundle corresponding to the nested tunnels.

   For transport mode SAs, only one ordering of security protocols seems
   appropriate.  AH is applied to both the upper layer protocols and
   (parts of) the IP header.  Thus if AH is used in a transport mode, in
   conjunction with ESP, AH SHOULD appear as the first header after IP,
   prior to the appearance of ESP.  In that context, AH is applied to
   the ciphertext output of ESP.  In contrast, for tunnel mode SAs, one
   can imagine uses for various orderings of AH and ESP.  The required
   set of SA bundle types that MUST be supported by a compliant IPsec
   implementation is described in Section 4.5.

4.4 Security Association Databases

   Many of the details associated with processing IP traffic in an IPsec
   implementation are largely a local matter, not subject to
   standardization.  However, some external aspects of the processing
   must be standardized, to ensure interoperability and to provide a
   minimum management capability that is essential for productive use of
   IPsec.  This section describes a general model for processing IP
   traffic relative to security associations, in support of these
   interoperability and functionality goals.  The model described below
   is nominal; compliant implementations need not match details of this
   model as presented, but the external behavior of such implementations
   must be mappable to the externally observable characteristics of this
   model.

   There are two nominal databases in this model: the Security Policy
   Database and the Security Association Database.  The former specifies
   the policies that determine the disposition of all IP traffic inbound
   or outbound from a host, security gateway, or BITS or BITW IPsec
   implementation.  The latter database contains parameters that are
   associated with each (active) security association.  This section
   also defines the concept of a Selector, a set of IP and upper layer
   protocol field values that is used by the Security Policy Database to
   map traffic to a policy, i.e., an SA (or SA bundle).

   Each interface for which IPsec is enabled requires nominally separate
   inbound vs. outbound databases (SAD and SPD), because of the
   directionality of many of the fields that are used as selectors.
   Typically there is just one such interface, for a host or security
   gateway (SG).  Note that an SG would always have at least 2
   interfaces, but the "internal" one to the corporate net, usually
   would not have IPsec enabled and so only one pair of SADs and one
   pair of SPDs would be needed.  On the other hand, if a host had
   multiple interfaces or an SG had multiple external interfaces, it
   might be necessary to have separate SAD and SPD pairs for each
   interface.

4.4.1 The Security Policy Database (SPD)

   Ultimately, a security association is a management construct used to
   enforce a security policy in the IPsec environment.  Thus an
   essential element of SA processing is an underlying Security Policy
   Database (SPD) that specifies what services are to be offered to IP
   datagrams and in what fashion.  The form of the database and its
   interface are outside the scope of this specification.  However, this
   section does specify certain minimum management functionality that
   must be provided, to allow a user or system administrator to control
   how IPsec is applied to traffic transmitted or received by a host or
   transiting a security gateway.

   The SPD must be consulted during the processing of all traffic
   (INBOUND and OUTBOUND), including non-IPsec traffic.  In order to
   support this, the SPD requires distinct entries for inbound and
   outbound traffic.  One can think of this as separate SPDs (inbound
   vs.  outbound).  In addition, a nominally separate SPD must be
   provided for each IPsec-enabled interface.

   An SPD must discriminate among traffic that is afforded IPsec
   protection and traffic that is allowed to bypass IPsec.  This applies
   to the IPsec protection to be applied by a sender and to the IPsec
   protection that must be present at the receiver.  For any outbound or
   inbound datagram, three processing choices are possible: discard,
   bypass IPsec, or apply IPsec.  The first choice refers to traffic
   that is not allowed to exit the host, traverse the security gateway,
   or be delivered to an application at all.  The second choice refers
   to traffic that is allowed to pass without additional IPsec
   protection.  The third choice refers to traffic that is afforded
   IPsec protection, and for such traffic the SPD must specify the
   security services to be provided, protocols to be employed,
   algorithms to be used, etc.

   For every IPsec implementation, there MUST be an administrative
   interface that allows a user or system administrator to manage the
   SPD.  Specifically, every inbound or outbound packet is subject to
   processing by IPsec and the SPD must specify what action will be
   taken in each case.  Thus the administrative interface must allow the
   user (or system administrator) to specify the security processing to
   be applied to any packet
entering or exiting the system, on a packet
   by packet basis.  (In a host IPsec implementation making use of a
   socket interface, the SPD may not need to be consulted on a per
   packet basis, but the effect is still the same.)  The management
   interface for the SPD MUST allow creation of entries consistent with
   the selectors defined in Section 4.4.2, and MUST support (total)
   ordering of these entries.  It is expected that through the use of
   wildcards in various selector fields, and because all packets on a

   single UDP or TCP connection will tend to match a single SPD entry,
   this requirement will not impose an unreasonably detailed level of
   SPD specification.  The selectors are analogous to what are found in
   a stateless firewall or filtering router and which are currently
   manageable this way.

   In host systems, applications MAY be allowed to select what security
   processing is to be applied to the traffic they generate and consume.
   (Means of signalling such requests to the IPsec implementation are
   outside the scope of this standard.)  However, the system
   administrator MUST be able to specify whether or not a user or
   application can override (default) system policies.  Note that
   application specified policies may satisfy system requirements, so
   that the system may not need to do additional IPsec processing beyond
   that needed to meet an application's requirements.  The form of the
   management interface is not specified by this document and may differ
   for hosts vs. security gateways, and within hosts the interface may
   differ for socket-based vs.  BITS implementations.  However, this
   document does specify a standard set of SPD elements that all IPsec
   implementations MUST support.

   The SPD contains an ordered list of policy entries.  Each policy
   entry is keyed by one or more selectors that define the set of IP
   traffic encompassed by this policy entry.  (The required selector
   types are defined in Section 4.4.2.)  These define the granularity of
   policies or SAs.  Each entry includes an indication of whether
   traffic matching this policy will be bypassed, discarded, or subject
   to IPsec processing.  If IPsec processing is to be applied, the entry
   includes an SA (or SA bundle) specification, listing the IPsec
   protocols, modes, and algorithms to be employed, including any
   nesting requirements.  For example, an entry may call for all
   matching traffic to be protected by ESP in transport mode using
   3DES-CBC with an explicit IV, nested inside of AH in tunnel mode
   using HMAC/SHA-1.  For each selector, the policy entry specifies how
   to derive the corresponding values for a new Security Association
   Database (SAD, see Section 4.4.3) entry from those in the SPD and the
   packet (Note that at present, ranges are only supported for IP
   addresses; but wildcarding can be expressed for all selectors):

           a. use the value in the packet itself -- This will limit use
              of the SA to those packets which have this packet's value
              for the selector even if the selector for the policy entry
              has a range of allowed values or a wildcard for this
              selector.
           b. use the value associated with the policy entry -- If this
              were to be just a single value, then there would be no
              difference between (b) and (a).  However, if the allowed
              values for the selector are a range (for IP addresses) or

              wildcard, then in the case of a range,(b) would enable use
              of the SA by any packet with a selector value within the
              range not just by packets with the selector value of the
              packet that triggered the creation of the SA.  In the case
              of a wildcard, (b) would allow use of the SA by packets
              with any value for this selector.

   For example, suppose there is an SPD entry where the allowed value
   for source address is any of a range of hosts (192.168.2.1 to
   192.168.2.10).  And suppose that a packet is to be sent that has a
   source address of 192.168.2.3.  The value to be used for the SA could
   be any of the sample values below depending on what the policy entry
   for this selector says is the source of the selector value:

           source for the  example of
           value to be     new SAD
           used in the SA  selector value
           --------------- ------------
           a. packet       192.168.2.3 (one host)
           b. SPD entry    192.168.2.1 to 192.168.2.10 (range of hosts)

   Note that if the SPD entry had an allowed value of wildcard for the
   source address, then the SAD selector value could be wildcard (any
   host).  Case (a) can be used to prohibit sharing, even among packets
   that match the same SPD entry.

   As described below in Section 4.4.3, selectors may include "wildcard"
   entries and hence the selectors for two entries may overlap.  (This
   is analogous to the overlap that arises with ACLs or filter entries
   in routers or packet filtering firewalls.)  Thus, to ensure
   consistent, predictable processing, SPD entries MUST be ordered and
   the SPD MUST always be searched in the same order, so that the first
   matching entry is consistently selected.  (This requirement is
   necessary as the effect of processing traffic against SPD entries
   must be deterministic, but there is no way to canonicalize SPD
   entries given the use of wildcards for some selectors.)  More detail
   on matching of packets against SPD entries is provided in Section 5.

   Note that if ESP is specified, either (but not both) authentication
   or encryption can be omitted.  So it MUST be possible to configure
   the SPD value for the authentication or encryption algorithms to be
   "NULL".  However, at least one of these services MUST be selected,
   i.e., it MUST NOT be possible to configure both of them as "NULL".

   The SPD can be used to map traffic to specific SAs or SA bundles.
   Thus it can function both as the reference database for security
   policy and as the map to existing SAs (or SA bundles).  (To
   accommodate the bypass and discard policies cited above, the SPD also

   MUST provide a means of mapping traffic to these functions, even
   though they are not, per se, IPsec processing.)  The way in which the
   SPD operates is different for inbound vs. outbound traffic and it
   also may differ for host vs.  security gateway, BITS, and BITW
   implementations.  Sections 5.1 and 5.2 describe the use of the SPD
   for outbound and inbound processing, respectively.

   Because a security policy may require that more than one SA be
   applied to a specified set of traffic, in a specific order, the
   policy entry in the SPD must preserve these ordering requirements,
   when present.  Thus, it must be possible for an IPsec implementation
   to determine that an outbound or inbound packet must be processed
   thorough a sequence of SAs.  Conceptually, for outbound processing,
   one might imagine links (to the SAD) from an SPD entry for which
   there are active SAs, and each entry would consist of either a single
   SA or an ordered list of SAs that comprise an SA bundle.  When a
   packet is matched against an SPD entry and there is an existing SA or
   SA bundle that can be used to carry the traffic, the processing of
   the packet is controlled by the SA or SA bundle entry on the list.
   For an inbound IPsec packet for which mul
tiple IPsec SAs are to be
   applied, the lookup based on destination address, IPsec protocol, and
   SPI should identify a single SA.

   The SPD is used to control the flow of ALL traffic through an IPsec
   system, including security and key management traffic (e.g., ISAKMP)
   from/to entities behind a security gateway.  This means that ISAKMP
   traffic must be explicitly accounted for in the SPD, else it will be
   discarded.  Note that a security gateway could prohibit traversal of
   encrypted packets in various ways, e.g., having a DISCARD entry in
   the SPD for ESP packets or providing proxy key exchange.  In the
   latter case, the traffic would be internally routed to the key
   management module in the security gateway.

4.4.2  Selectors

   An SA (or SA bundle) may be fine-grained or coarse-grained, depending
   on the selectors used to define the set of traffic for the SA.  For
   example, all traffic between two hosts may be carried via a single
   SA, and afforded a uniform set of security services.  Alternatively,
   traffic between a pair of hosts might be spread over multiple SAs,
   depending on the applications being used (as defined by the Next
   Protocol and Port fields), with different security services offered
   by different SAs.  Similarly, all traffic between a pair of security
   gateways could be carried on a single SA, or one SA could be assigned
   for each communicating host pair.  The following selector parameters
   MUST be supported for SA management to facilitate control of SA
   granularity.  Note that in the case of receipt of a packet with an
   ESP header, e.g., at an encapsulating security gateway or BITW

   implementation, the transport layer protocol, source/destination
   ports, and Name (if present) may be "OPAQUE", i.e., inaccessible
   because of encryption or fragmentation.  Note also that both Source
   and Destination addresses should either be IPv4 or IPv6.

      - Destination IP Address (IPv4 or IPv6): this may be a single IP
        address (unicast, anycast, broadcast (IPv4 only), or multicast
        group), a range of addresses (high and low values (inclusive),
        address + mask, or a wildcard address.  The last three are used
        to support more than one destination system sharing the same SA
        (e.g., behind a security gateway). Note that this selector is
        conceptually different from the "Destination IP Address" field
        in the  tuple used
        to uniquely identify an SA.  When a tunneled packet arrives at
        the tunnel endpoint, its SPI/Destination address/Protocol are
        used to look up the SA for this packet in the SAD.  This
        destination address comes from the encapsulating IP header.
        Once the packet has been processed according to the tunnel SA
        and has come out of the tunnel, its selectors are "looked up" in
        the Inbound SPD.  The Inbound SPD has a selector called
        destination address.  This IP destination address is the one in
        the inner (encapsulated) IP header.  In the case of a
        transport'd packet, there will be only one IP header and this
        ambiguity does not exist.  [REQUIRED for all implementations]

      - Source IP Address(es) (IPv4 or IPv6): this may be a single IP
        address (unicast, anycast, broadcast (IPv4 only), or multicast
        group), range of addresses (high and low values inclusive),
        address + mask, or a wildcard address.  The last three are used
        to support more than one source system sharing the same SA
        (e.g., behind a security gateway or in a multihomed host).
        [REQUIRED for all implementations]

      - Name: There are 2 cases (Note that these name forms are
        supported in the IPsec DOI.)
                1. User ID
                    a. a fully qualified user name string (DNS), e.g.,
                       mozart@foo.bar.com
                    b. X.500 distinguished name, e.g., C = US, SP = MA,
                       O = GTE Internetworking, CN = Stephen T. Kent.
                2. System name (host, security gateway, etc.)
                    a. a fully qualified DNS name, e.g., foo.bar.com
                    b. X.500 distinguished name
                    c. X.500 general name

        NOTE: One of the possible values of this selector is "OPAQUE".

        [REQUIRED for the following cases.  Note that support for name
        forms other than addresses is not required for manually keyed
        SAs.
                o User ID
                    - native host implementations
                    - BITW and BITS implementations acting as HOSTS
                      with only one user
                    - security gateway implementations for INBOUND
                      processing.
                o System names -- all implementations]

      - Data sensitivity level: (IPSO/CIPSO labels)
        [REQUIRED for all systems providing information flow security as
        per Section 8, OPTIONAL for all other systems.]

      - Transport Layer Protocol: Obtained from the IPv4 "Protocol" or
        the IPv6 "Next Header" fields.  This may be an individual
        protocol number.  These packet fields may not contain the
        Transport Protocol due to the presence of IP extension headers,
        e.g., a Routing Header, AH, ESP, Fragmentation Header,
        Destination Options, Hop-by-hop options, etc.  Note that the
        Transport Protocol may not be available in the case of receipt
        of a packet with an ESP header, thus a value of "OPAQUE" SHOULD
        be supported.
        [REQUIRED for all implementations]

        NOTE: To locate the transport protocol, a system has to chain
        through the packet headers checking the "Protocol" or "Next
        Header" field until it encounters either one it recognizes as a
        transport protocol, or until it reaches one that isn't on its
        list of extension headers, or until it encounters an ESP header
        that renders the transport protocol opaque.

      - Source and Destination (e.g., TCP/UDP) Ports: These may be
        individual UDP or TCP port values or a wildcard port.  (The use
        of the Next Protocol field and the Source and/or Destination
        Port fields (in conjunction with the Source and/or Destination
        Address fields), as an SA selector is sometimes referred to as
        "session-oriented keying.").  Note that the source and
        destination ports may not be available in the case of receipt of
        a packet with an ESP header, thus a value of "OPAQUE" SHOULD be
        supported.

        The following table summarizes the relationship between the
        "Next Header" value in the packet and SPD and the derived Port
        Selector value for the SPD and SAD.

          Next Hdr        Transport Layer   Derived Port Selector Field
          in Packet       Protocol in SPD   Value in SPD and SAD
          --------        ---------------   ---------------------------
          ESP             ESP or ANY        ANY (i.e., don't look at it)
          -don't care-    ANY               ANY (i.e., don't look at it)
          specific value  specific value    NOT ANY (i.e., drop packet)
             fragment
          specific value  specific value    actual port selector field
             not fragment

        If the packet has been fragmented, then the port information may
        not be avail
able in the current fragment.  If so, discard the
        fragment.  An ICMP PMTU should be sent for the first fragment,
        which will have the port information.  [MAY be supported]

   The IPsec implementation context determines how selectors are used.
   For example, a host implementation integrated into the stack may make
   use of a socket interface.  When a new connection is established the
   SPD can be consulted and an SA (or SA bundle) bound to the socket.
   Thus traffic sent via that socket need not result in additional
   lookups to the SPD/SAD.  In contrast, a BITS, BITW, or security
   gateway implementation needs to look at each packet and perform an
   SPD/SAD lookup based on the selectors. The allowable values for the
   selector fields differ between the traffic flow, the security
   association, and the security policy.

   The following table summarizes the kinds of entries that one needs to
   be able to express in the SPD and SAD.  It shows how they relate to
   the fields in data traffic being subjected to IPsec screening.
   (Note: the "wild" or "wildcard" entry for src and dst addresses
   includes a mask, range, etc.)

 Field         Traffic Value       SAD Entry            SPD Entry
 --------      -------------   ----------------   --------------------
 src addr      single IP addr  single,range,wild  single,range,wildcard
 dst addr      single IP addr  single,range,wild  single,range,wildcard
 xpt protocol* xpt protocol    single,wildcard    single,wildcard
 src port*     single src port single,wildcard    single,wildcard
 dst port*     single dst port single,wildcard    single,wildcard
 user id*      single user id  single,wildcard    single,wildcard
 sec. labels   single value    single,wildcard    single,wildcard

       * The SAD and SPD entries for these fields could be "OPAQUE"
         because the traffic value is encrypted.

   NOTE: In principle, one could have selectors and/or selector values
   in the SPD which cannot be negotiated for an SA or SA bundle.
   Examples might include selector values used to select traffic for

   discarding or enumerated lists which cause a separate SA to be
   created for each item on the list.  For now, this is left for future
   versions of this document and the list of required selectors and
   selector values is the same for the SPD and the SAD.  However, it is
   acceptable to have an administrative interface that supports use of
   selector values which cannot be negotiated provided that it does not
   mislead the user into believing it is creating an SA with these
   selector values.  For example, the interface may allow the user to
   specify an enumerated list of values but would result in the creation
   of a separate policy and SA for each item on the list.  A vendor
   might support such an interface to make it easier for its customers
   to specify clear and concise policy specifications.

4.4.3 Security Association Database (SAD)

   In each IPsec implementation there is a nominal Security Association
   Database, in which each entry defines the parameters associated with
   one SA.  Each SA has an entry in the SAD.  For outbound processing,
   entries are pointed to by entries in the SPD.  Note that if an SPD
   entry does not currently point to an SA that is appropriate for the
   packet, the implementation creates an appropriate SA (or SA Bundle)
   and links the SPD entry to the SAD entry (see Section 5.1.1).  For
   inbound processing, each entry in the SAD is indexed by a destination
   IP address, IPsec protocol type, and SPI.  The following parameters
   are associated with each entry in the SAD.  This description does not
   purport to be a MIB, but only a specification of the minimal data
   items required to support an SA in an IPsec implementation.

   For inbound processing: The following packet fields are used to look
   up the SA in the SAD:

         o Outer Header's Destination IP address: the IPv4 or IPv6
           Destination address.
           [REQUIRED for all implementations]
         o IPsec Protocol: AH or ESP, used as an index for SA lookup
           in this database.  Specifies the IPsec protocol to be
           applied to the traffic on this SA.
           [REQUIRED for all implementations]
         o SPI: the 32-bit value used to distinguish among different
           SAs terminating at the same destination and using the same
           IPsec protocol.
           [REQUIRED for all implementations]

   For each of the selectors defined in Section 4.4.2, the SA entry in
   the SAD MUST contain the value or values which were negotiated at the
   time the SA was created.  For the sender, these values are used to
   decide whether a given SA is appropriate for use with an outbound
   packet.  This is part of checking to see if there is an existing SA

   that can be used.  For the receiver, these values are used to check
   that the selector values in an inbound packet match those for the SA
   (and thus indirectly those for the matching policy).  For the
   receiver, this is part of verifying that the SA was appropriate for
   this packet.  (See Section 6 for rules for ICMP messages.)  These
   fields can have the form of specific values, ranges, wildcards, or
   "OPAQUE" as described in section 4.4.2, "Selectors".  Note that for
   an ESP SA, the encryption algorithm or the authentication algorithm
   could be "NULL".  However they MUST not both be "NULL".

   The following SAD fields are used in doing IPsec processing:

         o Sequence Number Counter: a 32-bit value used to generate the
           Sequence Number field in AH or ESP headers.
           [REQUIRED for all implementations, but used only for outbound
           traffic.]
         o Sequence Counter Overflow: a flag indicating whether overflow
           of the Sequence Number Counter should generate an auditable
           event and prevent transmission of additional packets on the
           SA.
           [REQUIRED for all implementations, but used only for outbound
           traffic.]
         o Anti-Replay Window: a 32-bit counter and a bit-map (or
           equivalent) used to determine whether an inbound AH or ESP
           packet is a replay.
           [REQUIRED for all implementations but used only for inbound
           traffic. NOTE: If anti-replay has been disabled by the
           receiver, e.g., in the case of a manually keyed SA, then the
           Anti-Replay Window is not used.]
         o AH Authentication algorithm, keys, etc.
           [REQUIRED for AH implementations]
         o ESP Encryption algorithm, keys, IV mode, IV, etc.
           [REQUIRED for ESP implementations]
         o ESP authentication algorithm, keys, etc. If the
           authentication service is not selected, this field will be
           null.
           [REQUIRED for ESP implementations]
         o Lifetime of this Security Association: a time interval after
           which an SA must be replaced with a new SA (and new SPI) or
           terminated, plus an indication of which of these actions
           should occur.  This may be expressed as a time or byte count,
           or a simultaneous use of both, the first lifetime to expire
           taking precedence. A compliant implementation MUST support
           both types of lifetimes, and must support a simultaneous use
           of both.  If time is employed, and if IKE employs X.509
           certificates for SA establishment, the SA lifetime must
 be
           constrained by the validity intervals of the certificates,
           and the NextIssueDate of the CRLs used in the IKE exchange

           for the SA.  Both initiator and responder are responsible for
           constraining SA lifetime in this fashion.
           [REQUIRED for all implementations]

           NOTE: The details of how to handle the refreshing of keys
           when SAs expire is a local matter.  However, one reasonable
           approach is:
             (a) If byte count is used, then the implementation
                 SHOULD count the number of bytes to which the IPsec
                 algorithm is applied.  For ESP, this is the encryption
                 algorithm (including Null encryption) and for AH,
                 this is the authentication algorithm.  This includes
                 pad bytes, etc.  Note that implementations SHOULD be
                 able to handle having the counters at the ends of an
                 SA get out of synch, e.g., because of packet loss or
                 because the implementations at each end of the SA
                 aren't doing things the same way.
             (b) There SHOULD be two kinds of lifetime -- a soft
                 lifetime which warns the implementation to initiate
                 action such as setting up a replacement SA and a
                 hard lifetime when the current SA ends.
             (c) If the entire packet does not get delivered during
                 the SAs lifetime, the packet SHOULD be discarded.

         o IPsec protocol mode: tunnel, transport or wildcard.
           Indicates which mode of AH or ESP is applied to traffic on
           this SA.  Note that if this field is "wildcard" at the
           sending end of the SA, then the application has to specify
           the mode to the IPsec implementation.  This use of wildcard
           allows the same SA to be used for either tunnel or transport
           mode traffic on a per packet basis, e.g., by different
           sockets.  The receiver does not need to know the mode in
           order to properly process the packet's IPsec headers.

           [REQUIRED as follows, unless implicitly defined by context:
                   - host implementations must support all modes
                   - gateway implementations must support tunnel mode]

           NOTE: The use of wildcard for the protocol mode of an inbound
           SA may add complexity to the situation in the receiver (host
           only).  Since the packets on such an SA could be delivered in
           either tunnel or transport mode, the security of an incoming
           packet could depend in part on which mode had been used to
           deliver it.  If, as a result, an application cared about the
           SA mode of a given packet, then the application would need a
           mechanism to obtain this mode information.

         o Path MTU: any observed path MTU and aging variables.  See
           Section 6.1.2.4
           [REQUIRED for all implementations but used only for outbound
           traffic]

4.5 Basic Combinations of Security Associations

   This section describes four examples of combinations of security
   associations that MUST be supported by compliant IPsec hosts or
   security gateways.  Additional combinations of AH and/or ESP in
   tunnel and/or transport modes MAY be supported at the discretion of
   the implementor.  Compliant implementations MUST be capable of
   generating these four combinations and on receipt, of processing
   them, but SHOULD be able to receive and process any combination.  The
   diagrams and text below describe the basic cases.  The legend for the
   diagrams is:

        ==== = one or more security associations (AH or ESP, transport
               or tunnel)
        ---- = connectivity (or if so labelled, administrative boundary)
        Hx   = host x
        SGx  = security gateway x
        X*   = X supports IPsec

   NOTE: The security associations below can be either AH or ESP.  The
   mode (tunnel vs transport) is determined by the nature of the
   endpoints.  For host-to-host SAs, the mode can be either transport or
   tunnel.

   Case 1.  The case of providing end-to-end security between 2 hosts
        across the Internet (or an Intranet).

                 ====================================
                 |                                  |
                H1* ------ (Inter/Intranet) ------ H2*

        Note that either transport or tunnel mode can be selected by the
        hosts.  So the headers in a packet between H1 and H2 could look
        like any of the following:

                  Transport                  Tunnel
             -----------------          ---------------------
             1. [IP1][AH][upper]        4. [IP2][AH][IP1][upper]
             2. [IP1][ESP][upper]       5. [IP2][ESP][IP1][upper]
             3. [IP1][AH][ESP][upper]

        Note that there is no requirement to support general nesting,
        but in transport mode, both AH and ESP can be applied to the
        packet.  In this event, the SA establishment procedure MUST
        ensure that first ESP, then AH are applied to the packet.

   Case 2.  This case illustrates simple virtual private networks
        support.

                       ===========================
                       |                         |
  ---------------------|----                  ---|-----------------------
  |                    |   |                  |  |                      |
  |  H1 -- (Local --- SG1* |--- (Internet) ---| SG2* --- (Local --- H2  |
  |        Intranet)       |                  |          Intranet)      |
  --------------------------                  ---------------------------
      admin. boundary                               admin. boundary

        Only tunnel mode is required here.  So the headers in a packet
        between SG1 and SG2 could look like either of the following:

                        Tunnel
                ---------------------
                4. [IP2][AH][IP1][upper]
                5. [IP2][ESP][IP1][upper]

   Case 3.  This case combines cases 1 and 2, adding end-to-end security
        between the sending and receiving hosts.  It imposes no new
        requirements on the hosts or security gateways, other than a
        requirement for a security gateway to be configurable to pass
        IPsec traffic (including ISAKMP traffic) for hosts behind it.

     ===============================================================
     |                                                             |
     |                 =========================                   |
     |                 |                       |                   |
  ---|-----------------|----                ---|-------------------|---
  |  |                 |   |                |  |                   |  |
  | H1* -- (Local --- SG1* |-- (Internet) --| SG2* --- (Local --- H2* |
  |        Intranet)       |                |          Intranet)      |
  --------------------------                ---------------------------
       admin. boundary                            admin. boundary

   Case 4.  This covers the situation where a remote host (H1) uses the
        Internet to reach an organization's firewall (SG2) and to then
        gain access to some server or other machine (H2).  The remote
        host could be a mobile host (H1) dialing up to a local
PPP/ARA
        server (not shown) on the Internet and then crossing the
        Internet to the home organization's firewall (SG2), etc.  The

        details of support for this case, (how H1 locates SG2,
        authenticates it, and verifies its authorization to represent
        H2) are discussed in Section 4.6.3, "Locating a Security
        Gateway".

        ======================================================
        |                                                    |
        |==============================                      |
        ||                            |                      |
        ||                         ---|----------------------|---
        ||                         |  |                      |  |
        H1* ----- (Internet) ------| SG2* ---- (Local ----- H2* |
              ^                    |           Intranet)        |
              |                    ------------------------------
        could be dialup              admin. boundary (optional)
        to PPP/ARA server

        Only tunnel mode is required between H1 and SG2.  So the choices
        for the SA between H1 and SG2 would be one of the ones in case
        2.  The choices for the SA between H1 and H2 would be one of the
        ones in case 1.

        Note that in this case, the sender MUST apply the transport
        header before the tunnel header.  Therefore the management
        interface to the IPsec implementation MUST support configuration
        of the SPD and SAD to ensure this ordering of IPsec header
        application.

   As noted above, support for additional combinations of AH and ESP is
   optional.  Use of other, optional combinations may adversely affect
   interoperability.

4.6 SA and Key Management

   IPsec mandates support for both manual and automated SA and
   cryptographic key management.  The IPsec protocols, AH and ESP, are
   largely independent of the associated SA management techniques,
   although the techniques involved do affect some of the security
   services offered by the protocols.  For example, the optional anti-
   replay services available for AH and ESP require automated SA
   management.  Moreover, the granularity of key distribution employed
   with IPsec determines the granularity of authentication provided.
   (See also a discussion of this issue in Section 4.7.)  In general,
   data origin authentication in AH and ESP is limited by the extent to
   which secrets used with the authentication algorithm (or with a key
   management protocol that creates such secrets) are shared among
   multiple possible sources.

   The following text describes the minimum requirements for both types
   of SA management.

4.6.1 Manual Techniques

   The simplest form of management is manual management, in which a
   person manually configures each system with keying material and
   security association management data relevant to secure communication
   with other systems.  Manual techniques are practical in small, static
   environments but they do not scale well.  For example, a company
   could create a Virtual Private Network (VPN) using IPsec in security
   gateways at several sites.  If the number of sites is small, and
   since all the sites come under the purview of a single administrative
   domain, this is likely to be a feasible context for manual management
   techniques.  In this case, the security gateway might selectively
   protect traffic to and from other sites within the organization using
   a manually configured key, while not protecting traffic for other
   destinations.  It also might be appropriate when only selected
   communications need to be secured.  A similar argument might apply to
   use of IPsec entirely within an organization for a small number of
   hosts and/or gateways.  Manual management techniques often employ
   statically configured, symmetric keys, though other options also
   exist.

4.6.2 Automated SA and Key Management

   Widespread deployment and use of IPsec requires an Internet-standard,
   scalable, automated, SA management protocol.  Such support is
   required to facilitate use of the anti-replay features of AH and ESP,
   and to accommodate on-demand creation of SAs, e.g., for user- and
   session-oriented keying.  (Note that the notion of "rekeying" an SA
   actually implies creation of a new SA with a new SPI, a process that
   generally implies use of an automated SA/key management protocol.)

   The default automated key management protocol selected for use with
   IPsec is IKE [MSST97, Orm97, HC98] under the IPsec domain of
   interpretation [Pip98].  Other automated SA management protocols MAY
   be employed.

   When an automated SA/key management protocol is employed, the output
   from this protocol may be used to generate multiple keys, e.g., for a
   single ESP SA.  This may arise because:

       o the encryption algorithm uses multiple keys (e.g., triple DES)
       o the authentication algorithm uses multiple keys
       o both encryption and authentication algorithms are employed

   The Key Management System may provide a separate string of bits for
   each key or it may generate one string of bits from which all of them
   are extracted.  If a single string of bits is provided, care needs to
   be taken to ensure that the parts of the system that map the string
   of bits to the required keys do so in the same fashion at both ends
   of the SA.  To ensure that the IPsec implementations at each end of
   the SA use the same bits for the same keys, and irrespective of which
   part of the system divides the string of bits into individual keys,
   the encryption key(s) MUST be taken from the first (left-most, high-
   order) bits and the authentication key(s) MUST be taken from the
   remaining bits.  The number of bits for each key is defined in the
   relevant algorithm specification RFC.  In the case of multiple
   encryption keys or multiple authentication keys, the specification
   for the algorithm must specify the order in which they are to be
   selected from a single string of bits provided to the algorithm.

4.6.3 Locating a Security Gateway

   This section discusses issues relating to how a host learns about the
   existence of relevant security gateways and once a host has contacted
   these security gateways, how it knows that these are the correct
   security gateways.  The details of where the required information is
   stored is a local matter.

   Consider a situation in which a remote host (H1) is using the
   Internet to gain access to a server or other machine (H2) and there
   is a security gateway (SG2), e.g., a firewall, through which H1's
   traffic must pass.  An example of this situation would be a mobile
   host (Road Warrior) crossing the Internet to the home organization's
   firewall (SG2).  (See Case 4 in the section 4.5 Basic Combinations of
   Security Associations.) This situation raises several issues:

        1. How does H1 know/learn about the existence of the security
           gateway SG2?
        2. How does it authenticate SG2, and once it has authenticated
           SG2, how does it confirm that SG2 has been authorized to
           represent H2?
        3. How does SG2 authenticate H1 and verify that H1 is authorized
           to contact H2?
        4. How does H1 know/learn about backup gateways which provide
           alternate paths to H2?

   To address these problems, a host or secu
rity gateway MUST have an
   administrative interface that allows the user/administrator to
   configure the address of a security gateway for any sets of
   destination addresses that require its use. This includes the ability
   to configure:

        o the requisite information for locating and authenticating the
          security gateway and verifying its authorization to represent
          the destination host.
        o the requisite information for locating and authenticating any
          backup gateways and verifying their authorization to represent
          the destination host.

   It is assumed that the SPD is also configured with policy information
   that covers any other IPsec requirements for the path to the security
   gateway and the destination host.

   This document does not address the issue of how to automate the
   discovery/verification of security gateways.

4.7 Security Associations and Multicast

   The receiver-orientation of the Security Association implies that, in
   the case of unicast traffic, the destination system will normally
   select the SPI value.  By having the destination select the SPI
   value, there is no potential for manually configured Security
   Associations to conflict with automatically configured (e.g., via a
   key management protocol) Security Associations or for Security
   Associations from multiple sources to conflict with each other.  For
   multicast traffic, there are multiple destination systems per
   multicast group.  So some system or person will need to coordinate
   among all multicast groups to select an SPI or SPIs on behalf of each
   multicast group and then communicate the group's IPsec information to
   all of the legitimate members of that multicast group via mechanisms
   not defined here.

   Multiple senders to a multicast group SHOULD use a single Security
   Association (and hence Security Parameter Index) for all traffic to
   that group when a symmetric key encryption or authentication
   algorithm is employed. In such circumstances, the receiver knows only
   that the message came from a system possessing the key for that
   multicast group.  In such circumstances, a receiver generally will
   not be able to authenticate which system sent the multicast traffic.
   Specifications for other, more general multicast cases are deferred
   to later IPsec documents.

   At the time this specification was published, automated protocols for
   multicast key distribution were not considered adequately mature for
   standardization.  For multicast groups having relatively few members,
   manual key distribution or multiple use of existing unicast key
   distribution algorithms such as modified Diffie-Hellman appears
   feasible.  For very large groups, new scalable techniques will be
   needed.  An example of current work in this area is the Group Key
   Management Protocol (GKMP) [HM97].

5. IP Traffic Processing

   As mentioned in Section 4.4.1 "The Security Policy Database (SPD)",
   the SPD must be consulted during the processing of all traffic
   (INBOUND and OUTBOUND), including non-IPsec traffic.  If no policy is
   found in the SPD that matches the packet (for either inbound or
   outbound traffic), the packet MUST be discarded.

   NOTE: All of the cryptographic algorithms used in IPsec expect their
   input in canonical network byte order (see Appendix in RFC 791) and
   generate their output in canonical network byte order.  IP packets
   are also transmitted in network byte order.

5.1 Outbound IP Traffic Processing

5.1.1 Selecting and Using an SA or SA Bundle

   In a security gateway or BITW implementation (and in many BITS
   implementations), each outbound packet is compared against the SPD to
   determine what processing is required for the packet.  If the packet
   is to be discarded, this is an auditable event.  If the traffic is
   allowed to bypass IPsec processing, the packet continues through
   "normal" processing for the environment in which the IPsec processing
   is taking place.  If IPsec processing is required, the packet is
   either mapped to an existing SA (or SA bundle), or a new SA (or SA
   bundle) is created for the packet.  Since a packet's selectors might
   match multiple policies or multiple extant SAs and since the SPD is
   ordered, but the SAD is not, IPsec MUST:

           1. Match the packet's selector fields against the outbound
              policies in the SPD to locate the first appropriate
              policy, which will point to zero or more SA bundles in the
              SAD.

           2. Match the packet's selector fields against those in the SA
              bundles found in (1) to locate the first SA bundle that
              matches.  If no SAs were found or none match, create an
              appropriate SA bundle and link the SPD entry to the SAD
              entry.  If no key management entity is found, drop the
              packet.

           3. Use the SA bundle found/created in (2) to do the required
              IPsec processing, e.g., authenticate and encrypt.

   In a host IPsec implementation based on sockets, the SPD will be
   consulted whenever a new socket is created, to determine what, if
   any, IPsec processing will be applied to the traffic that will flow
   on that socket.

   NOTE: A compliant implementation MUST not allow instantiation of an
   ESP SA that employs both a NULL encryption and a NULL authentication
   algorithm.  An attempt to negotiate such an SA is an auditable event.

5.1.2 Header Construction for Tunnel Mode

   This section describes the handling of the inner and outer IP
   headers, extension headers, and options for AH and ESP tunnels.  This
   includes how to construct the encapsulating (outer) IP header, how to
   handle fields in the inner IP header, and what other actions should
   be taken.  The general idea is modeled after the one used in RFC
   2003, "IP Encapsulation with IP":

        o The outer IP header Source Address and Destination Address
          identify the "endpoints" of the tunnel (the encapsulator and
          decapsulator).  The inner IP header Source Address and
          Destination Addresses identify the original sender and
          recipient of the datagram, (from the perspective of this
          tunnel), respectively.  (see footnote 3 after the table in
          5.1.2.1 for more details on the encapsulating source IP
          address.)
        o The inner IP header is not changed except to decrement the TTL
          as noted below, and remains unchanged during its delivery to
          the tunnel exit point.
        o No change to IP options or extension headers in the inner
          header occurs during delivery of the encapsulated datagram
          through the tunnel.
        o If need be, other protocol headers such as the IP
          Authentication header may be inserted between the outer IP
          header and the inner IP header.

   The tables in the following sub-sections show the handling for the
   different header/option fields (constructed = the value in the outer
   field is constructed independently of the value in the inner).

5.1.2.1 IPv4 -- Header Construction for Tunnel Mode

                        <-- How Outer Hdr Relates to Inner Hdr -->
                        Outer Hdr at                 Inner Hdr at
   IPv4                 Encapsulator                 Decapsulator
     Header fiel
ds:     --------------------         ------------
       version          4 (1)                        no change
       header length    constructed                  no change
       TOS              copied from inner hdr (5)    no change
       total length     constructed                  no change
       ID               constructed                  no change
       flags (DF,MF)    constructed, DF (4)          no change
       fragmt offset    constructed                  no change

       TTL              constructed (2)              decrement (2)
       protocol         AH, ESP, routing hdr         no change
       checksum         constructed                  constructed (2)
       src address      constructed (3)              no change
       dest address     constructed (3)              no change
   Options            never copied                 no change

        1. The IP version in the encapsulating header can be different
           from the value in the inner header.

        2. The TTL in the inner header is decremented by the
           encapsulator prior to forwarding and by the decapsulator if
           it forwards the packet.  (The checksum changes when the TTL
           changes.)

           Note: The decrementing of the TTL is one of the usual actions
           that takes place when forwarding a packet.  Packets
           originating from the same node as the encapsulator do not
           have their TTL's decremented, as the sending node is
           originating the packet rather than forwarding it.

        3. src and dest addresses depend on the SA, which is used to
           determine the dest address which in turn determines which src
           address (net interface) is used to forward the packet.

           NOTE: In principle, the encapsulating IP source address can
           be any of the encapsulator's interface addresses or even an
           address different from any of the encapsulator's IP
           addresses, (e.g., if it's acting as a NAT box) so long as the
           address is reachable through the encapsulator from the
           environment into which the packet is sent.  This does not
           cause a problem because IPsec does not currently have any
           INBOUND processing requirement that involves the Source
           Address of the encapsulating IP header.  So while the
           receiving tunnel endpoint looks at the Destination Address in
           the encapsulating IP header, it only looks at the Source
           Address in the inner (encapsulated) IP header.

        4. configuration determines whether to copy from the inner
           header (IPv4 only), clear or set the DF.

        5. If Inner Hdr is IPv4 (Protocol = 4), copy the TOS.  If Inner
           Hdr is IPv6 (Protocol = 41), map the Class to TOS.

5.1.2.2 IPv6 -- Header Construction for Tunnel Mode

   See previous section 5.1.2 for notes 1-5 indicated by (footnote
   number).

                        <-- How Outer Hdr  Relates Inner Hdr --->
                        Outer Hdr at                 Inner Hdr at
   IPv6                 Encapsulator                 Decapsulator
     Header fields:     --------------------         ------------
       version          6 (1)                        no change
       class            copied or configured (6)     no change
       flow id          copied or configured         no change
       len              constructed                  no change
       next header      AH,ESP,routing hdr           no change
       hop limit        constructed (2)              decrement (2)
       src address      constructed (3)              no change
       dest address     constructed (3)              no change
     Extension headers  never copied                 no change

        6. If Inner Hdr is IPv6 (Next Header = 41), copy the Class.  If
           Inner Hdr is IPv4 (Next Header = 4), map the TOS to Class.

5.2 Processing Inbound IP Traffic

   Prior to performing AH or ESP processing, any IP fragments are
   reassembled.  Each inbound IP datagram to which IPsec processing will
   be applied is identified by the appearance of the AH or ESP values in
   the IP Next Protocol field (or of AH or ESP as an extension header in
   the IPv6 context).

   Note: Appendix C contains sample code for a bitmask check for a 32
   packet window that can be used for implementing anti-replay service.

5.2.1 Selecting and Using an SA or SA Bundle

   Mapping the IP datagram to the appropriate SA is simplified because
   of the presence of the SPI in the AH or ESP header.  Note that the
   selector checks are made on the inner headers not the outer (tunnel)
   headers.  The steps followed are:

           1. Use the packet's destination address (outer IP header),
              IPsec protocol, and SPI to look up the SA in the SAD.  If
              the SA lookup fails, drop the packet and log/report the
              error.

           2. Use the SA found in (1) to do the IPsec processing, e.g.,
              authenticate and decrypt.  This step includes matching the
              packet's (Inner Header if tunneled) selectors to the
              selectors in the SA.  Local policy determines the
              specificity of the SA selectors (single value, list,
              range, wildcard).  In general, a packet's source address
              MUST match the SA selector value.  However, an ICMP packet
              received on a tunnel mode SA may have a source address

              other than that bound to the SA and thus such packets
              should be permitted as exceptions to this check.  For an
              ICMP packet, the selectors from the enclosed problem
              packet (the source and destination addresses and ports
              should be swapped) should be checked against the selectors
              for the SA.  Note that some or all of these selectors may
              be inaccessible because of limitations on how many bits of
              the problem packet the ICMP packet is allowed to carry or
              due to encryption.  See Section 6.

              Do (1) and (2) for every IPsec header until a Transport
              Protocol Header or an IP header that is NOT for this
              system is encountered.  Keep track of what SAs have been
              used and their order of application.

           3. Find an incoming policy in the SPD that matches the
              packet.  This could be done, for example, by use of
              backpointers from the SAs to the SPD or by matching the
              packet's selectors (Inner Header if tunneled) against
              those of the policy entries in the SPD.

           4. Check whether the required IPsec processing has been
              applied, i.e., verify that the SA's found in (1) and (2)
              match the kind and order of SAs required by the policy
              found in (3).

              NOTE: The correct "matching" policy will not necessarily
              be the first inbound policy found.  If the check in (4)
              fails, steps (3) and (4) are repeated until all policy
              entries have been checked or until the check succeeds.

   At the end of these steps, pass the resulting packet to the Transport
   Layer or forward the packet.  Note that any IPsec headers processed
   in these steps may have been removed, but that this information,
   i.e., what SAs were used and th
e order of their application, may be
   needed for subsequent IPsec or firewall processing.

   Note that in the case of a security gateway, if forwarding causes a
   packet to exit via an IPsec-enabled interface, then additional IPsec
   processing may be applied.

5.2.2 Handling of AH and ESP tunnels

   The handling of the inner and outer IP headers, extension headers,
   and options for AH and ESP tunnels should be performed as described
   in the tables in Section 5.1.

6. ICMP Processing (relevant to IPsec)

   The focus of this section is on the handling of ICMP error messages.
   Other ICMP traffic, e.g., Echo/Reply, should be treated like other
   traffic and can be protected on an end-to-end basis using SAs in the
   usual fashion.

   An ICMP error message protected by AH or ESP and generated by a
   router SHOULD be processed and forwarded in a tunnel mode SA.  Local
   policy determines whether or not it is subjected to source address
   checks by the router at the destination end of the tunnel.  Note that
   if the router at the originating end of the tunnel is forwarding an
   ICMP error message from another router, the source address check
   would fail.  An ICMP message protected by AH or ESP and generated by
   a router MUST NOT be forwarded on a transport mode SA (unless the SA
   has been established to the router acting as a host, e.g., a Telnet
   connection used to manage a router).  An ICMP message generated by a
   host SHOULD be checked against the source IP address selectors bound
   to the SA in which the message arrives.  Note that even if the source
   of an ICMP error message is authenticated, the returned IP header
   could be invalid. Accordingly, the selector values in the IP header
   SHOULD also be checked to be sure that they are consistent with the
   selectors for the SA over which the ICMP message was received.

   The table in Appendix D characterize ICMP messages as being either
   host generated, router generated, both, unknown/unassigned.  ICMP
   messages falling into the last two categories should be handled as
   determined by the receiver's policy.

   An ICMP message not protected by AH or ESP is unauthenticated and its
   processing and/or forwarding may result in denial of service.  This
   suggests that, in general, it would be desirable to ignore such
   messages.  However, it is expected that many routers (vs. security
   gateways) will not implement IPsec for transit traffic and thus
   strict adherence to this rule would cause many ICMP messages to be
   discarded.  The result is that some critical IP functions would be
   lost, e.g., redirection and PMTU processing.  Thus it MUST be
   possible to configure an IPsec implementation to accept or reject
   (router) ICMP traffic as per local security policy.

   The remainder of this section addresses how PMTU processing MUST be
   performed at hosts and security gateways.  It addresses processing of
   both authenticated and unauthenticated ICMP PMTU messages.  However,
   as noted above, unauthenticated ICMP messages MAY be discarded based
   on local policy.

6.1 PMTU/DF Processing

6.1.1 DF Bit

   In cases where a system (host or gateway) adds an encapsulating
   header (ESP tunnel or AH tunnel), it MUST support the option of
   copying the DF bit from the original packet to the encapsulating
   header (and processing ICMP PMTU messages).  This means that it MUST
   be possible to configure the system's treatment of the DF bit (set,
   clear, copy from encapsulated header) for each interface.  (See
   Appendix B for rationale.)

6.1.2 Path MTU Discovery (PMTU)

   This section discusses IPsec handling for Path MTU Discovery
   messages.  ICMP PMTU is used here to refer to an ICMP message for:

           IPv4 (RFC 792):
                   - Type = 3 (Destination Unreachable)
                   - Code = 4 (Fragmentation needed and DF set)
                   - Next-Hop MTU in the low-order 16 bits of the second
                     word of the ICMP header (labelled "unused" in RFC
                     792), with high-order 16 bits set to zero

           IPv6 (RFC 1885):
                   - Type = 2 (Packet Too Big)
                   - Code = 0 (Fragmentation needed)
                   - Next-Hop MTU in the 32 bit MTU field of the ICMP6
                     message

6.1.2.1 Propagation of PMTU

   The amount of information returned with the ICMP PMTU message (IPv4
   or IPv6) is limited and this affects what selectors are available for
   use in further propagating the PMTU information.  (See Appendix B for
   more detailed discussion of this topic.)

   o PMTU message with 64 bits of IPsec header -- If the ICMP PMTU
     message contains only 64 bits of the IPsec header (minimum for
     IPv4), then a security gateway MUST support the following options
     on a per SPI/SA basis:

        a. if the originating host can be determined (or the possible
           sources narrowed down to a manageable number), send the PM
           information to all the possible originating hosts.
        b. if the originating host cannot be determined, store the PMTU
           with the SA and wait until the next packet(s) arrive from the
           originating host for the relevant security association.  If

           the packet(s) are bigger than the PMTU, drop the packet(s),
           and compose ICMP PMTU message(s) with the new packet(s) and
           the updated PMTU, and send the ICMP message(s) about the
           problem to the originating host. Retain the PMTU information
           for any message that might arrive subsequently (see Section
           6.1.2.4, "PMTU Aging").

   o PMTU message with >64 bits of IPsec header -- If the ICMP message
     contains more information from the original packet then there may
     be enough non-opaque information to immediately determine to which
     host to propagate the ICMP/PMTU message and to provide that system
     with the 5 fields (source address, destination address, source
     port, destination port, transport protocol) needed to determine
     where to store/update the PMTU.  Under such circumstances, a
     security gateway MUST generate an ICMP PMTU message immediately
     upon receipt of an ICMP PMTU from further down the path.

   o Distributing the PMTU to the Transport Layer -- The host mechanism
     for getting the updated PMTU to the transport layer is unchanged,
     as specified in RFC 1191 (Path MTU Discovery).

6.1.2.2 Calculation of PMTU

   The calculation of PMTU from an ICMP PMTU MUST take into account the
   addition of any IPsec header -- AH transport, ESP transport, AH/ESP
   transport, ESP tunnel, AH tunnel.  (See Appendix B for discussion of
   implementation issues.)

   Note: In some situations the addition of IPsec headers could result
   in an effective PMTU (as seen by the host or application) that is
   unacceptably small.  To avoid this problem, the implementation may
   establish a threshold below which it will not report a reduced PMTU.
   In such cases, the implementation would apply IPsec and then fragment
   the resulting packet according to the PMTU.  This would result in a
   more efficient use of the available bandwidth.

6.1.2.3 Granularity of PMTU Processing

   In hosts, the granularity with which ICMP PMTU processing can be done
   differs depending on the implementation situation.  Looking at a
   host, ther
e are 3 situations that are of interest with respect to
   PMTU issues (See Appendix B for additional details on this topic.):

        a. Integration of IPsec into the native IP implementation
        b. Bump-in-the-stack implementations, where IPsec is implemented
           "underneath" an existing implementation of a TCP/IP protocol
           stack, between the native IP and the local network drivers

        c. No IPsec implementation -- This case is included because it
           is relevant in cases where a security gateway is sending PMTU
           information back to a host.

   Only in case (a) can the PMTU data be maintained at the same
   granularity as communication associations.  In (b) and (c), the IP
   layer will only be able to maintain PMTU data at the granularity of
   source and destination IP addresses (and optionally TOS), as
   described in RFC 1191.  This is an important difference, because more
   than one communication association may map to the same source and
   destination IP addresses, and each communication association may have
   a different amount of IPsec header overhead (e.g., due to use of
   different transforms or different algorithms).

   Implementation of the calculation of PMTU and support for PMTUs at
   the granularity of individual communication associations is a local
   matter.  However, a socket-based implementation of IPsec in a host
   SHOULD maintain the information on a per socket basis.  Bump in the
   stack systems MUST pass an ICMP PMTU to the host IP implementation,
   after adjusting it for any IPsec header overhead added by these
   systems.  The calculation of the overhead SHOULD be determined by
   analysis of the SPI and any other selector information present in a
   returned ICMP PMTU message.

6.1.2.4 PMTU Aging

   In all systems (host or gateway) implementing IPsec and maintaining
   PMTU information, the PMTU associated with a security association
   (transport or tunnel) MUST be "aged" and some mechanism put in place
   for updating the PMTU in a timely manner, especially for discovering
   if the PMTU is smaller than it needs to be.  A given PMTU has to
   remain in place long enough for a packet to get from the source end
   of the security association to the system at the other end of the
   security association and propagate back an ICMP error message if the
   current PMTU is too big.  Note that if there are nested tunnels,
   multiple packets and round trip times might be required to get an
   ICMP message back to an encapsulator or originating host.

   Systems SHOULD use the approach described in the Path MTU Discovery
   document (RFC 1191, Section 6.3), which suggests periodically
   resetting the PMTU to the first-hop data-link MTU and then letting
   the normal PMTU Discovery processes update the PMTU as necessary.
   The period SHOULD be configurable.

7. Auditing

   Not all systems that implement IPsec will implement auditing.  For
   the most part, the granularity of auditing is a local matter.
   However, several auditable events are identified in the AH and ESP
   specifications and for each of these events a minimum set of
   information that SHOULD be included in an audit log is defined.
   Additional information also MAY be included in the audit log for each
   of these events, and additional events, not explicitly called out in
   this specification, also MAY result in audit log entries.  There is
   no requirement for the receiver to transmit any message to the
   purported transmitter in response to the detection of an auditable
   event, because of the potential to induce denial of service via such
   action.

8. Use in Systems Supporting Information Flow Security

   Information of various sensitivity levels may be carried over a
   single network.  Information labels (e.g., Unclassified, Company
   Proprietary, Secret) [DoD85, DoD87] are often employed to distinguish
   such information.  The use of labels facilitates segregation of
   information, in support of information flow security models, e.g.,
   the Bell-LaPadula model [BL73].  Such models, and corresponding
   supporting technology, are designed to prevent the unauthorized flow
   of sensitive information, even in the face of Trojan Horse attacks.
   Conventional, discretionary access control (DAC) mechanisms, e.g.,
   based on access control lists, generally are not sufficient to
   support such policies, and thus facilities such as the SPD do not
   suffice in such environments.

   In the military context, technology that supports such models is
   often referred to as multi-level security (MLS).  Computers and
   networks often are designated "multi-level secure" if they support
   the separation of labelled data in conjunction with information flow
   security policies.  Although such technology is more broadly
   applicable than just military applications, this document uses the
   acronym "MLS" to designate the technology, consistent with much
   extant literature.

   IPsec mechanisms can easily support MLS networking.  MLS networking
   requires the use of strong Mandatory Access Controls (MAC), which
   unprivileged users or unprivileged processes are incapable of
   controlling or violating.  This section pertains only to the use of
   these IP security mechanisms in MLS (information flow security
   policy) environments.  Nothing in this section applies to systems not
   claiming to provide MLS.

   As used in this section, "sensitivity information" might include
   implementation-defined hierarchic levels, categories, and/or
   releasability information.

   AH can be used to provide strong authentication in support of
   mandatory access control decisions in MLS environments.  If explicit
   IP sensitivity information (e.g., IPSO [Ken91]) is used and
   confidentiality is not considered necessary within the particular
   operational environment, AH can be used to authenticate the binding
   between sensitivity labels in the IP header and the IP payload
   (including user data).  This is a significant improvement over
   labeled IPv4 networks where the sensitivity information is trusted
   even though there is no authentication or cryptographic binding of
   the information to the IP header and user data.  IPv4 networks might
   or might not use explicit labelling.  IPv6 will normally use implicit
   sensitivity information that is part of the IPsec Security
   Association but not transmitted with each packet instead of using
   explicit sensitivity information.  All explicit IP sensitivity
   information MUST be authenticated using either ESP, AH, or both.

   Encryption is useful and can be desirable even when all of the hosts
   are within a protected environment, for example, behind a firewall or
   disjoint from any external connectivity.  ESP can be used, in
   conjunction with appropriate key management and encryption
   algorithms, in support of both DAC and MAC.  (The choice of
   encryption and authentication algorithms, and the assurance level of
   an IPsec implementation will determine the environments in which an
   implementation may be deemed sufficient to satisfy MLS requirements.)
   Key management can make use of sensitivity information to provide
   MAC.  IPsec implementations on systems claiming to provide MLS SHOULD
   be capable of using IPsec to provide MAC for IP-based communications.

8.1 Relationship Between Security Associations and Data Sensitivity

   Both the Encapsulating Security Pay
load and the Authentication Header
   can be combined with appropriate Security Association policies to
   provide multi-level secure networking.  In this case each SA (or SA
   bundle) is normally used for only a single instance of sensitivity
   information.  For example, "PROPRIETARY - Internet Engineering" must
   be associated with a different SA (or SA bundle) from "PROPRIETARY -
   Finance".

8.2 Sensitivity Consistency Checking

   An MLS implementation (both host and router) MAY associate
   sensitivity information, or a range of sensitivity information with
   an interface, or a configured IP address with its associated prefix
   (the latter is sometimes referred to as a logical interface, or an

   interface alias).  If such properties exist, an implementation SHOULD
   compare the sensitivity information associated with the packet
   against the sensitivity information associated with the interface or
   address/prefix from which the packet arrived, or through which the
   packet will depart.  This check will either verify that the
   sensitivities match, or that the packet's sensitivity falls within
   the range of the interface or address/prefix.

   The checking SHOULD be done on both inbound and outbound processing.

8.3 Additional MLS Attributes for Security Association Databases

   Section 4.4 discussed two Security Association databases (the
   Security Policy Database (SPD) and the Security Association Database
   (SAD)) and the associated policy selectors and SA attributes.  MLS
   networking introduces an additional selector/attribute:

           - Sensitivity information.

   The Sensitivity information aids in selecting the appropriate
   algorithms and key strength, so that the traffic gets a level of
   protection appropriate to its importance or sensitivity as described
   in section 8.1.  The exact syntax of the sensitivity information is
   implementation defined.

8.4 Additional Inbound Processing Steps for MLS Networking

   After an inbound packet has passed through IPsec processing, an MLS
   implementation SHOULD first check the packet's sensitivity (as
   defined by the SA (or SA bundle) used for the packet) with the
   interface or address/prefix as described in section 8.2 before
   delivering the datagram to an upper-layer protocol or forwarding it.

   The MLS system MUST retain the binding between the data received in
   an IPsec protected packet and the sensitivity information in the SA
   or SAs used for processing, so appropriate policy decisions can be
   made when delivering the datagram to an application or forwarding
   engine.  The means for maintaining this binding are implementation
   specific.

8.5 Additional Outbound Processing Steps for MLS Networking

   An MLS implementation of IPsec MUST perform two additional checks
   besides the normal steps detailed in section 5.1.1.  When consulting
   the SPD or the SAD to find an outbound security association, the MLS
   implementation MUST use the sensitivity of the data to select an

   appropriate outbound SA or SA bundle.  The second check comes before
   forwarding the packet out to its destination, and is the sensitivity
   consistency checking described in section 8.2.

8.6 Additional MLS Processing for Security Gateways

   An MLS security gateway MUST follow the previously mentioned inbound
   and outbound processing rules as well as perform some additional
   processing specific to the intermediate protection of packets in an
   MLS environment.

   A security gateway MAY act as an outbound proxy, creating SAs for MLS
   systems that originate packets forwarded by the gateway.  These MLS
   systems may explicitly label the packets to be forwarded, or the
   whole originating network may have sensitivity characteristics
   associated with it.  The security gateway MUST create and use
   appropriate SAs for AH, ESP, or both, to protect such traffic it
   forwards.

   Similarly such a gateway SHOULD accept and process inbound AH and/or
   ESP packets and forward appropriately, using explicit packet
   labeling, or relying on the sensitivity characteristics of the
   destination network.

9. Performance Issues

   The use of IPsec imposes computational performance costs on the hosts
   or security gateways that implement these protocols.  These costs are
   associated with the memory needed for IPsec code and data structures,
   and the computation of integrity check values, encryption and
   decryption, and added per-packet handling.  The per-packet
   computational costs will be manifested by increased latency and,
   possibly, reduced throughout.  Use of SA/key management protocols,
   especially ones that employ public key cryptography, also adds
   computational performance costs to use of IPsec.  These per-
   association computational costs will be manifested in terms of
   increased latency in association establishment.  For many hosts, it
   is anticipated that software-based cryptography will not appreciably
   reduce throughput, but hardware may be required for security gateways
   (since they represent aggregation points), and for some hosts.

   The use of IPsec also imposes bandwidth utilization costs on
   transmission, switching, and routing components of the Internet
   infrastructure, components not implementing IPsec.  This is due to
   the increase in the packet size resulting from the addition of AH
   and/or ESP headers, AH and ESP tunneling (which adds a second IP
   header), and the increased packet traffic associated with key
   management protocols.  It is anticipated that, in most instances,

   this increased bandwidth demand will not noticeably affect the
   Internet infrastructure.  However, in some instances, the effects may
   be significant, e.g., transmission of ESP encrypted traffic over a
   dialup link that otherwise would have compressed the traffic.

   Note: The initial SA establishment overhead will be felt in the first
   packet.  This delay could impact the transport layer and application.
   For example, it could cause TCP to retransmit the SYN before the
   ISAKMP exchange is done.  The effect of the delay would be different
   on UDP than TCP because TCP shouldn't transmit anything other than
   the SYN until the connection is set up whereas UDP will go ahead and
   transmit data beyond the first packet.

   Note: As discussed earlier, compression can still be employed at
   layers above IP.  There is an IETF working group (IP Payload
   Compression Protocol (ippcp)) working on "protocol specifications
   that make it possible to perform lossless compression on individual
   payloads before the payload is processed by a protocol that encrypts
   it. These specifications will allow for compression operations to be
   performed prior to the encryption of a payload by IPsec protocols."

10. Conformance Requirements

   All IPv4 systems that claim to implement IPsec MUST comply with all
   requirements of the Security Architecture document.  All IPv6 systems
   MUST comply with all requirements of the Security Architecture
   document.

11. Security Considerations

   The focus of this document is security; hence security considerations
   permeate this specification.

12. Differences from RFC 1825

   This architecture document differs substantially from RFC 1825 in
   detail and in organization, but the fundamental notions are
   unchanged.  This document provid
es considerable additional detail in
   terms of compliance specifications.  It introduces the SPD and SAD,
   and the notion of SA selectors.  It is aligned with the new versions
   of AH and ESP, which also differ from their predecessors.  Specific
   requirements for supported combinations of AH and ESP are newly
   added, as are details of PMTU management.

Acknowledgements

   Many of the concepts embodied in this specification were derived from
   or influenced by the US Government's SP3 security protocol, ISO/IEC's
   NLSP, the proposed swIPe security protocol [SDNS, ISO, IB93, IBK93],
   and the work done for SNMP Security and SNMPv2 Security.

   For over 3 years (although it sometimes seems *much* longer), this
   document has evolved through multiple versions and iterations.
   During this time, many people have contributed significant ideas and
   energy to the process and the documents themselves.  The authors
   would like to thank Karen Seo for providing extensive help in the
   review, editing, background research, and coordination for this
   version of the specification.  The authors would also like to thank
   the members of the IPsec and IPng working groups, with special
   mention of the efforts of (in alphabetic order): Steve Bellovin,
   Steve Deering, James Hughes, Phil Karn, Frank Kastenholz, Perry
   Metzger, David Mihelcic, Hilarie Orman, Norman Shulman, William
   Simpson, Harry Varnis, and Nina Yuan.

Appendix A -- Glossary

   This section provides definitions for several key terms that are
   employed in this document.  Other documents provide additional
   definitions and background information relevant to this technology,
   e.g., [VK83, HA94].  Included in this glossary are generic security
   service and security mechanism terms, plus IPsec-specific terms.

     Access Control
        Access control is a security service that prevents unauthorized
        use of a resource, including the prevention of use of a resource
        in an unauthorized manner.  In the IPsec context, the resource
        to which access is being controlled is often:
                o for a host, computing cycles or data
                o for a security gateway, a network behind the gateway
        or
                  bandwidth on that network.

     Anti-replay
        [See "Integrity" below]

     Authentication
        This term is used informally to refer to the combination of two
        nominally distinct security services, data origin authentication
        and connectionless integrity.  See the definitions below for
        each of these services.

     Availability
        Availability, when viewed as a security service, addresses the
        security concerns engendered by attacks against networks that
        deny or degrade service.  For example, in the IPsec context, the
        use of anti-replay mechanisms in AH and ESP support
        availability.

     Confidentiality
        Confidentiality is the security service that protects data from
        unauthorized disclosure.  The primary confidentiality concern in
        most instances is unauthorized disclosure of application level
        data, but disclosure of the external characteristics of
        communication also can be a concern in some circumstances.
        Traffic flow confidentiality is the service that addresses this
        latter concern by concealing source and destination addresses,
        message length, or frequency of communication.  In the IPsec
        context, using ESP in tunnel mode, especially at a security
        gateway, can provide some level of traffic flow confidentiality.
        (See also traffic analysis, below.)

     Encryption
        Encryption is a security mechanism used to transform data from
        an intelligible form (plaintext) into an unintelligible form
        (ciphertext), to provide confidentiality.  The inverse
        transformation process is designated "decryption".  Oftimes the
        term "encryption" is used to generically refer to both
        processes.

     Data Origin Authentication
        Data origin authentication is a security service that verifies
        the identity of the claimed source of data.  This service is
        usually bundled with connectionless integrity service.

     Integrity
        Integrity is a security service that ensures that modifications
        to data are detectable.  Integrity comes in various flavors to
        match application requirements.  IPsec supports two forms of
        integrity: connectionless and a form of partial sequence
        integrity.  Connectionless integrity is a service that detects
        modification of an individual IP datagram, without regard to the
        ordering of the datagram in a stream of traffic.  The form of
        partial sequence integrity offered in IPsec is referred to as
        anti-replay integrity, and it detects arrival of duplicate IP
        datagrams (within a constrained window).  This is in contrast to
        connection-oriented integrity, which imposes more stringent
        sequencing requirements on traffic, e.g., to be able to detect
        lost or re-ordered messages.  Although authentication and
        integrity services often are cited separately, in practice they
        are intimately connected and almost always offered in tandem.

     Security Association (SA)
        A simplex (uni-directional) logical connection, created for
        security purposes.  All traffic traversing an SA is provided the
        same security processing.  In IPsec, an SA is an internet layer
        abstraction implemented through the use of AH or ESP.

     Security Gateway
        A security gateway is an intermediate system that acts as the
        communications interface between two networks.  The set of hosts
        (and networks) on the external side of the security gateway is
        viewed as untrusted (or less trusted), while the networks and
        hosts and on the internal side are viewed as trusted (or more
        trusted).  The internal subnets and hosts served by a security
        gateway are presumed to be trusted by virtue of sharing a
        common, local, security administration.  (See "Trusted
        Subnetwork" below.) In the IPsec context, a security gateway is
        a point at which AH and/or ESP is implemented in order to serve

        a set of internal hosts, providing security services for these
        hosts when they communicate with external hosts also employing
        IPsec (either directly or via another security gateway).

     SPI
        Acronym for "Security Parameters Index".  The combination of a
        destination address, a security protocol, and an SPI uniquely
        identifies a security association (SA, see above).  The SPI is
        carried in AH and ESP protocols to enable the receiving system
        to select the SA under which a received packet will be
        processed.  An SPI has only local significance, as defined by
        the creator of the SA (usually the receiver of the packet
        carrying the SPI); thus an SPI is generally viewed as an opaque
        bit string.  However, the creator of an SA may choose to
        interpret the bits in an SPI to facilitate local processing.

     Traffic Analysis
        The analysis of network traffic flow for the purpose of deducing
        information that is useful to an adversary.  Examples of such
        informat
ion are frequency of transmission, the identities of the
        conversing parties, sizes of packets, flow identifiers, etc.
        [Sch94]

     Trusted Subnetwork
        A subnetwork containing hosts and routers that trust each other
        not to engage in active or passive attacks.  There also is an
        assumption that the underlying communications channel (e.g., a
        LAN or CAN) isn't being attacked by other means.

Appendix B -- Analysis/Discussion of PMTU/DF/Fragmentation Issues

B.1 DF bit

   In cases where a system (host or gateway) adds an encapsulating
   header (e.g., ESP tunnel), should/must the DF bit in the original
   packet be copied to the encapsulating header?

   Fragmenting seems correct for some situations, e.g., it might be
   appropriate to fragment packets over a network with a very small MTU,
   e.g., a packet radio network, or a cellular phone hop to mobile node,
   rather than propagate back a very small PMTU for use over the rest of
   the path.  In other situations, it might be appropriate to set the DF
   bit in order to get feedback from later routers about PMTU
   constraints which require fragmentation.  The existence of both of
   these situations argues for enabling a system to decide whether or
   not to fragment over a particular network "link", i.e., for requiring
   an implementation to be able to copy the DF bit (and to process ICMP
   PMTU messages), but making it an option to be selected on a per
   interface basis.  In other words, an administrator should be able to
   configure the router's treatment of the DF bit (set, clear, copy from
   encapsulated header) for each interface.

   Note: If a bump-in-the-stack implementation of IPsec attempts to
   apply different IPsec algorithms based on source/destination ports,
   it will be difficult to apply Path MTU adjustments.

B.2 Fragmentation

   If required, IP fragmentation occurs after IPsec processing within an
   IPsec implementation.  Thus, transport mode AH or ESP is applied only
   to whole IP datagrams (not to IP fragments).  An IP packet to which
   AH or ESP has been applied may itself be fragmented by routers en
   route, and such fragments MUST be reassembled prior to IPsec
   processing at a receiver.  In tunnel mode, AH or ESP is applied to an
   IP packet, the payload of which may be a fragmented IP packet.  For
   example, a security gateway, "bump-in-the-stack" (BITS), or "bump-
   in-the-wire" (BITW) IPsec implementation may apply tunnel mode AH to
   such fragments.  Note that BITS or BITW implementations are examples
   of where a host IPsec implementation might receive fragments to which
   tunnel mode is to be applied.  However, if transport mode is to be
   applied, then these implementations MUST reassemble the fragments
   prior to applying IPsec.

   NOTE: IPsec always has to figure out what the encapsulating IP header
   fields are.  This is independent of where you insert IPsec and is
   intrinsic to the definition of IPsec.  Therefore any IPsec
   implementation that is not integrated into an IP implementation must
   include code to construct the necessary IP headers (e.g., IP2):

        o AH-tunnel --> IP2-AH-IP1-Transport-Data
        o ESP-tunnel -->  IP2-ESP_hdr-IP1-Transport-Data-ESP_trailer

   *********************************************************************

   Overall, the fragmentation/reassembly approach described above works
   for all cases examined.

                              AH Xport   AH Tunnel  ESP Xport  ESP Tunnel
 Implementation approach      IPv4 IPv6  IPv4 IPv6  IPv4 IPv6  IPv4 IPv6
 -----------------------      ---- ----  ---- ----  ---- ----  ---- ----
 Hosts (integr w/ IP stack)     Y    Y     Y    Y     Y    Y     Y    Y
 Hosts (betw/ IP and drivers)   Y    Y     Y    Y     Y    Y     Y    Y
 S. Gwy (integr w/ IP stack)               Y    Y                Y    Y
 Outboard crypto processor *

        * If the crypto processor system has its own IP address, then it
          is covered by the security gateway case.  This box receives
          the packet from the host and performs IPsec processing.  It
          has to be able to handle the same AH, ESP, and related
          IPv4/IPv6 tunnel processing that a security gateway would have
          to handle.  If it doesn't have it's own address, then it is
          similar to the bump-in-the stack implementation between IP and
          the network drivers.

   The following analysis assumes that:

        1. There is only one IPsec module in a given system's stack.
           There isn't an IPsec module A (adding ESP/encryption and
           thus) hiding the transport protocol, SRC port, and DEST port
           from IPsec module B.
        2. There are several places where IPsec could be implemented (as
           shown in the table above).
                a. Hosts with integration of IPsec into the native IP
                   implementation.  Implementer has access to the source
                   for the stack.
                b. Hosts with bump-in-the-stack implementations, where
                   IPsec is implemented between IP and the local network
                   drivers.  Source access for stack is not available;
                   but there are well-defined interfaces that allows the
                   IPsec code to be incorporated into the system.

                c. Security gateways and outboard crypto processors with
                   integration of IPsec into the stack.
        3. Not all of the above approaches are feasible in all hosts.
           But it was assumed that for each approach, there are some
           hosts for whom the approach is feasible.

   For each of the above 3 categories, there are IPv4 and IPv6, AH
   transport and tunnel modes, and ESP transport and tunnel modes -- for
   a total of 24 cases (3 x 2 x 4).

   Some header fields and interface fields are listed here for ease of
   reference -- they're not in the header order, but instead listed to
   allow comparison between the columns.  (* = not covered by AH
   authentication.  ESP authentication doesn't cover any headers that
   precede it.)

                                             IP/Transport Interface
             IPv4            IPv6            (RFC 1122 -- Sec 3.4)
             ----            ----            ----------------------
             Version = 4     Version = 6
             Header Len
             *TOS            Class,Flow Lbl  TOS
             Packet Len      Payload Len     Len
             ID                              ID (optional)
             *Flags                          DF
             *Offset
             *TTL            *Hop Limit      TTL
             Protocol        Next Header
             *Checksum
             Src Address     Src Address     Src Address
             Dst Address     Dst Address     Dst Address
             Options?        Options?        Opt

             ? = AH covers Option-Type and Option-Length, but
                 might not cover Option-Data.

   The results for each of the 20 cases is shown below ("works" = will
   work if system fragments after outbound IPsec processing, reassembles
   before inbound IPsec processing).  Notes indicate implementation
   issues.

    a. Hosts (integrated into IP stack)
          o AH-transport  --> (IP1-AH-Transport-Data)
                    - IPv4 -- works
                    - IPv6 -- works

          o AH-tunnel --> (IP2-AH-IP1-Transport-Data)
                    - IPv4 -- works
                    - IPv6 -- works

          o ESP-transport --> (IP1-ESP_hdr-Transport-Data-ESP_trailer)
                    - IPv4 -- works
                    - IPv6 -- works
          o ESP-tunnel -->  (IP2-ESP_hdr-IP1-Transport-Data-ESP_trailer)
                    - IPv4 -- works
                    - IPv6 -- works

    b. Hosts (Bump-in-the-stack) -- put IPsec between IP layer and
       network drivers.  In this case, the IPsec module would have to do
       something like one of the following for fragmentation and
       reassembly.
            - do the fragmentation/reassembly work itself and
              send/receive the packet directly to/from the network
              layer.  In AH or ESP transport mode, this is fine.  In AH
              or ESP tunnel mode where the tunnel end is at the ultimate
              destination, this is fine.  But in AH or ESP tunnel modes
              where the tunnel end is different from the ultimate
              destination and where the source host is multi-homed, this
              approach could result in sub-optimal routing because the
              IPsec module may be unable to obtain the information
              needed (LAN interface and next-hop gateway) to direct the
              packet to the appropriate network interface.  This is not
              a problem if the interface and next-hop gateway are the
              same for the ultimate destination and for the tunnel end.
              But if they are different, then IPsec would need to know
              the LAN interface and the next-hop gateway for the tunnel
              end.  (Note: The tunnel end (security gateway) is highly
              likely to be on the regular path to the ultimate
              destination.  But there could also be more than one path
              to the destination, e.g., the host could be at an
              organization with 2 firewalls.  And the path being used
              could involve the less commonly chosen firewall.)  OR
            - pass the IPsec'd packet back to the IP layer where an
              extra IP header would end up being pre-pended and the
              IPsec module would have to check and let IPsec'd fragments
              go by.
                                    OR
            - pass the packet contents to the IP layer in a form such
              that the IP layer recreates an appropriate IP header

       At the network layer, the IPsec module will have access to the
       following selectors from the packet -- SRC address, DST address,
       Next Protocol, and if there's a transport layer header --> SRC
       port and DST port.  One cannot assume IPsec has access to the
       Name.  It is assumed that the available selector information is
       sufficient to figure out the relevant Security Policy entry and
       Security Association(s).

          o AH-transport  --> (IP1-AH-Transport-Data)
                    - IPv4 -- works
                    - IPv6 -- works
          o AH-tunnel --> (IP2-AH-IP1-Transport-Data)
                    - IPv4 -- works
                    - IPv6 -- works
          o ESP-transport --> (IP1-ESP_hdr-Transport-Data-ESP_trailer)
                    - IPv4 -- works
                    - IPv6 -- works
          o ESP-tunnel -->  (IP2-ESP_hdr-IP1-Transport-Data-ESP_trailer)
                    - IPv4 -- works
                    - IPv6 -- works

    c. Security gateways -- integrate IPsec into the IP stack

       NOTE: The IPsec module will have access to the following
       selectors from the packet -- SRC address, DST address, Next
       Protocol, and if there's a transport layer header --> SRC port
       and DST port.  It won't have access to the User ID (only Hosts
       have access to User ID information.)  Unlike some Bump-in-the-
       stack implementations, security gateways may be able to look up
       the Source Address in the DNS to provide a System Name, e.g., in
       situations involving use of dynamically assigned IP addresses in
       conjunction with dynamically updated DNS entries.  It also won't
       have access to the transport layer information if there is an ESP
       header, or if it's not the first fragment of a fragmented
       message.  It is assumed that the available selector information
       is sufficient to figure out the relevant Security Policy entry
       and Security Association(s).

          o AH-tunnel --> (IP2-AH-IP1-Transport-Data)
                    - IPv4 -- works
                    - IPv6 -- works
          o ESP-tunnel -->  (IP2-ESP_hdr-IP1-Transport-Data-ESP_trailer)
                    - IPv4 -- works
                    - IPv6 -- works

   **********************************************************************

B.3 Path MTU Discovery

   As mentioned earlier, "ICMP PMTU" refers to an ICMP message used for
   Path MTU Discovery.

   The legend for the diagrams below in B.3.1 and B.3.3 (but not B.3.2)
   is:

        ==== = security association (AH or ESP, transport or tunnel)

        ---- = connectivity (or if so labelled, administrative boundary)
        .... = ICMP message (hereafter referred to as ICMP PMTU) for

                IPv4:
                - Type = 3 (Destination Unreachable)
                - Code = 4 (Fragmentation needed and DF set)
                - Next-Hop MTU in the low-order 16 bits of the second
                  word of the ICMP header (labelled unused in RFC 792),
                  with high-order 16 bits set to zero

                IPv6 (RFC 1885):
                - Type = 2 (Packet Too Big)
                - Code = 0 (Fragmentation needed and DF set)
                - Next-Hop MTU in the 32 bit MTU field of the ICMP6

        Hx   = host x
        Rx   = router x
        SGx  = security gateway x
        X*   = X supports IPsec

B.3.1 Identifying the Originating Host(s)

The amount of information returned with the ICMP message is limited
and this affects what selectors are available to identify security
associations, originating hosts, etc. for use in further propagating
the PMTU information.

In brief...  An ICMP message must contain the following information
from the "offending" packet:
        - IPv4 (RFC 792) --  IP header plus a minimum of 64 bits

Accordingly, in the IPv4 context, an ICMP PMTU may identify only the
first (outermost) security association.  This is because the ICMP
PMTU may contain only 64 bits of the "offending" packet beyond the IP
header, which would capture only the first SPI from AH or ESP.  In
the IPv6 context, an ICMP PMTU will probably provide all the SPIs and
the selectors in the IP header, but maybe not the SRC/DST ports (in
the transport header) or the encapsulated (TCP, UDP, etc.) protocol.
Moreover, if ESP is used, the transport ports and protocol selectors
may be encrypted.

Looking at the diagram below of a security gateway tunnel (as
mentioned elsewhere, security gateways do not use transport mode)...

     H1   ===================           H3
       \  |                 |          /
   H0 -- SG1* ---- R1 ---- SG2* ---- R2 -- H5
       /  ^        |                   \
     H2   |........|                    H4

   Suppose that the security policy for SG1 is to use a single SA to SG2
   for all the traffic between hosts H0, H1, and H2 and
 hosts H3, H4,
   and H5.  And suppose H0 sends a data packet to H5 which causes R1 to
   send an ICMP PMTU message to SG1.  If the PMTU message has only the
   SPI, SG1 will be able to look up the SA and find the list of possible
   hosts (H0, H1, H2, wildcard); but SG1 will have no way to figure out
   that H0 sent the traffic that triggered the ICMP PMTU message.

      original        after IPsec     ICMP
      packet          processing      packet
      --------        -----------     ------
                                      IP-3 header (S = R1, D = SG1)
                                      ICMP header (includes PMTU)
                      IP-2 header     IP-2 header (S = SG1, D = SG2)
                      ESP header      minimum of 64 bits of ESP hdr (*)
      IP-1 header     IP-1 header
      TCP header      TCP header
      TCP data        TCP data
                      ESP trailer

      (*) The 64 bits will include enough of the ESP (or AH) header to
          include the SPI.
              - ESP -- SPI (32 bits), Seq number (32 bits)
              - AH -- Next header (8 bits), Payload Len (8 bits),
                Reserved (16 bits), SPI (32 bits)

   This limitation on the amount of information returned with an ICMP
   message creates a problem in identifying the originating hosts for
   the packet (so as to know where to further propagate the ICMP PMTU
   information).  If the ICMP message contains only 64 bits of the IPsec
   header (minimum for IPv4), then the IPsec selectors (e.g., Source and
   Destination addresses, Next Protocol, Source and Destination ports,
   etc.) will have been lost.  But the ICMP error message will still
   provide SG1 with the SPI, the PMTU information and the source and
   destination gateways for the relevant security association.

   The destination security gateway and SPI uniquely define a security
   association which in turn defines a set of possible originating
   hosts.  At this point, SG1 could:

   a. send the PMTU information to all the possible originating hosts.
      This would not work well if the host list is a wild card or if
      many/most of the hosts weren't sending to SG1; but it might work
      if the SPI/destination/etc mapped to just one or a small number of
      hosts.
   b. store the PMTU with the SPI/etc and wait until the next packet(s)
      arrive from the originating host(s) for the relevant security
      association.  If it/they are bigger than the PMTU, drop the
      packet(s), and compose ICMP PMTU message(s) with the new packet(s)
      and the updated PMTU, and send the originating host(s) the ICMP
      message(s) about the problem.  This involves a delay in notifying
      the originating host(s), but avoids the problems of (a).

   Since only the latter approach is feasible in all instances, a
   security gateway MUST provide such support, as an option.  However,
   if the ICMP message contains more information from the original
   packet, then there may be enough information to immediately determine
   to which host to propagate the ICMP/PMTU message and to provide that
   system with the 5 fields (source address, destination address, source
   port, destination port, and transport protocol) needed to determine
   where to store/update the PMTU.  Under such circumstances, a security
   gateway MUST generate an ICMP PMTU message immediately upon receipt
   of an ICMP PMTU from further down the path.  NOTE: The Next Protocol
   field may not be contained in the ICMP message and the use of ESP
   encryption may hide the selector fields that have been encrypted.

B.3.2 Calculation of PMTU

   The calculation of PMTU from an ICMP PMTU has to take into account
   the addition of any IPsec header by H1 -- AH and/or ESP transport, or
   ESP or AH tunnel.  Within a single host, multiple applications may
   share an SPI and nesting of security associations may occur.  (See
   Section 4.5 Basic Combinations of Security Associations for
   description of the combinations that MUST be supported).  The diagram
   below illustrates an example of security associations between a pair
   of hosts (as viewed from the perspective of one of the hosts.)  (ESPx
   or AHx = transport mode)

           Socket 1 -------------------------|
                                             |
           Socket 2 (ESPx/SPI-A) ---------- AHx (SPI-B) -- Internet

   In order to figure out the PMTU for each socket that maps to SPI-B,
   it will be necessary to have backpointers from SPI-B to each of the 2
   paths that lead to it -- Socket 1 and Socket 2/SPI-A.

B.3.3 Granularity of Maintaining PMTU Data

   In hosts, the granularity with which PMTU ICMP processing can be done
   differs depending on the implementation situation.  Looking at a
   host, there are three situations that are of interest with respect to
   PMTU issues:

   a. Integration of IPsec into the native IP implementation
   b. Bump-in-the-stack implementations, where IPsec is implemented
      "underneath" an existing implementation of a TCP/IP protocol
      stack, between the native IP and the local network drivers
   c. No IPsec implementation -- This case is included because it is
      relevant in cases where a security gateway is sending PMTU
      information back to a host.

   Only in case (a) can the PMTU data be maintained at the same
   granularity as communication associations.  In the other cases, the
   IP layer will maintain PMTU data at the granularity of Source and
   Destination IP addresses (and optionally TOS/Class), as described in
   RFC 1191.  This is an important difference, because more than one
   communication association may map to the same source and destination
   IP addresses, and each communication association may have a different
   amount of IPsec header overhead (e.g., due to use of different
   transforms or different algorithms).  The examples below illustrate
   this.

   In cases (a) and (b)...  Suppose you have the following situation.
   H1 is sending to H2 and the packet to be sent from R1 to R2 exceeds
   the PMTU of the network hop between them.

                 ==================================
                 |                                |
                H1* --- R1 ----- R2 ---- R3 ---- H2*
                 ^       |
                 |.......|

   If R1 is configured to not fragment subscriber traffic, then R1 sends
   an ICMP PMTU message with the appropriate PMTU to H1.  H1's
   processing would vary with the nature of the implementation.  In case
   (a) (native IP), the security services are bound to sockets or the
   equivalent.  Here the IP/IPsec implementation in H1 can store/update
   the PMTU for the associated socket.  In case (b), the IP layer in H1
   can store/update the PMTU but only at the granularity of Source and
   Destination addresses and possibly TOS/Class, as noted above.  So the
   result may be sub-optimal, since the PMTU for a given
   SRC/DST/TOS/Class will be the subtraction of the largest amount of
   IPsec header used for any communication association between a given
   source and destination.

   In case (c), there has to be a security gateway to have any IPsec
   processing.  So suppose you have the following situation.  H1 is
   sending to H2 and the packet to be sent from SG1 to R exceeds the
   PMTU of the network hop between them.

                         ================
                         |              |

 H1 ---- SG1* --- R --- SG2* ---- H2
                 ^       |
                 |.......|

   As described above for case (b), the IP layer in H1 can store/update
   the PMTU but only at the granularity of Source and Destination
   addresses, and possibly TOS/Class.  So the result may be sub-optimal,
   since the PMTU for a given SRC/DST/TOS/Class will be the subtraction
   of the largest amount of IPsec header used for any communication
   association between a given source and destination.

B.3.4 Per Socket Maintenance of PMTU Data

   Implementation of the calculation of PMTU (Section B.3.2) and support
   for PMTUs at the granularity of individual "communication
   associations" (Section B.3.3) is a local matter.  However, a socket-
   based implementation of IPsec in a host SHOULD maintain the
   information on a per socket basis.  Bump in the stack systems MUST
   pass an ICMP PMTU to the host IP implementation, after adjusting it
   for any IPsec header overhead added by these systems.  The
   determination of the overhead SHOULD be determined by analysis of the
   SPI and any other selector information present in a returned ICMP
   PMTU message.

B.3.5 Delivery of PMTU Data to the Transport Layer

   The host mechanism for getting the updated PMTU to the transport
   layer is unchanged, as specified in RFC 1191 (Path MTU Discovery).

B.3.6 Aging of PMTU Data

   This topic is covered in Section 6.1.2.4.

Appendix C -- Sequence Space Window Code Example

   This appendix contains a routine that implements a bitmask check for
   a 32 packet window.  It was provided by James Hughes
   (jim_hughes@stortek.com) and Harry Varnis (hgv@anubis.network.com)
   and is intended as an implementation example.  Note that this code
   both checks for a replay and updates the window.  Thus the algorithm,
   as shown, should only be called AFTER the packet has been
   authenticated.  Implementers might wish to consider splitting the
   code to do the check for replays before computing the ICV.  If the
   packet is not a replay, the code would then compute the ICV, (discard
   any bad packets), and if the packet is OK, update the window.

#include 
#include 
typedef unsigned long u_long;

enum {
    ReplayWindowSize = 32
};

u_long bitmap = 0;                 /* session state - must be 32 bits */
u_long lastSeq = 0;                     /* session state */

/* Returns 0 if packet disallowed, 1 if packet permitted */
int ChkReplayWindow(u_long seq);

int ChkReplayWindow(u_long seq) {
    u_long diff;

    if (seq == 0) return 0;             /* first == 0 or wrapped */
    if (seq > lastSeq) {                /* new larger sequence number */
        diff = seq - lastSeq;
        if (diff < ReplayWindowSize) {  /* In window */
            bitmap <<= diff;
            bitmap |= 1;                /* set bit for this packet */
        } else bitmap = 1;          /* This packet has a "way larger" */
        lastSeq = seq;
        return 1;                       /* larger is good */
    }
    diff = lastSeq - seq;
    if (diff >= ReplayWindowSize) return 0; /* too old or wrapped */
    if (bitmap & ((u_long)1 << diff)) return 0; /* already seen */
    bitmap |= ((u_long)1 << diff);              /* mark as seen */
    return 1;                           /* out of order but good */
}

char string_buffer[512];

#define STRING_BUFFER_SIZE sizeof(string_buffer)

int main() {
    int result;
    u_long last, current, bits;

    printf("Input initial state (bits in hex, last msgnum):\n");
    if (!fgets(string_buffer, STRING_BUFFER_SIZE, stdin)) exit(0);
    sscanf(string_buffer, "%lx %lu", &bits, &last);
    if (last != 0)
    bits |= 1;
    bitmap = bits;
    lastSeq = last;
    printf("bits:%08lx last:%lu\n", bitmap, lastSeq);
    printf("Input value to test (current):\n");

    while (1) {
        if (!fgets(string_buffer, STRING_BUFFER_SIZE, stdin)) break;
        sscanf(string_buffer, "%lu", ¤t);
        result = ChkReplayWindow(current);
        printf("%-3s", result ? "OK" : "BAD");
        printf(" bits:%08lx last:%lu\n", bitmap, lastSeq);
    }
    return 0;
}

Appendix D -- Categorization of ICMP messages

The tables below characterize ICMP messages as being either host
generated, router generated, both, unassigned/unknown.  The first set
are IPv4.  The second set are IPv6.

                                IPv4

Type    Name/Codes                                             Reference
========================================================================
HOST GENERATED:
  3     Destination Unreachable
         2  Protocol Unreachable                               [RFC792]
         3  Port Unreachable                                   [RFC792]
         8  Source Host Isolated                               [RFC792]
        14  Host Precedence Violation                          [RFC1812]
 10     Router Selection                                       [RFC1256]

Type    Name/Codes                                             Reference
========================================================================
ROUTER GENERATED:
  3     Destination Unreachable
         0  Net Unreachable                                    [RFC792]
         4  Fragmentation Needed, Don't Fragment was Set       [RFC792]
         5  Source Route Failed                                [RFC792]
         6  Destination Network Unknown                        [RFC792]
         7  Destination Host Unknown                           [RFC792]
         9  Comm. w/Dest. Net. is Administratively Prohibited  [RFC792]
        11  Destination Network Unreachable for Type of Service[RFC792]
  5     Redirect
         0  Redirect Datagram for the Network (or subnet)      [RFC792]
         2  Redirect Datagram for the Type of Service & Network[RFC792]
  9     Router Advertisement                                   [RFC1256]
 18     Address Mask Reply                                     [RFC950]

                                IPv4
Type    Name/Codes                                             Reference
========================================================================
BOTH ROUTER AND HOST GENERATED:
  0     Echo Reply                                             [RFC792]
  3     Destination Unreachable
         1  Host Unreachable                                   [RFC792]
        10  Comm. w/Dest. Host is Administratively Prohibited  [RFC792]
        12  Destination Host Unreachable for Type of Service   [RFC792]
        13  Communication Administratively Prohibited          [RFC1812]
        15  Precedence cutoff in effect                        [RFC1812]
  4     Source Quench                                          [RFC792]
  5     Redirect
         1  Redirect Datagram for the Host                     [RFC792]
         3  Redirect Datagram for the Type of Service and Host [RFC792]
  6     Alternate Host Address                                 [JBP]
  8     Echo                                                   [RFC792]
 11     Time Exceeded                                          [RFC792]
 12     Parameter Problem                              [RFC792,RFC1108]
 13     Timestamp                                              [RFC792]
 14     Timestamp Reply                                        [RFC792]
 15     Information Re
quest                                    [RFC792]
 16     Information Reply                                      [RFC792]
 17     Address Mask Request                                   [RFC950]
 30     Traceroute                                             [RFC1393]
 31     Datagram Conversion Error                              [RFC1475]
 32     Mobile Host Redirect                                   [Johnson]
 39     SKIP                                                   [Markson]
 40     Photuris                                               [Simpson]

Type    Name/Codes                                             Reference
========================================================================
UNASSIGNED TYPE OR UNKNOWN GENERATOR:
  1     Unassigned                                             [JBP]
  2     Unassigned                                             [JBP]
  7     Unassigned                                             [JBP]
 19     Reserved (for Security)                                [Solo]
 20-29  Reserved (for Robustness Experiment)                   [ZSu]
 33     IPv6 Where-Are-You                                     [Simpson]
 34     IPv6 I-Am-Here                                         [Simpson]
 35     Mobile Registration Request                            [Simpson]
 36     Mobile Registration Reply                              [Simpson]
 37     Domain Name Request                                    [Simpson]
 38     Domain Name Reply                                      [Simpson]
 41-255 Reserved                                               [JBP]

                                IPv6

Type    Name/Codes                                             Reference
========================================================================
HOST GENERATED:
  1     Destination Unreachable                                [RFC 1885]
         4  Port Unreachable

Type    Name/Codes                                             Reference
========================================================================
ROUTER GENERATED:
  1     Destination Unreachable                                [RFC1885]
         0  No Route to Destination
         1  Comm. w/Destination is Administratively Prohibited
         2  Not a Neighbor
         3  Address Unreachable
  2     Packet Too Big                                         [RFC1885]
         0
  3     Time Exceeded                                          [RFC1885]
         0  Hop Limit Exceeded in Transit
         1  Fragment reassembly time exceeded

Type    Name/Codes                                             Reference
========================================================================
BOTH ROUTER AND HOST GENERATED:
  4     Parameter Problem                                      [RFC1885]
         0  Erroneous Header Field Encountered
         1  Unrecognized Next Header Type Encountered
         2  Unrecognized IPv6 Option Encountered

References

   [BL73]    Bell, D.E. & LaPadula, L.J., "Secure Computer Systems:
             Mathematical Foundations and Model", Technical Report M74-
             244, The MITRE Corporation, Bedford, MA, May 1973.

   [Bra97]   Bradner, S., "Key words for use in RFCs to Indicate
             Requirement Level", BCP 14, RFC 2119, March 1997.

   [DoD85]   US National Computer Security Center, "Department of
             Defense Trusted Computer System Evaluation Criteria", DoD
             5200.28-STD, US Department of Defense, Ft. Meade, MD.,
             December 1985.

   [DoD87]   US National Computer Security Center, "Trusted Network
             Interpretation of the Trusted Computer System Evaluation
             Criteria", NCSC-TG-005, Version 1, US Department of
             Defense, Ft. Meade, MD., 31 July 1987.

   [HA94]    Haller, N., and R. Atkinson, "On Internet Authentication",
             RFC 1704, October 1994.

   [HC98]    Harkins, D., and D. Carrel, "The Internet Key Exchange
             (IKE)", RFC 2409, November 1998.

   [HM97]    Harney, H., and C.  Muckenhirn, "Group Key Management
             Protocol (GKMP) Architecture", RFC 2094, July 1997.

   [ISO]     ISO/IEC JTC1/SC6, Network Layer Security Protocol, ISO-IEC
             DIS 11577, International Standards Organisation, Geneva,
             Switzerland, 29 November 1992.

   [IB93]    John Ioannidis and Matt Blaze, "Architecture and
             Implementation of Network-layer Security Under Unix",
             Proceedings of USENIX Security Symposium, Santa Clara, CA,
             October 1993.

   [IBK93]   John Ioannidis, Matt Blaze, & Phil Karn, "swIPe: Network-
             Layer Security for IP", presentation at the Spring 1993
             IETF Meeting, Columbus, Ohio

   [KA98a]   Kent, S., and R. Atkinson, "IP Authentication Header", RFC
             2402, November 1998.

   [KA98b]   Kent, S., and R. Atkinson, "IP Encapsulating Security
             Payload (ESP)", RFC 2406, November 1998.

   [Ken91]   Kent, S., "US DoD Security Options for the Internet
             Protocol", RFC 1108, November 1991.

   [MSST97]  Maughan, D., Schertler, M., Schneider, M., and J. Turner,
             "Internet Security Association and Key Management Protocol
             (ISAKMP)", RFC 2408, November 1998.

   [Orm97]   Orman, H., "The OAKLEY Key Determination Protocol", RFC
             2412, November 1998.

   [Pip98]   Piper, D., "The Internet IP Security Domain of
             Interpretation for ISAKMP", RFC 2407, November 1998.

   [Sch94]   Bruce Schneier, Applied Cryptography, Section 8.6, John
             Wiley & Sons, New York, NY, 1994.

   [SDNS]    SDNS Secure Data Network System, Security Protocol 3, SP3,
             Document SDN.301, Revision 1.5, 15 May 1989, published in
             NIST Publication NIST-IR-90-4250, February 1990.

   [SMPT98]  Shacham, A., Monsour, R., Pereira, R., and M. Thomas, "IP
             Payload Compression Protocol (IPComp)", RFC 2393, August
             1998.

   [TDG97]   Thayer, R., Doraswamy, N., and R. Glenn, "IP Security
             Document Roadmap", RFC 2411, November 1998.

   [VK83]    V.L. Voydock & S.T. Kent, "Security Mechanisms in High-
             level Networks", ACM Computing Surveys, Vol. 15, No. 2,
             June 1983.

Disclaimer

   The views and specification expressed in this document are those of
   the authors and are not necessarily those of their employers.  The
   authors and their employers specifically disclaim responsibility for
   any problems arising from correct or incorrect implementation or use
   of this design.

Author Information

   Stephen Kent
   BBN Corporation
   70 Fawcett Street
   Cambridge, MA  02140
   USA

   Phone: +1 (617) 873-3988
   EMail: kent@bbn.com

   Randall Atkinson
   @Home Network
   425 Broadway
   Redwood City, CA 94063
   USA

   Phone: +1 (415) 569-5000
   EMail: rja@corp.home.net

Copyright (C) The Internet Society (1998).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies a
nd derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

The post RFC 2401 – Security Architecture for the Internet Protocol appeared first on IPv6.net.

RFC 1661 – The Point-to-Point Protocol (PPP)

IPv6 & IoT editor — Sat, 01 Aug 2009 17:56:20 +0000

Network Working Group                                 W. Simpson, Editor
Request for Comments: 1661                                    Daydreamer
STD: 51                                                        July 1994
Obsoletes: 1548
Category: Standards Track

                   The Point-to-Point Protocol (PPP)

Status of this Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Abstract

   The Point-to-Point Protocol (PPP) provides a standard method for
   transporting multi-protocol datagrams over point-to-point links.  PPP
   is comprised of three main components:

      1. A method for encapsulating multi-protocol datagrams.

      2. A Link Control Protocol (LCP) for establishing, configuring,
         and testing the data-link connection.

      3. A family of Network Control Protocols (NCPs) for establishing
         and configuring different network-layer protocols.

   This document defines the PPP organization and methodology, and the
   PPP encapsulation, together with an extensible option negotiation
   mechanism which is able to negotiate a rich assortment of
   configuration parameters and provides additional management
   functions.  The PPP Link Control Protocol (LCP) is described in terms
   of this mechanism.

Table of Contents

     1.     Introduction ..........................................    1
        1.1       Specification of Requirements ...................    2
        1.2       Terminology .....................................    3

     2.     PPP Encapsulation .....................................    4

     3.     PPP Link Operation ....................................    6
        3.1       Overview ........................................    6
        3.2       Phase Diagram ...................................    6
        3.3       Link Dead (physical-layer not ready) ............    7
        3.4       Link Establishment Phase ........................    7
        3.5       Authentication Phase ............................    8
        3.6       Network-Layer Protocol Phase ....................    8
        3.7       Link Termination Phase ..........................    9

     4.     The Option Negotiation Automaton ......................   11
        4.1       State Transition Table ..........................   12
        4.2       States ..........................................   14
        4.3       Events ..........................................   16
        4.4       Actions .........................................   21
        4.5       Loop Avoidance ..................................   23
        4.6       Counters and Timers .............................   24

     5.     LCP Packet Formats ....................................   26
        5.1       Configure-Request ...............................   28
        5.2       Configure-Ack ...................................   29
        5.3       Configure-Nak ...................................   30
        5.4       Configure-Reject ................................   31
        5.5       Terminate-Request and Terminate-Ack .............   33
        5.6       Code-Reject .....................................   34
        5.7       Protocol-Reject .................................   35
        5.8       Echo-Request and Echo-Reply .....................   36
        5.9       Discard-Request .................................   37

     6.     LCP Configuration Options .............................   39
        6.1       Maximum-Receive-Unit (MRU) ......................   41
        6.2       Authentication-Protocol .........................   42
        6.3       Quality-Protocol ................................   43
        6.4       Magic-Number ....................................   45
        6.5       Protocol-Field-Compression (PFC) ................   48
        6.6       Address-and-Control-Field-Compression (ACFC)

     SECURITY CONSIDERATIONS ......................................   51
     REFERENCES ...................................................   51
     ACKNOWLEDGEMENTS .............................................   51
     CHAIR'S ADDRESS ..............................................   52
     EDITOR'S ADDRESS .............................................   52

1.  Introduction

   The Point-to-Point Protocol is designed for simple links which
   transport packets between two peers.  These links provide full-duplex
   simultaneous bi-directional operation, and are assumed to deliver
   packets in order.  It is intended that PPP provide a common solution
   for easy connection of a wide variety of hosts, bridges and routers
   [1].

   Encapsulation

      The PPP encapsulation provides for multiplexing of different
      network-layer protocols simultaneously over the same link.  The
      PPP encapsulation has been carefully designed to retain
      compatibility with most commonly used supporting hardware.

      Only 8 additional octets are necessary to form the encapsulation
      when used within the default HDLC-like framing.  In environments
      where bandwidth is at a premium, the encapsulation and framing may
      be shortened to 2 or 4 octets.

      To support high speed implementations, the default encapsulation
      uses only simple fields, only one of which needs to be examined
      for demultiplexing.  The default header and information fields
      fall on 32-bit boundaries, and the trailer may be padded to an
      arbitrary boundary.

   Link Control Protocol

      In order to be sufficiently versatile to be portable to a wide
      variety of environments, PPP provides a Link Control Protocol
      (LCP).  The LCP is used to automatically agree upon the
      encapsulation format options, handle varying limits on sizes of
      packets, detect a looped-back link and other common
      misconfiguration errors, and terminate the link.  Other optional
      facilities provided are authentication of the identity of its peer
      on the link, and determination when a link is functioning properly
      and when it is failing.

   Network Control Protocols

      Point-to-Point links tend to exacerbate many problems with the
      current family of network protocols.  For instance, assignment and
      management of IP addresses, which is a problem even in LAN
      environments, is especially difficult over circuit-switched
      point-to-point links (such as dial-up modem servers).  These
      problems are handled by a family of Network Control Protocols
      (NCPs), which each manage the specific needs required by their

      respective network-layer protocols.  These NCPs are defined in
      companion documents.

   Configuration

      It is intended that PPP links be easy to configure.  By design,
      the standard defaults handle all common configurations.  The
      implementor can specify improvements to the default configuration,
      which are automatically communicated to the peer without operator
      intervention.  Finally, the operator may explicitly configure
      options for the link which enable the link to op
erate in
      environments where it would otherwise be impossible.

      This self-configuration is implemented through an extensible
      option negotiation mechanism, wherein each end of the link
      describes to the other its capabilities and requirements.
      Although the option negotiation mechanism described in this
      document is specified in terms of the Link Control Protocol (LCP),
      the same facilities are designed to be used by other control
      protocols, especially the family of NCPs.

1.1.  Specification of Requirements

   In this document, several words are used to signify the requirements
   of the specification.  These words are often capitalized.

   MUST      This word, or the adjective "required", means that the
             definition is an absolute requirement of the specification.

   MUST NOT  This phrase means that the definition is an absolute
             prohibition of the specification.

   SHOULD    This word, or the adjective "recommended", means that there
             may exist valid reasons in particular circumstances to
             ignore this item, but the full implications must be
             understood and carefully weighed before choosing a
             different course.

   MAY       This word, or the adjective "optional", means that this
             item is one of an allowed set of alternatives.  An
             implementation which does not include this option MUST be
             prepared to interoperate with another implementation which
             does include the option.

1.2.  Terminology

   This document frequently uses the following terms:

   datagram  The unit of transmission in the network layer (such as IP).
             A datagram may be encapsulated in one or more packets
             passed to the data link layer.

   frame     The unit of transmission at the data link layer.  A frame
             may include a header and/or a trailer, along with some
             number of units of data.

   packet    The basic unit of encapsulation, which is passed across the
             interface between the network layer and the data link
             layer.  A packet is usually mapped to a frame; the
             exceptions are when data link layer fragmentation is being
             performed, or when multiple packets are incorporated into a
             single frame.

   peer      The other end of the point-to-point link.

   silently discard
             The implementation discards the packet without further
             processing.  The implementation SHOULD provide the
             capability of logging the error, including the contents of
             the silently discarded packet, and SHOULD record the event
             in a statistics counter.

2.  PPP Encapsulation

   The PPP encapsulation is used to disambiguate multiprotocol
   datagrams.  This encapsulation requires framing to indicate the
   beginning and end of the encapsulation.  Methods of providing framing
   are specified in companion documents.

   A summary of the PPP encapsulation is shown below.  The fields are
   transmitted from left to right.

           +----------+-------------+---------+
           | Protocol | Information | Padding |
           | 8/16 bits|      *      |    *    |
           +----------+-------------+---------+

   Protocol Field

      The Protocol field is one or two octets, and its value identifies
      the datagram encapsulated in the Information field of the packet.
      The field is transmitted and received most significant octet
      first.

      The structure of this field is consistent with the ISO 3309
      extension mechanism for address fields.  All Protocols MUST be
      odd; the least significant bit of the least significant octet MUST
      equal "1".  Also, all Protocols MUST be assigned such that the
      least significant bit of the most significant octet equals "0".
      Frames received which don't comply with these rules MUST be
      treated as having an unrecognized Protocol.

      Protocol field values in the "0***" to "3***" range identify the
      network-layer protocol of specific packets, and values in the
      "8***" to "b***" range identify packets belonging to the
      associated Network Control Protocols (NCPs), if any.

      Protocol field values in the "4***" to "7***" range are used for
      protocols with low volume traffic which have no associated NCP.
      Protocol field values in the "c***" to "f***" range identify
      packets as link-layer Control Protocols (such as LCP).

      Up-to-date values of the Protocol field are specified in the most
      recent "Assigned Numbers" RFC [2].  This specification reserves
      the following values:

      Value (in hex)  Protocol Name

      0001            Padding Protocol
      0003 to 001f    reserved (transparency inefficient)
      007d            reserved (Control Escape)
      00cf            reserved (PPP NLPID)
      00ff            reserved (compression inefficient)

      8001 to 801f    unused
      807d            unused
      80cf            unused
      80ff            unused

      c021            Link Control Protocol
      c023            Password Authentication Protocol
      c025            Link Quality Report
      c223            Challenge Handshake Authentication Protocol

      Developers of new protocols MUST obtain a number from the Internet
      Assigned Numbers Authority (IANA), at IANA@isi.edu.

   Information Field

      The Information field is zero or more octets.  The Information
      field contains the datagram for the protocol specified in the
      Protocol field.

      The maximum length for the Information field, including Padding,
      but not including the Protocol field, is termed the Maximum
      Receive Unit (MRU), which defaults to 1500 octets.  By
      negotiation, consenting PPP implementations may use other values
      for the MRU.

   Padding

      On transmission, the Information field MAY be padded with an
      arbitrary number of octets up to the MRU.  It is the
      responsibility of each protocol to distinguish padding octets from
      real information.

3.  PPP Link Operation

3.1.  Overview

   In order to establish communications over a point-to-point link, each
   end of the PPP link MUST first send LCP packets to configure and test
   the data link.  After the link has been established, the peer MAY be
   authenticated.

   Then, PPP MUST send NCP packets to choose and configure one or more
   network-layer protocols.  Once each of the chosen network-layer
   protocols has been configured, datagrams from each network-layer
   protocol can be sent over the link.

   The link will remain configured for communications until explicit LCP
   or NCP packets close the link down, or until some external event
   occurs (an inactivity timer expires or network administrator
   intervention).

3.2.  Phase Diagram

   In the process of configuring, maintaining and terminating the
   point-to-point link, the PPP link goes through several distinct
   phases which are specified in the following simplified state diagram:

   +------+        +-----------+           +--------------+
   |      | UP     |           | OPENED    |              | SUCCESS/NONE
   | Dead |------->|
Establish |---------->| Authenticate |--+
   |      |        |           |           |              |  |
   +------+        +-----------+           +--------------+  |
      ^               |                        |             |
      |          FAIL |                   FAIL |             |
      +<--------------+             +----------+             |
      |                             |                        |
      |            +-----------+    |           +---------+  |
      |       DOWN |           |    |   CLOSING |         |  |
      +------------| Terminate |<---+<----------| Network |<-+
                   |           |                |         |
                   +-----------+                +---------+

   Not all transitions are specified in this diagram.  The following
   semantics MUST be followed.

3.3.  Link Dead (physical-layer not ready)

   The link necessarily begins and ends with this phase.  When an
   external event (such as carrier detection or network administrator
   configuration) indicates that the physical-layer is ready to be used,
   PPP will proceed to the Link Establishment phase.

   During this phase, the LCP automaton (described later) will be in the
   Initial or Starting states.  The transition to the Link Establishment
   phase will signal an Up event to the LCP automaton.

   Implementation Note:

      Typically, a link will return to this phase automatically after
      the disconnection of a modem.  In the case of a hard-wired link,
      this phase may be extremely short -- merely long enough to detect
      the presence of the device.

3.4.  Link Establishment Phase

   The Link Control Protocol (LCP) is used to establish the connection
   through an exchange of Configure packets.  This exchange is complete,
   and the LCP Opened state entered, once a Configure-Ack packet
   (described later) has been both sent and received.

   All Configuration Options are assumed to be at default values unless
   altered by the configuration exchange.  See the chapter on LCP
   Configuration Options for further discussion.

   It is important to note that only Configuration Options which are
   independent of particular network-layer protocols are configured by
   LCP.  Configuration of individual network-layer protocols is handled
   by separate Network Control Protocols (NCPs) during the Network-Layer
   Protocol phase.

   Any non-LCP packets received during this phase MUST be silently
   discarded.

   The receipt of the LCP Configure-Request causes a return to the Link
   Establishment phase from the Network-Layer Protocol phase or
   Authentication phase.

3.5.  Authentication Phase

   On some links it may be desirable to require a peer to authenticate
   itself before allowing network-layer protocol packets to be
   exchanged.

   By default, authentication is not mandatory.  If an implementation
   desires that the peer authenticate with some specific authentication
   protocol, then it MUST request the use of that authentication
   protocol during Link Establishment phase.

   Authentication SHOULD take place as soon as possible after link
   establishment.  However, link quality determination MAY occur
   concurrently.  An implementation MUST NOT allow the exchange of link
   quality determination packets to delay authentication indefinitely.

   Advancement from the Authentication phase to the Network-Layer
   Protocol phase MUST NOT occur until authentication has completed.  If
   authentication fails, the authenticator SHOULD proceed instead to the
   Link Termination phase.

   Only Link Control Protocol, authentication protocol, and link quality
   monitoring packets are allowed during this phase.  All other packets
   received during this phase MUST be silently discarded.

   Implementation Notes:

      An implementation SHOULD NOT fail authentication simply due to
      timeout or lack of response.  The authentication SHOULD allow some
      method of retransmission, and proceed to the Link Termination
      phase only after a number of authentication attempts has been
      exceeded.

      The implementation responsible for commencing Link Termination
      phase is the implementation which has refused authentication to
      its peer.

3.6.  Network-Layer Protocol Phase

   Once PPP has finished the previous phases, each network-layer
   protocol (such as IP, IPX, or AppleTalk) MUST be separately
   configured by the appropriate Network Control Protocol (NCP).

   Each NCP MAY be Opened and Closed at any time.

   Implementation Note:

      Because an implementation may initially use a significant amount
      of time for link quality determination, implementations SHOULD
      avoid fixed timeouts when waiting for their peers to configure a
      NCP.

   After a NCP has reached the Opened state, PPP will carry the
   corresponding network-layer protocol packets.  Any supported
   network-layer protocol packets received when the corresponding NCP is
   not in the Opened state MUST be silently discarded.

   Implementation Note:

      While LCP is in the Opened state, any protocol packet which is
      unsupported by the implementation MUST be returned in a Protocol-
      Reject (described later).  Only protocols which are supported are
      silently discarded.

   During this phase, link traffic consists of any possible combination
   of LCP, NCP, and network-layer protocol packets.

3.7.  Link Termination Phase

   PPP can terminate the link at any time.  This might happen because of
   the loss of carrier, authentication failure, link quality failure,
   the expiration of an idle-period timer, or the administrative closing
   of the link.

   LCP is used to close the link through an exchange of Terminate
   packets.  When the link is closing, PPP informs the network-layer
   protocols so that they may take appropriate action.

   After the exchange of Terminate packets, the implementation SHOULD
   signal the physical-layer to disconnect in order to enforce the
   termination of the link, particularly in the case of an
   authentication failure.  The sender of the Terminate-Request SHOULD
   disconnect after receiving a Terminate-Ack, or after the Restart
   counter expires.  The receiver of a Terminate-Request SHOULD wait for
   the peer to disconnect, and MUST NOT disconnect until at least one
   Restart time has passed after sending a Terminate-Ack.  PPP SHOULD
   proceed to the Link Dead phase.

   Any non-LCP packets received during this phase MUST be silently
   discarded.

   Implementation Note:

      The closing of the link by LCP is sufficient.  There is no need
      for each NCP to send a flurry of Terminate packets.  Conversely,
      the fact that one NCP has Closed is not sufficient reason to cause
      the termination of the PPP link, even if that NCP was the only NCP
      currently in the Opened state.

4.  The Option Negotiation Automaton

   The finite-state automaton is defined by events, actions and state
   transitions.  Events include reception of external commands such as
   Open and Close, expiration of the Restart timer, and reception of
   packets from a peer.  Actions include the starting of the Restart
   timer and transmission of packets to the peer.

   Some types of packets -- Configure-Naks and Configure-Rejects
, or
   Code-Rejects and Protocol-Rejects, or Echo-Requests, Echo-Replies and
   Discard-Requests -- are not differentiated in the automaton
   descriptions.  As will be described later, these packets do indeed
   serve different functions.  However, they always cause the same
   transitions.

   Events                                   Actions

   Up   = lower layer is Up                 tlu = This-Layer-Up
   Down = lower layer is Down               tld = This-Layer-Down
   Open = administrative Open               tls = This-Layer-Started
   Close= administrative Close              tlf = This-Layer-Finished

   TO+  = Timeout with counter > 0          irc = Initialize-Restart-Count
   TO-  = Timeout with counter expired      zrc = Zero-Restart-Count

   RCR+ = Receive-Configure-Request (Good)  scr = Send-Configure-Request
   RCR- = Receive-Configure-Request (Bad)
   RCA  = Receive-Configure-Ack             sca = Send-Configure-Ack
   RCN  = Receive-Configure-Nak/Rej         scn = Send-Configure-Nak/Rej

   RTR  = Receive-Terminate-Request         str = Send-Terminate-Request
   RTA  = Receive-Terminate-Ack             sta = Send-Terminate-Ack

   RUC  = Receive-Unknown-Code              scj = Send-Code-Reject
   RXJ+ = Receive-Code-Reject (permitted)
       or Receive-Protocol-Reject
   RXJ- = Receive-Code-Reject (catastrophic)
       or Receive-Protocol-Reject
   RXR  = Receive-Echo-Request              ser = Send-Echo-Reply
       or Receive-Echo-Reply
       or Receive-Discard-Request

4.1.  State Transition Table

   The complete state transition table follows.  States are indicated
   horizontally, and events are read vertically.  State transitions and
   actions are represented in the form action/new-state.  Multiple
   actions are separated by commas, and may continue on succeeding lines
   as space requires; multiple actions may be implemented in any
   convenient order.  The state may be followed by a letter, which
   indicates an explanatory footnote.  The dash ('-') indicates an
   illegal transition.

      | State
      |    0         1         2         3         4         5
Events| Initial   Starting  Closed    Stopped   Closing   Stopping
------+-----------------------------------------------------------
 Up   |    2     irc,scr/6     -         -         -         -
 Down |    -         -         0       tls/1       0         1
 Open |  tls/1       1     irc,scr/6     3r        5r        5r
 Close|    0       tlf/0       2         2         4         4
      |
  TO+ |    -         -         -         -       str/4     str/5
  TO- |    -         -         -         -       tlf/2     tlf/3
      |
 RCR+ |    -         -       sta/2 irc,scr,sca/8   4         5
 RCR- |    -         -       sta/2 irc,scr,scn/6   4         5
 RCA  |    -         -       sta/2     sta/3       4         5
 RCN  |    -         -       sta/2     sta/3       4         5
      |
 RTR  |    -         -       sta/2     sta/3     sta/4     sta/5
 RTA  |    -         -         2         3       tlf/2     tlf/3
      |
 RUC  |    -         -       scj/2     scj/3     scj/4     scj/5
 RXJ+ |    -         -         2         3         4         5
 RXJ- |    -         -       tlf/2     tlf/3     tlf/2     tlf/3
      |
 RXR  |    -         -         2         3         4         5

      | State
      |    6         7         8           9
Events| Req-Sent  Ack-Rcvd  Ack-Sent    Opened
------+-----------------------------------------
 Up   |    -         -         -           -
 Down |    1         1         1         tld/1
 Open |    6         7         8           9r
 Close|irc,str/4 irc,str/4 irc,str/4 tld,irc,str/4
      |
  TO+ |  scr/6     scr/6     scr/8         -
  TO- |  tlf/3p    tlf/3p    tlf/3p        -
      |
 RCR+ |  sca/8   sca,tlu/9   sca/8   tld,scr,sca/8
 RCR- |  scn/6     scn/7     scn/6   tld,scr,scn/6
 RCA  |  irc/7     scr/6x  irc,tlu/9   tld,scr/6x
 RCN  |irc,scr/6   scr/6x  irc,scr/8   tld,scr/6x
      |
 RTR  |  sta/6     sta/6     sta/6   tld,zrc,sta/5
 RTA  |    6         6         8       tld,scr/6
      |
 RUC  |  scj/6     scj/7     scj/8       scj/9
 RXJ+ |    6         6         8           9
 RXJ- |  tlf/3     tlf/3     tlf/3   tld,irc,str/5
      |
 RXR  |    6         7         8         ser/9

   The states in which the Restart timer is running are identifiable by
   the presence of TO events.  Only the Send-Configure-Request, Send-
   Terminate-Request and Zero-Restart-Count actions start or re-start
   the Restart timer.  The Restart timer is stopped when transitioning
   from any state where the timer is running to a state where the timer
   is not running.

   The events and actions are defined according to a message passing
   architecture, rather than a signalling architecture.  If an action is
   desired to control specific signals (such as DTR), additional actions
   are likely to be required.

   [p]   Passive option; see Stopped state discussion.

   [r]   Restart option; see Open event discussion.

   [x]   Crossed connection; see RCA event discussion.

4.2.  States

   Following is a more detailed description of each automaton state.

   Initial

      In the Initial state, the lower layer is unavailable (Down), and
      no Open has occurred.  The Restart timer is not running in the
      Initial state.

   Starting

      The Starting state is the Open counterpart to the Initial state.
      An administrative Open has been initiated, but the lower layer is
      still unavailable (Down).  The Restart timer is not running in the
      Starting state.

      When the lower layer becomes available (Up), a Configure-Request
      is sent.

   Closed

      In the Closed state, the link is available (Up), but no Open has
      occurred.  The Restart timer is not running in the Closed state.

      Upon reception of Configure-Request packets, a Terminate-Ack is
      sent.  Terminate-Acks are silently discarded to avoid creating a
      loop.

   Stopped

      The Stopped state is the Open counterpart to the Closed state.  It
      is entered when the automaton is waiting for a Down event after
      the This-Layer-Finished action, or after sending a Terminate-Ack.
      The Restart timer is not running in the Stopped state.

      Upon reception of Configure-Request packets, an appropriate
      response is sent.  Upon reception of other packets, a Terminate-
      Ack is sent.  Terminate-Acks are silently discarded to avoid
      creating a loop.

      Rationale:

         The Stopped state is a junction state for link termination,
         link configuration failure, and other automaton failure modes.
         These potentially separate states have been combined.

         There is a race condition between the Down event response (from

         the This-Layer-Finished action) and the Receive-Configure-
         Request event.  When a Configure-Request arrives before the
         Down event, the Down event will supercede by returning the
         automaton to the Starting state.  This prevents attack by
         repetition.

      Implementation Option:

         After the peer fails to respond to Configure-Requests, an
         implementation MAY wait passively for the peer to send
         Configure-Requests.  In this case, the This-Layer-Finished

         action is not used for the TO- event in states Req-Sent, Ack-
         Rcvd and Ack-Sent.

         This option is useful for dedicated circuits, or circuits which
         have no status signals available, but SHOULD NOT be used for
         switched circuits.

   Closing

      In the Closing state, an attempt is made to terminate the
      connection.  A Terminate-Request has been sent and the Restart
      timer is running, but a Terminate-Ack has not yet been received.

      Upon reception of a Terminate-Ack, the Closed state is entered.
      Upon the expiration of the Restart timer, a new Terminate-Request
      is transmitted, and the Restart timer is restarted.  After the
      Restart timer has expired Max-Terminate times, the Closed state is
      entered.

   Stopping

      The Stopping state is the Open counterpart to the Closing state.
      A Terminate-Request has been sent and the Restart timer is
      running, but a Terminate-Ack has not yet been received.

      Rationale:

         The Stopping state provides a well defined opportunity to
         terminate a link before allowing new traffic.  After the link
         has terminated, a new configuration may occur via the Stopped
         or Starting states.

   Request-Sent

      In the Request-Sent state an attempt is made to configure the
      connection.  A Configure-Request has been sent and the Restart
      timer is running, but a Configure-Ack has not yet been received

      nor has one been sent.

   Ack-Received

      In the Ack-Received state, a Configure-Request has been sent and a
      Configure-Ack has been received.  The Restart timer is still
      running, since a Configure-Ack has not yet been sent.

   Ack-Sent

      In the Ack-Sent state, a Configure-Request and a Configure-Ack
      have both been sent, but a Configure-Ack has not yet been
      received.  The Restart timer is running, since a Configure-Ack has
      not yet been received.

   Opened

      In the Opened state, a Configure-Ack has been both sent and
      received.  The Restart timer is not running.

      When entering the Opened state, the implementation SHOULD signal
      the upper layers that it is now Up.  Conversely, when leaving the
      Opened state, the implementation SHOULD signal the upper layers
      that it is now Down.

4.3.  Events

   Transitions and actions in the automaton are caused by events.

   Up

      This event occurs when a lower layer indicates that it is ready to
      carry packets.

      Typically, this event is used by a modem handling or calling
      process, or by some other coupling of the PPP link to the physical
      media, to signal LCP that the link is entering Link Establishment
      phase.

      It also can be used by LCP to signal each NCP that the link is
      entering Network-Layer Protocol phase.  That is, the This-Layer-Up
      action from LCP triggers the Up event in the NCP.

   Down

      This event occurs when a lower layer indicates that it is no

      longer ready to carry packets.

      Typically, this event is used by a modem handling or calling
      process, or by some other coupling of the PPP link to the physical
      media, to signal LCP that the link is entering Link Dead phase.

      It also can be used by LCP to signal each NCP that the link is
      leaving Network-Layer Protocol phase.  That is, the This-Layer-
      Down action from LCP triggers the Down event in the NCP.

   Open

      This event indicates that the link is administratively available
      for traffic; that is, the network administrator (human or program)
      has indicated that the link is allowed to be Opened.  When this
      event occurs, and the link is not in the Opened state, the
      automaton attempts to send configuration packets to the peer.

      If the automaton is not able to begin configuration (the lower
      layer is Down, or a previous Close event has not completed), the
      establishment of the link is automatically delayed.

      When a Terminate-Request is received, or other events occur which
      cause the link to become unavailable, the automaton will progress
      to a state where the link is ready to re-open.  No additional
      administrative intervention is necessary.

      Implementation Option:

         Experience has shown that users will execute an additional Open
         command when they want to renegotiate the link.  This might
         indicate that new values are to be negotiated.

         Since this is not the meaning of the Open event, it is
         suggested that when an Open user command is executed in the
         Opened, Closing, Stopping, or Stopped states, the
         implementation issue a Down event, immediately followed by an
         Up event.  Care must be taken that an intervening Down event
         cannot occur from another source.

         The Down followed by an Up will cause an orderly renegotiation
         of the link, by progressing through the Starting to the
         Request-Sent state.  This will cause the renegotiation of the
         link, without any harmful side effects.

   Close

      This event indicates that the link is not available for traffic;

      that is, the network administrator (human or program) has
      indicated that the link is not allowed to be Opened.  When this
      event occurs, and the link is not in the Closed state, the
      automaton attempts to terminate the connection.  Futher attempts
      to re-configure the link are denied until a new Open event occurs.

      Implementation Note:

         When authentication fails, the link SHOULD be terminated, to
         prevent attack by repetition and denial of service to other
         users.  Since the link is administratively available (by
         definition), this can be accomplished by simulating a Close
         event to the LCP, immediately followed by an Open event.  Care
         must be taken that an intervening Close event cannot occur from
         another source.

         The Close followed by an Open will cause an orderly termination
         of the link, by progressing through the Closing to the Stopping
         state, and the This-Layer-Finished action can disconnect the
         link.  The automaton waits in the Stopped or Starting states
         for the next connection attempt.

   Timeout (TO+,TO-)

      This event indicates the expiration of the Restart timer.  The
      Restart timer is used to time responses to Configure-Request and
      Terminate-Request packets.

      The TO+ event indicates that the Restart counter continues to be
      greater than zero, which triggers the corresponding Configure-
      Request or Terminate-Request packet to be retransmitted.

      The TO- event indicates that the Restart counter is not greater
      than zero, and no more packets need to be retransmitted.

   Receive-Configure-Request (RCR+,RCR-)

      This event occurs when a Configure-Request packet is received from
      the peer.  The Configure-Request packet indicates the desire to
      open a connection and may specify Configuration Options.  The
      Configure-Request packet is more fully described in a later
      section.

      The RCR+ event indicates that
 the Configure-Request was
      acceptable, and triggers the transmission of a corresponding
      Configure-Ack.

      The RCR- event indicates that the Configure-Request was

      unacceptable, and triggers the transmission of a corresponding
      Configure-Nak or Configure-Reject.

      Implementation Note:

         These events may occur on a connection which is already in the
         Opened state.  The implementation MUST be prepared to
         immediately renegotiate the Configuration Options.

   Receive-Configure-Ack (RCA)

      This event occurs when a valid Configure-Ack packet is received
      from the peer.  The Configure-Ack packet is a positive response to
      a Configure-Request packet.  An out of sequence or otherwise
      invalid packet is silently discarded.

      Implementation Note:

         Since the correct packet has already been received before
         reaching the Ack-Rcvd or Opened states, it is extremely
         unlikely that another such packet will arrive.  As specified,
         all invalid Ack/Nak/Rej packets are silently discarded, and do
         not affect the transitions of the automaton.

         However, it is not impossible that a correctly formed packet
         will arrive through a coincidentally-timed cross-connection.
         It is more likely to be the result of an implementation error.
         At the very least, this occurance SHOULD be logged.

   Receive-Configure-Nak/Rej (RCN)

      This event occurs when a valid Configure-Nak or Configure-Reject
      packet is received from the peer.  The Configure-Nak and
      Configure-Reject packets are negative responses to a Configure-
      Request packet.  An out of sequence or otherwise invalid packet is
      silently discarded.

      Implementation Note:

         Although the Configure-Nak and Configure-Reject cause the same
         state transition in the automaton, these packets have
         significantly different effects on the Configuration Options
         sent in the resulting Configure-Request packet.

   Receive-Terminate-Request (RTR)

      This event occurs when a Terminate-Request packet is received.
      The Terminate-Request packet indicates the desire of the peer to

      close the connection.

      Implementation Note:

         This event is not identical to the Close event (see above), and
         does not override the Open commands of the local network
         administrator.  The implementation MUST be prepared to receive
         a new Configure-Request without network administrator
         intervention.

   Receive-Terminate-Ack (RTA)

      This event occurs when a Terminate-Ack packet is received from the
      peer.  The Terminate-Ack packet is usually a response to a
      Terminate-Request packet.  The Terminate-Ack packet may also
      indicate that the peer is in Closed or Stopped states, and serves
      to re-synchronize the link configuration.

   Receive-Unknown-Code (RUC)

      This event occurs when an un-interpretable packet is received from
      the peer.  A Code-Reject packet is sent in response.

   Receive-Code-Reject, Receive-Protocol-Reject (RXJ+,RXJ-)

      This event occurs when a Code-Reject or a Protocol-Reject packet
      is received from the peer.

      The RXJ+ event arises when the rejected value is acceptable, such
      as a Code-Reject of an extended code, or a Protocol-Reject of a
      NCP.  These are within the scope of normal operation.  The
      implementation MUST stop sending the offending packet type.

      The RXJ- event arises when the rejected value is catastrophic,
      such as a Code-Reject of Configure-Request, or a Protocol-Reject
      of LCP!  This event communicates an unrecoverable error that
      terminates the connection.

   Receive-Echo-Request, Receive-Echo-Reply, Receive-Discard-Request
   (RXR)

      This event occurs when an Echo-Request, Echo-Reply or Discard-
      Request packet is received from the peer.  The Echo-Reply packet
      is a response to an Echo-Request packet.  There is no reply to an
      Echo-Reply or Discard-Request packet.

4.4.  Actions

   Actions in the automaton are caused by events and typically indicate
   the transmission of packets and/or the starting or stopping of the
   Restart timer.

   Illegal-Event (-)

      This indicates an event that cannot occur in a properly
      implemented automaton.  The implementation has an internal error,
      which should be reported and logged.  No transition is taken, and
      the implementation SHOULD NOT reset or freeze.

   This-Layer-Up (tlu)

      This action indicates to the upper layers that the automaton is
      entering the Opened state.

      Typically, this action is used by the LCP to signal the Up event
      to a NCP, Authentication Protocol, or Link Quality Protocol, or
      MAY be used by a NCP to indicate that the link is available for
      its network layer traffic.

   This-Layer-Down (tld)

      This action indicates to the upper layers that the automaton is
      leaving the Opened state.

      Typically, this action is used by the LCP to signal the Down event
      to a NCP, Authentication Protocol, or Link Quality Protocol, or
      MAY be used by a NCP to indicate that the link is no longer
      available for its network layer traffic.

   This-Layer-Started (tls)

      This action indicates to the lower layers that the automaton is
      entering the Starting state, and the lower layer is needed for the
      link.  The lower layer SHOULD respond with an Up event when the
      lower layer is available.

      This results of this action are highly implementation dependent.

   This-Layer-Finished (tlf)

      This action indicates to the lower layers that the automaton is
      entering the Initial, Closed or Stopped states, and the lower
      layer is no longer needed for the link.  The lower layer SHOULD
      respond with a Down event when the lower layer has terminated.

      Typically, this action MAY be used by the LCP to advance to the
      Link Dead phase, or MAY be used by a NCP to indicate to the LCP
      that the link may terminate when there are no other NCPs open.

      This results of this action are highly implementation dependent.

   Initialize-Restart-Count (irc)

      This action sets the Restart counter to the appropriate value
      (Max-Terminate or Max-Configure).  The counter is decremented for
      each transmission, including the first.

      Implementation Note:

         In addition to setting the Restart counter, the implementation
         MUST set the timeout period to the initial value when Restart
         timer backoff is used.

   Zero-Restart-Count (zrc)

      This action sets the Restart counter to zero.

      Implementation Note:

         This action enables the FSA to pause before proceeding to the
         desired final state, allowing traffic to be processed by the
         peer.  In addition to zeroing the Restart counter, the
         implementation MUST set the timeout period to an appropriate
         value.

   Send-Configure-Request (scr)

      A Configure-Request packet is transmitted.  This indicates the
      desire to open a connection with a specified set of
 Configuration
      Options.  The Restart timer is started when the Configure-Request
      packet is transmitted, to guard against packet loss.  The Restart
      counter is decremented each time a Configure-Request is sent.

   Send-Configure-Ack (sca)

      A Configure-Ack packet is transmitted.  This acknowledges the
      reception of a Configure-Request packet with an acceptable set of
      Configuration Options.

   Send-Configure-Nak (scn)

      A Configure-Nak or Configure-Reject packet is transmitted, as
      appropriate.  This negative response reports the reception of a

      Configure-Request packet with an unacceptable set of Configuration
      Options.

      Configure-Nak packets are used to refuse a Configuration Option
      value, and to suggest a new, acceptable value.  Configure-Reject
      packets are used to refuse all negotiation about a Configuration
      Option, typically because it is not recognized or implemented.
      The use of Configure-Nak versus Configure-Reject is more fully
      described in the chapter on LCP Packet Formats.

   Send-Terminate-Request (str)

      A Terminate-Request packet is transmitted.  This indicates the
      desire to close a connection.  The Restart timer is started when
      the Terminate-Request packet is transmitted, to guard against
      packet loss.  The Restart counter is decremented each time a
      Terminate-Request is sent.

   Send-Terminate-Ack (sta)

      A Terminate-Ack packet is transmitted.  This acknowledges the
      reception of a Terminate-Request packet or otherwise serves to
      synchronize the automatons.

   Send-Code-Reject (scj)

      A Code-Reject packet is transmitted.  This indicates the reception
      of an unknown type of packet.

   Send-Echo-Reply (ser)

      An Echo-Reply packet is transmitted.  This acknowledges the
      reception of an Echo-Request packet.

4.5.  Loop Avoidance

   The protocol makes a reasonable attempt at avoiding Configuration
   Option negotiation loops.  However, the protocol does NOT guarantee
   that loops will not happen.  As with any negotiation, it is possible
   to configure two PPP implementations with conflicting policies that
   will never converge.  It is also possible to configure policies which
   do converge, but which take significant time to do so.  Implementors
   should keep this in mind and SHOULD implement loop detection
   mechanisms or higher level timeouts.

4.6.  Counters and Timers

   Restart Timer

      There is one special timer used by the automaton.  The Restart
      timer is used to time transmissions of Configure-Request and
      Terminate-Request packets.  Expiration of the Restart timer causes
      a Timeout event, and retransmission of the corresponding
      Configure-Request or Terminate-Request packet.  The Restart timer
      MUST be configurable, but SHOULD default to three (3) seconds.

      Implementation Note:

         The Restart timer SHOULD be based on the speed of the link.
         The default value is designed for low speed (2,400 to 9,600
         bps), high switching latency links (typical telephone lines).
         Higher speed links, or links with low switching latency, SHOULD
         have correspondingly faster retransmission times.

         Instead of a constant value, the Restart timer MAY begin at an
         initial small value and increase to the configured final value.
         Each successive value less than the final value SHOULD be at
         least twice the previous value.  The initial value SHOULD be
         large enough to account for the size of the packets, twice the
         round trip time for transmission at the link speed, and at
         least an additional 100 milliseconds to allow the peer to
         process the packets before responding.  Some circuits add
         another 200 milliseconds of satellite delay.  Round trip times
         for modems operating at 14,400 bps have been measured in the
         range of 160 to more than 600 milliseconds.

   Max-Terminate

      There is one required restart counter for Terminate-Requests.
      Max-Terminate indicates the number of Terminate-Request packets
      sent without receiving a Terminate-Ack before assuming that the
      peer is unable to respond.  Max-Terminate MUST be configurable,
      but SHOULD default to two (2) transmissions.

   Max-Configure

      A similar counter is recommended for Configure-Requests.  Max-
      Configure indicates the number of Configure-Request packets sent
      without receiving a valid Configure-Ack, Configure-Nak or
      Configure-Reject before assuming that the peer is unable to
      respond.  Max-Configure MUST be configurable, but SHOULD default
      to ten (10) transmissions.

   Max-Failure

      A related counter is recommended for Configure-Nak.  Max-Failure
      indicates the number of Configure-Nak packets sent without sending
      a Configure-Ack before assuming that configuration is not
      converging.  Any further Configure-Nak packets for peer requested
      options are converted to Configure-Reject packets, and locally
      desired options are no longer appended.  Max-Failure MUST be
      configurable, but SHOULD default to five (5) transmissions.

5.  LCP Packet Formats

   There are three classes of LCP packets:

      1. Link Configuration packets used to establish and configure a
         link (Configure-Request, Configure-Ack, Configure-Nak and
         Configure-Reject).

      2. Link Termination packets used to terminate a link (Terminate-
         Request and Terminate-Ack).

      3. Link Maintenance packets used to manage and debug a link
         (Code-Reject, Protocol-Reject, Echo-Request, Echo-Reply, and
         Discard-Request).

   In the interest of simplicity, there is no version field in the LCP
   packet.  A correctly functioning LCP implementation will always
   respond to unknown Protocols and Codes with an easily recognizable
   LCP packet, thus providing a deterministic fallback mechanism for
   implementations of other versions.

   Regardless of which Configuration Options are enabled, all LCP Link
   Configuration, Link Termination, and Code-Reject packets (codes 1
   through 7) are always sent as if no Configuration Options were
   negotiated.  In particular, each Configuration Option specifies a
   default value.  This ensures that such LCP packets are always
   recognizable, even when one end of the link mistakenly believes the
   link to be open.

   Exactly one LCP packet is encapsulated in the PPP Information field,
   where the PPP Protocol field indicates type hex c021 (Link Control
   Protocol).

   A summary of the Link Control Protocol packet format is shown below.
   The fields are transmitted from left to right.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Code      |  Identifier   |            Length             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Data ...
   +-+-+-+-+

   Code

      The Code field is one octet, and identifies the kind of LCP

      packet.  When a packet is received with an unknown Code field, a
      Code-Re
ject packet is transmitted.

      Up-to-date values of the LCP Code field are specified in the most
      recent "Assigned Numbers" RFC [2].  This document concerns the
      following values:

         1       Configure-Request
         2       Configure-Ack
         3       Configure-Nak
         4       Configure-Reject
         5       Terminate-Request
         6       Terminate-Ack
         7       Code-Reject
         8       Protocol-Reject
         9       Echo-Request
         10      Echo-Reply
         11      Discard-Request

   Identifier

      The Identifier field is one octet, and aids in matching requests
      and replies.  When a packet is received with an invalid Identifier
      field, the packet is silently discarded without affecting the
      automaton.

   Length

      The Length field is two octets, and indicates the length of the
      LCP packet, including the Code, Identifier, Length and Data
      fields.  The Length MUST NOT exceed the MRU of the link.

      Octets outside the range of the Length field are treated as
      padding and are ignored on reception.  When a packet is received
      with an invalid Length field, the packet is silently discarded
      without affecting the automaton.

   Data

      The Data field is zero or more octets, as indicated by the Length
      field.  The format of the Data field is determined by the Code
      field.

5.1.  Configure-Request

   Description

      An implementation wishing to open a connection MUST transmit a
      Configure-Request.  The Options field is filled with any desired
      changes to the link defaults.  Configuration Options SHOULD NOT be
      included with default values.

      Upon reception of a Configure-Request, an appropriate reply MUST
      be transmitted.

   A summary of the Configure-Request packet format is shown below.  The
   fields are transmitted from left to right.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Code      |  Identifier   |            Length             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Options ...
   +-+-+-+-+

   Code

      1 for Configure-Request.

   Identifier

      The Identifier field MUST be changed whenever the contents of the
      Options field changes, and whenever a valid reply has been
      received for a previous request.  For retransmissions, the
      Identifier MAY remain unchanged.

   Options

      The options field is variable in length, and contains the list of
      zero or more Configuration Options that the sender desires to
      negotiate.  All Configuration Options are always negotiated
      simultaneously.  The format of Configuration Options is further
      described in a later chapter.

5.2.  Configure-Ack

   Description

      If every Configuration Option received in a Configure-Request is
      recognizable and all values are acceptable, then the
      implementation MUST transmit a Configure-Ack.  The acknowledged
      Configuration Options MUST NOT be reordered or modified in any
      way.

      On reception of a Configure-Ack, the Identifier field MUST match
      that of the last transmitted Configure-Request.  Additionally, the
      Configuration Options in a Configure-Ack MUST exactly match those
      of the last transmitted Configure-Request.  Invalid packets are
      silently discarded.

   A summary of the Configure-Ack packet format is shown below.  The
   fields are transmitted from left to right.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Code      |  Identifier   |            Length             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Options ...
   +-+-+-+-+

   Code

      2 for Configure-Ack.

   Identifier

      The Identifier field is a copy of the Identifier field of the
      Configure-Request which caused this Configure-Ack.

   Options

      The Options field is variable in length, and contains the list of
      zero or more Configuration Options that the sender is
      acknowledging.  All Configuration Options are always acknowledged
      simultaneously.

5.3.  Configure-Nak

   Description

      If every instance of the received Configuration Options is
      recognizable, but some values are not acceptable, then the
      implementation MUST transmit a Configure-Nak.  The Options field
      is filled with only the unacceptable Configuration Options from
      the Configure-Request.  All acceptable Configuration Options are
      filtered out of the Configure-Nak, but otherwise the Configuration
      Options from the Configure-Request MUST NOT be reordered.

      Options which have no value fields (boolean options) MUST use the
      Configure-Reject reply instead.

      Each Configuration Option which is allowed only a single instance
      MUST be modified to a value acceptable to the Configure-Nak
      sender.  The default value MAY be used, when this differs from the
      requested value.

      When a particular type of Configuration Option can be listed more
      than once with different values, the Configure-Nak MUST include a
      list of all values for that option which are acceptable to the
      Configure-Nak sender.  This includes acceptable values that were
      present in the Configure-Request.

      Finally, an implementation may be configured to request the
      negotiation of a specific Configuration Option.  If that option is
      not listed, then that option MAY be appended to the list of Nak'd
      Configuration Options, in order to prompt the peer to include that
      option in its next Configure-Request packet.  Any value fields for
      the option MUST indicate values acceptable to the Configure-Nak
      sender.

      On reception of a Configure-Nak, the Identifier field MUST match
      that of the last transmitted Configure-Request.  Invalid packets
      are silently discarded.

      Reception of a valid Configure-Nak indicates that when a new
      Configure-Request is sent, the Configuration Options MAY be
      modified as specified in the Configure-Nak.  When multiple
      instances of a Configuration Option are present, the peer SHOULD
      select a single value to include in its next Configure-Request
      packet.

      Some Configuration Options have a variable length.  Since the
      Nak'd Option has been modified by the peer, the implementation
      MUST be able to handle an Option length which is different from

      the original Configure-Request.

   A summary of the Configure-Nak packet format is shown below.  The
   fields are transmitted from left to right.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Code      |  Identifier   |            Length             |
   +-+-+-+-+-+-+-+-+-+-+-+-
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Options ...
   +-+-+-+-+

   Code

      3 for Configure-Nak.

   Identifier

      The Identifier field is a copy of the Identifier field of the
      Configure-Request which caused this Configure-Nak.

   Options

      The Options field is variable in length, and contains the list of
      zero or more Configuration Options that the sender is Nak'ing.
      All Configuration Options are always Nak'd simultaneously.

5.4.  Configure-Reject

   Description

      If some Configuration Options received in a Configure-Request are
      not recognizable or are not acceptable for negotiation (as
      configured by a network administrator), then the implementation
      MUST transmit a Configure-Reject.  The Options field is filled
      with only the unacceptable Configuration Options from the
      Configure-Request.  All recognizable and negotiable Configuration
      Options are filtered out of the Configure-Reject, but otherwise
      the Configuration Options MUST NOT be reordered or modified in any
      way.

      On reception of a Configure-Reject, the Identifier field MUST
      match that of the last transmitted Configure-Request.
      Additionally, the Configuration Options in a Configure-Reject MUST

      be a proper subset of those in the last transmitted Configure-
      Request.  Invalid packets are silently discarded.

      Reception of a valid Configure-Reject indicates that when a new
      Configure-Request is sent, it MUST NOT include any of the
      Configuration Options listed in the Configure-Reject.

   A summary of the Configure-Reject packet format is shown below.  The
   fields are transmitted from left to right.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Code      |  Identifier   |            Length             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Options ...
   +-+-+-+-+

   Code

      4 for Configure-Reject.

   Identifier

      The Identifier field is a copy of the Identifier field of the
      Configure-Request which caused this Configure-Reject.

   Options

      The Options field is variable in length, and contains the list of
      zero or more Configuration Options that the sender is rejecting.
      All Configuration Options are always rejected simultaneously.

5.5.  Terminate-Request and Terminate-Ack

   Description

      LCP includes Terminate-Request and Terminate-Ack Codes in order to
      provide a mechanism for closing a connection.

      An implementation wishing to close a connection SHOULD transmit a
      Terminate-Request.  Terminate-Request packets SHOULD continue to
      be sent until Terminate-Ack is received, the lower layer indicates
      that it has gone down, or a sufficiently large number have been
      transmitted such that the peer is down with reasonable certainty.

      Upon reception of a Terminate-Request, a Terminate-Ack MUST be
      transmitted.

      Reception of an unelicited Terminate-Ack indicates that the peer
      is in the Closed or Stopped states, or is otherwise in need of
      re-negotiation.

   A summary of the Terminate-Request and Terminate-Ack packet formats
   is shown below.  The fields are transmitted from left to right.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Code      |  Identifier   |            Length             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Data ...
   +-+-+-+-+

   Code

      5 for Terminate-Request;

      6 for Terminate-Ack.

   Identifier

      On transmission, the Identifier field MUST be changed whenever the
      content of the Data field changes, and whenever a valid reply has
      been received for a previous request.  For retransmissions, the
      Identifier MAY remain unchanged.

      On reception, the Identifier field of the Terminate-Request is
      copied into the Identifier field of the Terminate-Ack packet.

   Data

      The Data field is zero or more octets, and contains uninterpreted
      data for use by the sender.  The data may consist of any binary
      value.  The end of the field is indicated by the Length.

5.6.  Code-Reject

   Description

      Reception of a LCP packet with an unknown Code indicates that the
      peer is operating with a different version.  This MUST be reported
      back to the sender of the unknown Code by transmitting a Code-
      Reject.

      Upon reception of the Code-Reject of a code which is fundamental
      to this version of the protocol, the implementation SHOULD report
      the problem and drop the connection, since it is unlikely that the
      situation can be rectified automatically.

   A summary of the Code-Reject packet format is shown below.  The
   fields are transmitted from left to right.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Code      |  Identifier   |            Length             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Rejected-Packet ...
   +-+-+-+-+-+-+-+-+

   Code

      7 for Code-Reject.

   Identifier

      The Identifier field MUST be changed for each Code-Reject sent.

   Rejected-Packet

      The Rejected-Packet field contains a copy of the LCP packet which
      is being rejected.  It begins with the Information field, and does
      not include any Data Link Layer headers nor an FCS.  The
      Rejected-Packet MUST be truncated to comply with the peer's

      established MRU.

5.7.  Protocol-Reject

   Description

      Reception of a PPP packet with an unknown Protocol field indicates
      that the peer is attempting to use a protocol which is
      unsupported.  This usually occurs when the peer attempts to
      configure a new protocol.  If the LCP automaton is in the Opened
      state, then this MUST be reported back to the peer by transmitting
      a Protocol-Reject.

      Upon reception of a Protocol-Reject, the implementation MUST stop
      sending packets of the indicated protocol at the earliest
      opportunity.

      Protocol-Reject packets can only be sent in the LCP Opened state.
      Protocol-Reject packets received in any state other than the LCP
      Opened state SHOULD be silently discarded.

   A summary of the Protocol-Reject packet format is shown below.  The
   fields are transmitted from left to right.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Code      |  Identifier   |            Length             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Rejected-Protocol       |      Rej
ected-Information ...
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Code

      8 for Protocol-Reject.

   Identifier

      The Identifier field MUST be changed for each Protocol-Reject
      sent.

   Rejected-Protocol

      The Rejected-Protocol field is two octets, and contains the PPP
      Protocol field of the packet which is being rejected.

   Rejected-Information

      The Rejected-Information field contains a copy of the packet which
      is being rejected.  It begins with the Information field, and does
      not include any Data Link Layer headers nor an FCS.  The
      Rejected-Information MUST be truncated to comply with the peer's
      established MRU.

5.8.  Echo-Request and Echo-Reply

   Description

      LCP includes Echo-Request and Echo-Reply Codes in order to provide
      a Data Link Layer loopback mechanism for use in exercising both
      directions of the link.  This is useful as an aid in debugging,
      link quality determination, performance testing, and for numerous
      other functions.

      Upon reception of an Echo-Request in the LCP Opened state, an
      Echo-Reply MUST be transmitted.

      Echo-Request and Echo-Reply packets MUST only be sent in the LCP
      Opened state.  Echo-Request and Echo-Reply packets received in any
      state other than the LCP Opened state SHOULD be silently
      discarded.

   A summary of the Echo-Request and Echo-Reply packet formats is shown
   below.  The fields are transmitted from left to right.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Code      |  Identifier   |            Length             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                         Magic-Number                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Data ...
   +-+-+-+-+

   Code

      9 for Echo-Request;

      10 for Echo-Reply.

   Identifier

      On transmission, the Identifier field MUST be changed whenever the
      content of the Data field changes, and whenever a valid reply has
      been received for a previous request.  For retransmissions, the
      Identifier MAY remain unchanged.

      On reception, the Identifier field of the Echo-Request is copied
      into the Identifier field of the Echo-Reply packet.

   Magic-Number

      The Magic-Number field is four octets, and aids in detecting links
      which are in the looped-back condition.  Until the Magic-Number
      Configuration Option has been successfully negotiated, the Magic-
      Number MUST be transmitted as zero.  See the Magic-Number
      Configuration Option for further explanation.

   Data

      The Data field is zero or more octets, and contains uninterpreted
      data for use by the sender.  The data may consist of any binary
      value.  The end of the field is indicated by the Length.

5.9.  Discard-Request

   Description

      LCP includes a Discard-Request Code in order to provide a Data
      Link Layer sink mechanism for use in exercising the local to
      remote direction of the link.  This is useful as an aid in
      debugging, performance testing, and for numerous other functions.

      Discard-Request packets MUST only be sent in the LCP Opened state.
      On reception, the receiver MUST silently discard any Discard-
      Request that it receives.

   A summary of the Discard-Request packet format is shown below.  The
   fields are transmitted from left to right.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Code      |  Identifier   |            Length             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                         Magic-Number                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Data ...
   +-+-+-+-+

   Code

      11 for Discard-Request.

   Identifier

      The Identifier field MUST be changed for each Discard-Request
      sent.

   Magic-Number

      The Magic-Number field is four octets, and aids in detecting links
      which are in the looped-back condition.  Until the Magic-Number
      Configuration Option has been successfully negotiated, the Magic-
      Number MUST be transmitted as zero.  See the Magic-Number
      Configuration Option for further explanation.

   Data

      The Data field is zero or more octets, and contains uninterpreted
      data for use by the sender.  The data may consist of any binary
      value.  The end of the field is indicated by the Length.

6.  LCP Configuration Options

   LCP Configuration Options allow negotiation of modifications to the
   default characteristics of a point-to-point link.  If a Configuration
   Option is not included in a Configure-Request packet, the default
   value for that Configuration Option is assumed.

   Some Configuration Options MAY be listed more than once.  The effect
   of this is Configuration Option specific, and is specified by each
   such Configuration Option description.  (None of the Configuration
   Options in this specification can be listed more than once.)

   The end of the list of Configuration Options is indicated by the
   Length field of the LCP packet.

   Unless otherwise specified, all Configuration Options apply in a
   half-duplex fashion; typically, in the receive direction of the link
   from the point of view of the Configure-Request sender.

   Design Philosophy

      The options indicate additional capabilities or requirements of
      the implementation that is requesting the option.  An
      implementation which does not understand any option SHOULD
      interoperate with one which implements every option.

      A default is specified for each option which allows the link to
      correctly function without negotiation of the option, although
      perhaps with less than optimal performance.

      Except where explicitly specified, acknowledgement of an option
      does not require the peer to take any additional action other than
      the default.

      It is not necessary to send the default values for the options in
      a Configure-Request.

   A summary of the Configuration Option format is shown below.  The
   fields are transmitted from left to right.

    0                   1
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |    Length     |    Data ...
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Type

      The Type field is one octet, and indicates the type of
      Configuration Option.  Up-to-date values of the LCP Option Type
      field are specified in the most recent "Assigned Numbers" RFC [2].
      This document concerns the following values:

         0       RESERVED
         1       Maximum-Receive-Unit
         3       Authentication-Protocol
         4       Quality-Protocol
         5
Magic-Number
         7       Protocol-Field-Compression
         8       Address-and-Control-Field-Compression

   Length

      The Length field is one octet, and indicates the length of this
      Configuration Option including the Type, Length and Data fields.

      If a negotiable Configuration Option is received in a Configure-
      Request, but with an invalid or unrecognized Length, a Configure-
      Nak SHOULD be transmitted which includes the desired Configuration
      Option with an appropriate Length and Data.

   Data

      The Data field is zero or more octets, and contains information
      specific to the Configuration Option.  The format and length of
      the Data field is determined by the Type and Length fields.

      When the Data field is indicated by the Length to extend beyond
      the end of the Information field, the entire packet is silently
      discarded without affecting the automaton.

6.1.  Maximum-Receive-Unit (MRU)

   Description

      This Configuration Option may be sent to inform the peer that the
      implementation can receive larger packets, or to request that the
      peer send smaller packets.

      The default value is 1500 octets.  If smaller packets are
      requested, an implementation MUST still be able to receive the
      full 1500 octet information field in case link synchronization is
      lost.

      Implementation Note:

         This option is used to indicate an implementation capability.
         The peer is not required to maximize the use of the capacity.
         For example, when a MRU is indicated which is 2048 octets, the
         peer is not required to send any packet with 2048 octets.  The
         peer need not Configure-Nak to indicate that it will only send
         smaller packets, since the implementation will always require
         support for at least 1500 octets.

   A summary of the Maximum-Receive-Unit Configuration Option format is
   shown below.  The fields are transmitted from left to right.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |    Length     |      Maximum-Receive-Unit     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Type

      1

   Length

      4

   Maximum-Receive-Unit

      The Maximum-Receive-Unit field is two octets, and specifies the
      maximum number of octets in the Information and Padding fields.
      It does not include the framing, Protocol field, FCS, nor any
      transparency bits or bytes.

6.2.  Authentication-Protocol

   Description

      On some links it may be desirable to require a peer to
      authenticate itself before allowing network-layer protocol packets
      to be exchanged.

      This Configuration Option provides a method to negotiate the use
      of a specific protocol for authentication.  By default,
      authentication is not required.

      An implementation MUST NOT include multiple Authentication-
      Protocol Configuration Options in its Configure-Request packets.
      Instead, it SHOULD attempt to configure the most desirable
      protocol first.  If that protocol is Configure-Nak'd, then the
      implementation SHOULD attempt the next most desirable protocol in
      the next Configure-Request.

      The implementation sending the Configure-Request is indicating
      that it expects authentication from its peer.  If an
      implementation sends a Configure-Ack, then it is agreeing to
      authenticate with the specified protocol.  An implementation
      receiving a Configure-Ack SHOULD expect the peer to authenticate
      with the acknowledged protocol.

      There is no requirement that authentication be full-duplex or that
      the same protocol be used in both directions.  It is perfectly
      acceptable for different protocols to be used in each direction.
      This will, of course, depend on the specific protocols negotiated.

   A summary of the Authentication-Protocol Configuration Option format
   is shown below.  The fields are transmitted from left to right.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |    Length     |     Authentication-Protocol   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Data ...
   +-+-+-+-+

   Type

      3

   Length

      >= 4

   Authentication-Protocol

      The Authentication-Protocol field is two octets, and indicates the
      authentication protocol desired.  Values for this field are always
      the same as the PPP Protocol field values for that same
      authentication protocol.

      Up-to-date values of the Authentication-Protocol field are
      specified in the most recent "Assigned Numbers" RFC [2].  Current
      values are assigned as follows:

      Value (in hex)  Protocol

      c023            Password Authentication Protocol
      c223            Challenge Handshake Authentication Protocol

   Data

      The Data field is zero or more octets, and contains additional
      data as determined by the particular protocol.

6.3.  Quality-Protocol

   Description

      On some links it may be desirable to determine when, and how
      often, the link is dropping data.  This process is called link
      quality monitoring.

      This Configuration Option provides a method to negotiate the use
      of a specific protocol for link quality monitoring.  By default,
      link quality monitoring is disabled.

      The implementation sending the Configure-Request is indicating
      that it expects to receive monitoring information from its peer.
      If an implementation sends a Configure-Ack, then it is agreeing to
      send the specified protocol.  An implementation receiving a
      Configure-Ack SHOULD expect the peer to send the acknowledged
      protocol.

      There is no requirement that quality monitoring be full-duplex or

      that the same protocol be used in both directions.  It is
      perfectly acceptable for different protocols to be used in each
      direction.  This will, of course, depend on the specific protocols
      negotiated.

   A summary of the Quality-Protocol Configuration Option format is
   shown below.  The fields are transmitted from left to right.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |    Length     |        Quality-Protocol       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Data ...
   +-+-+-+-+

   Type

      4

   Length

      >= 4

   Quality-Protocol

      The Quality-Protocol field is two octets, and indicates the link
      quality monitoring protocol desired.  Values for this field are
      always the same as the PPP Protocol field values for that same
      monitoring protocol.
      Up-to-date values of the Quality-Protocol field are specified in
      the most recent "Assigned Numbers" RFC [2].  Current values are
      assigned as follows:

      Value (in hex)  Protocol

      c025            Link Quality Report

   Data

      The Data field is zero or more octets, and contains additional
      data as determined by the particular protocol.

6.4.  Magic-Number

   Description

      This Configuration Option provides a method to detect looped-back
      links and other Data Link Layer anomalies.  This Configuration
      Option MAY be required by some other Configuration Options such as
      the Quality-Protocol Configuration Option.  By default, the
      Magic-Number is not negotiated, and zero is inserted where a
      Magic-Number might otherwise be used.

      Before this Configuration Option is requested, an implementation
      MUST choose its Magic-Number.  It is recommended that the Magic-
      Number be chosen in the most random manner possible in order to
      guarantee with very high probability that an implementation will
      arrive at a unique number.  A good way to choose a unique random
      number is to start with a unique seed.  Suggested sources of
      uniqueness include machine serial numbers, other network hardware
      addresses, time-of-day clocks, etc.  Particularly good random
      number seeds are precise measurements of the inter-arrival time of
      physical events such as packet reception on other connected
      networks, server response time, or the typing rate of a human
      user.  It is also suggested that as many sources as possible be
      used simultaneously.

      When a Configure-Request is received with a Magic-Number
      Configuration Option, the received Magic-Number is compared with
      the Magic-Number of the last Configure-Request sent to the peer.
      If the two Magic-Numbers are different, then the link is not
      looped-back, and the Magic-Number SHOULD be acknowledged.  If the
      two Magic-Numbers are equal, then it is possible, but not certain,
      that the link is looped-back and that this Configure-Request is
      actually the one last sent.  To determine this, a Configure-Nak
      MUST be sent specifying a different Magic-Number value.  A new
      Configure-Request SHOULD NOT be sent to the peer until normal
      processing would cause it to be sent (that is, until a Configure-
      Nak is received or the Restart timer runs out).

      Reception of a Configure-Nak with a Magic-Number different from
      that of the last Configure-Nak sent to the peer proves that a link
      is not looped-back, and indicates a unique Magic-Number.  If the
      Magic-Number is equal to the one sent in the last Configure-Nak,
      the possibility of a looped-back link is increased, and a new
      Magic-Number MUST be chosen.  In either case, a new Configure-
      Request SHOULD be sent with the new Magic-Number.

      If the link is indeed looped-back, this sequence (transmit
      Configure-Request, receive Configure-Request, transmit Configure-

      Nak, receive Configure-Nak) will repeat over and over again.  If
      the link is not looped-back, this sequence might occur a few
      times, but it is extremely unlikely to occur repeatedly.  More
      likely, the Magic-Numbers chosen at either end will quickly
      diverge, terminating the sequence.  The following table shows the
      probability of collisions assuming that both ends of the link
      select Magic-Numbers with a perfectly uniform distribution:

         Number of Collisions        Probability
         --------------------   ---------------------
                 1              1/2**32    = 2.3 E-10
                 2              1/2**32**2 = 5.4 E-20
                 3              1/2**32**3 = 1.3 E-29

      Good sources of uniqueness or randomness are required for this
      divergence to occur.  If a good source of uniqueness cannot be
      found, it is recommended that this Configuration Option not be
      enabled; Configure-Requests with the option SHOULD NOT be
      transmitted and any Magic-Number Configuration Options which the
      peer sends SHOULD be either acknowledged or rejected.  In this
      case, looped-back links cannot be reliably detected by the
      implementation, although they may still be detectable by the peer.

      If an implementation does transmit a Configure-Request with a
      Magic-Number Configuration Option, then it MUST NOT respond with a
      Configure-Reject when it receives a Configure-Request with a
      Magic-Number Configuration Option.  That is, if an implementation
      desires to use Magic Numbers, then it MUST also allow its peer to
      do so.  If an implementation does receive a Configure-Reject in
      response to a Configure-Request, it can only mean that the link is
      not looped-back, and that its peer will not be using Magic-
      Numbers.  In this case, an implementation SHOULD act as if the
      negotiation had been successful (as if it had instead received a
      Configure-Ack).

      The Magic-Number also may be used to detect looped-back links
      during normal operation, as well as during Configuration Option
      negotiation.  All LCP Echo-Request, Echo-Reply, and Discard-
      Request packets have a Magic-Number field.  If Magic-Number has
      been successfully negotiated, an implementation MUST transmit
      these packets with the Magic-Number field set to its negotiated
      Magic-Number.

      The Magic-Number field of these packets SHOULD be inspected on
      reception.  All received Magic-Number fields MUST be equal to
      either zero or the peer's unique Magic-Number, depending on
      whether or not the peer negotiated a Magic-Number.

      Reception of a Magic-Number field equal to the negotiated local
      Magic-Number indicates a looped-back link.  Reception of a Magic-
      Number other than the negotiated local Magic-Number, the peer's
      negotiated Magic-Number, or zero if the peer didn't negotiate one,
      indicates a link which has been (mis)configured for communications
      with a different peer.

      Procedures for recovery from either case are unspecified, and may
      vary from implementation to implementation.  A somewhat
      pessimistic procedure is to assume a LCP Down event.  A further
      Open event will begin the process of re-establishing the link,
      which can't complete until the looped-back condition is
      terminated, and Magic-Numbers are successfully negotiated.  A more
      optimistic procedure (in the case of a looped-back link) is to
      begin transmitting LCP Echo-Request packets until an appropriate
      Echo-Reply is received, indicating a termination of the looped-
      back condition.

   A summary of the Magic-Number Configuration Option format is shown
   below.  The fields are transmitted from left to right.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |    Length     |          Magic-Number
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         Magic-Number (cont)       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Type

      5

   Length

      6

   Magic-Number


      The Magic-Number field is four octets, and indicates a number
      which is very likely to be unique to one end of the link.  A
      Magic-Number of zero is illegal and MUST always be Nak'd, if it is
      not Rejected outright.

6.5.  Protocol-Field-Compression (PFC)

   Description

      This Configuration Option provides a method to negotiate the
      compression of the PPP Protocol field.  By default, all
      implementations MUST transmit packets with two octet PPP Protocol
      fields.

      PPP Protocol field numbers are chosen such that some values may be
      compressed into a single octet form which is clearly
      distinguishable from the two octet form.  This Configuration
      Option is sent to inform the peer that the implementation can
      receive such single octet Protocol fields.

      As previously mentioned, the Protocol field uses an extension
      mechanism consistent with the ISO 3309 extension mechanism for the
      Address field; the Least Significant Bit (LSB) of each octet is
      used to indicate extension of the Protocol field.  A binary "0" as
      the LSB indicates that the Protocol field continues with the
      following octet.  The presence of a binary "1" as the LSB marks
      the last octet of the Protocol field.  Notice that any number of
      "0" octets may be prepended to the field, and will still indicate
      the same value (consider the two binary representations for 3,
      00000011 and 00000000 00000011).

      When using low speed links, it is desirable to conserve bandwidth
      by sending as little redundant data as possible.  The Protocol-
      Field-Compression Configuration Option allows a trade-off between
      implementation simplicity and bandwidth efficiency.  If
      successfully negotiated, the ISO 3309 extension mechanism may be
      used to compress the Protocol field to one octet instead of two.
      The large majority of packets are compressible since data
      protocols are typically assigned with Protocol field values less
      than 256.

      Compressed Protocol fields MUST NOT be transmitted unless this
      Configuration Option has been negotiated.  When negotiated, PPP
      implementations MUST accept PPP packets with either double-octet
      or single-octet Protocol fields, and MUST NOT distinguish between
      them.

      The Protocol field is never compressed when sending any LCP
      packet.  This rule guarantees unambiguous recognition of LCP
      packets.

      When a Protocol field is compressed, the Data Link Layer FCS field
      is calculated on the compressed frame, not the original

      uncompressed frame.

   A summary of the Protocol-Field-Compression Configuration Option
   format is shown below.  The fields are transmitted from left to
   right.

    0                   1
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |    Length     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Type

      7

   Length

      2

6.6.  Address-and-Control-Field-Compression (ACFC)

   Description

      This Configuration Option provides a method to negotiate the
      compression of the Data Link Layer Address and Control fields.  By
      default, all implementations MUST transmit frames with Address and
      Control fields appropriate to the link framing.

      Since these fields usually have constant values for point-to-point
      links, they are easily compressed.  This Configuration Option is
      sent to inform the peer that the implementation can receive
      compressed Address and Control fields.

      If a compressed frame is received when Address-and-Control-Field-
      Compression has not been negotiated, the implementation MAY
      silently discard the frame.

      The Address and Control fields MUST NOT be compressed when sending
      any LCP packet.  This rule guarantees unambiguous recognition of
      LCP packets.

      When the Address and Control fields are compressed, the Data Link
      Layer FCS field is calculated on the compressed frame, not the
      original uncompressed frame.

   A summary of the Address-and-Control-Field-Compression configuration
   option format is shown below.  The fields are transmitted from left
   to right.

    0                   1
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |    Length     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Type

      8

   Length

      2

Security Considerations

   Security issues are briefly discussed in sections concerning the
   Authentication Phase, the Close event, and the Authentication-
   Protocol Configuration Option.

References

   [1]   Perkins, D., "Requirements for an Internet Standard Point-to-
         Point Protocol", RFC 1547, Carnegie Mellon University,
         December 1993.

   [2]   Reynolds, J., and Postel, J., "Assigned Numbers", STD 2, RFC
         1340, USC/Information Sciences Institute, July 1992.

Acknowledgements

   This document is the product of the Point-to-Point Protocol Working
   Group of the Internet Engineering Task Force (IETF).  Comments should
   be submitted to the ietf-ppp@merit.edu mailing list.

   Much of the text in this document is taken from the working group
   requirements [1]; and RFCs 1171 & 1172, by Drew Perkins while at
   Carnegie Mellon University, and by Russ Hobby of the University of
   California at Davis.

   William Simpson was principally responsible for introducing
   consistent terminology and philosophy, and the re-design of the phase
   and negotiation state machines.

   Many people spent significant time helping to develop the Point-to-
   Point Protocol.  The complete list of people is too numerous to list,
   but the following people deserve special thanks: Rick Adams, Ken
   Adelman, Fred Baker, Mike Ballard, Craig Fox, Karl Fox, Phill Gross,
   Kory Hamzeh, former WG chair Russ Hobby, David Kaufman, former WG
   chair Steve Knowles, Mark Lewis, former WG chair Brian Lloyd, John
   LoVerso, Bill Melohn, Mike Patton, former WG chair Drew Perkins, Greg
   Satz, John Shriver, Vernon Schryver, and Asher Waldfogel.

   Special thanks to Morning Star Technologies for providing computing
   resources and network access support for writing this specification.

Chair's Address

   The working group can be contacted via the current chair:

      Fred Baker
      Advanced Computer Communications
      315 Bollay Drive
      Santa Barbara, California  93117

      fbaker@acc.com

Editor's Address

   Questions about this memo can also be directed to:

      William Allen Simpson
      Daydreamer
      Computer Systems Consulting Services
      1384 Fontaine
      Madison Heights, Michigan  48071

      Bill.Simpson@um.cc.umich.edu
          bsimpson@MorningStar.com

Simpson                                                        [Page 52]

The post RFC 1661 – The Point-to-Point Protocol (PPP) appeared first on IPv6.net.

RFC 2406 – IP Encapsulating Security Payload (ESP)

IPv6 & IoT editor — Sat, 01 Aug 2009 17:49:43 +0000

Network Working Group S. Kent
Request for Comments: 2406 BBN Corp
Obsoletes: 1827 R. Atkinson
Category: Standards Track @Home Network
November 1998

IP Encapsulating Security Payload (ESP)

Status of this Memo

This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.

Copyright Notice

Copyright (C) The Internet Society (1998). All Rights Reserved.

Table of Contents

1. Introduction..................................................2
2. Encapsulating Security Payload Packet Format..................3
2.1 Security Parameters Index................................4
2.2 Sequence Number .........................................4
2.3 Payload Data.............................................5
2.4 Padding (for Encryption).................................5
2.5 Pad Length...............................................7
2.6 Next Header..............................................7
2.7 Authentication Data......................................7
3. Encapsulating Security Protocol Processing....................7
3.1 ESP Header Location......................................7
3.2 Algorithms..............................................10
3.2.1 Encryption Algorithms..............................10
3.2.2 Authentication Algorithms..........................10
3.3 Outbound Packet Processing..............................10
3.3.1 Security Association Lookup........................11
3.3.2 Packet Encryption..................................11
3.3.3 Sequence Number Generation.........................12
3.3.4 Integrity Check Value Calculation..................12
3.3.5 Fragmentation......................................13
3.4 Inbound Packet Processing...............................13
3.4.1 Reassembly.........................................13
3.4.2 Security Association Lookup........................13
3.4.3 Sequence Number Verification.......................14
3.4.4 Integrity Check Value Verification.................15

3.4.5 Packet Decryption..................................16
4. Auditing.....................................................17
5. Conformance Requirements.....................................18
6. Security Considerations......................................18
7. Differences from RFC 1827....................................18
Acknowledgements................................................19
References......................................................19
Disclaimer......................................................20
Author Information..............................................21
Full Copyright Statement........................................22

1. Introduction

The Encapsulating Security Payload (ESP) header is designed to
provide a mix of security services in IPv4 and IPv6. ESP may be
applied alone, in combination with the IP Authentication Header (AH)
[KA97b], or in a nested fashion, e.g., through the use of tunnel mode
(see "Security Architecture for the Internet Protocol" [KA97a],
hereafter referred to as the Security Architecture document).
Security services can be provided between a pair of communicating
hosts, between a pair of communicating security gateways, or between
a security gateway and a host. For more details on how to use ESP
and AH in various network environments, see the Security Architecture
document [KA97a].

The ESP header is inserted after the IP header and before the upper
layer protocol header (transport mode) or before an encapsulated IP
header (tunnel mode). These modes are described in more detail
below.

ESP is used to provide confidentiality, data origin authentication,
connectionless integrity, an anti-replay service (a form of partial
sequence integrity), and limited traffic flow confidentiality. The
set of services provided depends on options selected at the time of
Security Association establishment and on the placement of the
implementation. Confidentiality may be selected independent of all
other services. However, use of confidentiality without
integrity/authentication (either in ESP or separately in AH) may
subject traffic to certain forms of active attacks that could
undermine the confidentiality service (see [Bel96]). Data origin
authentication and connectionless integrity are joint services
(hereafter referred to jointly as "authentication) and are offered as
an option in conjunction with (optional) confidentiality. The anti-
replay service may be selected only if data origin authentication is
selected, and its election is solely at the discretion of the
receiver. (Although the default calls for the sender to increment
the Sequence Number used for anti-replay, the service is effective
only if the receiver checks the Sequence Number.) Traffic flow

confidentiality requires selection of tunnel mode, and is most
effective if implemented at a security gateway, where traffic
aggregation may be able to mask true source-destination patterns.
Note that although both confidentiality and authentication are
optional, at least one of them MUST be selected.

It is assumed that the reader is familiar with the terms and concepts
described in the Security Architecture document. In particular, the
reader should be familiar with the definitions of security services
offered by ESP and AH, the concept of Security Associations, the ways
in which ESP can be used in conjunction with the Authentication
Header (AH), and the different key management options available for
ESP and AH. (With regard to the last topic, the current key
management options required for both AH and ESP are manual keying and
automated keying via IKE [HC98].)

The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in RFC 2119 [Bra97].

2. Encapsulating Security Payload Packet Format

The protocol header (IPv4, IPv6, or Extension) immediately preceding
the ESP header will contain the value 50 in its Protocol (IPv4) or
Next Header (IPv6, Extension) field [STD-2].

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ----
| Security Parameters Index (SPI) | ^Auth.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Cov-
| Sequence Number | |erage
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ----
| Payload Data* (variable) | | ^
~ ~ | |
| | |Conf.
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Cov-
| | Padding (0-255 bytes) | |erage*
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
| | Pad Length | Next Header | v v
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------
| Authentication Data (variable) |
~ ~
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

* If included in the Payload field, cryptographic
synchronization data, e.g., an Initialization Vector (IV, see

Section 2.3), usually is not encrypted per se, although it
often is referred to as being part of the ciphertext.

The following subsections define the fields in the header format.
"Optional" means that the field is omitted if the option is not
selected, i.e., it is present in neither the packet as transmitted
nor as formatted for computation of an Integrity Check Value (ICV,
see Section 2.7). Whether or not an option is selected is defined as
part of Security Association (SA) establishment. Thus the format of
ESP packets for a given SA is fixed, for the duration of the SA. In
contrast, "mandatory" fields are always present in the ESP packet
format, for all SAs.

2.1 Security Parameters Index

The SPI is an arbitrary 32-bit value that, in combination with the
destination IP address and security protocol (ESP), uniquely
identifies the Security Association for this datagram. The set of
SPI values in the range 1 through 255 are reserved by the Internet
Assigned Numbers Authority (IANA) for future use; a reserved SPI
value will not normally be assigned by IANA unless the use of the
assigned SPI value is specified in an RFC. It is ordinarily selected
by the destination system upon establishment of an SA (see the
Security Architecture document for more details). The SPI field is
mandatory.

The SPI value of zero (0) is reserved for local, implementation-
specific use and MUST NOT be sent on the wire. For example, a key
management implementation MAY use the zero SPI value to mean "No
Security Association Exists" during the period when the IPsec
implementation has requested that its key management entity establish
a new SA, but the SA has not yet been established.

2.2 Sequence Number

This unsigned 32-bit field contains a monotonically increasing
counter value (sequence number). It is mandatory and is always
present even if the receiver does not elect to enable the anti-replay
service for a specific SA. Processing of the Sequence Number field
is at the discretion of the receiver, i.e., the sender MUST always
transmit this field, but the receiver need not act upon it (see the
discussion of Sequence Number Verification in the "Inbound Packet
Processing" section below).

The sender's counter and the receiver's counter are initialized to 0
when an SA is established. (The first packet sent using a given SA
will have a Sequence Number of 1; see Section 3.3.3 for more details
on how the Sequence Number is generated.) If anti-replay is enabled

(the default), the transmitted Sequence Number must never be allowed
to cycle. Thus, the sender's counter and the receiver's counter MUST
be reset (by establishing a new SA and thus a new key) prior to the
transmission of the 2^32nd packet on an SA.

2.3 Payload Data

Payload Data is a variable-length field containing data described by
the Next Header field. The Payload Data field is mandatory and is an
integral number of bytes in length. If the algorithm used to encrypt
the payload requires cryptographic synchronization data, e.g., an
Initialization Vector (IV), then this data MAY be carried explicitly
in the Payload field. Any encryption algorithm that requires such
explicit, per-packet synchronization data MUST indicate the length,
any structure for such data, and the location of this data as part of
an RFC specifying how the algorithm is used with ESP. If such
synchronization data is implicit, the algorithm for deriving the data
MUST be part of the RFC.

Note that with regard to ensuring the alignment of the (real)
ciphertext in the presence of an IV:

o For some IV-based modes of operation, the receiver treats
the IV as the start of the ciphertext, feeding it into the
algorithm directly. In these modes, alignment of the start
of the (real) ciphertext is not an issue at the receiver.
o In some cases, the receiver reads the IV in separately from
the ciphertext. In these cases, the algorithm
specification MUST address how alignment of the (real)
ciphertext is to be achieved.

2.4 Padding (for Encryption)

Several factors require or motivate use of the Padding field.

o If an encryption algorithm is employed that requires the
plaintext to be a multiple of some number of bytes, e.g.,
the block size of a block cipher, the Padding field is used
to fill the plaintext (consisting of the Payload Data, Pad
Length and Next Header fields, as well as the Padding) to
the size required by the algorithm.

o Padding also may be required, irrespective of encryption
algorithm requirements, to ensure that the resulting
ciphertext terminates on a 4-byte boundary. Specifically,

the Pad Length and Next Header fields must be right aligned
within a 4-byte word, as illustrated in the ESP packet
format figure above, to ensure that the Authentication Data
field (if present) is aligned on a 4-byte boundary.

o Padding beyond that required for the algorithm or alignment
reasons cited above, may be used to conceal the actual
length of the payload, in support of (partial) traffic flow
confidentiality. However, inclusion of such additional
padding has adverse bandwidth implications and thus its use
should be undertaken with care.

The sender MAY add 0-255 bytes of padding. Inclusion of the Padding
field in an ESP packet is optional, but all implementations MUST
support generation and consumption of padding.

a. For the purpose of ensuring that the bits to be encrypted
are a multiple of the algorithm's blocksize (first bullet
above), the padding computation applies to the Payload
Data exclusive of the IV, the Pad Length, and Next Header
fields.

b. For the purposes of ensuring that the Authentication Data
is aligned on a 4-byte boundary (second bullet above), the
padding computation applies to the Payload Data inclusive
of the IV, the Pad Length, and Next Header fields.

If Padding bytes are needed but the encryption algorithm does not
specify the padding contents, then the follow ing default processing
MUST be used. The Padding bytes are initialized with a series of
(unsigned, 1-byte) integer values. The first padding byte appended
to the plaintext is numbered 1, with subsequent padding bytes making
up a monotonically increasing sequence: 1, 2, 3, ... When this
padding scheme is employed, the receiver SHOULD inspect the Padding
field. (This scheme was selected because of its relative simplicity,
ease of implementation in hardware, and because it offers limited
protection against certain forms of "cut and paste" attacks in the
absence of other integrity measures, if the receiver checks the
padding values upon decryption.)

Any encryption algorithm that requires Padding other than the default
described above, MUST define the Padding contents (e.g., zeros or
random data) and any required receiver processing of these Padding
bytes in an RFC specifying how the algorithm is used with ESP. In
such circumstances, the content of the Padding field will be
determined by the encryption algorithm and mode selected and defined
in the corresponding algorithm RFC. The relevant algorithm RFC MAY
specify that a receiver MUST inspect the Padding field or that a

receiver MUST inform senders of how the receiver will handle the
Padding field.

2.5 Pad Length

The Pad Length field indicates the number of pad bytes immediately
preceding it. The range of valid values is 0-255, where a value of
zero indicates that no Padding bytes are present. The Pad Length
field is mandatory.

2.6 Next Header

The Next Header is an 8-bit field that identifies the type of data
contained in the Payload Data field, e.g., an extension header in
IPv6 or an upper layer protocol identifier. The value of this field
is chosen from the set of IP Protocol Numbers defined in the most
recent "Assigned Numbers" [STD-2] RFC from the Internet Assigned
Numbers Authority (IANA). The Next Header field is mandatory.

2.7 Authentication Data

The Authentication Data is a variable-length field containing an
Integrity Check Value (ICV) computed over the ESP packet minus the
Authentication Data. The length of the field is specified by the
authentication function selected. The Authentication Data field is
optional, and is included only if the authentication service has been
selected for the SA in question. The authentication algorithm
specification MUST specify the length of the ICV and the comparison
rules and processing steps for validation.

3. Encapsulating Security Protocol Processing

3.1 ESP Header Location

Like AH, ESP may be employed in two ways: transport mode or tunnel
mode. The former mode is applicable only to host implementations and
provides protection for upper layer protocols, but not the IP header.
(In this mode, note that for "bump-in-the-stack" or "bump-in-the-
wire" implementations, as defined in the Security Architecture
document, inbound and outbound IP fragments may require an IPsec
implementation to perform extra IP reassembly/fragmentation in order
to both conform to this specification and provide transparent IPsec
support. Special care is required to perform such operations within
these implementations when multiple interfaces are in use.)

In transport mode, ESP is inserted after the IP header and before an
upper layer protocol, e.g., TCP, UDP, ICMP, etc. or before any other
IPsec headers that have already been inserted. In the context of

IPv4, this translates to placing ESP after the IP header (and any
options that it contains), but before the upper layer protocol.
(Note that the term "transport" mode should not be misconstrued as
restricting its use to TCP and UDP. For example, an ICMP message MAY
be sent using either "transport" mode or "tunnel" mode.) The
following diagram illustrates ESP transport mode positioning for a
typical IPv4 packet, on a "before and after" basis. (The "ESP
trailer" encompasses any Padding, plus the Pad Length, and Next
Header fields.)

BEFORE APPLYING ESP
----------------------------
IPv4 |orig IP hdr | | |
|(any options)| TCP | Data |
----------------------------

AFTER APPLYING ESP
-------------------------------------------------
IPv4 |orig IP hdr | ESP | | | ESP | ESP|
|(any options)| Hdr | TCP | Data | Trailer |Auth|
-------------------------------------------------
|<----- encrypted ---->|
|<------ authenticated ----->|

In the IPv6 context, ESP is viewed as an end-to-end payload, and thus
should appear after hop-by-hop, routing, and fragmentation extension
headers. The destination options extension header(s) could appear
either before or after the ESP header depending on the semantics
desired. However, since ESP protects only fields after the ESP
header, it generally may be desirable to place the destination
options header(s) after the ESP header. The following diagram
illustrates ESP transport mode positioning for a typical IPv6 packet.

BEFORE APPLYING ESP
---------------------------------------
IPv6 | | ext hdrs | | |
| orig IP hdr |if present| TCP | Data |
---------------------------------------

AFTER APPLYING ESP
---------------------------------------------------------
IPv6 | orig |hop-by-hop,dest*,| |dest| | | ESP | ESP|
|IP hdr|routing,fragment.|ESP|opt*|TCP|Data|Trailer|Auth|
---------------------------------------------------------
|<---- encrypted ---->|
|<---- authenticated ---->|

* = if present, could be before ESP, after ESP, or both

ESP and AH headers can be combined in a variety of modes. The IPsec
Architecture document describes the combinations of security
associations that must be supported.

Tunnel mode ESP may be employed in either hosts or security gateways.
When ESP is implemented in a security gateway (to protect subscriber
transit traffic), tunnel mode must be used. In tunnel mode, the
"inner" IP header carries the ultimate source and destination
addresses, while an "outer" IP header may contain distinct IP
addresses, e.g., addresses of security gateways. In tunnel mode, ESP
protects the entire inner IP packet, including the entire inner IP
header. The position of ESP in tunnel mode, relative to the outer IP
header, is the same as for ESP in transport mode. The following
diagram illustrates ESP tunnel mode positioning for typical IPv4 and
IPv6 packets.

-----------------------------------------------------------
IPv4 | new IP hdr* | | orig IP hdr* | | | ESP | ESP|
|(any options)| ESP | (any options) |TCP|Data|Trailer|Auth|
-----------------------------------------------------------
|<--------- encrypted ---------->|
|<----------- authenticated ---------->|

-------------------------------------------------------- ----
IPv6 | new* |new ext | | orig*|orig ext | | | ESP | ESP|
|IP hdr| hdrs* |ESP|IP hdr| hdrs * |TCP|Data|Trailer|Auth|
------------------------------------------------------------
|<--------- encrypted ----------->|
|<---------- authenticated ---------->|

* = if present, construction of outer IP hdr/extensions
and modification of inner IP hdr/extensions is
discussed below.

3.2 Algorithms

The mandatory-to-implement algorithms are described in Section 5,
"Conformance Requirements". Other algorithms MAY be supported. Note
that although both confidentiality and authentication are optional,
at least one of these services MUST be selected hence both algorithms
MUST NOT be simultaneously NULL.

3.2.1 Encryption Algorithms

The encryption algorithm employed is specified by the SA. ESP is
designed for use with symmetric encryption algorithms. Because IP
packets may arrive out of order, each packet must carry any data
required to allow the receiver to establish cryptographic
synchronization for decryption. This data may be carried explicitly
in the payload field, e.g., as an IV (as described above), or the
data may be derived from the packet header. Since ESP makes
provision for padding of the plaintext, encryption algorithms
employed with ESP may exhibit either block or stream mode
characteristics. Note that since encryption (confidentiality) is
optional, this algorithm may be "NULL".

3.2.2 Authentication Algorithms

The authentication algorithm employed for the ICV computation is
specified by the SA. For point-to-point communication, suitable
authentication algorithms include keyed Message Authentication Codes
(MACs) based on symmetric encryption algorithms (e.g., DES) or on
one-way hash functions (e.g., MD5 or SHA-1). For multicast
communication, one-way hash algorithms combined with asymmetric
signature algorithms are appropriate, though performance and space
considerations currently preclude use of such algorithms. Note that
since authentication is optional, this algorithm may be "NULL".

3.3 Outbound Packet Processing

In transport mode, the sender encapsulates the upper layer protocol
information in the ESP header/trailer, and retains the specified IP
header (and any IP extension headers in the IPv6 context). In tunnel
mode, the outer and inner IP header/extensions can be inter-related
in a variety of ways. The construction of the outer IP
header/extensions during the encapsulation process is described in
the Security Architecture document. If there is more than one IPsec
header/extension required by security policy, the order of the
application of the security headers MUST be defined by security
policy.

3.3.1 Security Association Lookup

ESP is applied to an outbound packet only after an IPsec
implementation determines that the packet is associated with an SA
that calls for ESP processing. The process of determining what, if
any, IPsec processing is applied to outbound traffic is described in
the Security Architecture document.

3.3.2 Packet Encryption

In this section, we speak in terms of encryption always being applied
because of the formatting implications. This is done with the
understanding that "no confidentiality" is offered by using the NULL
encryption algorithm. Accordingly, the sender:

1. encapsulates (into the ESP Payload field):
- for transport mode -- just the original upper layer
protocol information.
- for tunnel mode -- the entire original IP datagram.
2. adds any necessary padding.
3. encrypts the result (Payload Data, Padding, Pad Length, and
Next Header) using the key, encryption algorithm, algorithm
mode indicated by the SA and cryptographic synchronization
data (if any).
- If explicit cryptographic synchronization data, e.g.,
an IV, is indicated, it is input to the encryption
algorithm per the algorithm specification and placed
in the Payload field.
- If implicit cryptographic synchronication data, e.g.,
an IV, is indicated, it is constructed and input to
the encryption algorithm as per the algorithm
specification.

The exact steps for constructing the outer IP header depend on the
mode (transport or tunnel) and are described in the Security
Architecture document.

If authentication is selected, encryption is performed first, before
the authentication, and the encryption does not encompass the
Authentication Data field. This order of processing facilitates
rapid detection and rejection of replayed or bogus packets by the
receiver, prior to decrypting the packet, hence potentially reducing
the impact of denial of service attacks. It also allows for the
possibility of parallel processing of packets at the receiver, i.e.,
decryption can take place in parallel with authentication. Note that
since the Authentication Data is not protected by encryption, a keyed
authentication algorithm must be employed to compute the ICV.

3.3.3 Sequence Number Generation

The sender's counter is initialized to 0 when an SA is established.
The sender increments the Sequence Number for this SA and inserts the
new value into the Sequence Number field. Thus the first packet sent
using a given SA will have a Sequence Number of 1.

If anti-replay is enabled (the default), the sender checks to ensure
that the counter has not cycled before inserting the new value in the
Sequence Number field. In other words, the sender MUST NOT send a
packet on an SA if doing so would cause the Sequence Number to cycle.
An attempt to transmit a packet that would result in Sequence Number
overflow is an auditable event. (Note that this approach to Sequence
Number management does not require use of modular arithmetic.)

The sender assumes anti-replay is enabled as a default, unless
otherwise notified by the receiver (see 3.4.3). Thus, if the counter
has cycled, the sender will set up a new SA and key (unless the SA
was configured with manual key management).

If anti-replay is disabled, the sender does not need to monitor or
reset the counter, e.g., in the case of manual key management (see
Section 5). However, the sender still increments the counter and
when it reaches the maximum value, the counter rolls over back to
zero.

3.3.4 Integrity Check Value Calculation

If authentication is selected for the SA, the sender computes the ICV
over the ESP packet minus the Authentication Data. Thus the SPI,
Sequence Number, Payload Data, Padding (if present), Pad Length, and
Next Header are all encompassed by the ICV computation. Note that
the last 4 fields will be in ciphertext form, since encryption is
performed prior to authentication.

For some authentication algorithms, the byte string over which the
ICV computation is performed must be a multiple of a blocksize
specified by the algorithm. If the length of this byte string does
not match the blocksize requirements for the algo rithm, implicit
padding MUST be appended to the end of the ESP packet, (after the
Next Header field) prior to ICV computation. The padding octets MUST
have a value of zero. The blocksize (and hence the length of the
padding) is specified by the algorithm specification. This padding
is not transmitted with the packet. Note that MD5 and SHA-1 are
viewed as having a 1-byte blocksize because of their internal padding
conventions.

3.3.5 Fragmentation

If necessary, fragmentation is performed after ESP processing within
an IPsec implementation. Thus, transport mode ESP is applied only to
whole IP datagrams (not to IP fragments). An IP packet to which ESP
has been applied may itself be fragmented by routers en route, and
such fragments must be reassembled prior to ESP processing at a
receiver. In tunnel mode, ESP is applied to an IP packet, the
payload of which may be a fragmented IP packet. For example, a
security gateway or a "bump-in-the-stack" or "bump-in-the-wire" IPsec
implementation (as defined in the Security Architecture document) may
apply tunnel mode ESP to such fragments.

NOTE: For transport mode -- As mentioned at the beginning of Section
3.1, bump-in-the-stack and bump-in-the-wire implementations may have
to first reassemble a packet fragmented by the local IP layer, then
apply IPsec, and then fragment the resulting packet.

NOTE: For IPv6 -- For bump-in-the-stack and bump-in-the-wire
implementations, it will be necessary to walk through all the
extension headers to determine if there is a fragmentation header and
hence that the packet needs reassembling prior to IPsec processing.

3.4 Inbound Packet Processing

3.4.1 Reassembly

If required, reassembly is performed prior to ESP processing. If a
packet offered to ESP for processing appears to be an IP fragment,
i.e., the OFFSET field is non-zero or the MORE FRAGMENTS flag is set,
the receiver MUST discard the packet; this is an auditable event. The
audit log entry for this event SHOULD include the SPI value,
date/time received, Source Address, Destination Address, Sequence
Number, and (in IPv6) the Flow ID.

NOTE: For packet reassembly, the current IPv4 spec does NOT require
either the zero'ing of the OFFSET field or the clearing of the MORE
FRAGMENTS flag. In order for a reassembled packet to be processed by
IPsec (as opposed to discarded as an apparent fragment), the IP code
must do these two things after it reassembles a packet.

3.4.2 Security Association Lookup

Upon receipt of a (reassembled) packet containing an ESP Header, the
receiver determines the appropriate (unidirectional) SA, based on the
destination IP address, security protocol (ESP), and the SPI. (This
process is described in more detail in the Security Architecture
document.) The SA indicates whether the Sequence Number field will

be checked, whether the Authentication Data field should be present,
and it will specify the algorithms and keys to be employed for
decryption and ICV computations (if applicable).

If no valid Security Association exists for this session (for
example, the receiver has no key), the receiver MUST discard the
packet; this is an auditable event. The audit log entry for this
event SHOULD include the SPI value, date/time received, Source
Address, Destination Address, Sequence Number, and (in IPv6) the
cleartext Flow ID.

3.4.3 Sequence Number Verification

All ESP implementations MUST support the anti-replay service, though
its use may be enabled or disabled by the receiver on a per-SA basis.
This service MUST NOT be enabled unless the authentication service
also is enabled for the SA, since otherwise the Sequence Number field
has not been integrity protected. (Note that there are no provisions
for managing transmitted Sequence Number values among multiple
senders directing traffic to a single SA (irrespective of whether the
destination address is unicast, broadcast, or multicast). Thus the
anti-replay service SHOULD NOT be used in a multi-sender environment
that employs a single SA.)

If the receiver does not enable anti-replay for an SA, no inbound
checks are performed on the Sequence Number. However, from the
perspective of the sender, the default is to assume that anti-replay
is enabled at the receiver. To avoid having the sender do
unnecessary sequence number monitoring and SA setup (see section
3.3.3), if an SA establishment protocol such as IKE is employed, the
receiver SHOULD notify the sender, during SA establishment, if the
receiver will not provide anti-replay protection.

If the receiver has enabled the anti-replay service for this SA, the
receive packet counter for the SA MUST be initialized to zero when
the SA is established. For each received packet, the receiver MUST
verify that the packet contains a Sequence Number that does not
duplicate the Sequence Number of any other packets received during
the life of this SA. This SHOULD be the first ESP check applied to a
packet after it has been matched to an SA, to speed rejection of
duplicate packets.

Duplicates are rejected through the use of a sliding receive window.
(How the window is implemented is a local matter, but the following
text describes the functionality that the implementation must
exhibit.) A MINIMUM window size of 32 MUST be supported; but a
window size of 64 is preferred and SHOULD be employed as the default.

Another window size (larger than the MINIMUM) MAY be chosen by the
receiver. (The receiver does NOT notify the sender of the window
size.)

The "right" edge of the window represents the highest, validated
Sequence Number value received on this SA. Packets that contain
Sequence Numbers lower than the "left" edge of the window are
rejected. Packets falling within the window are checked against a
list of received packets within the window. An efficient means for
performing this check, based on the use of a bit mask, is described
in the Security Architecture document.

If the received packet falls within the window and is new, or if the
packet is to the right of the window, then the receiver proceeds to
ICV verification. If the ICV validation fails, the receiver MUST
discard the received IP datagram as invalid; this is an auditable
event. The audit log entry for this event SHOULD include the SPI
value, date/time received, Source Address, Destination Address, the
Sequence Number, and (in IPv6) the Flow ID. The receive window is
updated only if the ICV verification succeeds.

DISCUSSION:

Note that if the packet is either inside the window and new, or is
outside the window on the "right" side, the receiver MUST
authenticate the packet before updating the Sequence Number window
data.

3.4.4 Integrity Check Value Verification

If authentication has been selected, the receiver computes the ICV
over the ESP packet minus the Authentication Data using the specified
authentication algorithm and verifies that it is the same as the ICV
included in the Authentication Data field of the packet. Details of
the computation are provided below.

If the computed and received ICV's match, then the datagram is valid,
and it is accepted. If the test fails, then the receiver MUST
discard the received IP datagram as invalid; this is an auditable
event. The log data SHOULD include the SPI value, date/time
received, Source Address, Destination Address, the Sequence Number,
and (in IPv6) the cleartext Flow ID.

DISCUSSION:

Begin by removing and saving the ICV value (Authentication Data
field). Next check the overall length of the ESP packet minus the
Authentication Data. If implicit padding is required, based on

the blocksize of the authentication algorithm, append zero-filled
bytes to the end of the ESP packet directly after the Next Header
field. Perform the ICV computation and compare the result with
the saved value, using the comparison rules defined by the
algorithm specification. (For example, if a digital signature and
one-way hash are used for the ICV computation, the matching
process is more complex.)

3.4.5 Packet Decryption

As in section 3.3.2, "Packet Encryption", we speak here in terms of
encryption always being applied because of the formatting
implications. This is done with the understanding that "no
confidentiality" is offered by using the NULL encryption algorithm.
Accordingly, the receiver:

1. decrypts the ESP Payload Data, Padding, Pad Length, and Next
Header using the key, encryption algorithm, algorithm mode,
and cryptographic synchronization data (if any), indicated by
the SA.
- If explicit cryptographic synchronization data, e.g.,
an IV, is indicated, it is taken from the Payload
field and input to the decryption algorithm as per the
algorithm specification.
- If implicit cryptographic synchronization data, e.g.,
an IV, is indicated, a local version of the IV is
constructed and input to the decryption algorithm as
per the algorithm specification.
2. processes any padding as specified in the encryption
algorithm specification. If the default padding scheme (see
Section 2.4) has been employed, the receiver SHOULD inspect
the Padding field before removing the padding prior to
passing the decrypted data to the next layer.
3. reconstructs the original IP datagram from:
- for transport mode -- original IP header plus the
original upper layer protocol information in the ESP
Payload field
- for tunnel mode -- tunnel IP header + the entire IP
datagram in the ESP Payload field.

The exact steps for reconstructing the original datagram depend on
the mode (transport or tunnel) and are described in the Security
Architecture document. At a minimum, in an IPv6 context, the
receiver SHOULD ensure that the decrypted data is 8-byte aligned, to
facilitate processing by the protocol identified in the Next Header
field.

If authentication has been selected, verification and decryption MAY
be performed serially or in parallel. If performed serially, then
ICV verification SHOULD be performed first. If performed in
parallel, verification MUST be completed before the decrypted packet
is passed on for further processing. This order of processing
facilitates rapid detection and rejection of replayed or bogus
packets by the receiver, prior to decrypting the packet, hence
potentially reducing the impact of denial of service attacks. Note:

If the receiver performs decryption in parallel with authentication,
care must be taken to avoid possible race conditions with regard to
packet access and reconstruction of the decrypted packet.

Note that there are several ways in which the decryption can "fail":

a. The selected SA may not be correct -- The SA may be
mis-selected due to tampering with the SPI, destination
address, or IPsec protocol type fields. Such errors, if they
map the packet to another extant SA, will be
indistinguishable from a corrupted packet, (case c).
Tampering with the SPI can be detected by use of
authentication. However, an SA mismatch might still occur
due to tampering with the IP Destination Address or the IPsec
protocol type field.

b. The pad length or pad values could be erroneous -- Bad pad
lengths or pad values can be detected irrespective of the use
of authentication.

c. The encrypted ESP packet could be corrupted -- This can be
detected if authentication is selected for the SA.,

In case (a) or (c), the erroneous result of the decryption operation
(an invalid IP datagram or transport-layer frame) will not
necessarily be detected by IPsec, and is the responsibility of later
protocol processing.

4. Auditing

Not all systems that implement ESP will implement auditing. However,
if ESP is incorporated into a system that supports auditing, then the
ESP implementation MUST also support auditing and MUST allow a system
administrator to enable or disable auditing for ESP. For the most
part, the granularity of auditing is a local matter. However,
several auditable events are identified in this specification and for
each of these events a minimum set of information that SHOULD be
included in an audit log is defined. Additional information also MAY
be included in the audit log for each of these events, and additional

events, not explicitly called out in this specification, also MAY
result in audit log entries. There is no requirement for the
receiver to transmit any message to the purported sender in response
to the detection of an auditable event, because of the potential to
induce denial of service via such action.

5. Conformance Requirements

Implementations that claim conformance or compliance with this
specification MUST implement the ESP syntax and processing described
here and MUST comply with all requirements of the Security
Architecture document. If the key used to compute an ICV is manually
distributed, correct provision of the anti-replay service would
require correct maintenance of the counter state at the sender, until
the key is replaced, and there likely would be no automated recovery
provision if counter overflow were imminent. Thus a compliant
implementation SHOULD NOT provide this service in conjunction with
SAs that are manually keyed. A compliant ESP implementation MUST
support the following mandatory-to-implement algorithms:

- DES in CBC mode [MD97]
- HMAC with MD5 [MG97a]
- HMAC with SHA-1 [MG97b]
- NULL Authentication algorithm
- NULL Encryption algorithm

Since ESP encryption and authentication are optional, support for the
2 "NULL" algorithms is required to maintain consistency with the way
these services are negotiated. NOTE that while authentication and
encryption can each be "NULL", they MUST NOT both be "NULL".

6. Security Considerations

Security is central to the design of this protocol, and thus security
considerations permeate the specification. Additional security-
relevant aspects of using the IPsec protocol are discussed in the
Security Architec ture document.

7. Differences from RFC 1827

This document differs from RFC 1827 [ATK95] in several significant
ways. The major difference is that, this document attempts to
specify a complete framework and context for ESP, whereas RFC 1827
provided a "shell" that was completed through the definition of
transforms. The combinatorial growth of transforms motivated the
reformulation of the ESP specification as a more complete document,
with options for security services that may be offered in the context
of ESP. Thus, fields previously defined in transform documents are

now part of this base ESP specification. For example, the fields
necessary to support authentication (and anti-replay) are now defined
here, even though the provision of this service is an option. The
fields used to support padding for encryption, and for next protocol
identification, are now defined here as well. Packet processing
consistent with the definition of these fields also is included in
the document.

Acknowledgements

Many of the concepts embodied in this specification were derived from
or influenced by the US Government's SP3 security protocol, ISO/IEC's
NLSP, or from the proposed swIPe security protocol. [SDNS89, ISO92,
IB93].

For over 3 years, this document has evolved through multiple versions
and iterations. During this time, many people have contributed
significant ideas and energy to the process and the documents
themselves. The authors would like to thank Karen Seo for providing
extensive help in the review, editing, background research, and
coordination for this version of the specification. The authors
would also like to thank the members of the IPsec and IPng working
groups, with special mention of the efforts of (in alphabetic order):
Steve Bellovin, Steve Deering, Phil Karn, Perry Metzger, David
Mihelcic, Hilarie Orman, Norman Shulman, William Simpson and Nina
Yuan.

References

[ATK95] Atkinson, R., "IP Encapsulating Security Payload (ESP)",
RFC 1827, August 1995.

[Bel96] Steven M. Bellovin, "Problem Areas for the IP Security
Protocols", Proceedings of the Sixth Usenix Unix Security
Symposium, July, 1996.

[Bra97] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Level", BCP 14, RFC 2119, March 1997.

[HC98] Harkins, D., and D. Carrel, "The Internet Key Exchange
(IKE)", RFC 2409, November 1998.

[IB93] John Ioannidis & Matt Blaze, "Architecture and
Implementation of Network-layer Security Under Unix",
Proceedings of the USENIX Security Symposium, Santa Clara,
CA, October 1993.

[ISO92] ISO/IEC JTC1/SC6, Network Layer Security Protocol, ISO-IEC
DIS 11577, International Standards Organisation, Geneva,
Switzerland, 29 November 1992.

[KA97a] Kent, S., and R. Atkinson, "Security Architecture for the
Internet Protocol", RFC 2401, November 1998.

[KA97b] Kent, S., and R. Atkinson, "IP Authentication Header", RFC
2402, November 1998.

[MD97] Madson, C., and N. Doraswamy, "The ESP DES-CBC Cipher
Algorithm With Explicit IV", RFC 2405, November 1998.

[MG97a] Madson, C., and R. Glenn, "The Use of HMAC-MD5-96 within
ESP and AH", RFC 2403, November 1998.

[MG97b] Madson, C., and R. Glenn, "The Use of HMAC-SHA-1-96 within
ESP and AH", RFC 2404, November 1998.

[STD-2] Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC
1700, October 1994. See also:
http://www.iana.org/numbers.html

[SDNS89] SDNS Secure Data Network System, Security Protocol 3, SP3,
Document SDN.301, Revision 1.5, 15 May 1989, as published
in NIST Publication NIST-IR-90-4250, February 1990.

Disclaimer

The views and specification here are those of the authors and are not
necessarily those of their employers. The authors and their
employers specifically disclaim responsibility for any problems
arising from correct or incorrect implementation or use of this
specification.

Author Information

Stephen Kent
BBN Corporation
70 Fawcett Street
Cambridge, MA 02140
USA

Phone: +1 (617) 873-3988
EMail: kent@bbn.com

Randall Atkinson
@Home Network
425 Broadway,
Redwood City, CA 94063
USA

Phone: +1 (415) 569-5000
EMail: rja@corp.home.net

Full Copyright Statement

Copyright (C) The Internet Society (1998). All Rights Reserved.

This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.

The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.

This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

The post RFC 2406 – IP Encapsulating Security Payload (ESP) appeared first on IPv6.net.

RFC 2402 – IP Authentication Header

IPv6 & IoT editor — Sat, 01 Aug 2009 17:46:33 +0000

Network Working Group S. Kent
Request for Comments: 2402 BBN Corp
Obsoletes: 1826 R. Atkinson
Category: Standards Track @Home Network
November 1998

IP Authentication Header

Status of this Memo

This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.

Copyright Notice

Copyright (C) The Internet Society (1998). All Rights Reserved.

Table of Contents

1. Introduction......................................................2
2. Authentication Header Format......................................3
2.1 Next Header...................................................4
2.2 Payload Length................................................4
2.3 Reserved......................................................4
2.4 Security Parameters Index (SPI)...............................4
2.5 Sequence Number...............................................5
2.6 Authentication Data ..........................................5
3. Authentication Header Processing..................................5
3.1 Authentication Header Location...............................5
3.2 Authentication Algorithms....................................7
3.3 Outbound Packet Processing...................................8
3.3.1 Security Association Lookup.............................8
3.3.2 Sequence Number Generation..............................8
3.3.3 Integrity Check Value Calculation.......................9
3.3.3.1 Handling Mutable Fields............................9
3.3.3.1.1 ICV Computation for IPv4.....................10
3.3.3.1.1.1 Base Header Fields.......................10
3.3.3.1.1.2 Options..................................11
3.3.3.1.2 ICV Computation for IPv6.....................11
3.3.3.1.2.1 Base Header Fields.......................11
3.3.3.1.2.2 Extension Headers Containing Options.....11
3.3.3.1.2.3 Extension Headers Not Containing Options.11
3.3.3.2 Padding...........................................12
3.3.3.2.1 Authentication Data Padding..................12

3.3.3.2.2 Implicit Packet Padding......................12
3.3.4 Fragmentation..........................................12
3.4 Inbound Packet Processing...................................13
3.4.1 Reassembly.............................................13
3.4.2 Security Association Lookup............................13
3.4.3 Sequence Number Verification...........................13
3.4.4 Integrity Check Value Verification.....................15
4. Auditing.........................................................15
5. Conformance Requirements.........................................16
6. Security Considerations..........................................16
7. Differences from RFC 1826........................................16
Acknowledgements....................................................17
Appendix A -- Mutability of IP Options/Extension Headers............18
A1. IPv4 Options.................................................18
A2. IPv6 Extension Headers.......................................19
References..........................................................20
Disclaimer..........................................................21
Author Information..................................................22
Full Copyright Statement............................................22

1. Introduction

The IP Authentication Header (AH) is used to provide connectionless
integrity and data origin authentication for IP datagrams (hereafter
referred to as just "authentication"), and to provide protection
against replays. This latter, optional service may be selected, by
the receiver, when a Security Association is established. (Although
the default calls for the sender to increment the Sequence Number
used for anti-replay, the service is effective only if the receiver
checks the Sequence Number.) AH provides authentication for as much
of the IP header as possible, as well as for upper level protocol
data. However, some IP header fields may change in transit and the
value of these fields, when the packet arrives at the receiver, may
not be predictable by the sender. The values of such fields cannot
be protected by AH. Thus the protection provided to the IP header by
AH is somewhat piecemeal.

AH may be applied alone, in combination with the IP Encapsulating
Security Payload (ESP) [KA97b], or in a nested fashion through the
use of tunnel mode (see "Security Architecture for the Internet
Protocol" [KA97a], hereafter referred to as the Security Architecture
document). Security services can be provided between a pair of
communicating hosts, between a pair of communicating security
gateways, or between a security gateway and a host. ESP may be used
to provide the same security services, and it also provides a
confidentiality (encryption) service. The primary difference between
the authentication provided by ESP and AH is the extent of the
coverage. Specifically, ESP does not protect any IP header fields

unless those fields are encapsulated by ESP (tunnel mode). For more
details on how to use AH and ESP in various network environments, see
the Security Architecture document [KA97a].

It is assumed that the reader is familiar with the terms and concepts
described in the Security Architecture document. In particular, the
reader should be familiar with the definitions of security services
offered by AH and ESP, the concept of Security Associations, the ways
in which AH can be used in conjunction with ESP, and the different
key management options available for AH and ESP. (With regard to the
last topic, the current key management options required for both AH
and ESP are manual keying and automated keying via IKE [HC98].)

The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
document, are to be interpreted as described in RFC 2119 [Bra97].

2. Authentication Header Format

The protocol header (IPv4, IPv6, or Extension) immediately preceding
the AH header will contain the value 51 in its Protocol (IPv4) or
Next Header (IPv6, Extension) field [STD-2].

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next Header | Payload Len | RESERVED |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Security Parameters Index (SPI) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number Field |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ Authentication Data (variable) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The following subsections define the fields that comprise the AH
format. All the fields described here are mandatory, i.e., they are
always present in the AH format and are included in the Integrity
Check Value (ICV) computation (see Sections 2.6 and 3.3.3).

2.1 Next Header

The Next Header is an 8-bit field that identifies the type of the
next payload after the Authentication Header. The value of this
field is chosen from the set of IP Protocol Numbers defined in the
most recent "Assigned Numbers" [STD-2] RFC from the Internet Assigned
Numbers Authority (IANA).

2.2 Payload Length

This 8-bit field specifies the length of AH in 32-bit words (4-byte
units), minus "2". (All IPv6 extension headers, as per RFC 1883,
encode the "Hdr Ext Len" field by first subtracting 1 (64-bit word)
from the header length (measured in 64-bit words). AH is an IPv6
extension header. However, since its length is measured in 32-bit
words, the "Payload Length" is calculated by subtracting 2 (32 bit
words).) In the "standard" case of a 96-bit authentication value
plus the 3 32-bit word fixed portion, this length field will be "4".
A "null" authentication algorithm may be used only for debugging
purposes. Its use would result in a "1" value for this field for
IPv4 or a "2" for IPv6, as there would be no corresponding
Authentication Data field (see Section 3.3.3.2.1 on "Authentication
Data Padding").

2.3 Reserved

This 16-bit field is reserved for future use. It MUST be set to
"zero." (Note that the value is included in the Authentication Data
calculation, but is otherwise ignored by the recipient.)

2.4 Security Parameters Index (SPI)

The SPI is an arbitrary 32-bit value that, in combination with the
destination IP address and security protocol (AH), uniquely
identifies the Security Association for this datagram. The set of
SPI values in the range 1 through 255 are reserved by the Internet
Assigned Numbers Authority (IANA) for future use; a reserved SPI
value will not normally be assigned by IANA unless the use of the
assigned SPI value is specified in an RFC. It is ordinarily selected
by the destination system upon establishment of an SA (see the
Security Architecture document for more details).

The SPI value of zero (0) is reserved for local, implementation-
specific use and MUST NOT be sent on the wire. For example, a key
management implementation MAY use the zero SPI value to mean "No
Security Association Exists" during the period when the IPsec
implementation has requested that its key management entity establish
a new SA, but the SA has not yet been established.

2.5 Sequence Number

This unsigned 32-bit field contains a monotonically increasing
counter value (sequence number). It is mandatory and is always
present even if the receiver does not elect to enable the anti-replay
service for a specific SA. Processing of the Sequence Number field
is at the discretion of the receiver, i.e., the sender MUST always
transmit this field, but the receiver need not act upon it (see the
discussion of Sequence Number Verification in the "Inbound Packet
Processing" section below).

The sender's counter and the receiver's counter are initialized to 0
when an SA is established. (The first packet sent using a given SA
will have a Sequence Number of 1; see Section 3.3.2 for more details
on how the Sequence Number is generated.) If anti-replay is enabled
(the default), the transmitted Sequence Number must never be allowed
to cycle. Thus, the sender's counter and the receiver's counter MUST
be reset (by establishing a new SA and thus a new key) prior to the
transmission of the 2^32nd packet on an SA.

2.6 Authentication Data

This is a variable-length field that contains the Integrity Check
Value (ICV) for this packet. The field must be an integral multiple
of 32 bits in length. The details of the ICV computation are
described in Section 3.3.2 below. This field may include explicit
padding. This padding is included to ensure that the length of the
AH header is an integral multiple of 32 bits (IPv4) or 64 bits
(IPv6). All implementations MUST support such padding. Details of
how to compute the required padding length are provided below. The
authentication algorithm specification MUST specify the length of the
ICV and the comparison rules and processing steps for validation.

3. Authentication Header Processing

3.1 Authentication Header Location

Like ESP, AH may be employed in two ways: transport mode or tunnel
mode. The former mode is applicable only to host implementations and
provides protection for upper layer protocols, in addition to
selected IP header fields. (In this mode, note that for "bump-in-
the-stack" or "bump-in-the-wire" implementations, as defined in the
Security Architecture document, inbound and outbound IP fragments may
require an IPsec implementation to perform extra IP
reassembly/fragmentation in order to both conform to this
specification and provide transparent IPsec support. Special care is
required to perform such operations within these implementations when
multiple interfaces are in use.)

In transport mode, AH is inserted after the IP header and before an
upper layer protocol, e.g., TCP, UDP, ICMP, etc. or before any other
IPsec headers that have already been inserted. In the context of
IPv4, this calls for placing AH after the IP header (and any options
that it contains), but before the upper layer protocol. (Note that
the term "transport" mode should not be misconstrued as restricting
its use to TCP and UDP. For example, an ICMP message MAY be sent
using either "transport" mode or "tunnel" mode.) The following
diagram illustrates AH transport mode positioning for a typical IPv4
packet, on a "before and after" basis.

BEFORE APPLYING AH
----------------------------
IPv4 |orig IP hdr | | |
|(any options)| TCP | Data |
----------------------------

AFTER APPLYING AH
---------------------------------
IPv4 |orig IP hdr | | | |
|(any options)| AH | TCP | Data |
---------------------------------
|<------- authenticated ------->|
except for mutable fields

In the IPv6 context, AH is viewed as an end-to-end payload, and thus
should appear after hop-by-hop, routing, and fragmentation extension
headers. The destination options extension header(s) could appear
either before or after the AH header depending on the semantics
desired. The following diagram illustrates AH transport mode
positioning for a typical IPv6 packet.

BEFORE APPLYING AH
---------------------------------------
IP v6 | | ext hdrs | | |
| orig IP hdr |if present| TCP | Data |
---------------------------------------

AFTER APPLYING AH
------------------------------------------------------------
IPv6 | |hop-by-hop, dest*, | | dest | | |
|orig IP hdr |routing, fragment. | AH | opt* | TCP | Data |
------------------------------------------------------------
|<---- authenticated except for mutable fields ----------->|

* = if present, could be before AH, after AH, or both

ESP and AH headers can be combined in a variety of modes. The IPsec
Architecture document describes the combinations of security
associations that must be supported.

Tunnel mode AH may be employed in either hosts or security gateways
(or in so-called "bump-in-the-stack" or "bump-in-the-wire"
implementations, as defined in the Security Architecture document).
When AH is implemented in a security gateway (to protect transit
traffic), tunnel mode must be used. In tunnel mode, the "inner" IP
header carries the ultimate source and destination addresses, while
an "outer" IP header may contain distinct IP addresses, e.g.,
addresses of security gateways. In tunnel mode, AH protects the
entire inner IP packet, including the entire inner IP header. The
position of AH in tunnel mode, relative to the outer IP header, is
the same as for AH in transport mode. The following diagram
illustrates AH tunnel mode positioning for typical IPv4 and IPv6
packets.

------------------------------------------------
IPv4 | new IP hdr* | | orig IP hdr* | | |
|(any options)| AH | (any options) |TCP | Data |
------------------------------------------------
|<- authenticated except for mutable fields -->|
| in the new IP hdr |

--------------------------------------------------------------
IPv6 | | ext hdrs*| | | ext hdrs*| | |
|new IP hdr*|if present| AH |orig IP hdr*|if present|TCP|Data|
--------------------------------------------------------------
|<-- authenticated except for mutable fields in new IP hdr ->|

* = construction of outer IP hdr/extensions and modification
of inner IP hdr/extensions is discussed below.

3.2 Authentication Algorithms

The authentication algorithm employed for the ICV computation is
specified by the SA. For point-to-point communication, suitable
authentication algorithms include keyed Message Authentication Codes
(MACs) based on symmetric encryption algorithms (e.g., DES) or on
one-way hash functions (e.g., MD5 or SHA-1). For multicast
communication, one-way hash algorithms combined with asymmetric
signature algorithms are appropriate, though performance and space
considerations currently preclude use of such algorithms. The
mandatory-to-implement authentication algorithms are described in
Section 5 "Conformance Requirements". Other algorithms MAY be
supported.

3.3 Outbound Packet Processing

In transport mode, the sender inserts the AH header after the IP
header and before an upper layer protocol header, as described above.
In tunnel mode, the outer and inner IP header/extensions can be
inter-related in a variety of ways. The construction of the outer IP
header/extensions during the encapsulation process is described in
the Security Architecture document.

If there is more than one IPsec header/extension required, the order
of the application of the security headers MUST be defined by
security policy. For simplicity of processing, each IPsec header
SHOULD ignore the existence (i.e., not zero the contents or try to
predict the contents) of IPsec headers to be applied later. (While a
native IP or bump-in-the-stack implementation could predict the
contents of later IPsec headers that it applies itself, it won't be
possible for it to predict any IPsec headers added by a bump-in-the-
wire implementation between the host and the network.)

3.3.1 Security Association Lookup

AH is applied to an outbound packet only after an IPsec
implementation determines that the packet is associated with an SA
that calls for AH processing. The process of determining what, if
any, IPsec processing is applied to outbound traffic is described in
the Security Architecture document.

3.3.2 Sequence Number Generation

The sender's counter is initialized to 0 when an SA is established.
The sender increments the Sequence Number for this SA and inserts the
new value into the Sequence Number Field. Thus the first packet sent
using a given SA will have a Sequence Number of 1.

If anti-replay is enabled (the default), the sender checks to ensure
that the counter has not cycled before inserting the new value in the
Sequence Number field. In other words, the sender MUST NOT send a
packet on an SA if doing so would cause the Sequence Number to cycle.
An attempt to transmit a packet that would result in Sequence Number
overflow is an auditable event. (Note that this approach to Sequence
Number management does not require use of modular arithmetic.)

The sender assumes anti-replay is enabled as a default, unless
otherwise notified by the receiver (see 3.4.3). Thus, if the counter
has cycled, the sender will set up a new SA and key (unless the SA
was configured with manual key management).

If anti-replay is disabled, the sender does not need to monitor or
reset the counter, e.g., in the case of manual key management (see
Section 5.) However, the sender still increments the counter and when
it reaches the maximum value, the counter rolls over back to zero.

3.3.3 Integrity Check Value Calculation

The AH ICV is computed over:
o IP header fields that are either immutable in transit or
that are predictable in value upon arrival at the endpoint
for the AH SA
o the AH header (Next Header, Payload Len, Reserved, SPI,
Sequence Number, and the Authentication Data (which is set
to zero for this computation), and explicit padding bytes
(if any))
o the upper level protocol data, which is assumed to be
immutable in transit

3.3.3.1 Handling Mutable Fields

If a field may be modified during transit, the value of the field is
set to zero for purposes of the ICV computation. If a field is
mutable, but its value at the (IPsec) receiver is predictable, then
that value is inserted into the field for purposes of the ICV
calculation. The Authentication Data field is also set to zero in
preparation for this computation. Note that by replacing each
field's value with zero, rather than omitting the field, alignment is
preserved for the ICV calculation. Also, the zero-fill approach
ensures that the length of the fields that are so handled cannot be
changed during transit, even though their contents are not explicitly
covered by the ICV.

As a new extension header or IPv4 option is created, it will be
defined in its own RFC and SHOULD include (in the Security
Considerations section) directions for how i t should be handled when
calculating the AH ICV. If the IP (v4 or v6) implementation
encounters an extension header that it does not recognize, it will
discard the packet and send an ICMP message. IPsec will never see
the packet. If the IPsec implementation encounters an IPv4 option
that it does not recognize, it should zero the whole option, using
the second byte of the option as the length. IPv6 options (in
Destination extension headers or Hop by Hop extension header) contain
a flag indicating mutability, which determines appropriate processing
for such options.

3.3.3.1.1 ICV Computation for IPv4

3.3.3.1.1.1 Base Header Fields

The IPv4 base header fields are classified as follows:

Immutable
Version
Internet Header Length
Total Length
Identification
Protocol (This should be the value for AH.)
Source Address
Destination Address (without loose or strict source routing)

Mutable but predictable
Destination Address (with loose or strict source routing)

Mutable (zeroed prior to ICV calculation)
Type of Service (TOS)
Flags
Fragment Offset
Time to Live (TTL)
Header Checksum

TOS -- This field is excluded because some routers are known to
change the value of this field, even though the IP
specification does not consider TOS to be a mutable header
field.

Flags -- This field is excluded since an intermediate router might
set the DF bit, even if the source did not select it.

Fragment Offset -- Since AH is applied only to non-fragmented IP
packets, the Offset Field must always be zero, and thus it
is excluded (even though it is predictable).

TTL -- This is changed en-route as a normal course of processing
by routers, and thus its value at the receiver is not
predictable by the sender.

Header Checksum -- This will change if any of these other fields
changes, and thus its value upon reception cannot be
predicted by the sender.

3.3.3.1.1.2 Options

For IPv4 (unlike IPv6), there is no mechanism for tagging options as
mutable in transit. Hence the IPv4 options are explicitly listed in
Appendix A and classified as immutable, mutable but predictable, or
mutable. For IPv4, the entire option is viewed as a unit; so even
though the type and length fields within most options are immutable
in transit, if an option is classified as mutable, the entire option
is zeroed for ICV computation purposes.

3.3.3.1.2 ICV Computation for IPv6

3.3.3.1.2.1 Base Header Fields

The IPv6 base header fields are classified as follows:

Immutable
Version
Payload Length
Next Header (This should be the value for AH.)
Source Address
Destination Address (without Routing Extension Header)

Mutable but predictable
Destination Address (with Routing Extension Header)

Mutable (zeroed prior to ICV calculation)
Class
Flow Label
Hop Limit

3.3.3.1.2.2 Extension Headers Containing Options

IPv6 options in the Hop-by-Hop and Destination Extension Headers
contain a bit that indicates whether the option might change
(unpredictably) during transit. For any option for which contents
may change en-route, the entire "Option Data" field must be treated
as zero-valued octets when computing or verifying the ICV. The
Option Type and Opt Data Len are included in the ICV calculation.
All options for which the bit indicates immutability are included in
the ICV calculation. See the IPv6 specification [DH95] for more
information.

3.3.3.1.2.3 Extension Headers Not Containing Options

The IPv6 extension headers that do not contain options are explicitly
listed in Appendix A and classified as immutable, mutable but
predictable, or mutable.

3.3.3.2 Padding

3.3.3.2.1 Authentication Data Padding

As mentioned in section 2.6, the Authentication Data field explicitly
includes padding to ensure that the AH header is a multiple of 32
bits (IPv4) or 64 bits (IPv6). If padding is required, its length is
determined by two factors:

- the length of the ICV
- the IP protocol version (v4 or v6)

For example, if the output of the selected algorithm is 96-bits, no
padding is required for either IPv4 or for IPv6. However, if a
different length ICV is generated, due to use of a different
algorithm, then padding may be required depending on the length and
IP protocol version. The content of the padding field is arbitrarily
selected by the sender. (The padding is arbitrary, but need not be
random to achieve security.) These padding bytes are included in the
Authentication Data calculation, counted as part of the Payload
Length, and transmitted at the end of the Authentication Data field
to enable the receiver to perform the ICV calculation.

3.3.3.2.2 Implicit Packet Padding

For some authentication algorithms, the byte string over which the
ICV computation is performed must be a multiple of a blocksize
specified by the algorithm. If the IP packet length (including AH)
does not match the blocksize requirements for the algorithm, implicit
padding MUST be appended to the end of the packet, prior to ICV
computation. The padding octets MUST have a value of zero. The
blocksize (and hence the length of the padding) is specified by the
algorithm specification. This padding is not transmitted with the
packet. Note that MD5 and SHA-1 are viewed as having a 1-byte
blocksize because of their internal padding conventions.

3.3.4 Fragmentation

If required, IP fragmentation occurs after AH processing within an
IPsec implementation. Thus, transport mode AH is applied only to
whole IP datagrams (not to IP fragments). An IP packet to which AH
has been applied may itself be fragmented by routers en route, and
such fragments must be reassembled prior to AH processing at a
receiver. In tunnel mode, AH is applied to an IP packet, the payload
of which may be a fragmented IP packet. For example, a security
gateway or a "bump-in-the-stack" or "bump-in-the-wire" IPsec
implementation (see the Security Architecture document for details)
may apply tunnel mode AH to such fragments.

3.4 Inbound Packet Processing

If there is more than one IPsec header/extension present, the
processing for each one ignores (does not zero, does not use) any
IPsec headers applied subsequent to the header being processed.

3.4.1 Reassembly

If required, reassembly is performed prior to AH processing. If a
packet offered to AH for processing appears to be an IP fragment,
i.e., the OFFSET field is non-zero or the MORE FRAGMENTS flag is set,
the receiver MUST discard the packet; this is an auditable event. The
audit log entry for this event SHOULD include the SPI value,
date/time, Source Address, Destination Address, and (in IPv6) the
Flow ID.

NOTE: For packet reassembly, the current IPv4 sp ec does NOT require
either the zero'ing of the OFFSET field or the clearing of the MORE
FRAGMENTS flag. In order for a reassembled packet to be processed by
IPsec (as opposed to discarded as an apparent fragment), the IP code
must do these two things after it reassembles a packet.

3.4.2 Security Association Lookup

Upon receipt of a packet containing an IP Authentication Header, the
receiver determines the appropriate (unidirectional) SA, based on the
destination IP address, security protocol (AH), and the SPI. (This
process is described in more detail in the Security Architecture
document.) The SA indicates whether the Sequence Number field will
be checked, specifies the algorithm(s) employed for ICV computation,
and indicates the key(s) required to validate the ICV.

If no valid Security Association exists for this session (e.g., the
receiver has no key), the receiver MUST discard the packet; this is
an auditable event. The audit log entry for this event SHOULD
include the SPI value, date/time, Source Address, Destination
Address, and (in IPv6) the Flow ID.

3.4.3 Sequence Number Verification

All AH implementations MUST support the anti-replay service, though
its use may be enabled or disabled by the receiver on a per-SA basis.
(Note that there are no provisions for managing transmitted Sequence
Number values among multiple senders directing traffic to a single SA
(irrespective of whether the destination address is unicast,
broadcast, or multicast). Thus the anti-replay service SHOULD NOT be
used in a multi-sender environment that employs a single SA.)

If the receiver does not enable anti-replay for an SA, no inbound
checks are performed on the Sequence Number. However, from the
perspective of the sender, the default is to assume that anti-replay
is enabled at the receiver. To avoid having the sender do
unnecessary sequence number monitoring and SA setup (see section
3.3.2), if an SA establishment protocol such as IKE is employed, the
receiver SHOULD notify the sender, during SA establishment, if the
receiver will not provide anti-replay protection.

If the receiver has enabled the anti-replay service for this SA, the
receiver packet counter for the SA MUST be initialized to zero when
the SA is established. For each received packet, the receiver MUST
verify that the packet contains a Sequence Number that does not
duplicate the Sequence Number of any other packets received during
the life of this SA. This SHOULD be the first AH check applied to a
packet after it has been matched to an SA, to speed rejection of
duplicate packets.

Duplicates are rejected through the use of a sliding receive window.
(How the window is implemented is a local matter, but the following
text describes the functionality that the implementation must
exhibit.) A MINIMUM window size of 32 MUST be supported; but a
window size of 64 is preferred and SHOULD be employed as the default.
Another window size (larger than the MINIMUM) MAY be chosen by the
receiver. (The receiver does NOT notify the sender of the window
size.)

The "right" edge of the window represents the highest, validated
Sequence Number value received on this SA. Packets that contain
Sequence Numbers lower than the "left" edge of the window are
rejected. Packets falling within the window are checked against a
list of received packets within the window. An efficient means for
performing this check, based on the use of a bit mask, is described
in the Security Architecture document.

If the received packet falls within the window and is new, or if the
packet is to the right of the window, then the receiver proceeds to
ICV verification. If the ICV validation fails, the receiver MUST
discard the received IP datagram as invalid; this is an auditable
event. The audit log entry for this event SHOULD include the SPI
value, date/time, Source Address, Destination Address, the Sequence
Number, and (in IPv6) the Flow ID. The receive window is updated
only if the ICV verification succeeds.

DISCUSSION:

Note that if the packet is either inside the window and new, or is
outside the window on the "right" side, the receiver MUST
authenticate the packet before updating the Sequence Number window
data.

3.4.4 Integrity Check Value Verification

The receiver computes the ICV over the appropriate fields of the
packet, using the specified authentication algorithm, and verifies
that it is the same as the ICV included in the Authentication Data
field of the packet. Details of the computation are provided below.

If the computed and received ICV's match, then the datagram is valid,
and it is accepted. If the test fails, then the receiver MUST
discard the received IP datagram as invalid; this is an auditable
event. The audit log entry SHOULD include the SPI value, date/time
received, Source Address, Destination Address, and (in IPv6) the Flow
ID.

DISCUSSION:

Begin by saving the ICV value and replacing it (but not any
Authentication Data padding) with zero. Zero all other fields
that may have been modified during transit. (See section 3.3.3.1
for a discussion of which fields are zeroed before performing the
ICV calculation.) Check the overall length of the packet, and if
it requires implicit padding based on the requirements of the
authentication algorithm, append zero-filled bytes to the end of
the packet as required. Perform the ICV computation and compare
the result with the saved value, using the comparison rules
defined by the algorithm specification. (For example, if a
digital signature and one-way hash are used for the ICV
computation, the matching process is more complex.)

4. Auditing

Not all systems that implement AH will implement auditing. However,
if AH is incorporated into a system that supports auditing, then the
AH implementation MUST also support auditing and MUST allow a system
administrator to enable or disable auditing for AH. For the most
part, the granularity of auditing is a local matter. However,
several auditable events are identified in this specification and for
each of these events a minimum set of information that SHOULD be
included in an audit log is defined. Additional information also MAY
be included in the audit log for each of these events, and additional
events, not explicitly called out in this specification, also MAY

result in audit log entries. There is no requirement for the
receiver to transmit any message to the purported sender in response
to the detection of an auditable event, because of the potential to
induce denial of service via such action.

5. Conformance Requirements

Implementations that claim conformance or compliance with this
specification MUST fully implement the AH syntax and processing
described here and MUST comply with all requirements of the Security
Architecture document. If the key used to compute an ICV is manually
distributed, correct provision of the anti-replay service would
require correct maintenance of the counter state at the sender, until
the key is replaced, and there likely would be no automated recovery
provision if counter overflow were imminent. Thu s a compliant
implementation SHOULD NOT provide this service in conjunction with
SAs that are manually keyed. A compliant AH implementation MUST
support the following mandatory-to-implement algorithms:

- HMAC with MD5 [MG97a]
- HMAC with SHA-1 [MG97b]

6. Security Considerations

Security is central to the design of this protocol, and these
security considerations permeate the specification. Additional
security-relevant aspects of using the IPsec protocol are discussed
in the Security Architecture document.

7. Differences from RFC 1826

This specification of AH differs from RFC 1826 [ATK95] in several
important respects, but the fundamental features of AH remain intact.
One goal of the revision of RFC 1826 was to provide a complete
framework for AH, with ancillary RFCs required only for algorithm
specification. For example, the anti-replay service is now an
integral, mandatory part of AH, not a feature of a transform defined
in another RFC. Carriage of a sequence number to support this
service is now required at all times. The default algorithms
required for interoperability have been changed to HMAC with MD5 or
SHA-1 (vs. keyed MD5), for security reasons. The list of IPv4 header
fields excluded from the ICV computation has been expanded to include
the OFFSET and FLAGS fields.

Another motivation for revision was to provide additional detail and
clarification of subtle points. This specification provides
rationale for exclusion of selected IPv4 header fields from AH
coverage and provides examples on positioning of AH in both the IPv4

and v6 contexts. Auditing requirements have been clarified in this
version of the specification. Tunnel mode AH was mentioned only in
passing in RFC 1826, but now is a mandatory feature of AH.
Discussion of interactions with key management and with security
labels have been moved to the Security Architecture document.

Acknowledgements

For over 3 years, this document has evolved through multiple versions
and iterations. During this time, many people have contributed
significant ideas and energy to the process and the documents
themselves. The authors would like to thank Karen Seo for providing
extensive help in the review, editing, background research, and
coordination for this version of the specification. The authors
would also like to thank the members of the IPsec and IPng working
groups, with special mention of the efforts of (in alphabetic order):
Steve Bellovin, Steve Deering, Francis Dupont, Phil Karn, Frank
Kastenholz, Perry Metzger, David Mihelcic, Hilarie Orman, Norman
Shulman, William Simpson, and Nina Yuan.

Appendix A -- Mutability of IP Options/Extension Headers

A1. IPv4 Options

This table shows how the IPv4 options are classified with regard to
"mutability". Where two references are provided, the second one
supercedes the first. This table is based in part on information
provided in RFC1700, "ASSIGNED NUMBERS", (October 1994).

Opt.
Copy Class # Name Reference
---- ----- --- ------------------------ ---------
IMMUTABLE -- included in ICV calculation
0 0 0 End of Options List [RFC791]
0 0 1 No Operation [RFC791]
1 0 2 Security [RFC1108(historic but in use)]
1 0 5 Extended Security [RFC1108(historic but in use)]
1 0 6 Commercial Security [expired I-D, now US MIL STD]
1 0 20 Router Alert [RFC2113]
1 0 21 Sender Directed Multi- [RFC1770]
Destination Delivery
MUTABLE -- zeroed
1 0 3 Loose Source Route [RFC791]
0 2 4 Time Stamp [RFC791]
0 0 7 Record Route [RFC791]
1 0 9 Strict Source Route [RFC791]
0 2 18 Traceroute [RFC1393]

EXPERIMENTAL, SUPERCEDED -- zeroed
1 0 8 Stream ID [RFC791, RFC1122 (Host Req)]
0 0 11 MTU Probe [RFC1063, RFC1191 (PMTU)]
0 0 12 MTU Reply [RFC1063, RFC1191 (PMTU)]
1 0 17 Extended Internet Proto [RFC1385, RFC1883 (IPv6)]
0 0 10 Experimental Measurement [ZSu]
1 2 13 Experimental Flow Control [Finn]
1 0 14 Experimental Access Ctl [Estrin]
0 0 15 ??? [VerSteeg]
1 0 16 IMI Traffic Descriptor [Lee]
1 0 19 Address Extension [Ullmann IPv7]

NOTE: Use of the Router Alert option is potentially incompatible with
use of IPsec. Although the option is immutable, its use implies that
each router along a packet's path will "process" the packet and
consequently might change the packet. This would happen on a hop by
hop basis as the packet goes from router to router. Prior to being
processed by the application to which the option contents are
directed, e.g., RSVP/IGMP, the packet should encounter AH processing.

However, AH processing would require that each router along the path
is a member of a multicast-SA defined by the SPI. This might pose
problems for packets that are not strictly source routed, and it
requires multicast support techniques not currently available.

NOTE: Addition or removal of any security labels (BSO, ESO, CIPSO) by
systems along a packet's path conflicts with the classification of
these IP Options as immutable and is incompatible with the use of
IPsec.

NOTE: End of Options List options SHOULD be repeated as necessary to
ensure that the IP header ends on a 4 byte boundary in order to
ensure that there are no unspecified bytes which could be used for a
covert channel.

A2. IPv6 Extension Headers

This table shows how the IPv6 Extension Headers are classified with
regard to "mutability".

Option/Extension Name Reference
----------------------------------- ---------
MUTABLE BUT PREDICTABLE -- included in ICV calculation
Routing (Type 0) [RFC1883]

BIT INDICATES IF OPTION IS MUTABLE (CHANGES UNPREDICTABLY DURING TRANSIT)
Hop by Hop options [RFC1883]
Destination options [RFC1883]

NOT APPLICABLE
Fragmentation [RFC1883]

Options -- IPv6 options in the Hop-by-Hop and Destination
Extension Headers contain a bit that indicates whether the
option might change (unpredictably) during transit. For
any option for which contents may change en-route, the
entire "Option Data" field must be treated as zero-valued
octets when computing or verifying the ICV. The Option
Type and Opt Data Len are included in the ICV calculation.
All options for which the bit indicates immutability are
included in the ICV calculation. See the IPv6
specification [DH95] for more information.

Routing (Type 0) -- The IPv6 Routing Header "Type 0" will
rearrange the address fields within the packet during
transit from source to destination. However, the contents
of the packet as it will appear at the receiver are known
to the sender and to all intermediate hops. Hence, the

IPv6 Routing Header "Type 0" is included in the
Authentication Data calculation as mutable but predictable.
The sender must order the field so that it appears as it
will at the receiver, prior to performing the ICV
computation.

Fragmentation -- Fragmentation occurs after outbound IPsec
processing (section 3.3) and reassembly occurs before
inbound IPsec processing (section 3.4). So the
Fragmentation Extension Header, if it exists, is not seen
by IPsec.

Note that on the receive side, the IP implementation could
leave a Fragmentation Extension Header in place when it
does re-assembly. If this happens, then when AH receives
the packet, before doing ICV processing, AH MUST "remove"
(or skip over) this header and change the previous header's
"Next Header" field to be the "Next Header" field in the
Fragmentation Extension Header.

Note that on the send side, the IP implementation could
give the IPsec code a packet with a Fragmentation Extension
Header with Offset of 0 (first fragment) and a More
Fragments Flag of 0 (last fragment). If this happens, then
before doing ICV processing, AH MUST first "remove" (or
skip over) this header and change the previous header's
"Next Header" field to be the "Next Header" field in the
Fragmentation Extension Header.

References

[ATK95] Atkinson, R., "The IP Authentication Header", RFC 1826,
August 1995.

[Bra97] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Level", BCP 14, RFC 2119, March 1997.

[DH95] Deering, S., and B. Hinden, "Internet Protocol version 6
(IPv6) Specification", RFC 1883, December 1995.

[HC98] Harkins, D., and D. Carrel, "The Internet Key Exchange
(IKE)", RFC 2409, November 1998.

[KA97a] Kent, S., and R. Atkinson, "Security Architecture for the
Internet Protocol", RFC 2401, November 1998.

[KA97b] Kent, S., and R. Atkinson, "IP Encapsulating Security
Payload (ESP)", RFC 2406, November 1998.

[MG97a] Madson, C., and R. Glenn, "The Use of HMAC-MD5-96 within
ESP and AH", RFC 2403, November 1998.

[MG97b] Madson, C., and R. Glenn, "The Use of HMAC-SHA-1-96 within
ESP and AH", RFC 2404, November 1998.

[STD-2] Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC
1700, October 1994. See also:
http://www.iana.org/numbers.html

Disclaimer

The views and specification here are those of the authors and are not
necessarily those of their employers. The authors and their
employers specifically disclaim responsibility for any problems
arising from correct or incorrect implementation or use of this
specification.

Author Information

Stephen Kent
BBN Corporation
70 Fawcett Street
Cambridge, MA 02140
USA

Phone: +1 (617) 873-3988
EMail: kent@bbn.com

Randall Atkinson
@Home Network
425 Broadway,
Redwood City, CA 94063
USA

Phone: +1 (415) 569-5000
EMail: rja@corp.home.net

Copyright (C) The Internet Society (1998). All Rights Reserved.

This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.

The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.

This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

The post RFC 2402 – IP Authentication Header appeared first on IPv6.net.

RFC 791 – Internet Protocol Version 4 Specification

IPv6 & IoT editor — Sat, 01 Aug 2009 17:42:51 +0000

INTERNET PROTOCOL

DARPA INTERNET PROGRAM

PROTOCOL SPECIFICATION

September 1981

prepared for

Defense Advanced Research Projects Agency
Information Processing Techniques Office
1400 Wilson Boulevard
Arlington, Virginia 22209

by

Information Sciences Institute
University of Southern California
4676 Admiralty Way
Marina del Rey, California 90291

September 1981
Internet Protocol

TABLE OF CONTENTS

PREFACE ........................................................ iii

1. INTRODUCTION ..................................................... 1

1.1 Motivation .................................................... 1
1.2 Scope ......................................................... 1
1.3 Interfaces .................................................... 1
1.4 Operation ..................................................... 2

2. OVERVIEW ......................................................... 5

2.1 Relation to Other Protocols ................................... 9
2.2 Model of Operation ............................................ 5
2.3 Function Description .......................................... 7
2.4 Gateways ...................................................... 9

3. SPECIFICATION ................................................... 11

3.1 Internet Header Format ....................................... 11
3.2 Discussion ................................................... 23
3.3 Interfaces ................................................... 31

APPENDIX A: Examples & Scenarios ................................... 34
APPENDIX B: Data Transmission Order ................................ 39

GLOSSARY ............................................................ 41

REFERENCES .......................................................... 45

[Page i]

September 1981
Internet Protocol

[Page ii]

September 1981
Internet Protocol

PREFACE

This document specifies the DoD Standard Internet Protocol. This
document is based on six earlier editions of the ARPA Internet Protocol
Specification, and the present text draws heavily from them. There have
been many contributors to this work both in terms of concepts and in
terms of text. This edition revises aspects of addressing, error
handling, option codes, and the security, precedence, compartments, and
handling restriction features of the internet protocol.

Jon Postel

Editor

September 1981

RFC: 791
Replaces: RFC 760
IENs 128, 123, 111,
80, 54, 44, 41, 28, 26

INTERNET PROTOCOL

DARPA INTERNET PROGRAM
PROTOCOL SPECIFICATION

1. INTRODUCTION

1.1. Motivation

The Internet Protocol is designed for use in interconnected systems of
packet-switched computer communication networks. Such a system has
been called a "catenet" [1]. The internet protocol provides for
transmitting blocks of data called datagrams from sources to
destinations, where sources and destinations are hosts identified by
fixed length addresses. The internet protocol also provides for
fragmentation and reassembly of long datagrams, if necessary, for
transmission through "small packet" networks.

1.2. Scope

The internet protocol is specifically limited in scope to provide the
functions necessary to deliver a package of bits (an internet
datagram) from a source to a destination over an interconnected system
of networks. There are no mechanisms to augment end-to-end data
reliability, flow control, sequencing, or other services commonly
found in host-to-host protocols. The internet protocol can capitalize
on the services of its supporting networks to provide various types
and qualities of service.

1.3. Interfaces

This protocol is called on by host-to-host protocols in an internet
environment. This protocol calls on local network protocols to carry
the internet datagram to the next gateway or destination host.

For example, a TCP module would call on the internet module to take a
TCP segment (including the TCP header and user data) as the data
portion of an internet datagram. The TCP module would provide the
addresses and other parameters in the internet header to the internet
module as arguments of the call. The internet module would then
create an internet datagram and call on the local network interface to
transmit the internet datagram.

In the ARPANET case, for example, the internet module would call on a

[Page 1]

September 1981
Internet Protocol
Introduction

local net module which would add the 1822 leader [2] to the internet
datagram creating an ARPANET message to transmit to the IMP. The
ARPANET address would be derived from the internet address by the
local network interface and would be the address of some host in the
ARPANET, that host might be a gateway to other networks.

1.4. Operation

The internet protocol implements two basic functions: addressing and
fragmentation.

The internet modules use the addresses carried in the internet header
to transmit internet datagrams toward their destinations. The
selection of a path for transmission is called routing.

The internet modules use fields in the internet header to fragment and
reassemble internet datagrams when necessary for transmission through
"small packet" networks.

The model of operation is that an internet module resides in each host
engaged in internet communication and in each gateway that
interconnects networks. These modules share common rules for
interpreting address fields and for fragmenting and assembling
internet datagrams. In addition, these modules (especially in
gateways) have procedures for making routing decisions and other
functions.

The internet protocol treats each internet datagram as an independent
entity unrelated to any other internet datagram. There are no
connections or logical circuits (virtual or otherwise).

The internet protocol uses four key mechanisms in providing its
service: Type of Service, Time to Live, Options, and Header Checksum.

The Type of Service is used to ind icate the quality of the service
desired. The type of service is an abstract or generalized set of
parameters which characterize the service choices provided in the
networks that make up the internet. This type of service indication
is to be used by gateways to select the actual transmission parameters
for a particular network, the network to be used for the next hop, or
the next gateway when routing an internet datagram.

The Time to Live is an indication of an upper bound on the lifetime of
an internet datagram. It is set by the sender of the datagram and
reduced at the points along the route where it is processed. If the
time to live reaches zero before the internet datagram reaches its
destination, the internet datagram is destroyed. The time to live can
be thought of as a self destruct time limit.

[Page 2]

September 1981
Internet Protocol
Introduction

The Options provide for control functions needed or useful in some
situations but unnecessary for the most common communications. The
options include provisions for timestamps, security, and special
routing.

The Header Checksum provides a verification that the information used
in processing internet datagram has been transmitted correctly. The
data may contain errors. If the header checksum fails, the internet
datagram is discarded at once by the entity which detects the error.

The internet protocol does not provide a reliable communication
facility. There are no acknowledgments either end-to-end or
hop-by-hop. There is no error control for data, only a header
checksum. There are no retransmissions. There is no flow control.

Errors detected may be reported via the Internet Control Message
Protocol (ICMP) [3] which is implemented in the internet protocol
module.

[Page 3]

September 1981
Internet Protocol

[Page 4]

September 1981
Internet Protocol

2. OVERVIEW

2.1. Relation to Other Protocols

The following diagram illustrates the place of the internet protocol
in the protocol hierarchy:

+------+ +-----+ +-----+ +-----+
|Telnet| | FTP | | TFTP| ... | ... |
+------+ +-----+ +-----+ +-----+
| | | |
+-----+ +-----+ +-----+
| TCP | | UDP | ... | ... |
+-----+ +-----+ +-----+
| | |
+--------------------------+----+
| Internet Protocol & ICMP |
+--------------------------+----+
|
+---------------------------+
| Local Network Protocol |
+---------------------------+

Protocol Relationships

Figure 1.

Internet protocol interfaces on one side to the higher level
host-to-host protocols and on the other side to the local network
protocol. In this context a "local network" may be a small network in
a building or a large network such as the ARPANET.

2.2. Model of Operation

The model of operation for transmitting a datagram from one
application program to another is illustrated by the following
scenario:

We suppose that this transmission will involve one intermediate
gateway.

The sending application program prepares its data and calls on its
local internet module to send that data as a datagram and passes the
destination address and other parameters as arguments of the call.

The internet module prepares a datagram header and attaches the data
to it. The internet module determines a local network address for
this internet address, in this case it is the address of a gateway.

[Page 5]

September 1981
Internet Protocol
Overview

It sends this datagram and the local network address to the local
network interface.

The local network interface creates a local network header, and
attaches the datagram to it, then sends the result via the local
network.

The datagram arrives at a gateway host wrapped in the local network
header, the local network interface strips off this header, and
turns the datagram over to the internet module. The internet module
determines from the internet address that the datagram is to be
forwarded to another host in a second network. The internet module
determines a local net address for the destination host. It calls
on the local network interface for that network to send the
datagram.

This local network interface creates a local network header and
attaches the datagram sending the result to the destination host.

At this destination host the datagram is stripped of the local net
header by the local network interface and handed to the internet
module.

The internet module determines that the datagram is for an
application program in this host. It passes the data to the
application program in response to a system call, passing the source
address and other parameters as results of the call.

Application Application
Program Program
\ /
Internet Module Internet Module Internet Module
\ / \ /
LNI-1 LNI-1 LNI-2 LNI-2
\ / \ /
Local Network 1 Local Network 2

Transmission Path

Figure 2

[Page 6]

September 1981
Internet Protocol
Overview

2.3. Function Description

The function or purpose of Internet Protocol is to move datagrams
through an interconnected set of networks. This is done by passing
the datagrams from one internet module to another until the
destination is reached. The internet modules reside in hosts and
gateways in the internet system. The datagrams are routed from one
internet module to another through individual networks based on the
interpretation of an internet address. Thus, one i mportant mechanism
of the internet protocol is the internet address.

In the routing of messages from one internet module to another,
datagrams may need to traverse a network whose maximum packet size is
smaller than the size of the datagram. To overcome this difficulty, a
fragmentation mechanism is provided in the internet protocol.

Addressing

A distinction is made between names, addresses, and routes [4]. A
name indicates what we seek. An address indicates where it is. A
route indicates how to get there. The internet protocol deals
primarily with addresses. It is the task of higher level (i.e.,
host-to-host or application) protocols to make the mapping from
names to addresses. The internet module maps internet addresses to
local net addresses. It is the task of lower level (i.e., local net
or gateways) procedures to make the mapping from local net addresses
to routes.

Addresses are fixed length of four octets (32 bits). An address
begins with a network number, followed by local address (called the
"rest" field). There are three formats or classes of internet
addresses: in class a, the high order bit is zero, the next 7 bits
are the network, and the last 24 bits are the local address; in
class b, the high order two bits are one-zero, the next 14 bits are
the network and the last 16 bits are the local address; in class c,
the high order three bits are one-one-zero, the next 21 bits are the
network and the last 8 bits are the local address.

Care must be taken in mapping internet addresses to local net
addresses; a single physical host must be able to act as if it were
several distinct hosts to the extent of using several distinct
internet addresses. Some hosts will also have several physical
interfaces (multi-homing).

That is, provision must be made for a host to have several physical
interfaces to the network with each having several logical internet
addresses.

[Page 7]

September 1981
Internet Protocol
Overview

Examples of address mappings may be found in "Address Mappings" [5].

Fragmentation

Fragmentation of an internet datagram is necessary when it
originates in a local net that allows a large packet size and must
traverse a local net that limits packets to a smaller size to reach
its destination.

An internet datagram can be marked "don't fragment." Any internet
datagram so marked is not to be internet fragmented under any
circumstances. If internet datagram marked don't fragment cannot be
delivered to its destination without fragmenting it, it is to be
discarded instead.

Fragmentation, transmission and reassembly across a local network
which is invisible to the internet protocol module is called
intranet fragmentation and may be used [6].

The internet fragmentation and reassembly procedure needs to be able
to break a datagram into an almost arbitrary number of pieces that
can be later reassembled. The receiver of the fragments uses the
identification field to ensure that fragments of different datagrams
are not mixed. The fragment offset field tells the receiver the
position of a fragment in the original datagram. The fragment
offset and length determine the portion of the original datagram
covered by this fragment. The more-fragments flag indicates (by
being reset) the last fragment. These fields provide sufficient
information to reassemble datagrams.

The identification field is used to distinguish the fragments of one
datagram from those of another. The originating protocol module of
an internet datagram sets the identification field to a value that
must be unique for that source-destination pair and protocol for the
time the datagram will be active in the internet system. The
originating protocol module of a complete datagram sets the
more-fragments flag to zero and the fragment offset to zero.

To fragment a long internet datagram, an internet protocol module
(for example, in a gateway), creates two new internet datagrams and
copies the contents of the internet header fields from the long
datagram into both new internet headers. The data of the long
datagram is divided into two portions on a 8 octet (64 bit) boundary
(the second portion might not be an integral multiple of 8 octets,
but the first must be). Call the number of 8 octet blocks in the
first portion NFB (for Number of Fragment Blocks). The first
portion of the data is placed in the first new internet datagram,
and the total length field is set to the length of the first

[Page 8]

September 1981
Internet Protocol
Overview

datagram. The more-fragments flag is set to one. The second
portion of the data is placed in the second new internet datagram,
and the total length field is set to the length of the second
datagram. The more-fragments flag carries the same value as the
long datagram. The fragment offset field of the second new internet
datagram is set to the value of that field in the long datagram plus
NFB.

This procedure can be generalized for an n-way split, rather than
the two-way split described.

To assemble the fragments of an internet datagram, an internet
protocol module (for example at a destination host) combines
internet datagrams that all have the same value for the four fields:
identification, source, destination, and protocol. The combination
is done by placing the data portion of each fragment in the relative
position indicated by the fragment offset in that fragment's
internet header. The first fragment will have the fragment offset
zero, and the last fragment will have the more-fragments flag reset
to zero.

2.4. Gateways

Gateways implement internet protocol to forward datagrams between
networks. Gateways also implement the Gateway to Gateway Protocol
(GGP) [7] to coordinate routing and other internet control
information.

In a gateway the higher level protocols need not be implemented and
the GGP functions are added to the IP module.

+-------------------------------+
| Internet Protocol & ICMP & GGP|
+-------------------------------+
| |
+---------------+ +---------------+
| Local Net | | Local Net |
+---------------+ +---------------+

Gateway Protocols

Figure 3.

[Page 9]

September 1981
Internet Protocol

[Page 10]

September 1981
Internet Protocol

3. SPECIFICATION

3.1. Int ernet Header Format

A summary of the contents of the internet header follows:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Example Internet Datagram Header

Figure 4.

Note that each tick mark represents one bit position.

Version: 4 bits

The Version field indicates the format of the internet header. This
document describes version 4.

IHL: 4 bits

Internet Header Length is the length of the internet header in 32
bit words, and thus points to the beginning of the data. Note that
the minimum value for a correct header is 5.

[Page 11]

September 1981
Internet Protocol
Specification

Type of Service: 8 bits

The Type of Service provides an indication of the abstract
parameters of the quality of service desired. These parameters are
to be used to guide the selection of the actual service parameters
when transmitting a datagram through a particular network. Several
networks offer service precedence, which somehow treats high
precedence traffic as more important than other traffic (generally
by accepting only traffic above a certain precedence at time of high
load). The major choice is a three way tradeoff between low-delay,
high-reliability, and high-throughput.

Bits 0-2: Precedence.
Bit 3: 0 = Normal Delay, 1 = Low Delay.
Bits 4: 0 = Normal Throughput, 1 = High Throughput.
Bits 5: 0 = Normal Relibility, 1 = High Relibility.
Bit 6-7: Reserved for Future Use.

0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| | | | | | |
| PRECEDENCE | D | T | R | 0 | 0 |
| | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+

Precedence

111 - Network Control
110 - Internetwork Control
101 - CRITIC/ECP
100 - Flash Override
011 - Flash
010 - Immediate
001 - Priority
000 - Routine

The use of the Delay, Throughput, and Reliability indications may
increase the cost (in some sense) of the service. In many networks
better performance for one of these parameters is coupled with worse
performance on another. Except for very unusual cases at most two
of these three indications should be set.

The type of service is used to specify the treatment of the datagram
during its transmission through the internet system. Example
mappings of the internet type of service to the actual service
provided on networks such as AUTODIN II, ARPANET, SATNET, and PRNET
is given in "Service Mappings" [8].

[Page 12]

September 1981
Internet Protocol
Specification

The Network Control precedence designation is intended to be used
within a network only. The actual use and control of that
designation is up to each network. The Internetwork Control
designation is intended for use by gateway control originators only.
If the actual use of these precedence designations is of concern to
a particular network, it is the responsibility of that network to
control the access to, and use of, those precedence designations.

Total Length: 16 bits

Total Length is the length of the datagram, measured in octets,
including internet header and data. This field allows the length of
a datagram to be up to 65,535 octets. Such long datagrams are
impractical for most hosts and networks. All hosts must be prepared
to accept datagrams of up to 576 octets (whether they arrive whole
or in fragments). It is recommended that hosts only send datagrams
larger than 576 octets if they have assurance that the destination
is prepared to accept the larger datagrams.

The number 576 is selected to allow a reasonable sized data block to
be transmitted in addition to the required header information. For
example, this size allows a data block of 512 octets plus 64 header
octets to fit in a datagram. The maximal internet header is 60
octets, and a typical internet header is 20 octets, allowing a
margin for headers of higher level protocols.

Identification: 16 bits

An identifying value assigned by the sender to aid in assembling the
fragments of a datagram.

Flags: 3 bits

Various Control Flags.

Bit 0: reserved, must be zero
Bit 1: (DF) 0 = May Fragment, 1 = Don't Fragment.
Bit 2: (MF) 0 = Last Fragment, 1 = More Fragments.

0 1 2
+---+---+---+
| | D | M |
| 0 | F | F |
+---+---+---+

Fragment Offset: 13 bits

This field indicates where in the datagram this fragment belongs.

[Page 13]

September 1981
Internet Protocol
Specification

The fragment offset is measured in units of 8 octets (64 bits). The
first fragment has offset zero.

Time to Live: 8 bits

This field indicates the maximum time the datagram is allowed to
remain in the internet system. If this field contains the value
zero, then the datagram must be destroyed. This field is modified
in internet header processing. The time is measured in units of
seconds, but since every module that processes a datagram must
decrease the TTL by at least one even if it process the datagram in
less than a second, the TTL must be thought of only as an upper
bound on the time a datagram may exist. The intention is to cause
undeliverable datagrams to be discarded, and to bound the maximum
datagram lifetime.

Protocol: 8 bits

This field indicates the next level protocol used in the data
portion of the internet datagram. The values for various protocols
are specified in " Assigned Numbers" [9].

Header Checksum: 16 bits

A checksum on the header only. Since some header fields change
(e.g., time to live), this is recomputed and verified at each point
that the internet header is processed.

The checksum algorithm is:

The checksum field is the 16 bit one's complement of the one's
complement sum of all 16 bit words in the header. For purposes of
computing the checksum, the value of the checksum field is zero.

This is a simple to compute checksum and experimental evidence
indicates it is adequate, but it is provisional and may be replaced
by a CRC procedure, depending on further experience.

Source Address: 32 bits

The source address. See section 3.2.

Destination Address: 32 bits

The destination address. See section 3.2.

[Page 14]

September 1981
Internet Protocol
Specification

Options: variable

The options may appear or not in datagrams. They must be
implemented by all IP modules (host and gateways). What is optional
is their transmission in any particular datagram, not their
implementation.

In some environments the security option may be required in all
datagrams.

The option field is variable in length. There may be zero or more
options. There are two cases for the format of an option:

Case 1: A single octet of option-type.

Case 2: An option-type octet, an option-length octet, and the
actual option-data octets.

The option-length octet counts the option-type octet and the
option-length octet as well as the option-data octets.

The option-type octet is viewed as having 3 fields:

1 bit copied flag,
2 bits option class,
5 bits option number.

The copied flag indicates that this option is copied into all
fragments on fragmentation.

0 = not copied
1 = copied

The option classes are:

0 = control
1 = reserved for future use
2 = debugging and measurement
3 = reserved for future use

[Page 15]

September 1981
Internet Protocol
Specification

The following internet options are defined:

CLASS NUMBER LENGTH DESCRIPTION
----- ------ ------ -----------
0 0 - End of Option list. This option occupies only
1 octet; it has no length octet.
0 1 - No Operation. This option occupies only 1
octet; it has no length octet.
0 2 11 Security. Used to carry Security,
Compartmentation, User Group (TCC), and
Handling Restriction Codes compatible with DOD
requirements.
0 3 var. Loose Source Routing. Used to route the
internet datagram based on information
supplied by the source.
0 9 var. Strict Source Routing. Used to route the
internet datagram based on information
supplied by the source.
0 7 var. Record Route. Used to trace the route an
internet datagram takes.
0 8 4 Stream ID. Used to carry the stream
identifier.
2 4 var. Internet Timestamp.

Specific Option Definitions

End of Option List

+--------+
|00000000|
+--------+
Type=0

This option indicates the end of the option list. This might
not coincide with the end of the internet header according to
the internet header length. This is used at the end of all
options, not the end of each option, and need only be used if
the end of the options would not otherwise coincide with the end
of the internet header.

May be copied, introduced, or deleted on fragmentation, or for
any other reason.

[Page 16]

September 1981
Internet Protocol
Specification

No Operation

+--------+
|00000001|
+--------+
Type=1

This option may be used between options, for example, to align
the beginning of a subsequent option on a 32 bit boundary.

May be copied, introduced, or deleted on fragmentation, or for
any other reason.

Security

This option provides a way for hosts to send security,
compartmentation, handling restrictions, and TCC (closed user
group) parameters. The format for this option is as follows:

+--------+--------+---//---+---//---+---//---+---//---+
|10000010|00001011|SSS SSS|CCC CCC|HHH HHH| TCC |
+--------+--------+---//---+---//---+---//---+---//---+
Type=130 Length=11

Security (S field): 16 bits

Specifies one of 16 levels of security (eight of which are
reserved for future use).

00000000 00000000 - Unclassified
11110001 00110101 - Confidential
01111000 10011010 - EFTO
10111100 01001101 - MMMM
01011110 00100110 - PROG
10101111 00010011 - Restricted
11010111 10001000 - Secret
01101011 11000101 - Top Secret
00110101 11100010 - (Reserved for future use)
10011010 11110001 - (Reserved for future use)
01001101 01111000 - (Reserved for future use)
00100100 10111101 - (Reserved for future use)
00010011 01011110 - (Reserved for future use)
10001001 10101111 - (Reserved for future use)
11000100 11010110 - (Reserved for future use)
11100010 01101011 - (Reserved for future use)

[Page 17]

September 1981
Internet Protocol
Specification

Compartments (C field): 16 bits

An all zero value is used when the information transmitted is
not compartmented. Other values for the compartments field
may be obtained from the Defense Intelligence Agency.

Handling Restrictions (H field): 16 bits

The values for the control and release markings are
alphanumeric digraphs and are defined in the Defense
Intelligence Agency Manual DIAM 65-19, "Standard Security
Markings".

Transmission Control Code (TCC field): 24 bits

Provides a means to segregate traffic and define controlled
communities of inter est among subscribers. The TCC values are
trigraphs, and are available from HQ DCA Code 530.

Must be copied on fragmentation. This option appears at most
once in a datagram.

Loose Source and Record Route

+--------+--------+--------+---------//--------+
|10000011| length | pointer| route data |
+--------+--------+--------+---------//--------+
Type=131

The loose source and record route (LSRR) option provides a means
for the source of an internet datagram to supply routing
information to be used by the gateways in forwarding the
datagram to the destination, and to record the route
information.

The option begins with the option type code. The second octet
is the option length which includes the option type code and the
length octet, the pointer octet, and length-3 octets of route
data. The third octet is the pointer into the route data
indicating the octet which begins the next source address to be
processed. The pointer is relative to this option, and the
smallest legal value for the pointer is 4.

A route data is composed of a series of internet addresses.
Each internet address is 32 bits or 4 octets. If the pointer is
greater than the length, the source route is empty (and the
recorded route full) and the routing is to be based on the
destination address field.

[Page 18]

September 1981
Internet Protocol
Specification

If the address in destination address field has been reached and
the pointer is not greater than the length, the next address in
the source route replaces the address in the destination address
field, and the recorded route address replaces the source
address just used, and pointer is increased by four.

The recorded route address is the internet module's own internet
address as known in the environment into which this datagram is
being forwarded.

This procedure of replacing the source route with the recorded
route (though it is in the reverse of the order it must be in to
be used as a source route) means the option (and the IP header
as a whole) remains a constant length as the datagram progresses
through the internet.

This option is a loose source route because the gateway or host
IP is allowed to use any route of any number of other
intermediate gateways to reach the next address in the route.

Must be copied on fragmentation. Appears at most once in a
datagram.

Strict Source and Record Route

+--------+--------+--------+---------//--------+
|10001001| length | pointer| route data |
+--------+--------+--------+---------//--------+
Type=137

The strict source and record route (SSRR) option provides a
means for the source of an internet datagram to supply routing
information to be used by the gateways in forwarding the
datagram to the destination, and to record the route
information.

The option begins with the option type code. The second octet
is the option length which includes the option type code and the
length octet, the pointer octet, and length-3 octets of route
data. The third octet is the pointer into the route data
indicating the octet which begins the next source address to be
processed. The pointer is relative to this option, and the
smallest legal value for the pointer is 4.

A route data is composed of a series of internet addresses.
Each internet address is 32 bits or 4 octets. If the pointer is
greater than the length, the source route is empty (and the

[Page 19]

September 1981
Internet Protocol
Specification

recorded route full) and the routing is to be based on the
destination address field.

If the address in destination address field has been reached and
the pointer is not greater than the length, the next address in
the source route replaces the address in the destination address
field, and the recorded route address replaces the source
address just used, and pointer is increased by four.

The recorded route address is the internet module's own internet
address as known in the environment into which this datagram is
being forwarded.

This procedure of replacing the source route with the recorded
route (though it is in the reverse of the order it must be in to
be used as a source route) means the option (and the IP header
as a whole) remains a constant length as the datagram progresses
through the internet.

This option is a strict source route because the gateway or host
IP must send the datagram directly to the next address in the
source route through only the directly connected network
indicated in the next address to reach the next gateway or host
specified in the route.

Must be copied on fragmentation. Appears at most once in a
datagram.

Record Route

+--------+--------+--------+---------//--------+
|00000111| length | pointer| route data |
+--------+--------+--------+---------//--------+
Type=7

The record route option provides a means to record the route of
an internet datagram.

The option begins with the option type code. The second octet
is the option length which includes the option type code and the
length octet, the pointer octet, and length-3 octets of route
data. The third octet is the pointer into the route data
indicating the octet which begins the next area to store a route
address. The pointer is relative to this option, and the
smallest legal value for the pointer is 4.

A recorded route is composed of a series of internet addresses.
Each internet address is 32 bits or 4 octets. If the pointer is

[Page 20]

September 1981
Internet Protocol
Specification

greater than the length, the recorded route data area is full.
The originating host must compose this option with a large
enough route data area to hold all the address expected. The
size of the option does not change due to adding addresses. The
intitial contents of the route data area must be zero.

When an internet module routes a datagram it checks to see if
the record route option is present. If it is, it inserts its
own internet address as known in the environment into whi ch this
datagram is being forwarded into the recorded route begining at
the octet indicated by the pointer, and increments the pointer
by four.

If the route data area is already full (the pointer exceeds the
length) the datagram is forwarded without inserting the address
into the recorded route. If there is some room but not enough
room for a full address to be inserted, the original datagram is
considered to be in error and is discarded. In either case an
ICMP parameter problem message may be sent to the source
host [3].

Not copied on fragmentation, goes in first fragment only.
Appears at most once in a datagram.

Stream Identifier

+--------+--------+--------+--------+
|10001000|00000010| Stream ID |
+--------+--------+--------+--------+
Type=136 Length=4

This option provides a way for the 16-bit SATNET stream
identifier to be carried through networks that do not support
the stream concept.

Must be copied on fragmentation. Appears at most once in a
datagram.

[Page 21]

September 1981
Internet Protocol
Specification

Internet Timestamp

+--------+--------+--------+--------+
|01000100| length | pointer|oflw|flg|
+--------+--------+--------+--------+
| internet address |
+--------+--------+--------+--------+
| timestamp |
+--------+--------+--------+--------+
| . |
.
.
Type = 68

The Option Length is the number of octets in the option counting
the type, length, pointer, and overflow/flag octets (maximum
length 40).

The Pointer is the number of octets from the beginning of this
option to the end of timestamps plus one (i.e., it points to the
octet beginning the space for next timestamp). The smallest
legal value is 5. The timestamp area is full when the pointer
is greater than the length.

The Overflow (oflw) [4 bits] is the number of IP modules that
cannot register timestamps due to lack of space.

The Flag (flg) [4 bits] values are

0 -- time stamps only, stored in consecutive 32-bit words,

1 -- each timestamp is preceded with internet address of the
registering entity,

3 -- the internet address fields are prespecified. An IP
module only registers its timestamp if it matches its own
address with the next specified internet address.

The Timestamp is a right-justified, 32-bit timestamp in
milliseconds since midnight UT. If the time is not available in
milliseconds or cannot be provided with respect to midnight UT
then any time may be inserted as a timestamp provided the high
order bit of the timestamp field is set to one to indicate the
use of a non-standard value.

The originating host must compose this option with a large
enough timestamp data area to hold all the timestamp information
expected. The size of the option does not change due to adding

[Page 22]

September 1981
Internet Protocol
Specification

timestamps. The intitial contents of the timestamp data area
must be zero or internet address/zero pairs.

If the timestamp data area is already full (the pointer exceeds
the length) the datagram is forwarded without inserting the
timestamp, but the overflow count is incremented by one.

If there is some room but not enough room for a full timestamp
to be inserted, or the overflow count itself overflows, the
original datagram is considered to be in error and is discarded.
In either case an ICMP parameter problem message may be sent to
the source host [3].

The timestamp option is not copied upon fragmentation. It is
carried in the first fragment. Appears at most once in a
datagram.

Padding: variable

The internet header padding is used to ensure that the internet
header ends on a 32 bit boundary. The padding is zero.

3.2. Discussion

The implementation of a protocol must be robust. Each implementation
must expect to interoperate with others created by different
individuals. While the goal of this specification is to be explicit
about the protocol there is the possibility of differing
interpretations. In general, an implementation must be conservative
in its sending behavior, and liberal in its receiving behavior. That
is, it must be careful to send well-formed datagrams, but must accept
any datagram that it can interpret (e.g., not object to technical
errors where the meaning is still clear).

The basic internet service is datagram oriented and provides for the
fragmentation of datagrams at gateways, with reassembly taking place
at the destination internet protocol module in the destination host.
Of course, fragmentation and reassembly of datagrams within a network
or by private agreement between the gateways of a network is also
allowed since this is transparent to the internet protocols and the
higher-level protocols. This transparent type of fragmentation and
reassembly is termed "network-dependent" (or intranet) fragmentation
and is not discussed further here.

Internet addresses distinguish sources and destinations to the host
level and provide a protocol field as well. It is assumed that each
protocol will provide for whatever multiplexing is necessary within a
host.

[Page 23]

September 1981
Internet Protocol
Specification

Addressing

To provide for flexibility in assigning address to networks and
allow for the large number of small to intermediate sized networks
the interpretation of the address field is coded to specify a small
number of networks with a large number of host, a moderate number of
networks with a moderate number of hosts, and a large number of
networks with a small number of hosts. In addition there is an
escape code for extended addressing mode.

Address Formats:

High Order Bits Format Class
--------------- ------------------------------- -----
0 7 bits of net, 24 bits of host a
10 14 bits of net, 16 bits of host b
110 21 bits of net, 8 bits of host c
111 escape to extended addressing mode

A value of zero in the network field means this network. This is
only used in certain ICMP messages. The extended addressing mode
is undefined. Both of these features are reserved for future use.

The actual values assigned for network addresses is given in
"Assigned Numbers" [9].

The local address, assigned by the local network, must allow for a
single physical host to act as several distinct internet hosts.
That is, there must be a mapping between internet host addresses and
network/host interfaces that allows several internet addresses to
correspond to one interface. It must also be allowed for a host to
have several physical interfaces and to treat the datagrams from
several of them as if they were all addressed to a single host.

Address mappings between internet addresses and addresses for
ARPANET, SATNET, PRNET, and other networks are described in "Address
Mappings" [5].

Fragmentation and Reassembly.

The internet identification field (ID) is used together with the
source and destination address, and the protocol fields, to identify
datagram fragments for reassembly.

The More Fragments flag bit (MF) is set if the datagram is not the
last fragment. The Fragment Offset field identifies the fragment
location, relative to the beginning of the original unfragmented
datagram. Fragments are counted in units of 8 octets. The

[Page 24]

September 1981
Internet Protocol
Specification

fragmentation strategy is designed so than an unfragmented datagram
has all zero fragmentation information (MF = 0, fragment offset =
0). If an internet datagram is fragmented, its data portion must be
broken on 8 octet boundaries.

This format allows 2**13 = 8192 fragments of 8 octets each for a
total of 65,536 octets. Note that this is consistent with the the
datagram total length field (of course, the header is counted in the
total length and not in the fragments).

When fragmentation occurs, some options are copied, but others
remain with the first fragment only.

Every internet module must be able to forward a datagram of 68
octets without further fragmentation. This is because an internet
header may be up to 60 octets, and the minimum fragment is 8 octets.

Every internet destination must be able to receive a datagram of 576
octets either in one piece or in fragments to be reassembled.

The fields which may be affected by fragmentation include:

(1) options field
(2) more fragments flag
(3) fragment offset
(4) internet header length field
(5) total length field
(6) header checksum

If the Don't Fragment flag (DF) bit is set, then internet
fragmentation of this datagram is NOT permitted, although it may be
discarded. This can be used to prohibit fragmentation in cases
where the receiving host does not have sufficient resources to
reassemble internet fragments.

One example of use of the Don't Fragment feature is to down line
load a small host. A small host could have a boot strap program
that accepts a datagram stores it in memory and then executes it.

The fragmentation and reassembly procedures are most easily
described by examples. The following procedures are example
implementations.

General notation in the following pseudo programs: "=<" means "less
than or equal", "#" means "not equal", "=" means "equal", "<-" means
"is set to". Also, "x to y" includes x and excludes y; for example,
"4 to 7" would include 4, 5, and 6 (but not 7).

[Page 25]

September 1981
Internet Protocol
Specification

An Example Fragmentation Procedure

The maximum sized datagram that can be transmitted through the
next network is called the maximum transmission unit (MTU).

If the total length is less than or equal the maximum transmission
unit then submit this datagram to the next step in datagram
processing; otherwise cut the datagram into two fragments, the
first fragment being the maximum size, and the second fragment
being the rest of the datagram. The first fragment is submitted
to the next step in datagram processing, while the second fragment
is submitted to this procedure in case it is still too large.

Notation:

FO - Fragment Offset
IHL - Internet Header Length
DF - Don't Fragment flag
MF - More Fragments flag
TL - Total Length
OFO - Old Fragment Offset
OIHL - Old Internet Header Length
OMF - Old More Fragments flag
OTL - Old Total Length
NFB - Number of Fragment Blocks
MTU - Maximum Transmission Unit

Procedure:

IF TL =< MTU THEN Submit this datagram to the next step
in datagram processing ELSE IF DF = 1 THEN discard the
datagram ELSE
To produce the first fragment:
(1) Copy the original internet header;
(2) OIHL <- IHL; OTL <- TL; OFO <- FO; OMF <- MF;
(3) NFB <- (MTU-IHL*4)/8;
(4) Attach the first NFB*8 data octets;
(5) Correct the header:
MF <- 1; TL <- (IHL*4)+(NFB*8);
Recompute Checksum;
(6) Submit this fragment to the next step in
datagram processing;
To produce the second fragment:
(7) Selectively copy the internet header (some options
are not copied, see option definitions);
(8) Append the remaining data;
(9) Correct the header:
IHL <- (((OIHL*4)-(length of options not copied))+3)/4;

[Page 26]

September 1981
Internet Protocol
Specification

TL <- OTL - NFB*8 - (OIHL-IHL)*4);
FO <- OFO + NFB; MF <- OMF; Recompute Checksum;
(10) Submit this fragment to the fragmentation test; DONE.

In the above procedure each fragment (except the last) was made
the maximum allowable size. An alternative might produce less
than the maximum size datagrams. For example, one could implement
a fragmentation procedure that repeatly divided large datagrams in
half until the resulting fragments were less than the maximum
transmission unit size.

An Example Reassembly Procedure

For each datagram the buffer identifier is computed as the
concatenation of the source, destination, protocol, and
identification fields. If this is a whole datagram (that is both
the fragment offset and the more fragments fields are zero), then
any reassembly resources associated with this buffer identifier
are released and the datagram is forwarded to the next step in
datagram processing.

If no other fragment with this buffer identifier is on hand then
reassem bly resources are allocated. The reassembly resources
consist of a data buffer, a header buffer, a fragment block bit
table, a total data length field, and a timer. The data from the
fragment is placed in the data buffer according to its fragment
offset and length, and bits are set in the fragment block bit
table corresponding to the fragment blocks received.

If this is the first fragment (that is the fragment offset is
zero) this header is placed in the header buffer. If this is the
last fragment ( that is the more fragments field is zero) the
total data length is computed. If this fragment completes the
datagram (tested by checking the bits set in the fragment block
table), then the datagram is sent to the next step in datagram
processing; otherwise the timer is set to the maximum of the
current timer value and the value of the time to live field from
this fragment; and the reassembly routine gives up control.

If the timer runs out, the all reassembly resources for this
buffer identifier are released. The initial setting of the timer
is a lower bound on the reassembly waiting time. This is because
the waiting time will be increased if the Time to Live in the
arriving fragment is greater than the current timer value but will
not be decreased if it is less. The maximum this timer value
could reach is the maximum time to live (approximately 4.25
minutes). The current recommendation for the initial timer
setting is 15 seconds. This may be changed as experience with

[Page 27]

September 1981
Internet Protocol
Specification

this protocol accumulates. Note that the choice of this parameter
value is related to the buffer capacity available and the data
rate of the transmission medium; that is, data rate times timer
value equals buffer size (e.g., 10Kb/s X 15s = 150Kb).

Notation:

FO - Fragment Offset
IHL - Internet Header Length
MF - More Fragments flag
TTL - Time To Live
NFB - Number of Fragment Blocks
TL - Total Length
TDL - Total Data Length
BUFID - Buffer Identifier
RCVBT - Fragment Received Bit Table
TLB - Timer Lower Bound

Procedure:

(1) BUFID <- source|destination|protocol|identification;
(2) IF FO = 0 AND MF = 0
(3) THEN IF buffer with BUFID is allocated
(4) THEN flush all reassembly for this BUFID;
(5) Submit datagram to next step; DONE.
(6) ELSE IF no buffer with BUFID is allocated
(7) THEN allocate reassembly resources
with BUFID;
TIMER <- TLB; TDL <- 0;
(8) put data from fragment into data buffer with
BUFID from octet FO*8 to
octet (TL-(IHL*4))+FO*8;
(9) set RCVBT bits from FO
to FO+((TL-(IHL*4)+7)/8);
(10) IF MF = 0 THEN TDL <- TL-(IHL*4)+(FO*8)
(11) IF FO = 0 THEN put header in header buffer
(12) IF TDL # 0
(13) AND all RCVBT bits from 0
to (TDL+7)/8 are set
(14) THEN TL <- TDL+(IHL*4)
(15) Submit datagram to next step;
(16) free all reassembly resources
for this BUFID; DONE.
(17) TIMER <- MAX(TIMER,TTL);
(18) give up until next fragment or timer expires;
(19) timer expires: flush all reassembly with this BUFID; DONE.

In the case that two or more fragments contain the same data

[Page 28]

September 1981
Internet Protocol
Specification

either identically or through a partial overlap, this procedure
will use the more recently arrived copy in the data buffer and
datagram delivered.

Identification

The choice of the Identifier for a datagram is based on the need to
provide a way to uniquely identify the fragments of a particular
datagram. The protocol module assembling fragments judges fragments
to belong to the same datagram if they have the same source,
destination, protocol, and Identifier. Thus, the sender must choose
the Identifier to be unique for this source, destination pair and
protocol for the time the datagram (or any fragment of it) could be
alive in the internet.

It seems then that a sending protocol module needs to keep a table
of Identifiers, one entry for each destination it has communicated
with in the last maximum packet lifetime for the internet.

However, since the Identifier field allows 65,536 different values,
some host may be able to simply use unique identifiers independent
of destination.

It is appropriate for some higher level protocols to choose the
identifier. For example, TCP protocol modules may retransmit an
identical TCP segment, and the probability for correct reception
would be enhanced if the retransmission carried the same identifier
as the original transmission since fragments of either datagram
could be used to construct a correct TCP segment.

Type of Service

The type of service (TOS) is for internet service quality selection.
The type of service is specified along the abstract parameters
precedence, delay, throughput, and reliability. These abstract
parameters are to be mapped into the actual service parameters of
the particular networks the datagram traverses.

Precedence. An independent measure of the importance of this
datagram.

Delay. Prompt delivery is important for datagrams with this
indication.

Throughput. High data rate is important for datagrams with this
indication.

[Page 29]

September 1981
Internet Protocol
Specification

Reliability. A higher level of effort to ensure delivery is
important for datagrams with this indication.

For example, the ARPANET has a priority bit, and a choice between
"standard" messages (type 0) and "uncontrolled" messages (type 3),
(the choice between single packet and multipacket messages can also
be considered a service parameter). The uncontrolled messages tend
to be less reliably delivered and suffer less delay. Suppose an
internet datagram is to be sent through the ARPANET. Let the
internet type of service be given as:

Precedence: 5
Delay: 0
Throughput: 1
Reliability: 1

In this example, the mapping of these parameters to those available
for the ARPANET would be to set the ARPANET priority bit on since
t he Internet precedence is in the upper half of its range, to select
standard messages since the throughput and reliability requirements
are indicated and delay is not. More details are given on service
mappings in "Service Mappings" [8].

Time to Live

The time to live is set by the sender to the maximum time the
datagram is allowed to be in the internet system. If the datagram
is in the internet system longer than the time to live, then the
datagram must be destroyed.

This field must be decreased at each point that the internet header
is processed to reflect the time spent processing the datagram.
Even if no local information is available on the time actually
spent, the field must be decremented by 1. The time is measured in
units of seconds (i.e. the value 1 means one second). Thus, the
maximum time to live is 255 seconds or 4.25 minutes. Since every
module that processes a datagram must decrease the TTL by at least
one even if it process the datagram in less than a second, the TTL
must be thought of only as an upper bound on the time a datagram may
exist. The intention is to cause undeliverable datagrams to be
discarded, and to bound the maximum datagram lifetime.

Some higher level reliable connection protocols are based on
assumptions that old duplicate datagrams will not arrive after a
certain time elapses. The TTL is a way for such protocols to have
an assurance that their assumption is met.

[Page 30]

September 1981
Internet Protocol
Specification

Options

The options are optional in each datagram, but required in
implementations. That is, the presence or absence of an option is
the choice of the sender, but each internet module must be able to
parse every option. There can be several options present in the
option field.

The options might not end on a 32-bit boundary. The internet header
must be filled out with octets of zeros. The first of these would
be interpreted as the end-of-options option, and the remainder as
internet header padding.

Every internet module must be able to act on every option. The
Security Option is required if classified, restricted, or
compartmented traffic is to be passed.

Checksum

The internet header checksum is recomputed if the internet header is
changed. For example, a reduction of the time to live, additions or
changes to internet options, or due to fragmentation. This checksum
at the internet level is intended to protect the internet header
fields from transmission errors.

There are some applications where a few data bit errors are
acceptable while retransmission delays are not. If the internet
protocol enforced data correctness such applications could not be
supported.

Errors

Internet protocol errors may be reported via the ICMP messages [3].

3.3. Interfaces

The functional description of user interfaces to the IP is, at best,
fictional, since every operating system will have different
facilities. Consequently, we must warn readers that different IP
implementations may have different user interfaces. However, all IPs
must provide a certain minimum set of services to guarantee that all
IP implementations can support the same protocol hierarchy. This
section specifies the functional interfaces required of all IP
implementations.

Internet protocol interfaces on one side to the local network and on
the other side to either a higher level protocol or an application
program. In the following, the higher level protocol or application

[Page 31]

September 1981
Internet Protocol
Specification

program (or even a gateway program) will be called the "user" since it
is using the internet module. Since internet protocol is a datagram
protocol, there is minimal memory or state maintained between datagram
transmissions, and each call on the internet protocol module by the
user supplies all information necessary for the IP to perform the
service requested.

An Example Upper Level Interface

The following two example calls satisfy the requirements for the user
to internet protocol module communication ("=>" means returns):

SEND (src, dst, prot, TOS, TTL, BufPTR, len, Id, DF, opt => result)

where:

src = source address
dst = destination address
prot = protocol
TOS = type of service
TTL = time to live
BufPTR = buffer pointer
len = length of buffer
Id = Identifier
DF = Don't Fragment
opt = option data
result = response
OK = datagram sent ok
Error = error in arguments or local network error

Note that the precedence is included in the TOS and the
security/compartment is passed as an option.

RECV (BufPTR, prot, => result, src, dst, TOS, len, opt)

where:

BufPTR = buffer pointer
prot = protocol
result = response
OK = datagram received ok
Error = error in arguments
len = length of buffer
src = source address
dst = destination address
TOS = type of service
opt = option data

[Page 32]

September 1981
Internet Protocol
Specification

When the user sends a datagram, it executes the SEND call supplying
all the arguments. The internet protocol module, on receiving this
call, checks the arguments and prepares and sends the message. If the
arguments are good and the datagram is accepted by the local network,
the call returns successfully. If either the arguments are bad, or
the datagram is not accepted by the local network, the call returns
unsuccessfully. On unsuccessful returns, a reasonable report must be
made as to the cause of the problem, but the details of such reports
are up to individual implementations.

When a datagram arrives at the internet protocol module from the local
network, either there is a pending RECV call from the user addressed
or there is not. In the first case, the pending call is satisfied by
passing the information from the datagram to the user. In the second
case, the user addressed is notified of a pending datagram. If the
user addressed does not exist, an ICMP error message is returned to
the sender, and the data is discarded.

The notification of a user may be via a pseudo interrupt or similar
mechanism, as appropriate in the particular operating system
environment of the implementation.

A user's RECV call may then either be immediately satisfied by a
pending datagram, or the call may be pending until a datagram arrives.

The source address is included in the send call in case the sending
host has several addresses (multiple physical connections or logical
addresses). The internet module must check to see that the source
address is one of the legal address for this host.

An implementation may also allow or require a call to the internet
module to indicate interest in or reserve exclusive use of a class of
datagrams (e.g., all those with a certain value in the protocol
field).

This section functionally characterizes a USER/IP interface. The
notation used is similar to most procedure of function calls in high
level languages, but this usage is not meant to rule out trap type
service calls (e.g., SVCs, UUOs, EMTs), or any other form of
interprocess communication.

[Page 33]

September 1981
Internet Protocol

APPENDIX A: Examples & Scenarios

Example 1:

This is an example of the minimal data carrying internet datagram:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 5 |Type of Service| Total Length = 21 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=0| Fragment Offset = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 123 | Protocol = 1 | header checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+

Example Internet Datagram

Figure 5.

Note that each tick mark represents one bit position.

This is a internet datagram in version 4 of internet protocol; the
internet header consists of five 32 bit words, and the total length of
the datagram is 21 octets. This datagram is a complete datagram (not
a fragment).

[Page 34]

September 1981
Internet Protocol

Example 2:

In this example, we show first a moderate size internet datagram (452
data octets), then two internet fragments that might result from the
fragmentation of this datagram if the maximum sized transmission
allowed were 280 octets.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 5 |Type of Service| Total Length = 472 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=0| Fragment Offset = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 123 | Protocol = 6 | header checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
\ \
\ \
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Example Internet Datagram

Figure 6.

[Page 35]

September 1981
Internet Protocol

Now the first fragment that results from splitting the datagram after
256 data octets.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 5 |Type of Service| Total Length = 276 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=1| Fragment Offset = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 119 | Protocol = 6 | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
\ \
\ \
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Example Internet Fragment

Figure 7.

[Page 36]

September 1981
Internet Protocol

And the second fragment.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 5 |Type of Service| Total Length = 216 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=0| Fragment Offset = 32 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 119 | Protocol = 6 | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
\ \
\ \
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Example Internet Fragment

Figure 8.

[Page 37]

September 1981
Internet Protocol

Example 3:

Here, we show an example of a datagram containing options:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver= 4 |IHL= 8 |Type of Service| Total Length = 576 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification = 111 |Flg=0| Fragment Offset = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time = 123 | Protocol = 6 | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| source address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| destination address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opt. Code = x | Opt. Len.= 3 | option value | Opt. Code = x |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opt. Len. = 4 | option value | Opt. Code = 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Opt. Code = y | Opt. Len. = 3 | option value | Opt. Code = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
\ \
\ \
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Example Internet Datagram

Figure 9.

[Page 38]

September 1981
Internet Protocol

APPENDIX B: Data Transmission Order

The order of transmission of the header and data described in this
document is resolved to the octet level. Whenever a diagram shows a
group of octets, the order of transmission of those octets is the normal
order in which they are read in English. For example, in the following
diagram the octets are transmitted in the order they are numbered.

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 1 | 2 | 3 | 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 5 | 6 | 7 | 8 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 9 | 10 | 11 | 12 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Transmission Order of Bytes

Figure 10.

Whenever an octet represents a numeric quantity the left most bit in the
diagram is the high order or most significant bit. That is, the bit
labeled 0 is the most significant bit. For example, the following
diagram represents the value 170 (decimal).

0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|1 0 1 0 1 0 1 0|
+-+-+-+-+-+-+-+-+

Significance of Bits

Figure 11.

Similarly, whenever a multi-octet field represents a numeric quantity
the left most bit of the whole field is the most significant bit. When
a multi-octet quantity is transmitted the most significant octet is
transmitted first.

[Page 39]

September 1981
Internet Protocol

[Page 40]

September 1981
Internet Protocol

GLOSSARY

1822
BBN Report 1822, "The Specification of the Interconnection of
a Host and an IMP". The specification of interface between a
host and the ARPANET.

ARPANET leader
The control information on an ARPANET message at the host-IMP
interface.

ARPANET message
The unit of transmission between a host and an IMP in the
ARPANET. The maximum size is about 1012 octets (8096 bits).

ARPANET packet
A unit of transmission used internally in the ARPANET between
IMPs. The maximum size is about 126 octets (1008 bits).

Destination
The destination address, an internet header field.

DF
The Don't Fragment bit carried in the flags field.

Flags
An internet header field carrying various control flags.

Fragment Offset
This internet header field indicates where in the internet
datagram a fragment belongs.

GGP
Gateway to Gateway Protocol, the protocol used primarily
between gateways to control routing and other gateway
functions.

header
Control information at the beginning of a message, segment,
datagram, packet or block of data.

ICMP
Internet Control Message Protocol, implemented in the internet
module, the ICMP is used from gateways to hosts and between
hosts to report errors and make routing suggestions.

[Page 41]

September 1981
Internet Protocol
Glossary

Identification
An internet header field carrying the identifying value
assigned by the sender to aid in assembling the fragments of a
datagram.

IHL
The internet header field Internet Header Length is the length
of the internet header measured in 32 bit words.

IMP
The Int erface Message Processor, the packet switch of the
ARPANET.

Internet Address
A four octet (32 bit) source or destination address consisting
of a Network field and a Local Address field.

internet datagram
The unit of data exchanged between a pair of internet modules
(includes the internet header).

internet fragment
A portion of the data of an internet datagram with an internet
header.

Local Address
The address of a host within a network. The actual mapping of
an internet local address on to the host addresses in a
network is quite general, allowing for many to one mappings.

MF
The More-Fragments Flag carried in the internet header flags
field.

module
An implementation, usually in software, of a protocol or other
procedure.

more-fragments flag
A flag indicating whether or not this internet datagram
contains the end of an internet datagram, carried in the
internet header Flags field.

NFB
The Number of Fragment Blocks in a the data portion of an
internet fragment. That is, the length of a portion of data
measured in 8 octet units.

[Page 42]

September 1981
Internet Protocol
Glossary

octet
An eight bit byte.

Options
The internet header Options field may contain several options,
and each option may be several octets in length.

Padding
The internet header Padding field is used to ensure that the
data begins on 32 bit word boundary. The padding is zero.

Protocol
In this document, the next higher level protocol identifier,
an internet header field.

Rest
The local address portion of an Internet Address.

Source
The source address, an internet header field.

TCP
Transmission Control Protocol: A host-to-host protocol for
reliable communication in internet environments.

TCP Segment
The unit of data exchanged between TCP modules (including the
TCP header).

TFTP
Trivial File Transfer Protocol: A simple file transfer
protocol built on UDP.

Time to Live
An internet header field which indicates the upper bound on
how long this internet datagram may exist.

TOS
Type of Service

Total Length
The internet header field Total Length is the length of the
datagram in octets including internet header and data.

TTL
Time to Live

[Page 43]

September 1981
Internet Protocol
Glossary

Type of Service
An internet header field which indicates the type (or quality)
of service for this internet datagram.

UDP
User Datagram Protocol: A user level protocol for transaction
oriented applications.

User
The user of the internet protocol. This may be a higher level
protocol module, an application program, or a gateway program.

Version
The Version field indicates the format of the internet header.

[Page 44]

September 1981
Internet Protocol

REFERENCES

[1] Cerf, V., "The Catenet Model for Internetworking," Information
Processing Techniques Office, Defense Advanced Research Projects
Agency, IEN 48, July 1978.

[2] Bolt Beranek and Newman, "Specification for the Interconnection of
a Host and an IMP," BBN Technical Report 1822, Revised May 1978.

[3] Postel, J., "Internet Control Message Protocol - DARPA Internet
Program Protocol Specification," RFC 792, USC/Information Sciences
Institute, September 1981.

[4] Shoch, J., "Inter-Network Naming, Addressing, and Routing,"
COMPCON, IEEE Computer Society, Fall 1978.

[5] Postel, J., "Address Mappings," RFC 796, USC/Information Sciences
Institute, September 1981.

[6] Shoch, J., "Packet Fragmentation in Inter-Network Protocols,"
Computer Networks, v. 3, n. 1, February 1979.

[7] Strazisar, V., "How to Build a Gateway", IEN 109, Bolt Beranek and
Newman, August 1979.

[8] Postel, J., "Service Mappings," RFC 795, USC/Information Sciences
Institute, September 1981.

[9] Postel, J., "Assigned Numbers," RFC 790, USC/Information Sciences
Institute, September 1981.

The post RFC 791 – Internet Protocol Version 4 Specification appeared first on IPv6.net.

RFC 1819 – Internet Stream Protocol Version 2 (ST2) Protocol Specification

IPv6 & IoT editor — Tue, 28 Jul 2009 07:54:47 +0000

Network Working Group                                  ST2 Working Group

Request for Comments: 1819           L. Delgrossi and L. Berger, Editors

Obsoletes: 1190, IEN 119                                     August 1995

Category: Experimental

                Internet Stream Protocol Version 2 (ST2)

                 Protocol Specification - Version ST2+

Status of this Memo

   This memo defines an Experimental Protocol for the Internet

   community. This memo does not specify an Internet standard of any

   kind. Discussion and suggestions for improvement are requested.

   Distribution of this memo is unlimited.

IESG NOTE

   This document is a revision of RFC1190. The charter of this effort

   was clarifying, simplifying and removing errors from RFC1190 to

   ensure interoperability of implementations.

   NOTE WELL: Neither the version of the protocol described in this

   document nor the previous version is an Internet Standard or under

   consideration for that status.

   Since the publication of the original version of the protocol, there

   have been significant developments in the state of the art. Readers

   should note that standards and technology addressing alternative

   approaches to the resource reservation problem are currently under

   development within the IETF.

Abstract

   This memo contains a revised specification of the Internet STream

   Protocol Version 2 (ST2). ST2 is an experimental resource reservation

   protocol intended to provide end-to-end real-time guarantees over an

   internet. It allows applications to build multi-destination simplex

   data streams with a desired quality of service. The revised version

   of ST2 specified in this memo is called ST2+.

   This specification is a product of the STream Protocol Working Group

   of the Internet Engineering Task Force.

Table of Contents

     1 Introduction                                                   6

             1.1 What is ST2?                                         6

             1.2 ST2 and IP                                           8

             1.3 Protocol History                                     8

             1.3.1 RFC1190 ST and ST2+ Major Differences              9

             1.4 Supporting Modules for ST2                          10

             1.4.1 Data Transfer Protocol                            11

             1.4.2 Setup Protocol                                    11

             1.4.3 Flow Specification                                11

             1.4.4 Routing Function                                  12

             1.4.5 Local Resource Manager                            12

             1.5 ST2 Basic Concepts                                  15

             1.5.1 Streams                                           16

             1.5.2 Data Transmission                                 16

             1.5.3 Flow Specification                                17

             1.6 Outline of This Document                            19

     2 ST2 User Service Description                                  19

             2.1 Stream Operations and Primitive Functions           19

             2.2 State Diagrams                                      21

             2.3 State Transition Tables                             25

     3 The ST2 Data Transfer Protocol                                26

             3.1 Data Transfer with ST                               26

             3.2 ST Protocol Functions                               27

             3.2.1 Stream Identification                             27

             3.2.2 Packet Discarding based on Data Priority          27

     4 SCMP Functional Description                                   28

             4.1 Types of Streams                                    29

             4.1.1 Stream Building                                   30

             4.1.2 Knowledge of Receivers                            30

             4.2 Control PDUs                                        31

             4.3 SCMP Reliability                                    32

             4.4 Stream Options                                      33

             4.4.1 No Recovery                                       33

             4.4.2 Join Authorization Level                          34

             4.4.3 Record Route                                      34

             4.4.4 User Data                                         35

             4.5 Stream Setup                                        35

             4.5.1 Information from the Application                  35

             4.5.2 Initial Setup at the Origin                       35

             4.5.2.1 Invoking the Routing Function                   36

             4.5.2.2 Reserving Resources                             36

             4.5.3 Sending CONNECT Messages                          37

             4.5.3.1 Empty Target List                               37

             4.5.4 CONNECT Processing by an Intermediate ST agent    37

             4.5.5 CONNECT Processing at the Targets                 38

             4.5.6 ACCEPT Processing by an Intermediate ST agent     38

             4.5.7 ACCEPT Processing by the Origin                   39

             4.5.8 REFUSE Processing by the Intermediate ST agent    39

             4.5.9 REFUSE Processing by the Origin                   39

             4.5.10 Other Functions during Stream Setup              40

             4.6 Modifying an Existing Stream                        40

             4.6.1 The Origin Adding New Targets                     41

             4.6.2 The Origin Removing a Target                      41

             4.6.3 A Target Joining a Stream                         42

             4.6.3.1 Intermediate Agent (Router) as Origin           43

             4.6.4 A Target Deleting Itself                          43

             4.6.5 Changing a Stream's FlowSpec                      44

             4.7 Stream Tear Down                                    45

     5 Exceptional Cases                                             45

             5.1 Long ST Messages                                    45

             5.1.1 Handling of Long Data Packets                     45

             5.1.2 Handling of Long Control Packets                  46

             5.2 Timeout Failures                                    47

             5.2.1 Failure due to ACCEPT Acknowledgment Timeout      47

             5.2.2 Failure due to CHANGE Acknowledgment Timeout      47

             5.2.3 Failure due to CHANGE Response Timeout            48

             5.2.4 Failure due to CONNECT Acknowledgment Timeout     48

             5.2.5 Failure due to CONNECT Response Timeout           48

             5.2.6 Failure due to DISCONNECT Acknowledgment Timeout 48

             5.2.7 Failure due to JOIN Acknowledgment Timeout        48

             5.2.8 Failure due to JOIN Response Timeout              49

             5.2.9 Failure due to JOIN-REJECT Acknowledgment Timeout 49

             5.2.10 Failure due to NOTIFY Acknowledgment Timeout     49

             5.2.11 Failure due to REFUSE Acknowledgment Timeout     49

             5.2.12 Failure due to STATUS Response Timeout           49

             5.3 Setup Failures due to Routing Failures              50

             5.3.1 Path Convergence                                  50

             5.3.2 Other Cases                                       51

             5.4 Problems due to Routing Inconsistency               52

             5.5 Problems in Reserving Resources                     53

             5.5.1 Mismatched FlowSpecs                              53

             5.5.2 Unknown FlowSpec Version                          53

             5.5.3 LRM Unable to Process FlowSpec                    53

             5.5.4 Insufficient Resources                            53

             5.6 Problems Caused by CHANGE Messages                  54

             5.7 Unknown Targets in DISCONNECT and CHANGE            55

     6 Failure Detection and Recovery                                55

             6.1 Failure Detection                                   55

             6.1.1 Network Failures                                  56

             6.1.2 Detecting ST Agents Failures                      56

             6.2 Failure Recovery                                    58

             6.2.1 Problems in Stream Recovery                       60

             6.3 Stream Preemption                                   62

     7 A Group of Streams                                            63

             7.1 Basic Group Relationships                           63

             7.1.1 Bandwidth Sharing                                 63

             7.1.2 Fate Sharing                                      64

             7.1.3 Route Sharing                                     65

             7.1.4 Subnet Resources Sharing                          65

             7.2 Relationships Orthogonality                         65

     8 Ancillary Functions                                           66

             8.1 Stream ID Generation                                66

             8.2 Group Name Generator                                66

             8.3 Checksum Computation                                67

             8.4 Neighbor ST Agent Identification and

                     Information Collection                           67

             8.5 Round Trip Time Estimation                          68

             8.6 Network MTU Discovery                               68

             8.7 IP Encapsulation of ST                              69

             8.8 IP Multicasting                                     70

     9 The ST2+ Flow Specification                                   71

             9.1 FlowSpec Version #0 - (Null FlowSpec)               72

             9.2 FlowSpec Version #7 - ST2+ FlowSpec                 72

             9.2.1 QoS Classes                                       73

             9.2.2 Precedence                                        74

             9.2.3 Maximum Data Size                                 74

             9.2.4 Message Rate                                      74

             9.2.5 Delay and Delay Jitter                            74

             9.2.6 ST2+ FlowSpec Format                              75

     10 ST2 Protocol Data Units Specification                        77

             10.1 Data PDU                                           77

             10.1.1 ST Data Packets                                  78

             10.2 Control PDUs                                       78

             10.3 Common SCMP Elements                               80

             10.3.1 FlowSpec                                         80

             10.3.2 Group                                            81

             10.3.3 MulticastAddress                                 82

             10.3.4 Origin                                           82

             10.3.5 RecordRoute                                      83

             10.3.6 Target and TargetList                            84

             10.3.7 UserData                                         85

             10.3.8 Handling of Undefined Parameters                 86

             10.4 ST Control Message PDUs                            86

             10.4.1 ACCEPT                                           86

             10.4.2 ACK                                              88

             10.4.3 CHANGE                                           89

             10.4.4 CONNECT                                          89

             10.4.5 DISCONNECT                                       92

             10.4.6 ERROR                                            93

             10.4.7 HELLO                                            94

             10.4.8 JOIN                                             95

             10.4.9 JOIN-REJECT                                      96

             10.4.10 NOTIFY                                          97

             10.4.11 REFUSE                                          98

             10.4.12 STATUS                                         100

             10.4.13 STATUS-RESPONSE                                100

             10.5 Suggested Protocol Constants                      101

             10.5.1 SCMP Messages                                   102

             10.5.2 SCMP Parameters                                 102

             10.5.3 ReasonCode                                      102

             10.5.4 Timeouts and Other Constants                    104

             10.6 Data Notations                                    105

     11 References                                                  106

     12 Security Considerations                                     108

     13 Acknowledgments and Authors' Addresses                      108

1. Introduction

1.1 What is ST2?

   The Internet Stream Protocol, Version 2 (ST2) is an experimental

   connection-oriented internetworking protocol that operates at the

   same layer as connectionless IP. It has been developed to support the

   efficient delivery of data streams to single or multiple destinations

   in applications that require guaranteed quality of service. ST2 is

   part of the IP protocol family and serves as an adjunct to, not a

   replacement for, IP. The main application areas of the protocol are

   the real-time transport of multimedia data, e.g., digital audio and

   video packet streams, and distributed simulation/gaming, across

   internets.

   ST2 can be used to reserve bandwidth for real-time streams across

   network routes. This reservation, together with appropriate network

   access and packet scheduling mechanisms in all nodes running the

   protocol, guarantees a well-defined Quality of Service (QoS) to ST2

   applications. It ensures that real-time packets are delivered within

   their deadlines, that is, at the time where they need to be

   presented. This facilitates a smooth delivery of data that is

   essential for time- critical applications, but can typically not be

   provided by best- effort IP communication.

                      DATA PATH                         CONTROL PATH

                      =========                         ============

       Upper     +------------------+                     +---------+

       Layer     | Application data |                     | Control |

                 +------------------+                     +---------+

                          |                                    |

                          |                                    V

                          |                     +-------------------+

       SCMP               |                     |   SCMP |         |

                          |                     +-------------------+

                          |                             |

                          V                             V

            +-----------------------+      +------------------------+

       ST   | ST |                  |      | ST |         |         |

            +-----------------------+      +------------------------+

            D-bit=1                       D-bit=0

                   Figure 1: ST2 Data and Control Path

   Just like IP, ST2 actually consists of two protocols: ST for the data

   transport and SCMP, the Stream Control Message Protocol, for all

   control functions. ST is simple and contains only a single PDU format

   that is designed for fast and efficient data forwarding in order to

   achieve low communication delays. SCMP, however, is more complex than

   IP's ICMP. As with ICMP and IP, SCMP packets are transferred within

   ST packets as shown in Figure 1.

    +--------------------+

    | Conference Control |

    +--------------------+

   +-------+ +-------+ |

   | Video | | Voice | | +-----+ +------+ +-----+     +-----+ Application

   | Appl | | Appl | | | SNMP| |Telnet| | FTP | ... |     |    Layer

   +-------+ +-------+ | +-----+ +------+ +-----+     +-----+

       |        |      |     |        |     |            |

       V        V      |     |        |     |            |   ------------

    +-----+ +-----+   |     |        |     |            |

    | PVP | | NVP |   |     |        |     |            |

    +-----+ +-----+   +     |        |     |            |

     |   \      | \     \    |        |     |            |

     |    +-----|--+-----+   |        |     |            |

     |     Appl.|control V V        V     V            V

     | ST data |         +-----+    +-------+        +-----+

     | & control|         | UDP |    | TCP |    ... | RTP | Transport

     |          |         +-----+    +-------+        +-----+   Layer

     |         /|          / | \       / / |          / /|

     |\       / | +------+--|--\-----+-/--|--- ... -+ / |

     | \     / | |         |   \     /   |          / |

     | \   /   | |         |    \   +----|--- ... -+   |   -----------

     |   \ /    | |         |     \ /     |             |

     |    V     | |         |      V      |             |

     | +------+ | |         |   +------+ |   +------+ |

     | | SCMP | | |         |   | ICMP | |   | IGMP | |    Internet

     | +------+ | |         |   +------+ |   +------+ |     Layer

     |    |     | |         |      |      |      |      |

     V    V     V V         V      V      V      V      V

   +-----------------+ +-----------------------------------+

   | STream protocol |->|      Internet     Protocol        |

   +-----------------+ +-----------------------------------+

                  | \   / |

                  | \ / |

                  |   X   |                                  ------------

                  | / \ |

                  | /   \ |

                  VV     VV

   +----------------+   +----------------+

   | (Sub-) Network |...| (Sub-) Network |                  (Sub-)Network

   |    Protocol    |   |    Protocol    |                     Layer

   +----------------+   +----------------+

                   Figure 2. Protocol Relationships

1.2 ST2 and IP

   ST2 is designed to coexist with IP on each node. A typical

   distributed multimedia application would use both protocols: IP for

   the transfer of traditional data and control information, and ST2 for

   the transfer of real-time data. Whereas IP typically will be accessed

   from TCP or UDP, ST2 will be accessed via new end-to-end real-time

   protocols. The position of ST2 with respect to the other protocols of

   the Internet family is represented in Figure 2.

   Both ST2 and IP apply the same addressing schemes to identify

   different hosts. ST2 and IP packets differ in the first four bits,

   which contain the internetwork protocol version number: number 5 is

   reserved for ST2 (IP itself has version number 4). As a network layer

   protocol, like IP, ST2 operates independently of its underlying

   subnets. Existing implementations use ARP for address resolution, and

   use the same Layer 2 SAPs as IP.

   As a special function, ST2 messages can be encapsulated in IP

   packets. This is represented in Figure 2 as a link between ST2 and

   IP. This link allows ST2 messages to pass through routers which do

   not run ST2. Resource management is typically not available for

   these IP route segments. IP encapsulation is, therefore, suggested

   only for portions of the network which do not constitute a system

   bottleneck.

   In Figure 2, the RTP protocol is shown as an example of transport

   layer on top of ST2. Others include the Packet Video Protocol (PVP)

   [Cole81], the Network Voice Protocol (NVP) [Cohe81], and others such

   as the Heidelberg Transport Protocol (HeiTP) [DHHS92].

1.3 Protocol History

   The first version of ST was published in the late 1970's and was used

   throughout the 1980's for experimental transmission of voice, video,

   and distributed simulation. The experience gained in these

   applications led to the development of the revised protocol version

   ST2. The revision extends the original protocol to make it more

   complete and more applicable to emerging multimedia environments. The

   specification of this protocol version is contained in Internet RFC

   1190 which was published in October 1990 [RFC1190].

   With more and more developments of commercial distributed multimedia

   applications underway and with a growing dissatisfaction at the

   transmission quality for audio and video over IP in the MBONE,

   interest in ST2 has grown over the last years. Companies have

   products available incorporating the protocol. The BERKOM MMTS

   project of the German PTT [DeAl92] uses ST2 as its core protocol for

   the provision of multimedia teleservices such as conferencing and

   mailing. In addition, implementations of ST2 for Digital Equipment,

   IBM, NeXT, Macintosh, PC, Silicon Graphics, and Sun platforms are

   available.

   In 1993, the IETF started a new working group on ST2 as part of

   ongoing efforts to develop protocols that address resource

   reservation issues. The group's mission was to clean up the existing

   protocol specification to ensure better interoperability between the

   existing and emerging implementations. It was also the goal to

   produce an updated experimental protocol specification that reflected

   the experiences gained with the existing ST2 implementations and

   applications. Which led to the specification of the ST2+ protocol

   contained in this document.

1.3.1 RFC1190 ST and ST2+ Major Differences

   The protocol changes from RFC1190 were motivated by protocol

   simplification and clarification, and codification of extensions in

   existing implementations. This section provides a list of major

   differences, and is probably of interest only to those who have

   knowledge of RFC1190. The major differences between the versions are:

o   Elimination of "Hop IDentifiers" or HIDs. HIDs added much complexity

    to the protocol and was found to be a major impediment to

    interoperability. HIDs have been replaced by globally unique

    identifiers called "Stream IDentifiers" or SIDs.

o   Elimination of a number of stream options. A number of options were

    found to not be used by any implementation, or were thought to add

    more complexity than value. These options were removed. Removed

    options include: point-to-point, full-duplex, reverse charge, and

    source route.

o   Elimination of the concept of "subset" implementations. RFC1190

    permitted subset implementations, to allow for easy implementation

    and experimentation. This led to interoperability problems. Agents

    implementing the protocol specified in this document, MUST implement

    the full protocol. A number of the protocol functions are best-

    effort. It is expected that some implementations will make more

    effort than others in satisfying particular protocol requests.

o   Clarification of the capability of targets to request to join a

    steam. RFC1190 can be interpreted to support target requests, but

    most implementors did not understand this and did not add support

    for this capability. The lack of this capability was found to be a

    significant limitation in the ability to scale the number of

    participants in a single ST stream. This clarification is based on

    work done by IBM Heidelberg.

o   Separation of functions between ST and supporting modules. An effort

    was made to improve the separation of functions provided by ST and

    those provided by other modules. This is reflected in reorganization

    of some text and some PDU formats. ST was also made FlowSpec

    independent, although it does define a FlowSpec for testing and

    interoperability purposes.

o   General reorganization and re-write of the specification. This

    document has been organized with the goal of improved readability

    and clarity. Some sections have been added, and an effort was made

    to improve the introduction of concepts.

1.4 Supporting Modules for ST2

   ST2 is one piece of a larger mosaic. This section presents the

   overall communication architecture and clarifies the role of ST2 with

   respect to its supporting modules.

   ST2 proposes a two-step communication model. In the first step, the

   real-time channels for the subsequent data transfer are built. This

   is called stream setup. It includes selecting the routes to the

   destinations and reserving the correspondent resources. In the second

   step, the data is transmitted over the previously established

   streams. This is called data transfer. While stream setup does not

   have to be completed in real-time, data transfer has stringent real-

   time requirements. The architecture used to describe the ST2

   communication model includes:

o   a data transfer protocol for the transmission of real-time data

    over the established streams,

o   a setup protocol to establish real-time streams based on the flow

    specification,

o   a flow specification to express user real-time requirements,

o   a routing function to select routes in the Internet,

o   a local resource manager to appropriately handle resources involved

    in the communication.

   This document defines a data protocol (ST), a setup protocol (SCMP),

   and a flow specification (ST2+ FlowSpec). It does not define a

   routing function and a local resource manager. However, ST2 assumes

   their existence.

   Alternative architectures are possible, see [RFC1633] for an example

   alternative architecture that could be used when implementing ST2.

1.4.1 Data Transfer Protocol

   The data transfer protocol defines the format of the data packets

   belonging to the stream. Data packets are delivered to the targets

   along the stream paths previously established by the setup protocol.

   Data packets are delivered with the quality of service associated

   with the stream.

   Data packets contain a globally unique stream identifier that

   indicates which stream they belong to. The stream identifier is also

   known by the setup protocol, which uses it during stream

   establishment. The data transfer protocol for ST2, known simply as

   ST, is completely defined by this document.

1.4.2 Setup Protocol

   The setup protocol is responsible for establishing, maintaining, and

   releasing real-time streams. It relies on the routing function to

   select the paths from the source to the destinations. At each

   host/router on these paths, it presents the flow specification

   associated with the stream to the local resource manager. This causes

   the resource managers to reserve appropriate resources for the

   stream. The setup protocol for ST2 is called Stream Control Message

   Protocol, or SCMP, and is completely defined by this document.

1.4.3 Flow Specification

   The flow specification is a data structure including the ST2

   applications' QoS requirements. At each host/router, it is used by

   the local resource manager to appropriately handle resources so that

   such requirements are met. Distributing the flow specification to all

   resource managers along the communication paths is the task of the

   setup protocol. However, the contents of the flow specification are

   transparent to the setup protocol, which simply carries the flow

   specification. Any operations on the flow specification, including

   updating internal fields and comparing flow specifications are

   performed by the resource managers.

   This document defines a specific flow specification format that

   allows for interoperability among ST2 implementations. This flow

   specification is intended to support a flow with a single

   transmission rate for all destinations in the stream. Implementations

   may support more than one flow specification format and the means are

   provided to add new formats as they are defined in the future.

   However, the flow specification format has to be consistent

   throughout the stream, i.e., it is not possible to use different flow

   specification formats for different parts of the same stream.

1.4.4 Routing Function

   The routing function is an external unicast route generation

   capability. It provides the setup protocol with the path to reach

   each of the desired destinations. The routing function is called on a

   hop-by-hop basis and provides next-hop information. Once a route is

   selected by the routing function, it persists for the whole stream

   lifetime. The routing function may try to optimize based on the

   number of targets, the requested resources, or use of local network

   multicast or bandwidth capabilities. Alternatively, the routing

   function may even be based on simple connectivity information.

   The setup protocol is not necessarily aware of the criteria used by

   the routing function to select routes. It works with any routing

   function algorithm. The algorithm adopted is a local matter at each

   host/router and different hosts/routers may use different algorithms.

   The interface between setup protocol and routing function is also a

   local matter and therefore it is not specified by this document.

   This version of ST does not support source routing. It does support

   route recording. It does include provisions that allow identification

   of ST capable neighbors. Identification of remote ST hosts/routers is

   not specifically addressed.

1.4.5 Local Resource Manager

   At each host/router traversed by a stream, the Local Resource Manager

   (LRM) is responsible for handling local resources. The LRM knows

   which resources are on the system and what capacity they can provide.

   Resources include:

o   CPUs on end systems and routers to execute the application and

    protocol software,

o   main memory space for this software (as in all real-time systems,

    code should be pinned in main memory, as swapping it out would have

    detrimental effects on system performance),

o   buffer space to store the data, e.g., communication packets, passing

    through the nodes,

o   network adapters, and

o   transmission networks between the nodes. Networks may be as simple

    as point-to-point links or as complex as switched networks such as

    Frame Relay and ATM networks.

   During stream setup and modification, the LRM is presented by the

   setup protocol with the flow specification associated to the stream.

   For each resource it handles, the LRM is expected to perform the

   following functions:

o   Stream Admission Control: it checks whether, given the flow

    specification, there are sufficient resources left to handle the new

    data stream. If the available resources are insufficient, the new

    data stream must be rejected.

o   QoS Computation: it calculates the best possible performance the

    resource can provide for the new data stream under the current

    traffic conditions, e.g., throughput and delay values are computed.

o   Resource Reservation: it reserves the resource capacities required

    to meet the desired QoS.

   During data transfer, the LRM is responsible for:

o   QoS Enforcement: it enforces the QoS requirements by appropriate

    scheduling of resource access. For example, data packets from an

    application with a short guaranteed delay must be served prior to

    data from an application with a less strict delay bound.

   The LRM may also provide the following additional functions:

o   Data Regulation: to smooth a stream's data traffic, e.g., as with the

    leaky bucket algorithm.

o   Policing: to prevent applications exceed their negotiated QoS, e.g.,

    to send data at a higher rate than indicated in the flow

    specification.

o   Stream Preemption: to free up resources for other streams with

    higher priority or importance.

   The strategies adopted by the LRMs to handle resources are resource-

   dependent and may vary at every host/router. However, it is necessary

   that all LRMs have the same understanding of the flow specification.

   The interface between setup protocol and LRM is a local matter at

   every host and therefore it is not specified by this document. An

   example of LRM is the Heidelberg Resource Administration Technique

   (HeiRAT) [VoHN93].

   It is also assumed that the LRM provides functions to compare flow

   specifications, i.e., to decide whether a flow specification requires

   a greater, equal, or smaller amount of resource capacities to be

   reserved.

1.5 ST2 Basic Concepts

   The following sections present at an introductory level some of the

   fundamental ST2 concepts including streams, data transfer, and flow

   specification.

            Hosts Connections...                :      ...and Streams

            ====================                :      ==============

        data       Origin                       :          Origin

       packets +-----------+                    :          +----+

          +----|Application|                    :          |    |

          |    |-----------|                    :          +----+

          +--->| ST Agent |                    :           | |

               +-----------+                    :           | |

                     |                          :           | |

                     V                          :           | |

              +-------------+                   :           | |

              |             |                   :           | |

+-------------| Network A |                   :   +-------+ +--+

|             |             |                   :   |             |

|             +-------------+                   :   |     Target 2|

|                    |     Target 2             :   |     & Router|

|     Target 1       |    and Router            :   |             |

| +-----------+     | +-----------+           :   V             V

| |Application|<-+ | |Application|<-+        : +----+        +----+

| |-----------| | | |-----------| |        : |    |        |    |

+->| ST Agent |--+ +->| ST Agent |--+        : +----+        +----+

   +-----------+        +-----------+           :Target 1         | |

                              |                 :                 | |

                              V                 :                 | |

                    +-------------+             :                 | |

                    |             |             :                 | |

      +-------------| Network B |             :           +-----+ |

      |             |             |             :           |        |

      |             +-------------+             :           |        |

      |    Target 3        |    Target 4        :           |        |

      | +-----------+     | +-----------+     :           V        V

      | |Application|<-+ | |Application|<-+ :         +----+ +----+

      | |-----------| | | |-----------| | :         |    | |    |

      +->| ST Agent |--+ +->| ST Agent |--+ :         +----+ +----+

         +-----------+        +-----------+     :      Target 3 Target 4

                                                :

                         Figure 3: The Stream Concept

1.5.1 Streams

   Streams form the core concepts of ST2. They are established between a

   sending origin and one or more receiving targets in the form of a

   routing tree. Streams are uni-directional from the origin to the

   targets. Nodes in the tree represent so-called ST agents, entities

   executing the ST2 protocol; links in the tree are called hops. Any

   node in the middle of the tree is called an intermediate agent, or

   router. An agent may have any combination of origin, target, or

   intermediate capabilities.

   Figure 3 illustrates a stream from an origin to four targets, where

   the ST agent on Target 2 also functions as an intermediate agent. Let

   us use this Target 2/Router node to explain some basic ST2

   terminology: the direction of the stream from this node to Target 3

   and 4 is called downstream, the direction towards the Origin node

   upstream. ST agents that are one hop away from a given node are

   called previous-hops in the upstream, and next-hops in the downstream

   direction.

   Streams are maintained using SCMP messages. Typical SCMP messages are

   CONNECT and ACCEPT to build a stream, DISCONNECT and REFUSE to close

   a stream, CHANGE to modify the quality of service associated with a

   stream, and JOIN to request to be added to a stream.

   Each ST agent maintains state information describing the streams

   flowing through it. It can actively gather and distribute such

   information. It can recognize failed neighbor ST agents through the

   use of periodic HELLO message exchanges. It can ask other ST agents

   about a particular stream via a STATUS message. These ST agents then

   send back a STATUS-RESPONSE message. NOTIFY messages can be used to

   inform other ST agents of significant events.

   ST2 offers a wealth of functionalities for stream management. Streams

   can be grouped together to minimize allocated resources or to process

   them in the same way in case of failures. During audio conferences,

   for example, only a limited set of participants may talk at once.

   Using the group mechanism, resources for only a portion of the audio

   streams of the group need to be reserved. Using the same concept, an

   entire group of related audio and video streams can be dropped if one

   of them is preempted.

1.5.2 Data Transmission

   Data transfer in ST2 is simplex in the downstream direction. Data

   transport through streams is very simple. ST2 puts only a small

   header in front of the user data. The header contains a protocol

   identification that distinguishes ST2 from IP packets, an ST2 version

   number, a priority field (specifying a relative importance of streams

   in cases of conflict), a length counter, a stream identification, and

   a checksum. These elements form a 12-byte header.

   Efficiency is also achieved by avoiding fragmentation and reassembly

   on all agents. Stream establishment yields a maximum message size for

   data packets on a stream. This maximum message size is communicated

   to the upper layers, so that they provide data packets of suitable

   size to ST2.

   Communication with multiple next-hops can be made even more efficient

   using MAC Layer multicast when it is available. If a subnet supports

   multicast, a single multicast packet is sufficient to reach all

   next-hops connected to this subnet. This leads to a significant

   reduction of the bandwidth requirements of a stream. If multicast is

   not provided, separate packets need to be sent to each next-hop.

   As ST2 relies on reservation, it does not contain error correction

   mechanisms features for data exchange such as those found in TCP. It

   is assumed that real-time data, such as digital audio and video,

   require partially correct delivery only. In many cases, retransmitted

   packets would arrive too late to meet their real-time delivery

   requirements. Also, depending on the data encoding and the particular

   application, a small number of errors in stream data are acceptable.

   In any case, reliability can be provided by layers on top of ST2 when

   needed.

1.5.3 Flow Specification

   As part of establishing a connection, SCMP handles the negotiation of

   quality-of-service parameters for a stream. In ST2 terminology, these

   parameters form a flow specification (FlowSpec) which is associated

   with the stream. Different versions of FlowSpecs exist, see

   [RFC1190], [DHHS92] and [RFC1363], and can be distinguished by a

   version number. Typically, they contain parameters such as average

   and maximum throughput, end-to-end delay, and delay variance of a

   stream. SCMP itself only provides the mechanism for relaying the

   quality-of-service parameters.

   Three kinds of entities participate in the quality-of-service

   negotiation: application entities on the origin and target sites as

   the service users, ST agents, and local resource managers (LRM). The

   origin application supplies the initial FlowSpec requesting a

   particular service quality. Each ST agent which obtains the FlowSpec

   as part of a connection establishment message, it presents the local

   resource manager with it. ST2 does not determine how resource

   managers make reservations and how resources are scheduled according

   to these reservations; ST2, however, assumes these mechanisms as its

   basis.

   An example of the FlowSpec negotiation procedure is illustrated in

   Figure 4. Depending on the success of its local reservations, the LRM

   updates the FlowSpec fields and returns the FlowSpec to the ST agent,

   which passes it downstream as part of the connection message.

   Eventually, the FlowSpec is communicated to the application at the

   target which may base its accept/reject decision for establishing the

   connection on it and may finally also modify the FlowSpec. If a

   target accepts the connection, the (possibly modified) FlowSpec is

   propagated back to the origin which can then calculate an overall

   service quality for all targets. The application entity at the origin

   may later request a CHANGE to adjust reservations.

                 Origin                 Router               Target 1

                +------+      1a       +------+      1b      +------+

                |      |-------------->|      |------------->|      |

                +------+               +------+              +------+

                 ^ | ^                                          |

                 | | |                    2                     |

                 | | +------------------------------------------+

                 + +

+-------------+ \ \             +-------------+       +-------------+

|Max Delay: 12|   \ \            |Max Delay: 12|       |Max Delay: 12|

|-------------|    \ \           |-------------|       |-------------|

|Min Delay: 2|     \ \          |Min Delay: 5|       |Min Delay: 9|

|-------------|      \ \         |-------------|       |-------------|

|Max Size:4096|       + +        |Max Size:2048|       |Max Size:2048|

+-------------+       | |        +-------------+       +-------------+

    FlowSpec           | | 1

                       | +---------------+

                       |                  |

                       |                  V

                     2 |               +------+

                       +---------------|      |

                                       +------+

                                       Target 2

                                   +-------------+

                                   |Max Delay: 12|

                                   |-------------|

                                   |Min Delay: 4|

                                   |-------------|

                                   |Max Size:4096|

                                   +-------------+

        Figure 4: Quality-of-Service Negotiation with FlowSpecs

1.6 Outline of This Document

   This document contains the specification of the ST2+ version of the

   ST2 protocol. In the rest of the document, whenever the terms "ST" or

   "ST2" are used, they refer to the ST2+ version of ST2.

   The document is organized as follows:

o   Section 2 describes the ST2 user service from an application point

    of view.

o   Section 3 illustrates the ST2 data transfer protocol, ST.

o   Section 4 through Section 8 specify the ST2 setup protocol, SCMP.

o   the ST2 flow specification is presented in Section 9.

o   the formats of protocol elements and PDUs are defined in Section 10.

2. ST2 User Service Description

   This section describes the ST user service from the high-level point

   of view of an application. It defines the ST stream operations and

   primitive functions. It specifies which operations on streams can be

   invoked by the applications built on top of ST and when the ST

   primitive functions can be legally executed. Note that the presented

   ST primitives do not specify an API. They are used here with the only

   purpose of illustrating the service model for ST.

2.1 Stream Operations and Primitive Functions

   An ST application at the origin may create, expand, reduce, change,

   send data to, and delete a stream. When a stream is expanded, new

   targets are added to the stream; when a stream is reduced, some of

   the current targets are dropped from it. When a stream is changed,

   the associated quality of service is modified.

   An ST application at the target may join, receive data from, and

   leave a stream. This translates into the following stream operations:

o   OPEN: create new stream [origin], CLOSE: delete stream [origin],

o   ADD: expand stream, i.e., add new targets to it [origin],

o   DROP: reduce stream, i.e., drop targets from it [origin],

o   JOIN: join a stream [target], LEAVE: leave a stream [target],

o   DATA: send data through stream [origin],

o   CHG: change a stream's QoS [origin],

   Each stream operation may require the execution of several primitive

   functions to be completed. For instance, to open a new stream, a

   request is first issued by the sender and an indication is generated

   at one or more receivers; then, the receivers may each accept or

   refuse the request and the correspondent indications are generated at

   the sender. A single receiver case is shown in Figure 5 below.

                Sender             Network             Receiver

                  |                   |                   |

     OPEN.req     |                   |                   |

                  |-----------------> |                   |

                  |                   |-----------------> |

                  |                   |                   | OPEN.ind

                  |                   |                   | OPEN.accept

                  |                   |<----------------- |

                  |<----------------- |                   |

OPEN.accept-ind |                   |                   |

                  |                   |                   |

           Figure 5: Primitives for the OPEN Stream Operation

   Table 1 defines the ST service primitive functions associated to each

   stream operation. The column labelled "O/T" indicates whether the

   primitive is executed at the origin or at the target.

           +===================================================+

           |Primitive      | Descriptive                   |O/T|

           |===================================================|

           |OPEN.req       | open a stream                 | O |

           |OPEN.ind       | connection request indication | T |

           |OPEN.accept    | accept stream                 | T |

           |OPEN.refuse    | refuse stream                 | T |

           |OPEN.accept-ind| connection accept indication | O |

           |OPEN.refuse-ind| connection refuse indication | O |

           |ADD.req        | add targets to stream         | O |

           |ADD.ind        | add request indication        | T |

           |ADD.accept     | accept stream                 | T |

           |ADD.refuse     | refuse stream                 | T |

           |ADD.accept-ind | add accept indication         | O |

           |ADD.refuse-ind | add refuse indication         | O |

           |JOIN.req       | join a stream                 | T |

           |JOIN.ind       | join request indication       | O |

           |JOIN.reject    | reject a join                 | O |

           |JOIN.reject-ind| join reject indication        | T |

           |DATA.req       | send data                     | O |

           |DATA.ind       | receive data indication       | T |

           |CHG.req        | change stream QoS             | O |

           |CHG.ind        | change request indication     | T |

           |CHG.accept     | accept change                 | T |

           |CHG.refuse     | refuse change                 | T |

           |CHG.accept-ind | change accept indication      | O |

           |CHG.refuse-ind | change refuse indication      | O |

           |DROP.req       | drop targets                  | O |

           |DROP.ind       | disconnect indication         | T |

           |LEAVE.req      | leave stream                  | T |

           |LEAVE.ind      | leave stream indication       | O |

           |CLOSE.req      | close stream                  | O |

           |CLOSE.ind      | close stream indication       | T |

           +---------------------------------------------------+

                              Table 1: ST Primitives

2.2 State Diagrams

   It is not sufficient to define the set of ST stream operations. It is

   also necessary to specify when the operations can be legally

   executed. For this reason, a set of states is now introduced and the

   transitions from one state to the others are specified. States are

   defined with respect to a single stream. The previously defined

   stream operations can be legally executed only from an appropriate

   state.

   An ST agent may, with respect to an ST stream, be in one of the

   following states:

o   IDLE: the stream has not been created yet.

o   PENDING: the stream is in the process of being established.

o   ACTIVE: the stream is established and active.

o   ADDING: the stream is established. A stream expansion is underway.

o   CHGING: the stream is established. A stream change is underway.

   Previous experience with ST has lead to limits on stream operations

   that can be executed simultaneously. These restrictions are:

   1. A single ADD or CHG operation can be processed at one time. If

       an ADD or CHG is already underway, further requests are queued

       by the ST agent and handled only after the previous operation

       has been completed. This also applies to two subsequent

       requests of the same kind, e.g., two ADD or two CHG operations.

       The second operation is not executed until the first one has

       been completed.

   2. Deleting a stream, leaving a stream, or dropping targets from a

       stream is possible only after stream establishment has been

       completed. A stream is considered to be established when all

       the next-hops of the origin have either accepted or refused the

       stream. Note that stream refuse is automatically forced after

       timeout if no reply comes from a next-hop.

   3. An ST agent forwards data only along already established paths

       to the targets, see also Section 3.1. A path is considered to

       be established when the next-hop on the path has explicitly

       accepted the stream. This implies that the target and all other

       intermediate ST agents are ready to handle the incoming data

       packets. In no cases an ST agent will forward data to a

       next-hop ST agent that has not explicitly accepted the stream.

       To be sure that all targets receive the data, an application

       should send the data only after all paths have been

       established, i.e., the stream is established.

   4. It is allowed to send data from the CHGING and ADDING states.

       While sending data from the CHGING state, the quality of

       service to the targets affected by the change should be assumed

       to be the more restrictive quality of service. When sending

       data from the ADDING state, the targets that receive the data

       include at least all the targets that were already part of the

       stream at the time the ADD operation was invoked.

   The rules introduced above require ST agents to queue incoming

   requests when the current state does not allow to process them

   immediately. In order to preserve the semantics, ST agents have to

   maintain the order of the requests, i.e., implement FIFO queuing.

   Exceptionally, the CLOSE request at the origin and the LEAVE request

   at the target may be immediately processed: in these cases, the queue

   is deleted and it is possible that requests in the queue are not

   processed.

   The following state diagrams define the ST service. Separate diagrams

   are presented for the origin and the targets.

   The symbol (a/r)* indicates that all targets in the target list have

   explicitly accepted or refused the stream, or refuse has been forced

   after timeout. If the target list is empty, i.e., it contains no

   targets, the (a/r)* condition is immediately satisfied, so the empty

   stream is created and state ESTBL is entered.

   The separate OPEN and ADD primitives at the target are for conceptual

   purposes only. The target is actually unable to distinguish between

   an OPEN and an ADD. This is reflected in Figure 7 and Table 3 through

   the notation OPEN/ADD.

                        +------------+

                        |            |<-------------------+

            +---------->|    IDLE    |-------------+      |

            |           |            |    OPEN.req |      |

            |           +------------+             |      |

CLOSE.req |      CLOSE.req ^   ^ CLOSE.req       V      | CLOSE.req

            |                |   |            +---------+ |

            |                |   |            | PENDING |-|-+ JOIN.reject

            |                |   -------------|         |<|-+

            |    JOIN.reject |                +---------+ |

            |    DROP.req +----------+             |      |

            |       +-----|          |             |      |

            |       |     | ESTDL   | OPEN.(a/r)* |      |

            |       +---->|          |<------------+      |

            |             +----------+                    |

            |              | ^ | ^                     |

            |              | | | |                     |

       +----------+ CHG.req| | | | Add.(a/r)*    +----------+

       |          |<-------+ | | +-------------- |          |

       | CHGING |           | |                  | ADDING |

       |          |-----------+ +----------------->|          |

       +----------+ CHG.(a/r)*         JOIN.ind     +----------+

           |   ^                         ADD.req        |   ^

           |   |                                        |   |

           +---+                                        +---+

           DROP.req                                    DROP.req

           JOIN.reject                                 JOIN.reject

                  Figure 6: ST Service at the Origin

                 +--------+

                 |        |-----------------------+

                 | IDLE |                       |

                 |        |<---+                  | OPEN/ADD.ind

                 +--------+    | CLOSE.ind        | JOIN.req

                     ^         | OPEN/ADD.refuse |

                     |         | JOIN.refect-ind |

         CLOSE.ind   |         |                  V

         DROP.ind    |         |             +---------+

         LEAVE.req   |         +-------------|         |

                     |                       | PENDING |

                 +-------+                   |         |

                 |       |                   +---------+

                 | ESTBL |    OPEN/ADD.accept     |

                 |       |<-----------------------+

                 +-------+

                     Figure 7: ST Service at the Target

2.3 State Transition Tables

   Table 2 and Table 3 define which primitives can be processed from

   which states and the possible state transitions.

+======================================================================+

|Primitive      |IDLE|    PENDING    | ESTBL |    CHGING |    ADDING |

|======================================================================|

|OPEN.req       | ok | -             | -      | -          | -         |

|OPEN.accept-ind| - |if(a,r)*->ESTBL| -      | -          | -         |

|OPEN.refuse-ind| - |if(a,r)*->ESTBL| -      | -          | -         |

|ADD.req        | - | queued        |->ADDING| queued     | queued    |

|ADD.accept-ind | - | -             | -      | -          |if(a,r)*   |

|               | - | -             | -      | -          |->ESTBL    |

|ADD.refuse-ind | - | -             | -      | -          |if(a,r)*   |

|               | - | -             | -      | -          |->ESTBL    |

|JOIN.ind       | - | queued        |->ADDING| queued     |queued     |

|JOIN.reject    | - | OK            | ok     | ok         | ok        |

|DATA.req       | - | -             | ok     | ok         | ok        |

|CHG.req        | - | queued        |->CHGING| queued     |queued     |

|CHG.accept-ind | - | -             | -      |if(a,r)*    | -         |

|               | - | -             | -      |->ESTBL     | -         |

|CHG.refuse.ind | - | -             | -      |if(a,r)*    | -         |

|               | - | -             | -      |->ESTBL     | -         |

|DROP.req       | - | -             | ok     | ok         | ok        |

|LEAVE.ind      | - | OK            | ok     | ok         | ok        |

|CLOSE.req      | - | OK            | ok     | ok         | ok        |

+----------------------------------------------------------------------+

                Table 2: Primitives and States at the Origin

             +======================================================+

             | Primitive       |   IDLE    | PENDING   |   ESTBL   |

             |======================================================|

             | OPEN/ADD.ind    | ->PENDING | -          | -         |

             | OPEN/ADD.accept | -         | ->ESTBL    | -         |

             | OPEN/ADD.refuse | -         | ->IDLE     | -         |

             | JOIN.req        | ->PENDING | -          | -         |

             | JOIN.reject-ind |-          | ->IDLE     | -         |

             | DATA.ind        | -         | -          | ok        |

             | CHG.ind         | -         | -          | ok        |

             | CHG.accept      | -         | -          | ok        |

             | DROP.ind        | -         | ok         | ok        |

             | LEAVE.req       | -         | ok         | ok        |

             | CLOSE.ind       | -         | ok         | ok        |

             | CHG.ind         | -         | -          | ok        |

             +------------------------------------------------------+

                Table 3: Primitives and States at the Target

3. The ST2 Data Transfer Protocol

   This section presents the ST2 data transfer protocol, ST. First, data

   transfer is described in Section 3.1, then, the data transfer

   protocol functions are illustrated in Section 3.2.

3.1 Data Transfer with ST

   Data transmission with ST is unreliable. An application is not

   guaranteed that the data reaches its destinations and ST makes no

   attempts to recover from packet loss, e.g., due to the underlying

   network. However, if the data reaches its destination, it should do

   so according to the quality of service associated with the stream.

   Additionally, ST may deliver data corrupted in transmission. Many

   types of real-time data, such as digital audio and video, require

   partially correct delivery only. In many cases, retransmitted packets

   would arrive too late to meet their real-time delivery requirements.

   On the other hand, depending on the data encoding and the particular

   application, a small number of errors in stream data are acceptable.

   In any case, reliability can be provided by layers on top of ST2 if

   needed.

   Also, no data fragmentation is supported during the data transfer

   phase. The application is expected to segment its data PDUs according

   to the minimum MTU over all paths in the stream. The application

   receives information on the MTUs relative to the paths to the targets

   as part of the ACCEPT message, see Section 8.6. The minimum MTU over

   all paths can be calculated from the MTUs relative to the single

   paths. ST agents silently discard too long data packets, see also

   Section 5.1.1.

   An ST agent forwards the data only along already established paths to

   targets. A path is considered to be established once the next-hop ST

   agent on the path sends an ACCEPT message, see Section 2.2. This

   implies that the target and all other intermediate ST agents on the

   path to the target are ready to handle the incoming data packets. In

   no cases will an ST agent forward data to a next-hop ST agent that

   has not explicitly accepted the stream.

   To be reasonably sure that all targets receive the data with the

   desired quality of service, an application should send the data only

   after the whole stream has been established. Depending on the local

   API, an application may not be prevented from sending data before the

   completion of stream setup, but it should be aware that the data

   could be lost or not reach all intended targets. This behavior may

   actually be desirable to applications, such as those application that

   have multiple targets which can each process data as soon as it is

   available (e.g., a lecture or distributed gaming).

   It is desirable for implementations to take advantage of networks

   that support multicast. If a network does not support multicast, or

   for the case where the next-hops are on different networks, multiple

   copies of the data packet must be sent.

3.2 ST Protocol Functions

   The ST protocol provides two functions:

   o   stream identification

   o   data priority

3.2.1 Stream Identification

   ST data packets are encapsulated by an ST header containing the

   Stream IDentifier (SID). This SID is selected at the origin so that

   it is globally unique over the Internet. The SID must be known by the

   setup protocol as well. At stream establishment time, the setup

   protocol builds, at each agent traversed by the stream, an entry into

   its local database containing stream information. The SID can be used

   as a reference into this database, to obtain quickly the necessary

   replication and forwarding information.

   Stream IDentifiers are intended to be used to make the packet

   forwarding task most efficient. The time-critical operation is an

   intermediate ST agent receiving a packet from the previous-hop ST

   agent and forwarding it to the next-hop ST agents.

   The format of data PDUs including the SID is defined in Section 10.1.

   Stream IDentifier generation is discussed in Section 8.1.

3.2.2 Packet Discarding based on Data Priority

   ST provides a well defined quality of service to its applications.

   However, there may be cases where the network is temporarily

   congested and the ST agents have to discard certain packets to

   minimize the overall impact to other streams. The ST protocol

   provides a mechanism to discard data packets based on the Priority

   field in the data PDU, see Section 10.1. The application assigns each

   data packet with a discard-priority level, carried into the Priority

   field. ST agents will attempt to discard lower priority packets first

   during periods of network congestion. Applications may choose to send

   data at multiple priority levels so that less important data may be

   discarded first.

4. SCMP Functional Description

   ST agents create and manage streams using the ST Control Message

   Protocol (SCMP). Conceptually, SCMP resides immediately above ST (as

   does ICMP above IP). SCMP follows a request-response model. SCMP

   messages are made reliable through the use of retransmission after

   timeout.

   This section contains a functional description of stream management

   with SCMP. To help clarify the SCMP exchanges used to setup and

   maintain ST streams, we include an example of a simple network

   topology, represented in Figure 8. Using the SCMP messages described

   in this section it will be possible for an ST application to:

   o   Create a stream from A to the peers at B, C and D,

   o   Add a peer at E,

   o   Drop peers B and C, and

   o   Let F join the stream

   o   Delete the stream.

                                               +---------+    +---+

                                               |         |----| B |

               +---------+      +----------+   |         |    +---+

               |         |------| Router 1 |---| Subnet2 |

               |         |      +----------+   |         |

               |         |                     |         |

               |         |                     +---------+

               |         |                         |

               | Subnet1 |                         |

               |         |                     +----------+

               |         |                     | Router 3 |

       +---+   |         |                     +----------+

       | A |---|         |    +----------+           |

       +---+   |         |----| Router 2 |           |

               |         |    +----------+           |

               +---------+         |                 |

                                   |                 |

                                   |          +----------+    +---+

                                   +----------|          |----| C |

                                              |          |    +---+

                         +---------+          | Subnet3 |

                 +---+   |         |   +---+ |          |    +---+

                 | F |---| Subnet4 |---| E |--|          |----| D |

                 +---+   |         |   +---+ +----------+    +---+

                         +---------+

                Figure 8: Sample Topology for an ST Stream

   We first describe the possible types of stream in Section 4.1;

   Section 4.2 introduces SCMP control message types; SCMP reliability

   is discussed in Section 4.3; stream options are covered in Section

   4.4; stream setup is presented in Section 4.5; Section 4.6

   illustrates stream modification including stream expansion,

   reduction, changes of the quality of service associated to a stream.

   Finally, stream deletion is handled in Section 4.7.

4.1 Types of Streams

   SCMP allows for the setup and management of different types of

   streams. Streams differ in the way they are built and the information

   maintained on connected targets.

4.1.1 Stream Building

   Streams may be built in a sender-oriented fashion, receiver-oriented

   fashion, or with a mixed approach:

o   in the sender-oriented fashion, the application at the origin

    provides the ST agent with the list of receivers for the stream. New

    targets, if any, are also added from the origin.

o   in the receiver-oriented approach, the application at the origin

    creates an empty stream that contains no targets. Each target then

    joins the stream autonomously.

o   in the mixed approach, the application at the origin creates a

    stream that contains some targets and other targets join the stream

    autonomously.

   ST2 provides stream options to support sender-oriented and mixed

   approach steams. Receiver-oriented streams can be emulated through

   the use of mixed streams. The fashion by which targets may be added

   to a particular stream is controlled via join authorization levels.

   Join authorization levels are described in Section 4.4.2.

4.1.2 Knowledge of Receivers

   When streams are built in the sender-oriented fashion, all ST agents

   will have full information on all targets down stream of a particular

   agent. In this case, target information is relayed down stream from

   agent-to-agent during stream set-up.

   When targets add themselves to mixed approach streams, upstream ST

   agents may or may not be informed. Propagation of information on

   targets that "join" a stream is also controlled via join

   authorization levels. As previously mentioned, join authorization

   levels are described in Section 4.4.2.

   This leads to two types of streams:

o   full target information is propagated in a full-state stream. For

    such streams, all agents are aware of all downstream targets

    connected to the stream. This results in target information being

    maintained at the origin and at intermediate agents. Operations on

    single targets are always possible, i.e., change a certain target,

    or, drop that target from the stream. It is also always possible for

    any ST agent to attempt recovery of all downstream targets.

o   in light-weight streams, it is possible that the origin and other

    upstream agents have no knowledge about some targets. This results

    in less maintained state and easier stream management, but it limits

    operations on specific targets. Special actions may be required to

    support change and drop operations on unknown targets, see Section

    5.7. Also, stream recovery may not be possible. Of course, generic

    functions such as deleting the whole stream, are still possible. It

    is expected that applications that will have a large number of

    targets will use light-weight streams in order to limit state in

    agents and the number of targets per control message.

   Full-state streams serve well applications as video conferencing or

   distributed gaming, where it is important to have knowledge on the

   connected receivers, e.g., to limit who participates. Light-weight

   streams may be exploited by applications such as remote lecturing or

   playback applications of radio and TV broadcast where the receivers

   do not need to be known by the sender. Section 4.4.2 defines join

   authorization levels, which support two types of full-state streams

   and one type of light-weight stream.

4.2 Control PDUs

   SCMP defines the following PDUs (the main purpose of each PDU is also

   indicated):

1.      ACCEPT          to accept a new stream

2.      ACK             to acknowledge an incoming message

3.      CHANGE          to change the quality of service associated with

                                a stream

4.      CONNECT         to establish a new stream or add new targets to

                                an existing stream

5.      DISCONNECT      to remove some or all of the stream's targets

6.      ERROR           to indicate an error contained in an incoming

                                message

7.      HELLO           to detect failures of neighbor ST agents

8.      JOIN            to request stream joining from a target

9.      JOIN-REJECT     to reject a stream joining request from a target

10.     NOTIFY          to inform an ST agent of a significant event

11.     REFUSE          to refuse the establishment of a new stream

12.     STATUS          to query an ST agent on a specific stream

13.     STATUS-RESPONSE to reply queries on a specific stream

   SCMP follows a request-response model with all requests expecting

   responses. Retransmission after timeout is used to allow for lost or

   ignored messages. Control messages do not extend across packet

   boundaries; if a control message is too large for the MTU of a hop,

   its information is partitioned and a control message per partition is

   sent, as described in Section 5.1.2.

   CONNECT and CHANGE request messages are answered with ACCEPT messages

   which indicate success, and with REFUSE messages which indicate

   failure. JOIN messages are answered with either a CONNECT message

   indicating success, or with a JOIN-REJECT message indicating failure.

   Targets may be removed from a stream by either the origin or the

   target via the DISCONNECT and REFUSE messages.

   The ACCEPT, CHANGE, CONNECT, DISCONNECT, JOIN, JOIN-REJECT, NOTIFY

   and REFUSE messages must always be explicitly acknowledged:

o   with an ACK message, if the message was received correctly and it

    was possible to parse and correctly extract and interpret its

    header, fields and parameters,

o   with an ERROR message, if a syntax error was detected in the header,

    fields, or parameters included in the message. The errored PDU may

    be optionally returned as part of the ERROR message. An ERROR

    message indicates a syntax error only. If any other errors are

    detected, it is necessary to first acknowledge with ACK and then

    take appropriate actions. For instance, suppose a CHANGE message

    contains an unknown SID: first, an ACK message has to be sent, then

    a REFUSE message with ReasonCode (SIDUnknown) follows.

   If no ACK or ERROR message are received before the correspondent

   timer expires, a timeout failure occurs. The way an ST agent should

   handle timeout failures is described in Section 5.2.

   ACK, ERROR, and STATUS-RESPONSE messages are never acknowledged.

   HELLO messages are a special case. If they contain a syntax error, an

   ERROR message should be generated in response. Otherwise, no

   acknowledgment or response should be generated. Use of HELLO messages

   is discussed in Section 6.1.2.

   STATUS messages containing a syntax error should be answered with an

   ERROR message. Otherwise, a STATUS-RESPONSE message should be sent

   back in response. Use of STATUS and STATUS-RESPONSE are discussed in

   Section 8.4.

4.3 SCMP Reliability

   SCMP is made reliable through the use of retransmission when a

   response is not received in a timely manner. The ACCEPT, CHANGE,

   CONNECT, DISCONNECT, JOIN, JOIN-REJECT, NOTIFY, and REFUSE messages

   all must be answered with an ACK message, see Section 4.2. In

   general, when sending a SCMP message which requires an ACK response,

   the sending ST agent needs to set the Toxxxx timer (where xxxx is the

   SCMP message type, e.g., ToConnect). If it does not receive an ACK

   before the Toxxxx timer expires, the ST agent should retransmit the

   SCMP message. If no ACK has been received within Nxxxx

   retransmissions, then a SCMP timeout condition occurs and the ST

   agent enters its SCMP timeout recovery state. The actions performed

   by the ST agent as the result of the SCMP timeout condition differ

   for different SCMP messages and are described in Section 5.2.

   For some SCMP messages (CONNECT, CHANGE, JOIN, and STATUS) the

   sending ST agent also expects a response back (ACCEPT/REFUSE,

   CONNECT/JOIN- REJECT) after ACK has been received. For these cases,

   the ST agent needs to set the ToxxxxResp timer after it receives the

   ACK. (As before, xxxx is the initiating SCMP message type, e.g.,

   ToConnectResp). If it does not receive the appropriate response back

   when ToxxxxResp expires, the ST agent updates its state and performs

   appropriate recovery action as described in Section 5.2. Suggested

   constants are given in Section 10.5.4.

   The timeout and retransmission algorithm is implementation dependent

   and it is outside the scope of this document. Most existing

   algorithms are based on an estimation of the Round Trip Time (RTT)

   between two agents. Therefore, SCMP contains a mechanism, see Section

   8.5, to estimate this RTT. Note that the timeout related variable

   names described above are for reference purposes only, implementors

   may choose to combine certain variables.

4.4 Stream Options

   An application may select among some stream options. The desired

   options are indicated to the ST agent at the origin when a new stream

   is created. Options apply to single streams and are valid during the

   whole stream's lifetime. The options chosen by the application at the

   origin are included into the initial CONNECT message, see Section

   4.5.3. When a CONNECT message reaches a target, the application at

   the target is notified of the stream options that have been selected,

   see Section 4.5.5.

4.4.1 No Recovery

   When a stream failure is detected, an ST agent would normally attempt

   stream recovery, as described in Section 6.2. The NoRecovery option

   is used to indicate that ST agents should not attempt recovery for

   the stream. The protocol behavior in the case that the NoRecovery

   option has been selected is illustrated in Section 6.2. The

   NoRecovery option is specified by setting the S-bit in the CONNECT

   message, see Section 10.4.4. The S-bit can be set only by the origin

   and it is never modified by intermediate and target ST agents.

4.4.2 Join Authorization Level

   When a new stream is created, it is necessary to define the join

   authorization level associated with the stream. This level determines

   the protocol behavior in case of stream joining, see Section 4.1 and

   Section 4.6.3. The join authorization level for a stream is defined

   by the J-bit and N-bit in the CONNECT message header, see Section

   10.4.4. One of the following authorization levels has to be

   selected:

   o   Level 0 - Refuse Join (JN = 00): No targets are allowed to join this

       stream.

   o   Level 1 - OK, Notify Origin (JN = 01): Targets are allowed to join

       the stream. The origin is notified that the target has joined.

   o   Level 2 - OK (JN = 10): Targets are allowed to join the stream. No

       notification is sent to the stream origin.

   Some applications may choose to maintain tight control on their

   streams and will not permit any connections without the origin's

   permission. For such streams, target applications may request to be

   added by sending an out-of-band, i.e., via regular IP, request to the

   origin. The origin, if it so chooses, can then add the target

   following the process described in Section 4.6.1.

   The selected authorization level impacts stream handling and the

   state that is maintained for the stream, as described in Section 4.1.

4.4.3 Record Route

   The RecordRoute option can be used to request the route between the

   origin and a target be recorded and delivered to the application.

   This option may be used while connecting, accepting, changing, or

   refusing a stream. The results of a RecordRoute option requested by

   the origin, i.e., as part of the CONNECT or CHANGE messages, are

   delivered to the target. The results of a RecordRoute option

   requested by the target, i.e., as part of the ACCEPT or REFUSE

   messages, are delivered to the origin.

   The RecordRoute option is specified by adding the RecordRoute

   parameter to the mentioned SCMP messages. The format of the

   RecordRoute parameter is shown in Section 10.3.5. When adding this

   parameter, the ST agent at the origin must determine the number of

   entries that may be recorded as explained in Section 10.3.5.

4.4.4 User Data

   The UserData option can be used by applications to transport

   application specific data along with some SCMP control messages. This

   option can be included with ACCEPT, CHANGE, CONNECT, DISCONNECT, and

   REFUSE messages. The format of the UserData parameter is shown in

   Section 10.3.7. This option may be included by the origin, or the

   target, by adding the UserData parameter to the mentioned SCMP

   messages. This option may only be included once per SCMP message.

4.5 Stream Setup

   This section presents a description of stream setup. For simplicity,

   we assume that everything succeeds, e.g., any required resources are

   available, messages are properly delivered, and the routing is

   correct. Possible failures in the setup phase are handled in Section

   5.2.

4.5.1 Information from the Application

   Before stream setup can be started, the application has to collect

   the necessary information to determine the characteristics for the

   connection. This includes identifying the participants and selecting

   the QoS parameters of the data flow. Information passed to the ST

   agent by the application includes:

o   the list of the stream's targets (Section 10.3.6). The list may be

    empty (Section 4.5.3.1),

o   the flow specification containing the desired quality of service for

    the stream (Section 9),

o   information on the groups in which the stream is a member, if any

    (Section 7),

o   information on the options selected for the stream (Section 4.4).

4.5.2 Initial Setup at the Origin

   The ST agent at the origin then performs the following operations:

o   allocates a stream ID (SID) for the stream (Section 8.1),

o   invokes the routing function to determine the set of next-hops for

    the stream (Section 4.5.2.1),

o   invokes the Local Resource Manager (LRM) to reserve resources

    (Section 4.5.2.2),

o   creates local database entries to store information on the new

    stream,

o   propagates the stream creation request to the next-hops determined

    by the routing function (Section 4.5.3).

4.5.2.1 Invoking the Routing Function

   An ST agent that is setting up a stream invokes the routing function

   to find the next-hop to reach each of the targets specified by the

   target list provided by the application. This is similar to the

   routing decision in IP. However, in this case the route is to a

   multitude of targets with QoS requirements rather than to a single

   destination.

   The result of the routing function is a set of next-hop ST agents.

   The set of next-hops selected by the routing function is not

   necessarily the same as the set of next-hops that IP would select

   given a number of independent IP datagrams to the same destinations.

   The routing algorithm may attempt to optimize parameters other than

   the number of hops that the packets will take, such as delay, local

   network bandwidth consumption, or total internet bandwidth

   consumption. Alternatively, the routing algorithm may use a simple

   route lookup for each target.

   Once a next-hop is selected by the routing function, it persists for

   the whole stream lifetime, unless a network failure occurs.

4.5.2.2 Reserving Resources

   The ST agent invokes the Local Resource Manager (LRM) to perform the

   appropriate reservations. The ST agent presents the LRM with

   information including:

o   the flow specification with the desired quality of service for the

    stream (Section 9),

o   the version number associated with the flow specification

    (Section 9).

o   information on the groups the stream is member in, if any

    (Section 7),

   The flow specification contains information needed by the LRM to

   allocate resources. The LRM updates the flow specification contents

   information before returning it to the ST agent. Section 9.2.3

   defines the fields of the flow specification to be updated by the

   LRM.

   The membership of a stream in a group may affect the amount of

   resources that have to be allocated by the LRM, see Section 7.

4.5.3 Sending CONNECT Messages

   The ST agent sends a CONNECT message to each of the next-hop ST

   agents identified by the routing function. Each CONNECT message

   contains the SID, the selected stream options, the FlowSpec, and a

   TargetList. The format of the CONNECT message is defined by Section

   10.4.4. In general, the FlowSpec and TargetList depend on both the

   next-hop and the intervening network. Each TargetList is a subset of

   the original TargetList, identifying the targets that are to be

   reached through the next-hop to which the CONNECT message is being

   sent.

   The TargetList may be empty, see Section 4.5.3.1; if the TargetList

   causes a too long CONNECT message to be generated, the CONNECT

   message is partitioned as explained in Section 5.1.2. If multiple

   next-hops are to be reached through a network that supports network

   level multicast, a different CONNECT message must nevertheless be

   sent to each next-hop since each will have a different TargetList.

4.5.3.1 Empty Target List

   An application at the origin may request the local ST agent to create

   an empty stream. It does so by passing an empty TargetList to the

   local ST agent during the initial stream setup. When the local ST

   agent receives a request to create an empty stream, it allocates the

   stream ID (SID), updates its local database entries to store

   information on the new stream and notifies the application that

   stream setup is complete. The local ST agent does not generate any

   CONNECT message for streams with an empty TargetList. Targets may be

   later added by the origin, see Section 4.6.1, or they may

   autonomously join the stream, see Section 4.6.3.

4.5.4 CONNECT Processing by an Intermediate ST agent

   An ST agent receiving a CONNECT message, assuming no errors, responds

   to the previous-hop with an ACK. The ACK message must identify the

   CONNECT message to which it corresponds by including the reference

   number indicated by the Reference field of the CONNECT message. The

   intermediate ST agent calls the routing function, invokes the LRM to

   reserve resources, and then propagates the CONNECT messages to its

   next-hops, as described in the previous sections.

4.5.5 CONNECT Processing at the Targets

   An ST agent that is the target of a CONNECT message, assuming no

   errors, responds to the previous-hop with an ACK. The ST agent

   invokes the LRM to reserve local resources and then queries the

   specified application process whether or not it is willing to accept

   the connection.

   The application is presented with parameters from the CONNECT message

   including the SID, the selected stream options, Origin, FlowSpec,

   TargetList, and Group, if any, to be used as a basis for its

   decision. The application is identified by a combination of the

   NextPcol field, from the Origin parameter, and the service access

   point, or SAP, field included in the correspondent (usually single

   remaining) Target of the TargetList. The contents of the SAP field

   may specify the port or other local identifier for use by the

   protocol layer above the host ST layer. Subsequently received data

   packets will carry the SID, that can be mapped into this information

   and be used for their delivery.

   Finally, based on the application's decision, the ST agent sends to

   the previous-hop from which the CONNECT message was received either

   an ACCEPT or REFUSE message. Since the ACCEPT (or REFUSE) message has

   to be acknowledged by the previous-hop, it is assigned a new

   Reference number that will be returned in the ACK. The CONNECT

   message to which ACCEPT (or REFUSE) is a reply is identified by

   placing the CONNECT's Reference number in the LnkReference field of

   ACCEPT (or REFUSE). The ACCEPT message contains the FlowSpec as

   accepted by the application at the target.

4.5.6 ACCEPT Processing by an Intermediate ST agent

   When an intermediate ST agent receives an ACCEPT, it first verifies

   that the message is a response to an earlier CONNECT. If not, it

   responds to the next-hop ST agent with an ERROR message, with

   ReasonCode (LnkRefUnknown). Otherwise, it responds to the next-hop ST

   agent with an ACK, and propagates the individual ACCEPT message to

   the previous-hop along the same path traced by the CONNECT but in the

   reverse direction toward the origin.

   The FlowSpec is included in the ACCEPT message so that the origin and

   intermediate ST agents can gain access to the information that was

   accumulated as the CONNECT traversed the internet. Note that the

   resources, as specified in the FlowSpec in the ACCEPT message, may

   differ from the resources that were reserved when the CONNECT was

   originally processed. Therefore, the ST agent presents the LRM with

   the FlowSpec included in the ACCEPT message. It is expected that each

   LRM adjusts local reservations releasing any excess resources. The

   LRM may choose not to adjust local reservations when that adjustment

   may result in the loss of needed resources. It may also choose to

   wait to adjust allocated resources until all targets in transition

   have been accepted or refused.

   In the case where the intermediate ST agent is acting as the origin

   with respect to this target, see Section 4.6.3.1, the ACCEPT message

   is not propagated upstream.

4.5.7 ACCEPT Processing by the Origin

   The origin will eventually receive an ACCEPT (or REFUSE) message from

   each of the targets. As each ACCEPT is received, the application is

   notified of the target and the resources that were successfully

   allocated along the path to it, as specified in the FlowSpec

   contained in the ACCEPT message. The application may then use the

   information to either adopt or terminate the portion of the stream to

   each target.

   When an ACCEPT is received by the origin, the path to the target is

   considered to be established and the ST agent is allowed to forward

   the data along this path as explained in Section 2 and in Section

   3.1.

4.5.8 REFUSE Processing by the Intermediate ST agent

   If an application at a target does not wish to participate in the

   stream, it sends a REFUSE message back to the origin with ReasonCode

   (ApplDisconnect). An intermediate ST agent that receives a REFUSE

   message with ReasonCode (ApplDisconnect) acknowledges it by sending

   an ACK to the next-hop, invokes the LRM to adjusts reservations as

   appropriate, deletes the target entry from the internal database, and

   propagates the REFUSE message back to the previous-hop ST agent.

   In the case where the intermediate ST agent is acting as the origin

   with respect to this target, see Section 4.6.3.1, the REFUSE message

   is only propagated upstream when there are no more downstream agents

   participating in the stream. In this case, the agent indicates that

   the agent is to be removed from the stream propagating the REFUSE

   message with the G-bit set (1).

4.5.9 REFUSE Processing by the Origin

   When the REFUSE message reaches the origin, the ST agent at the

   origin sends an ACK and notifies the application that the target is

   no longer part of the stream and also if the stream has no remaining

   targets. If there are no remaining targets, the application may wish

   to terminate the stream, or keep the stream active to allow addition

   of targets or stream joining as described in Section 4.6.3.

4.5.10 Other Functions during Stream Setup

   Some other functions have to be accomplished by an ST agent as

   CONNECT messages travel downstream and ACCEPT (or REFUSE) messages

   travel upstream during the stream setup phase. They were not

   mentioned in the previous sections to keep the discussion as simple

   as possible. These functions include:

   o   computing the smallest Maximum Transmission Unit size over the path

       to the targets, as part of the MTU discovery mechanism presented in

       Section 8.6. This is done by updating the MaxMsgSize field of the

       CONNECT message, see Section 10.4.4. This value is carried back to

       origin in the MaxMsgSize field of the ACCEPT message, see Section

       10.4.1.

   o   counting the number of IP clouds to be traversed to reach the

       targets, if any. IP clouds are traversed when the IP encapsulation

       mechanism is used. This mechanism described in Section 8.7.

       Encapsulating agents update the IPHops field of the CONNECT message,

       see Section 10.4.4. The resulting value is carried back to origin in

       the IPHops field of the ACCEPT message, see Section 10.4.1.

   o   updating the RecoveryTimeout value for the stream based on what can

       the agent can support. This is part of the stream recovery

       mechanism, in Section 6.2. This is done by updating the

       RecoveryTimeout field of the CONNECT message, see Section 10.4.4.

       This value is carried back to origin in the RecoveryTimeout field of

       the ACCEPT message, see Section 10.4.1.

4.6 Modifying an Existing Stream

   Some applications may wish to modify a stream after it has been

   created. Possible changes include expanding a stream, reducing it,

   and changing its FlowSpec. The origin may add or remove targets as

   described in Section 4.6.1 and Section 4.6.2. Targets may request to

   join the stream as described in Section 4.6.3 or, they may decide to

   leave a stream as described in Section 4.6.4. Section 4.6.5 explains

   how to change a stream's FlowSpec.

   As defined by Section 2, an ST agent can handle only one stream

   modification at a time. If a stream modification operation is already

   underway, further requests are queued and handled when the previous

   operation has been completed. This also applies to two subsequent

   requests of the same kind, e.g., two subsequent changes to the

   FlowSpec.

4.6.1 The Origin Adding New Targets

   It is possible for an application at the origin to add new targets to

   an existing stream any time after the stream has been established.

   Before new targets are added, the application has to collect the

   necessary information on the new targets. Such information is passed

   to the ST agent at the origin.

   The ST agent at the origin issues a CONNECT message that contains the

   SID, the FlowSpec, and the TargetList specifying the new targets.

   This is similar to sending a CONNECT message during stream

   establishment, with the following exceptions: the origin checks that

   a) the SID is valid, b) the targets are not already members of the

   stream, c) that the LRM evaluates the FlowSpec of the new target to

   be the same as the FlowSpec of the existing stream, i.e., it requires

   an equal or smaller amount of resources to be allocated. If the

   FlowSpec of the new target does not match the FlowSpec of the

   existing stream, an error is generated with ReasonCode

   (FlowSpecMismatch). Functions to compare flow specifications are

   provided by the LRM, see Section 1.4.5.

   An intermediate ST agent that is already a participant in the stream

   looks at the SID and StreamCreationTime, and verifies that the stream

   is the same. It then checks if the intersection of the TargetList and

   the targets of the established stream is empty. If this is not the

   case, it responds with a REFUSE message with ReasonCode

   (TargetExists) that contains a TargetList of those targets that were

   duplicates. To indicate that the stream exists, and includes the

   listed targets, the ST agent sets to one (1) the E-bit of the REFUSE

   message, see Section 10.4.11. The agent then proceeds processing

   each new target in the TargetList.

   For each new target in the TargetList, processing is much the same as

   for the original CONNECT. The CONNECT is acknowledged, propagated,

   and network resources are reserved. Intermediate or target ST agents

   that are not already participants in the stream behave as in the case

   of stream setup (see Section 4.5.4 and Section 4.5.5).

4.6.2 The Origin Removing a Target

   It is possible for an application at the origin to remove existing

   targets of a stream any time after the targets have accepted the

   stream. The application at the origin specifies the set of targets

   that are to be removed and informs the local ST agent. Based on this

   information, the ST agent sends DISCONNECT messages with the

   ReasonCode (ApplDisconnect) to the next-hops relative to the targets.

   An ST agent that receives a DISCONNECT message must acknowledge it by

   sending an ACK to the previous-hop. The ST agent updates its state

   and notifies the LRM of the target deletion so that the LRM can

   modify reservations as appropriate. When the DISCONNECT message

   reaches the target, the ST agent also notifies the application that

   the target is no longer part of the stream. When there are no

   remaining targets that can be reached through a particular next-hop,

   the ST agent informs the LRM and it deletes the next-hop from its

   next-hops set.

   SCMP also provides a flooding mechanism to delete targets that joined

   the stream without notifying the origin. The special case of target

   deletion via flooding is described in Section 5.7.

4.6.3 A Target Joining a Stream

   An application may request to join an existing stream. It has to

   collect information on the stream including the stream ID (SID) and

   the IP address of the stream's origin. This can be done out-of-band,

   e.g., via regular IP. The information is then passed to the local ST

   agent. The ST agent generates a JOIN message containing the

   application's request to join the stream and sends it toward the

   stream origin.

   An ST agent receiving a JOIN message, assuming no errors, responds

   with an ACK. The ACK message must identify the JOIN message to which

   it corresponds by including the Reference number indicated by the

   Reference field of the JOIN message. If the ST agent is not traversed

   by the stream that has to be joined, it propagates the JOIN message

   toward the stream's origin. Once a JOIN message has been

   acknowledged, ST agents do not retain any state information related

   to the JOIN message.

   Eventually, an ST agent traversed by the stream or the stream's

   origin itself is reached. This agent must respond to a received JOIN

   first with an ACK to the ST agent from which the message was

   received, then, it issues either a CONNECT or a JOIN-REJECT message

   and sends it toward the target. The response to the join request is

   based on the join authorization level associated with the stream, see

   Section 4.4.2:

o   If the stream has authorization level #0 (refuse join):

    The ST agent sends a JOIN-REJECT message toward the target with

    ReasonCode (JoinAuthFailure).

o   If the stream has authorization level #1 (ok, notify origin):

    The ST agent sends a CONNECT message toward the target with a

    TargetList including the target that requested to join the stream.

    This eventually results in adding the target to the stream. When

    the ST agent receives the ACCEPT message indicating that the new

    target has been added, it does not propagate the ACCEPT message

    backwards (Section 4.5.6). Instead, it issues a NOTIFY message

    with ReasonCode (TargetJoined) so that upstream agents, including

    the origin, may add the new target to maintained state

    information. The NOTIFY message includes all target specific

    information.

o   If the stream has authorization level #2 (ok):

    The ST agent sends a CONNECT message toward the target with a

    TargetList including the target that requested to join the stream.

    This eventually results in adding the target to the stream. When

    the ST agent receives the ACCEPT message indicating that the new

    target has been added, it does not propagate the ACCEPT message

    backwards (Section 4.5.6), nor does it notify the origin. A NOTIFY

    message is generated with ReasonCode (TargetJoined) if the target

    specific information needs to be propagated back to the origin. An

    example of such information is change in MTU, see Section 8.6.

4.6.3.1 Intermediate Agent (Router) as Origin

   When a stream has join authorization level #2, see Section 4.4.2, it

   is possible that the stream origin is unaware of some targets

   participating in the stream. In this case, the ST intermediate agent

   that first sent a CONNECT message to this target has to act as the

   stream origin for the given target. This includes:

o   if the whole stream is deleted, the intermediate agent must

    disconnect the target.

o   if the stream FlowSpec is changed, the intermediate agent must

    change the FlowSpec for the target as appropriate.

o   proper handling of ACCEPT and REFUSE messages, without propagation

    to upstream ST agents.

o   generation of NOTIFY messages when needed. (As described above.)

   The intermediate agent behaves normally for all other targets added

   to the stream as a consequence of a CONNECT message issued by the

   origin.

4.6.4 A Target Deleting Itself

   The application at the target may inform the local ST agent that it

   wants to be removed from the stream. The ST agent then forms a REFUSE

   message with the target itself as the only entry in the TargetList

   and with ReasonCode (ApplDisconnect). The REFUSE message is sent back

   to the origin via the previous-hop. If a stream has multiple targets

   and one target leaves the stream using this REFUSE mechanism, the

   stream to the other targets is not affected; the stream continues to

   exist.

   An ST agent that receives a REFUSE message acknowledges it by sending

   an ACK to the next-hop. The target is deleted and the LRM is notified

   so that it adjusts reservations as appropriate. The REFUSE message is

   also propagated back to the previous-hop ST agent except in the case

   where the agent is acting as the origin. In this case a NOTIFY may be

   propagated instead, see Section 4.6.3.

   When the REFUSE reaches the origin, the origin sends an ACK and

   notifies the application that the target is no longer part of the

   stream.

4.6.5 Changing a Stream's FlowSpec

   The application at the origin may wish to change the FlowSpec of an

   established stream. Changing the FlowSpec is a critical operation and

   it may even lead in some cases to the deletion of the affected

   targets. Possible problems with FlowSpec changes are discussed in

   Section 5.6.

   To change the stream's FlowSpec, the application informs the ST agent

   at the origin of the new FlowSpec and of the list of targets relative

   to the change. The ST agent at the origin then issues one CHANGE

   message per next-hop including the new FlowSpec and sends it to the

   relevant next-hop ST agents. If the G-bit field of the CHANGE message

   is set (1), the change affects all targets in the stream.

   The CHANGE message contains a bit called I-bit, see Section 10.4.3.

   By default, the I-bit is set to zero (0) to indicate that the LRM is

   expected to try and perform the requested FlowSpec change without

   risking to tear down the stream. Applications that desire a higher

   probability of success and are willing to take the risk of breaking

   the stream can indicate this by setting the I-bit to one (1).

   Applications that require the requested modification in order to

   continue operating are expected to set this bit.

   An intermediate ST agent that receives a CHANGE message first sends

   an ACK to the previous-hop and then provides the FlowSpec to the LRM.

   If the LRM can perform the change, the ST agent propagates the CHANGE

   messages along the established paths.

   If the whole process succeeds, the CHANGE messages will eventually

   reach the targets. Targets respond with an ACCEPT (or REFUSE) message

   that is propagated back to the origin. In processing the ACCEPT

   message on the way back to the origin, excess resources may be

   released by the LRM as described in Section 4.5.6. The REFUSE message

   must have the ReasonCode (ApplRefused).

   SCMP also provides a flooding mechanism to change targets that joined

   the stream without notifying the origin. The special case of target

   change via flooding is described in Section 5.7.

4.7 Stream Tear Down

   A stream is usually terminated by the origin when it has no further

   data to send. A stream is also torn down if the application should

   terminate abnormally or if certain network failures are encountered.

   Processing in this case is identical to the previous descriptions

   except that the ReasonCode (ApplAbort, NetworkFailure, etc.) is

   different.

   When all targets have left a stream, the origin notifies the

   application of that fact, and the application is then responsible for

   terminating the stream. Note, however, that the application may

   decide to add targets to the stream instead of terminating it, or may

   just leave the stream open with no targets in order to permit stream

   joins.

5. Exceptional Cases

   The previous descriptions covered the simple cases where everything

   worked. We now discuss what happens when things do not succeed.

   Included are situations where messages exceed a network MTU, are

   lost, the requested resources are not available, the routing fails or

   is inconsistent.

5.1 Long ST Messages

   It is possible that an ST agent, or an application, will need to send

   a message that exceeds a network's Maximum Transmission Unit (MTU).

   This case must be handled but not via generic fragmentation, since

   ST2 does not support generic fragmentation of either data or control

   messages.

5.1.1 Handling of Long Data Packets

   ST agents discard data packets that exceed the MTU of the next-hop

   network. No error message is generated. Applications should avoid

   sending data packets larger than the minimum MTU supported by a given

   stream. The application, both at the origin and targets, can learn

   the stream minimum MTU through the MTU discovery mechanism described

   in Section 8.6.

5.1.2 Handling of Long Control Packets

   Each ST agent knows the MTU of the networks to which it is connected,

   and those MTUs restrict the size of the SCMP message it can send. An

   SCMP message size can exceed the MTU of a given network for a number

   of reasons:

o   the TargetList parameter (Section 10.3.6) may be too long;

o   the RecordRoute parameter (Section 10.3.5) may be too long.

o   the UserData parameter (Section 10.3.7) may be too long;

o   the PDUInError field of the ERROR message (Section 10.4.6) may be

    too long;

   An ST agent receiving or generating a too long SCMP message should:

o   break the message into multiple messages, each carrying part of the

    TargetList. Any RecordRoute and UserData parameters are replicated

    in each message for delivery to all targets. Applications that

    support a large number of targets may avoid using long TargetList

    parameters, and are expected to do so, by exploiting the stream

    joining functions, see Section 4.6.3. One exception to this rule

    exists. In the case of a long TargetList parameter to be included in

    a STATUS-RESPONSE message, the TargetList parameter is just

    truncated to the point where the list can fit in a single message,

    see Section 8.4.

o   for down stream agents: if the TargetList parameter contains a

    single Target element and the message size is still too long, the ST

    agent should issue a REFUSE message with ReasonCode

    (RecordRouteSize) if the size of the RecordRoute parameter causes

    the SCMP message size to exceed the network MTU, or with ReasonCode

    (UserDataSize) if the size of the UserData parameter causes the SCMP

    message size to exceed the network MTU. If both RecordRoute and

    UserData parameters are present the ReasonCode (UserDataSize) should

    be sent. For messages generated at the target: the target ST agent

    must check for SCMP messages that may exceed the MTU on the complete

    target-to-origin path, and inform the application that a too long

    SCMP messages has been generated. The format for the error reporting

    is a local implementation issue. The error codes are the same as

    previously stated.

   ST agents generating too long ERROR messages, simply truncate the

   PDUInError field to the point where the message is smaller than the

   network MTU.

5.2 Timeout Failures

   As described in Section 4.3, SCMP message delivery is made reliable

   through the use of acknowledgments, timeouts, and retransmission. The

   ACCEPT, CHANGE, CONNECT, DISCONNECT, JOIN, JOIN-REJECT, NOTIFY, and

   REFUSE messages must always be acknowledged, see Section 4.2. In

   addition, for some SCMP messages (CHANGE, CONNECT, JOIN) the sending

   ST agent also expects a response back (ACCEPT/REFUSE, CONNECT/JOIN-

   REJECT) after an ACK has been received. Also, the STATUS message must

   be answered with a STATUS-RESPONSE message.

   The following sections describe the handling of each of the possible

   failure cases due to timeout situations while waiting for an

   acknowledgment or a response. The timeout related variables, and

   their names, used in the next sections are for reference purposes

   only. They may be implementation specific. Different implementations

   are not required to share variable names, or even the mechanism by

   which the timeout and retransmission behavior is implemented.

5.2.1 Failure due to ACCEPT Acknowledgment Timeout

   An ST agent that sends an ACCEPT message upstream expects an ACK from

   the previous-hop ST agent. If no ACK is received before the ToAccept

   timeout expires, the ST agent should retry and send the ACCEPT

   message again. After NAccept unsuccessful retries, the ST agent sends

   a REFUSE message toward the origin, and a DISCONNECT message toward

   the targets. Both REFUSE and DISCONNECT must identify the affected

   targets and specify the ReasonCode (RetransTimeout).

5.2.2 Failure due to CHANGE Acknowledgment Timeout

   An ST agent that sends a CHANGE message downstream expects an ACK

   from the next-hop ST agent. If no ACK is received before the ToChange

   timeout expires, the ST agent should retry and send the CHANGE

   message again. After NChange unsuccessful retries, the ST agent

   aborts the change attempt by sending a REFUSE message toward the

   origin, and a DISCONNECT message toward the targets. Both REFUSE and

   DISCONNECT must identify the affected targets and specify the

   ReasonCode (RetransTimeout).

5.2.3 Failure due to CHANGE Response Timeout

   Only the origin ST agent implements this timeout. After correctly

   receiving the ACK to a CHANGE message, an ST agent expects to receive

   an ACCEPT, or REFUSE message in response. If one of these messages is

   not received before the ToChangeResp timer expires, the ST agent at

   the origin aborts the change attempt, and behaves as if a REFUSE

   message with the E-bit set and with ReasonCode (ResponseTimeout) is

   received.

5.2.4 Failure due to CONNECT Acknowledgment Timeout

   An ST agent that sends a CONNECT message downstream expects an ACK

   from the next-hop ST agent. If no ACK is received before the

   ToConnect timeout expires, the ST agent should retry and send the

   CONNECT message again. After NConnect unsuccessful retries, the ST

   agent sends a REFUSE message toward the origin, and a DISCONNECT

   message toward the targets. Both REFUSE and DISCONNECT must identify

   the affected targets and specify the ReasonCode (RetransTimeout).

5.2.5 Failure due to CONNECT Response Timeout

   Only the origin ST agent implements this timeout. After correctly

   receiving the ACK to a CONNECT message, an ST agent expects to

   receive an ACCEPT or REFUSE message in response. If one of these

   messages is not received before the ToConnectResp timer expires, the

   origin ST agent aborts the connection setup attempt, acts as if a

   REFUSE message is received, and it sends a DISCONNECT message toward

   the targets. Both REFUSE and DISCONNECT must identify the affected

   targets and specify the ReasonCode (ResponseTimeout).

5.2.6 Failure due to DISCONNECT Acknowledgment Timeout

   An ST agent that sends a DISCONNECT message downstream expects an ACK

   from the next-hop ST agent. If no ACK is received before the

   ToDisconnect timeout expires, the ST agent should retry and send the

   DISCONNECT message again. After NDisconnect unsuccessful retries, the

   ST agent simply gives up and it assumes the next-hop ST agent is not

   part in the stream any more.

5.2.7 Failure due to JOIN Acknowledgment Timeout

   An ST agent that sends a JOIN message toward the origin expects an

   ACK from a neighbor ST agent. If no ACK is received before the ToJoin

   timeout expires, the ST agent should retry and send the JOIN message

   again. After NJoin unsuccessful retries, the ST agent sends a JOIN-

   REJECT message back in the direction of the target with ReasonCode

   (RetransTimeout).

5.2.8 Failure due to JOIN Response Timeout

   Only the target agent implements this timeout. After correctly

   receiving the ACK to a JOIN message, the ST agent at the target

   expects to receive a CONNECT or JOIN-REJECT message in response. If

   one of these message is not received before the ToJoinResp timer

   expires, the ST agent aborts the stream join attempt and returns an

   error corresponding with ReasonCode (RetransTimeout) to the

   application.

   Note that, after correctly receiving the ACK to a JOIN message,

   intermediate ST agents do not maintain any state on the stream

   joining attempt. As a consequence, they do not set the ToJoinResp

   timer and do not wait for a CONNECT or JOIN-REJECT message. This is

   described in Section 4.6.3.

5.2.9 Failure due to JOIN-REJECT Acknowledgment Timeout

   An ST agent that sends a JOIN-REJECT message toward the target

   expects an ACK from a neighbor ST agent. If no ACK is received before

   the ToJoinReject timeout expires, the ST agent should retry and send

   the JOIN-REJECT message again. After NJoinReject unsuccessful

   retries, the ST agent simply gives up.

5.2.10 Failure due to NOTIFY Acknowledgment Timeout

   An ST agent that sends a NOTIFY message to a neighbor ST agent

   expects an ACK from that neighbor ST agent. If no ACK is received

   before the ToNotify timeout expires, the ST agent should retry and

   send the NOTIFY message again. After NNotify unsuccessful retries,

   the ST agent simply gives up and behaves as if the ACK message was

   received.

5.2.11 Failure due to REFUSE Acknowledgment Timeout

   An ST agent that sends a REFUSE message upstream expects an ACK from

   the previous-hop ST agent. If no ACK is received before the ToRefuse

   timeout expires, the ST agent should retry and send the REFUSE

   message again. After NRefuse unsuccessful retries, the ST agent gives

   up and it assumes it is not part in the stream any more.

5.2.12 Failure due to STATUS Response Timeout

   After sending a STATUS message to a neighbor ST agent, an ST agent

   expects to receive a STATUS-RESPONSE message in response. If this

   message is not received before the ToStatusResp timer expires, the ST

   agent sends the STATUS message again. After NStatus unsuccessful

   retries, the ST agent gives up and assumes that the neighbor ST agent

   is not active.

5.3 Setup Failures due to Routing Failures

   It is possible for an ST agent to receive a CONNECT message that

   contains a known SID, but from an ST agent other than the previous-

   hop ST agent of the stream with that SID. This may be:

   1. that two branches of the tree forming the stream have joined

       back together,

   2. the result of an attempted recovery of a partially failed

       stream, or

   3. a routing loop.

   The TargetList contained in the CONNECT is used to distinguish the

   different cases by comparing each newly received target with those of

   the previously existing stream:

o   if the IP address of the target(s) differ, it is case #1;

o   if the target matches a target in the existing stream, it may be

    case #2 or #3.

   Case #1 is handled in Section 5.3.1, while the other cases are

   handled in Section 5.3.2.

5.3.1 Path Convergence

   It is possible for an ST agent to receive a CONNECT message that

   contains a known SID, but from an ST agent other than the previous-

   hop ST agent of the stream with that SID. This might be the result of

   two branches of the tree forming the stream have joined back

   together. Detection of this case and other possible sources was

   discussed in Section 5.2.

   SCMP does not allow for streams which have converged paths, i.e.,

   streams are always tree-shaped and not graph-like. At the point of

   convergence, the ST agent which detects the condition generates a

   REFUSE message with ReasonCode (PathConvergence). Also, as a help to

   the upstream ST agent, the detecting agent places the IP address of

   one of the stream's connected targets in the ValidTargetIPAddress

   field of the REFUSE message. This IP address will be used by upstream

   ST agents to avoid splitting the stream.

   An upstream ST agent that receives the REFUSE with ReasonCode

   (PathConvergence) will check to see if the listed IP address is one

   of the known stream targets. If it is not, the REFUSE is propagated

   to the previous-hop agent. If the listed IP address is known by the

   upstream ST agent, this ST agent is the ST agent that caused the

   split in the stream. (This agent may even be the origin.) This agent

   then avoids splitting the stream by using the next-hop of that known

   target as the next-hop for the refused targets. It sends a CONNECT

   with the affected targets to the existing valid next-hop.

   The above process will proceed, hop by hop, until the

   ValidTargetIPAddress matches the IP address of a known target. The

   only case where this process will fail is when the known target is

   deleted prior to the REFUSE propagating to the origin. In this case

   the origin can just reissue the CONNECT and start the whole process

   over again.

5.3.2 Other Cases

   The remaining cases including a partially failed stream and a routing

   loop, are not easily distinguishable. In attempting recovery of a

   failed stream, an ST agent may issue new CONNECT messages to the

   affected targets. Such a CONNECT may reach an ST agent downstream of

   the failure before that ST agent has received a DISCONNECT from the

   neighborhood of the failure. Until that ST agent receives the

   DISCONNECT, it cannot distinguish between a failure recovery and an

   erroneous routing loop. That ST agent must therefore respond to the

   CONNECT with a REFUSE message with the affected targets specified in

   the TargetList and an appropriate ReasonCode (StreamExists).

   The ST agent immediately preceding that point, i.e., the latest ST

   agent to send the CONNECT message, will receive the REFUSE message.

   It must release any resources reserved exclusively for traffic to the

   listed targets. If this ST agent was not the one attempting the

   stream recovery, then it cannot distinguish between a failure

   recovery and an erroneous routing loop. It should repeat the CONNECT

   after a ToConnect timeout, see Section 5.2.4. If after NConnect

   retransmissions it continues to receive REFUSE messages, it should

   propagate the REFUSE message toward the origin, with the TargetList

   that specifies the affected targets, but with a different ReasonCode

   (RouteLoop).

   The REFUSE message with this ReasonCode (RouteLoop) is propagated by

   each ST agent without retransmitting any CONNECT messages. At each ST

   agent, it causes any resources reserved exclusively for the listed

   targets to be released. The REFUSE will be propagated to the origin

   in the case of an erroneous routing loop. In the case of stream

   recovery, it will be propagated to the ST agent that is attempting

   the recovery, which may be an intermediate ST agent or the origin

   itself. In the case of a stream recovery, the ST agent attempting the

   recovery may issue new CONNECT messages to the same or to different

   next-hops.

   If an ST agent receives both a REFUSE message and a DISCONNECT

   message with a target in common then it can, for the each target in

   common, release the relevant resources and propagate neither the

   REFUSE nor the DISCONNECT.

   If the origin receives such a REFUSE message, it should attempt to

   send a new CONNECT to all the affected targets. Since routing errors

   in an internet are assumed to be temporary, the new CONNECTs will

   eventually find acceptable routes to the targets, if one exists. If

   no further routes exist after NRetryRoute tries, the application

   should be informed so that it may take whatever action it seems

   necessary.

5.4 Problems due to Routing Inconsistency

   When an intermediate ST agent receives a CONNECT, it invokes the

   routing algorithm to select the next-hop ST agents based on the

   TargetList and the networks to which it is connected. If the

   resulting next-hop to any of the targets is across the same network

   from which it received the CONNECT (but not the previous-hop itself),

   there may be a routing problem. However, the routing algorithm at the

   previous- hop may be optimizing differently than the local algorithm

   would in the same situation. Since the local ST agent cannot

   distinguish the two cases, it should permit the setup but send back

   to the previous- hop ST agent an informative NOTIFY message with the

   appropriate ReasonCode (RouteBack), pertinent TargetList, and in the

   NextHopIPAddress element the address of the next-hop ST agent

   returned by its routing algorithm.

   The ST agent that receives such a NOTIFY should ACK it. If the ST

   agent is using an algorithm that would produce such behavior, no

   further action is taken; if not, the ST agent should send a

   DISCONNECT to the next-hop ST agent to correct the problem.

   Alternatively, if the next-hop returned by the routing function is in

   fact the previous-hop, a routing inconsistency has been detected. In

   this case, a REFUSE is sent back to the previous-hop ST agent

   containing an appropriate ReasonCode (RouteInconsist), pertinent

   TargetList, and in the NextHopIPAddress element the address of the

   previous-hop. When the previous-hop receives the REFUSE, it will

   recompute the next-hop for the affected targets. If there is a

   difference in the routing databases in the two ST agents, they may

   exchange CONNECT and REFUSE messages again. Since such routing errors

   in the internet are assumed to be temporary, the situation should

   eventually stabilize.

5.5 Problems in Reserving Resources

   As mentioned in Section 1.4.5, resource reservation is handled by the

   LRM. The LRM may not be able to satisfy a particular request during

   stream setup or modification for a number of reasons, including a

   mismatched FlowSpec, an unknown FlowSpec version, an error in

   processing a FlowSpec, and an inability to allocate the requested

   resource. This section discusses these cases and specifies the

   ReasonCodes that should be used when these error cases are

   encountered.

5.5.1 Mismatched FlowSpecs

   In some cases the LRM may require a requested FlowSpec to match an

   existing FlowSpec, e.g., when adding new targets to an existing

   stream, see Section 4.6.1. In case of FlowSpec mismatch the LRM

   notifies the processing ST agent which should respond with ReasonCode

   (FlowSpecMismatch).

5.5.2 Unknown FlowSpec Version

   When the LRM is invoked, it is passed information including the

   version of the FlowSpec, see Section 4.5.2.2. If this version is not

   known by the LRM, the LRM notifies the ST agent. The ST agent should

   respond with a REFUSE message with ReasonCode (FlowVerUnknown).

5.5.3 LRM Unable to Process FlowSpec

   The LRM may encounter an LRM or FlowSpec specific error while

   attempting to satisfy a request. An example of such an error is given

   in Section 9.2.1. These errors are implementation specific and will

   not be enumerated with ST ReasonCodes. They are covered by a single,

   generic ReasonCode. When an LRM encounters such an error, it should

   notify the ST agent which should respond with the generic ReasonCode

   (FlowSpecError).

5.5.4 Insufficient Resources

   If the LRM cannot make the necessary reservations because sufficient

   resources are not available, an ST agent may:

o   try alternative paths to the targets: the ST agent calls the routing

    function to find a different path to the targets. If an alternative

    path is found, stream connection setup continues in the usual way,

    as described in Section 4.5.

o   refuse to establish the stream along this path: the origin ST agent

    informs the application of the stream setup failure; intermediate

    and target ST agents issue a REFUSE message (as described in Section

    4.5.8) with ReasonCode (CantGetResrc).

   It depends on the local implementations whether an ST agent tries

   alternative paths or refuses to establish the stream. In any case, if

   enough resources cannot be found over different paths, the ST agent

   has to explicitly refuse to establish the stream.

5.6 Problems Caused by CHANGE Messages

   A CHANGE might fail for several reasons, including:

o   insufficient resources: the request may be for a larger amount of

    network resources when those resources are not available, ReasonCode

    (CantGetResrc);

o   a target application not agreeing to the change, ReasonCode

    (ApplRefused);

   The affected stream can be left in one of two states as a result of

   change failures: a) the stream can revert back to the state it was in

   prior to the CHANGE message being processed, or b) the stream may be

   torn down.

   The expected common case of failure will be when the requested change

   cannot be satisfied, but the pre-change resources remain allocated

   and available for use by the stream. In this case, the ST agent at

   the point where the failure occurred must inform upstream ST agents

   of the failure. (In the case where this ST agent is the target, there

   may not actually be a failure, the application may merely have not

   agreed to the change). The ST agent informs upstream ST agents by

   sending a REFUSE message with ReasonCode (CantGetResrc or

   ApplRefused). To indicate that the pre-change FlowSpec is still

   available and that the stream still exists, the ST agent sets the E-

   bit of the REFUSE message to one (1), see Section 10.4.11. Upstream

   ST agents receiving the REFUSE message inform the LRM so that it can

   attempt to revert back to the pre-change FlowSpec. It is permissible,

   but not desirable, for excess resources to remain allocated.

   For the case when the attempt to change the stream results in the

   loss of previously reserved resources, the stream is torn down. This

   can happen, for instance, when the I-bit is set (Section 4.6.5) and

   the LRM releases pre-change stream resources before the new ones are

   reserved, and neither new nor former resources are available. In this

   case, the ST agent where the failure occurs must inform other ST

   agents of the break in the affected portion of the stream. This is

   done by the ST agent by sending a REFUSE message upstream and a

   DISCONNECT message downstream, both with the ReasonCode

   (CantGetResrc). To indicate that pre-change stream resources have

   been lost, the E-bit of the REFUSE message is set to zero (0).

   Note that a failure to change the resources requested for specific

   targets should not cause other targets in the stream to be deleted.

5.7 Unknown Targets in DISCONNECT and CHANGE

   The handling of unknown targets listed in a DISCONNECT or CHANGE

   message is dependent on a stream's join authorization level, see

   Section 4.4.2. For streams with join authorization levels #0 and #1,

   see Section 4.4.2, all targets must be known. In this case, when

   processing a CHANGE message, the agent should generate a REFUSE

   message with ReasonCode (TargetUnknown). When processing a DISCONNECT

   message, it is possible that the DISCONNECT is a duplicate of an old

   request so the agent should respond as if it has successfully

   disconnected the target. That is, it should respond with an ACK

   message.

   For streams with join authorization level #2, it is possible that the

   origin is not aware of some targets that participate in the stream.

   The origin may delete or change these targets via the following

   flooding mechanism.

   If no next-hop ST agent can be associated with a target, the CHANGE/

   DISCONNECT message including the target is replicated to all known

   next-hop ST agents. This has the effect of propagating the CHANGE/

   DISCONNECT message to all downstream ST agents. Eventually, the ST

   agent that acts as the origin for the target (Section 4.6.3.1) is

   reached and the target is deleted.

   Target deletion/change via flooding is not expected to be the normal

   case. It is included to present the applications with uniform

   capabilities for all stream types. Flooding only applies to streams

   with join authorization level #2.

6. Failure Detection and Recovery

6.1 Failure Detection

   The SCMP failure detection mechanism is based on two assumptions:

1. If a neighbor of an ST agent is up, and has been up without a

    disruption, and has not notified the ST agent of a problem with

    streams that pass through both, then the ST agent can assume that

    there has not been any problem with those streams.

2. A network through which an ST agent has routed a stream will notify

    the ST agent if there is a problem that affects the stream data

    packets but does not affect the control packets.

   The purpose of the robustness protocol defined here is for ST agents

   to determine that the streams through a neighbor have been broken by

   the failure of the neighbor or the intervening network. This protocol

   should detect the overwhelming majority of failures that can occur.

   Once a failure is detected, the recovery procedures described in

   Section 6.2 are initiated by the ST agents.

6.1.1 Network Failures

   An ST agent can detect network failures by two mechanisms:

   o   the network can report a failure, or

   o   the ST agent can discover a failure by itself.

   They differ in the amount of information that an ST agent has

   available to it in order to make a recovery decision. For example, a

   network may be able to report that reserved bandwidth has been lost

   and the reason for the loss and may also report that connectivity to

   the neighboring ST agent remains intact. On the other hand, an ST

   agent may discover that communication with a neighboring ST agent has

   ceased because it has not received any traffic from that neighbor in

   some time period. If an ST agent detects a failure, it may not be

   able to determine if the failure was in the network while the

   neighbor remains available, or the neighbor has failed while the

   network remains intact.

6.1.2 Detecting ST Agents Failures

   Each ST agent periodically sends each neighbor with which it shares

   one or more streams a HELLO message. This message exchange is between

   ST agents, not entities representing streams or applications. That

   is, an ST agent need only send a single HELLO message to a neighbor

   regardless of the number of streams that flow between them. All ST

   agents (host as well as intermediate) must participate in this

   exchange. However, only ST agents that share active streams can

   participate in this exchange and it is an error to send a HELLO

   message to a neighbor ST agent with no streams in common, e.g., to

   check whether it is active. STATUS messages can be used to poll the

   status of neighbor ST agents, see Section 8.4.

   For the purpose of HELLO message exchange, stream existence is

   bounded by ACCEPT and DISCONNECT/REFUSE processing and is defined for

   both the upstream and downstream case. A stream to a previous-hop is

   defined to start once an ACCEPT message has been forwarded upstream.

   A stream to a next-hop is defined to start once the received ACCEPT

   message has been acknowledged. A stream is defined to terminate once

   an acknowledgment is sent for a received DISCONNECT or REFUSE

   message, and an acknowledgment for a sent DISCONNECT or REFUSE

   message has been received.

   The HELLO message has two fields:

   o   a HelloTimer field that is in units of milliseconds modulo the

       maximum for the field size, and

   o   a Restarted-bit specifying that the ST agent has been restarted

       recently.

   The HelloTimer must appear to be incremented every millisecond

   whether a HELLO message is sent or not. The HelloTimer wraps around

   to zero after reaching the maximum value. Whenever an ST agent

   suffers a catastrophic event that may result in it losing ST state

   information, it must reset its HelloTimer to zero and must set the

   Restarted-bit in all HELLO messages sent in the following

   HelloTimerHoldDown seconds.

   If an ST agent receives a HELLO message that contains the Restarted-

   bit set, it must assume that the sending ST agent has lost its state.

   If it shares streams with that neighbor, it must initiate stream

   recovery activity, see Section 6.2. If it does not share streams with

   that neighbor, it should not attempt to create one until that bit is

   no longer set. If an ST agent receives a CONNECT message from a

   neighbor whose Restarted-bit is still set, the agent must respond

   with an ERROR message with the appropriate ReasonCode

   (RestartRemote). If an agent receives a CONNECT message while the

   agent's own Restarted- bit is set, the agent must respond with an

   ERROR message with the appropriate ReasonCode (RestartLocal).

   Each ST stream has an associated RecoveryTimeout value. This value is

   assigned by the origin and carried in the CONNECT message, see

   Section 4.5.10. Each agent checks to see if it can support the

   requested value. If it can not, it updates the value to the smallest

   timeout interval it can support. The RecoveryTimeout used by a

   particular stream is obtained from the ACCEPT message, see Section

   4.5.10, and is the smallest value seen across all ACCEPT messages

   from participating targets.

   An ST agent must send HELLO messages to its neighbor with a period

   shorter than the smallest RecoveryTimeout of all the active streams

   that pass between the two ST agents, regardless of direction. This

   period must be smaller by a factor, called HelloLossFactor, which is

   at least as large as the greatest number of consecutive HELLO

   messages that could credibly be lost while the communication between

   the two ST agents is still viable.

   An ST agent may send simultaneous HELLO messages to all its neighbors

   at the rate necessary to support the smallest RecoveryTimeout of any

   active stream. Alternately, it may send HELLO messages to different

   neighbors independently at different rates corresponding to

   RecoveryTimeouts of individual streams.

   An ST agent must expect to receive at least one new HELLO message

   from each neighbor at least as frequently as the smallest

   RecoveryTimeout of any active stream in common with that neighbor.

   The agent can detect duplicate or delayed HELLO messages by comparing

   the HelloTimer field of the most recent valid HELLO message from that

   neighbor with the HelloTimer field of an incoming HELLO message.

   Valid incoming HELLO messages will have a HelloTimer field that is

   greater than the field contained in the previously received valid

   HELLO message by the time elapsed since the previous message was

   received. Actual evaluation of the elapsed time interval should take

   into account the maximum likely delay variance from that neighbor.

   If the ST agent does not receive a valid HELLO message within the

   RecoveryTimeout period of a stream, it must assume that the

   neighboring ST agent or the communication link between the two has

   failed and it must initiate stream recovery activity, as described

   below in Section 6.2.

6.2 Failure Recovery

   If an intermediate ST agent fails or a network or part of a network

   fails, the previous-hop ST agent and the various next-hop ST agents

   will discover the fact by the failure detection mechanism described

   in Section 6.1.

   The recovery of an ST stream is a relatively complex and time

   consuming effort because it is designed in a general manner to

   operate across a large number of networks with diverse

   characteristics. Therefore, it may require information to be

   distributed widely, and may require relatively long timers. On the

   other hand, since a network is typically a homogeneous system,

   failure recovery in the network may be a relatively faster and

   simpler operation. Therefore an ST agent that detects a failure

   should attempt to fix the network failure before attempting recovery

   of the ST stream. If the stream that existed between two ST agents

   before the failure cannot be reconstructed by network recovery

   mechanisms alone, then the ST stream recovery mechanism must be

   invoked.

   If stream recovery is necessary, the different ST agents will need to

   perform different functions, depending on their relation to the

   failure:

o   An ST agent that is a next-hop from a failure should first verify

    that there was a failure. It can do this using STATUS messages to

    query its upstream neighbor. If it cannot communicate with that

    neighbor, then for each active stream from that neighbor it should

    first send a REFUSE message upstream with the appropriate ReasonCode

    (STAgentFailure). This is done to the neighbor to speed up the

    failure recovery in case the hop is unidirectional, i.e., the

    neighbor can hear the ST agent but the ST agent cannot hear the

    neighbor. The ST agent detecting the failure must then, for each

    active stream from that neighbor, send DISCONNECT messages with the

    same ReasonCode toward the targets. All downstream ST agents process

    this DISCONNECT message just like the DISCONNECT that tears down the

    stream. If recovery is successful, targets will receive new CONNECT

    messages.

o   An ST agent that is the previous-hop before the failed component

    first verifies that there was a failure by querying the downstream

    neighbor using STATUS messages. If the neighbor has lost its state

    but is available, then the ST agent may try and reconstruct

    (explained below) the affected streams, for those streams that do

    not have the NoRecovery option selected. If it cannot communicate

    with the next-hop, then the ST agent detecting the failure sends a

    DISCONNECT message, for each affected stream, with the appropriate

    ReasonCode (STAgentFailure) toward the affected targets. It does so

    to speed up failure recovery in case the communication may be

    unidirectional and this message might be delivered successfully.

   Based on the NoRecovery option, the ST agent that is the previous-hop

   before the failed component takes the following actions:

o   If the NoRecovery option is selected, then the ST agent sends, per

    affected stream, a REFUSE message with the appropriate ReasonCode

    (STAgentFailure) to the previous-hop. The TargetList in these

    messages contains all the targets that were reached through the

    broken branch. As discussed in Section 5.1.2, multiple REFUSE

    messages may be required if the PDU is too long for the MTU of the

    intervening network. The REFUSE message is propagated all the way to

    the origin. The application at the origin can attempt recovery of

    the stream by sending a new CONNECT to the affected targets. For

    established streams, the new CONNECT will be treated by intermediate

    ST agents as an addition of new targets into the established stream.

o   If the NoRecovery option is not selected, the ST agent can attempt

    recovery of the affected streams. It does so one a stream by stream

    basis by issuing a new CONNECT message to the affected targets. If

    the ST agent cannot find new routes to some targets, or if the only

    route to some targets is through the previous-hop, then it sends one

    or more REFUSE messages to the previous-hop with the appropriate

    ReasonCode (CantRecover) specifying the affected targets in the

    TargetList. The previous-hop can then attempt recovery of the stream

    by issuing a CONNECT to those targets. If it cannot find an

    appropriate route, it will propagate the REFUSE message toward the

    origin.

   Regardless of which ST agent attempts recovery of a damaged stream,

   it will issue one or more CONNECT messages to the affected targets.

   These CONNECT messages are treated by intermediate ST agents as

   additions of new targets into the established stream. The FlowSpecs

   of the new CONNECT messages are the same as the ones contained in the

   most recent CONNECT or CHANGE messages that the ST agent had sent

   toward the affected targets when the stream was operational.

   Upon receiving an ACCEPT during the a stream recovery, the agent

   reconstructing the stream must ensure that the FlowSpec and other

   stream attributes (e.g., MaxMsgSize and RecoveryTimeout) of the re-

   established stream are equal to, or are less restrictive, than the

   pre-failure stream. If they are more restrictive, the recovery

   attempt must be aborted. If they are equal, or are less restrictive,

   then the recovery attempt is successful. When the attempt is a

   success, failure recovery related ACCEPTs are not forwarded upstream

   by the recovering agent.

   Any ST agent that decides that enough recovery attempts have been

   made, or that recovery attempts have no chance of succeeding, may

   indicate that no further attempts at recovery should be made. This is

   done by setting the N-bit in the REFUSE message, see Section 10.4.11.

   This bit must be set by agents, including the target, that know that

   there is no chance of recovery succeeding. An ST agent that receives

   a REFUSE message with the N-bit set (1) will not attempt recovery,

   regardless of the NoRecovery option, and it will set the N-bit when

   propagating the REFUSE message upstream.

6.2.1 Problems in Stream Recovery

   The reconstruction of a broken stream may not proceed smoothly. Since

   there may be some delay while the information concerning the failure

   is propagated throughout an internet, routing errors may occur for

   some time after a failure. As a result, the ST agent attempting the

   recovery may receive ERROR messages for the new CONNECTs that are

   caused by internet routing errors. The ST agent attempting the

   recovery should be prepared to resend CONNECTs before it succeeds in

   reconstructing the stream. If the failure partitions the internet and

   a new set of routes cannot be found to the targets, the REFUSE

   messages will eventually be propagated to the origin, which can then

   inform the application so it can decide whether to terminate or to

   continue to attempt recovery of the stream.

   The new CONNECT may at some point reach an ST agent downstream of the

   failure before the DISCONNECT does. In this case, the ST agent that

   receives the CONNECT is not yet aware that the stream has suffered a

   failure, and will interpret the new CONNECT as resulting from a

   routing failure. It will respond with an ERROR message with the

   appropriate ReasonCode (StreamExists). Since the timeout that the ST

   agents immediately preceding the failure and immediately following

   the failure are approximately the same, it is very likely that the

   remnants of the broken stream will soon be torn down by a DISCONNECT

   message. Therefore, the ST agent that receives the ERROR message with

   ReasonCode (StreamExists) should retransmit the CONNECT message after

   the ToConnect timeout expires. If this fails again, the request will

   be retried for NConnect times. Only if it still fails will the ST

   agent send a REFUSE message with the appropriate ReasonCode

   (RouteLoop) to its previous-hop. This message will be propagated back

   to the ST agent that is attempting recovery of the damaged stream.

   That ST agent can issue a new CONNECT message if it so chooses. The

   REFUSE is matched to a CONNECT message created by a recovery

   operation through the LnkReference field in the CONNECT.

   ST agents that have propagated a CONNECT message and have received a

   REFUSE message should maintain this information for some period of

   time. If an ST agent receives a second CONNECT message for a target

   that recently resulted in a REFUSE, that ST agent may respond with a

   REFUSE immediately rather than attempting to propagate the CONNECT.

   This has the effect of pruning the tree that is formed by the

   propagation of CONNECT messages to a target that is not reachable by

   the routes that are selected first. The tree will pass through any

   given ST agent only once, and the stream setup phase will be

   completed faster.

   If a CONNECT message reaches a target, the target should as

   efficiently as possible use the state that it has saved from before

   the stream failed during recovery of the stream. It will then issue

   an ACCEPT message toward the origin. The ACCEPT message will be

   intercepted by the ST agent that is attempting recovery of the

   damaged stream, if not the origin. If the FlowSpec contained in the

   ACCEPT specifies the same selection of parameters as were in effect

   before the failure, then the ST agent that is attempting recovery

   will not propagate the ACCEPT. FlowSpec comparison is done by the

   LRM. If the selections of the parameters are different, then the ST

   agent that is attempting recovery will send the origin a NOTIFY

   message with the appropriate ReasonCode (FailureRecovery) that

   contains a FlowSpec that specifies the new parameter values. The

   origin may then have to change its data generation characteristics

   and the stream's parameters with a CHANGE message to use the newly

   recovered subtree.

6.3 Stream Preemption

   As mentioned in Section 1.4.5, it is possible that the LRM decides to

   break a stream intentionally. This is called stream preemption.

   Streams are expected to be preempted in order to free resources for a

   new stream which has a higher priority.

   If the LRM decides that it is necessary to preempt one or more of the

   stream traversing it, the decision on which streams have to be

   preempted has to be made. There are two ways for an application to

   influence such decision:

   1. based on FlowSpec information. For instance, with the ST2+

       FlowSpec, streams can be assigned a precedence value from 0

       (least important) to 256 (most important). This value is

       carried in the FlowSpec when the stream is setup, see Section

       9.2, so that the LRM is informed about it.

   2. with the group mechanism. An application may specify that a set

       of streams are related to each other and that they are all

       candidate for preemption if one of them gets preempted. It can

       be done by using the fate-sharing relationship defined in

       Section 7.1.2. This helps the LRM making a good choice when

       more than one stream have to be preempted, because it leads to

       breaking a single application as opposed to as many

       applications as the number of preempted streams.

   If the LRM preempts a stream, it must notify the local ST agent. The

   following actions are performed by the ST agent:

o   The ST agent at the host where the stream was preempted sends

    DISCONNECT messages with the appropriate ReasonCode

    (StreamPreempted) toward the affected targets. It sends a REFUSE

    message with the appropriate ReasonCode (StreamPreempted) to the

    previous-hop.

o   A previous-hop ST agent of the preempted stream acts as in case of

    failure recovery, see Section 6.2.

o   A next-hop ST agent of the preempted stream acts as in case of

    failure recovery, see Section 6.2.

   Note that, as opposite to failure recovery, there is no need to

   verify that the failure actually occurred, because this is explicitly

   indicated by the ReasonCode (StreamPreempted).

7. A Group of Streams

   There may be need to associate related streams. The group mechanism

   is simply an association technique that allows ST agents to identify

   the different streams that are to be associated.

   A group consists of a set of streams and a relationship. The set of

   streams may be empty. The relationship applies to all group members.

   Each group is identified by a group name. The group name must be

   globally unique.

   Streams belong to the same group if they have the same GroupName in

   the GroupName field of the Group parameter, see Section 10.3.2. The

   relationship is defined by the Relationship field. Group membership

   must be specified at stream creation time and persists for the whole

   stream lifetime. A single stream may belong to multiple groups.

   The ST agent that creates a new group is called group initiator. Any

   ST agent can be a group initiator. The initiator allocates the

   GroupName and the Relationship among group members. The initiator may

   or may not be the origin of a stream belonging to the group.

   GroupName generation is described in Section 8.2.

7.1 Basic Group Relationships

   This version of ST defines four basic group relationships. An ST2+

   implementation must support all four basic relationships. Adherence

   to specified relationships are usually best effort. The basic

   relationships are described in detail below in Section 7.1.1 -

   Section 7.1.4.

7.1.1 Bandwidth Sharing

   Streams associated with the same group share the same network

   bandwidth. The intent is to support applications such as audio

   conferences where, of all participants, only some are allowed to

   speak at one time. In such a scenario, global bandwidth utilization

   can be lowered by allocating only those resources that can be used at

   once, e.g., it is sufficient to reserve bandwidth for a small set of

   audio streams.

   The basic concept of a shared bandwidth group is that the LRM will

   allocate up to some specified multiplier of the most demanding stream

   that it knows about in the group. The LRM will allocate resources

   incrementally, as stream setup requests are received, until the total

   group requirements are satisfied. Subsequent setup requests will

   share the group's resources and will not need any additional

   resources allocated. The procedure will result in standard allocation

   where only one stream in a group traverses an agent, and shared

   allocations where multiple streams traverse an agent.

   To illustrate, let's call the multiplier mentioned above "N", and the

   most demanding stream that an agent knows about in a group Bmax. For

   an application that intends to allow three participants to speak at

   the same time, N has a value of three and each LRM will allocate for

   the group an amount of bandwidth up to 3*Bmax even when there are

   many more steams in the group. The LRM will reserve resources

   incrementally, per stream request, until N*Bmax resources are

   allocated. Each agent may be traversed by a different set and number

   of streams all belonging to the same group.

   An ST agent receiving a stream request presents the LRM with all

   necessary group information, see Section 4.5.2.2. If maximum

   bandwidth, N*Bmax, for the group has already been allocated and a new

   stream with a bandwidth demand less than Bmax is being established,

   the LRM won't allocate any further bandwidth.

   If there is less than N*Bmax resources allocated, the LRM will expand

   the resources allocated to the group by the amount requested in the

   new FlowSpec, up to N*Bmax resources. The LRM will update the

   FlowSpec based on what resources are available to the stream, but not

   the total resources allocated for the group.

   It should be noted that ST agents and LRMs become aware of a group's

   requirements only when the streams belonging to the group are

   created. In case of the bandwidth sharing relationship, an

   application should attempt to establish the most demanding streams

   first to minimize stream setup efforts. If on the contrary the less

   demanding streams are built first, it will be always necessary to

   allocate additional bandwidth in consecutive steps as the most

   demanding streams are built. It is also up to the applications to

   coordinate their different FlowSpecs and decide upon an appropriate

   value for N.

7.1.2 Fate Sharing

   Streams belonging to this group share the same fate. If a stream is

   deleted, the other members of the group are also deleted. This is

   intended to support stream preemption by indicating which streams are

   mutually related. If preemption of multiple streams is necessary,

   this information can be used by the LRM to delete a set of related

   streams, e.g., with impact on a single application, instead of making

   a random choice with the possible effect of interrupting several

   different applications. This attribute does not apply to normal

   stream shut down, i.e., ReasonCode (ApplDisconnect). On normal

   disconnect, other streams belonging to such groups remain active.

   This relationship provides a hint on which streams should be

   preempted. Still, the LRM responsible for the preemption is not

   forced to behave accordingly, and other streams could be preempted

   first based on different criteria.

7.1.3 Route Sharing

   Streams belonging to this group share the same paths as much as is

   possible. This can be desirable for several reasons, e.g., to exploit

   the same allocated resources or in the attempt to maintain the

   transmission order. An ST agent attempts to select the same path

   although the way this is implemented depends heavily on the routing

   algorithm which is used.

   If the routing algorithm is sophisticated enough, an ST agent can

   suggest that a stream is routed over an already established path.

   Otherwise, it can ask the routing algorithm for a set of legal routes

   to the destination and check whether the desired path is included in

   those feasible.

   Route sharing is a hint to the routing algorithm used by ST. Failing

   to route a stream through a shared path should not prevent the

   creation of a new stream or result in the deletion of an existing

   stream.

7.1.4 Subnet Resources Sharing

   This relationship provides a hint to the data link layer functions.

   Streams belonging to this group may share the same MAC layer

   resources. As an example, the same MAC layer multicast address may be

   used for all the streams in a given group. This mechanism allows for

   a better utilization of MAC layer multicast addresses and it is

   especially useful when used with network adapters that offer a very

   small number of MAC layer multicast addresses.

7.2 Relationships Orthogonality

   The four basic relationships, as they have been defined, are

   orthogonal. This means, any combinations of the basic relationships

   are allowed. For instance, let's consider an application that

   requires full-duplex service for a stream with multiple targets.

   Also, let's suppose that only N targets are allowed to send data back

   to the origin at the same time. In this scenario, all the reverse

   streams could belong to the same group. They could be sharing both

   the paths and the bandwidth attributes. The Path&Bandwidth sharing

   relationship is obtained from the basic set of relationships. This

   example is important because it shows how full-duplex service can be

   efficiently obtained in ST.

8. Ancillary Functions

   Certain functions are required by ST host and intermediate agent

   implementations. Such functions are described in this section.

8.1 Stream ID Generation

   The stream ID, or SID, is composed of 16-bit unique identifier and

   the stream origin's 32-bit IP address. Stream IDs must be globally

   unique. The specific definition and format of the 16 -bit field is

   left to the implementor. This field is expected to have only local

   significance.

   An ST implementation has to provide a stream ID generator facility,

   so that an application or higher layer protocol can obtain a unique

   IDs from the ST layer. This is a mechanism for the application to

   request the allocation of stream ID that is independent of the

   request to create a stream. The Stream ID is used by the application

   or higher layer protocol when creating the streams.

   For instance, the following two functions could be made available:

   o   AllocateStreamID() -> result, StreamID

   o   ReleaseStreamID(StreamID) -> result

   An implementation may also provide a StreamID deletion function.

8.2 Group Name Generator

   GroupName generation is similar to Stream ID generation. The

   GroupName includes a 16-bit unique identifier, a 32-bit creation

   timestamp, and a 32-bit IP address. Group names are globally unique.

   A GroupName includes the creator's IP address, so this reduces a

   global uniqueness problem to a simple local problem. The specific

   definitions and formats of the 16-bit field and the 32-bit creation

   timestamp are left to the implementor. These fields must be locally

   unique, and only have local significance.

   An ST implementation has to provide a group name generator facility,

   so that an application or higher layer protocol can obtain a unique

   GroupName from the ST layer. This is a mechanism for the application

   to request the allocation of a GroupName that is independent of the

   request to create a stream. The GroupName is used by the application

   or higher layer protocol when creating the streams that are to be

   part of the group.

   For instance, the following two functions could be made available:

   o   AllocateGroupName() -> result, GroupName

   o   ReleaseGroupName(GroupName) -> result

   An implementation may also provide a GroupName deletion function.

8.3 Checksum Computation

   The standard Internet checksum algorithm is used for ST: "The

   checksum field is the 16-bit one's complement of the one's complement

   sum of all 16-bit words in the header. For purposes of computing the

   checksum, the value of the checksum field is zero (0)." See

   [RFC1071], [RFC1141], and [RFC791] for suggestions for efficient

   checksum algorithms.

8.4 Neighbor ST Agent Identification and Information Collection

   The STATUS message can be used to collect information about neighbor

   ST agents, streams the neighbor supports, and specific targets of

   streams the neighbor supports. An agent receiving a STATUS message

   provides the requested information via a STATUS-RESPONSE message.

   The STATUS message can be used to collect different information from

   a neighbor. It can be used to:

o   identify ST capable neighbors. If an ST agent wishes to check if

    a neighbor is ST capable, it should generate a STATUS message with

    an SID which has all its fields set to zero. An agent receiving a

    STATUS message with such SID should answer with a STATUS-RESPONSE

    containing the same SID, and no other stream information. The

    receiving ST agent must answer as soon as possible to aid in Round

    Trip Time estimation, see Section 8.5;

o   obtain information on a particular stream. If an ST agent wishes to

    check a neighbor's general information related to a specific

    stream, it should generate a STATUS message containing the stream's

    SID. An ST agent receiving such a message, will first check to see

    if the stream is known. If not known, the receiving ST agent sends a

    STATUS-RESPONSE containing the same SID, and no other stream

    information. If the stream is known, the receiving ST agent sends a

    STATUS-RESPONSE containing the stream's SID, IPHops, FlowSpec, group

    membership (if any), and as many targets as can be included in a

    single message as limited by MTU, see Section 5.1.2. Note that all

    targets may not be included in a response to a request for general

    stream information. If information on a specific target in a stream

    is desired, the mechanism described next should be used.

o   obtain information on particular targets in a stream. If an ST agent

    wishes to check a neighbor's information related to one or more

    specific targets of a specific stream, it should generate a STATUS

    message containing the stream's SID and a TargetList parameter

    listing the relevant targets. An ST agent receiving such a message,

    will first check to see if the stream and target are known. If the

    stream is not known, the agent follows the process described above.

    If both the stream and targets are known, the agent responds with

    STATUS-RESPONSE containing the stream's SID, IPHops, FlowSpec, group

    membership (if any), and the requested targets that are known. If

    the stream is known but the target is not, the agent responds with a

    STATUS-RESPONSE containing the stream's SID, IPHops, FlowSpec, group

    membership (if any), but no targets.

   The specific formats for STATUS and STATUS-RESPONSE messages are

   defined in Section 10.4.12 and Section 10.4.13.

8.5 Round Trip Time Estimation

   SCMP is made reliable through use of retransmission when an expected

   acknowledgment is not received in a timely manner. Timeout and

   retransmission algorithms are implementation dependent and are

   outside the scope of this document. However, it must be reasonable

   enough not to cause excessive retransmission of SCMP messages while

   maintaining the robustness of the protocol. Algorithms on this

   subject are described in [WoHD95], [Jaco88], [KaPa87].

   Most existing algorithms are based on an estimation of the Round Trip

   Time (RTT) between two hosts. With SCMP, if an ST agent wishes to

   have an estimate of the RTT to and from a neighbor, it should

   generate a STATUS message with an SID which has all its fields set to

   zero. An ST agent receiving a STATUS message with such SID should

   answer as rapidly as possible with a STATUS-RESPONSE message

   containing the same SID, and no other stream information. The time

   interval between the send and receive operations can be used as an

   estimate of the RTT to and from the neighbor.

8.6 Network MTU Discovery

   At connection setup, the application at the origin asks the local ST

   agent to create streams with certain QoS requirements. The local ST

   agent fills out its network MTU value in the MaxMsgSize parameter in

   the CONNECT message and forwards it to the next-hop ST agents. Each

   ST agent in the path checks to see if it's network MTU is smaller

   than the one specified in the CONNECT message and, if it is, the ST

   agent updates the MaxMsgSize in the CONNECT message to it's network

   MTU. If the target application decides to accept the stream, the ST

   agent at the target copies the MTU value in the CONNECT message to

   the MaxMsgSize field in the ACCEPT message and sends it back to the

   application at the origin. The MaxMsgSize field in the ACCEPT message

   is the minimum MTU of the intervening networks to that target. If the

   application has multiple targets then the minimum MTU of the stream

   is the smallest MaxMsgSize received from all the ACCEPT messages. It

   is the responsibility of the application to segment its PDUs

   according to the minimum MaxMsgSize of the stream since no data

   fragmentation is supported during the data transfer phase. If a

   particular target's MaxMsgSize is unacceptable to an application, it

   may disconnect the target from the stream and assume that the target

   cannot be supported. When evaluating a particular target's

   MaxMsgSize, the application or the application interface will need to

   take into account the size of the ST data header.

8.7 IP Encapsulation of ST

   ST packets may be encapsulated in IP to allow them to pass through

   routers that don't support the ST Protocol. Of course, ST resource

   management is precluded over such a path, and packet overhead is

   increased by encapsulation, but if the performance is reasonably

   predictable this may be better than not communicating at all.

   IP-encapsulated ST packets begin with a normal IP header. Most fields

   of the IP header should be filled in according to the same rules that

   apply to any other IP packet. Three fields of special interest are:

o   Protocol is 5, see [RFC1700], to indicate an ST packet is enclosed,

    as opposed to TCP or UDP, for example.

o   Destination Address is that of the next-hop ST agent. This may or

    may not be the target of the ST stream. There may be an intermediate

    ST agent to which the packet should be routed to take advantage of

    service guarantees on the path past that agent. Such an intermediate

    agent would not be on a directly-connected network (or else IP

    encapsulation wouldn't be needed), so it would probably not be

    listed in the normal routing table. Additional routing mechanisms,

    not defined here, will be required to learn about such agents.

o   Type-of-Service may be set to an appropriate value for the service

    being requested, see [RFC1700]. This feature is not implemented

    uniformly in the Internet, so its use can't be precisely defined

    here.

   IP encapsulation adds little difficulty for the ST agent that

   receives the packet. However, when IP encapsulation is performed it

   must be done in both directions. To process the encapsulated IP

   message, the ST agents simply remove the IP header and proceed with

   ST header as usual.

   The more difficult part is during setup, when the ST agent must

   decide whether or not to encapsulate. If the next-hop ST agent is on

   a remote network and the route to that network is through a router

   that supports IP but not ST, then encapsulation is required. The

   routing function provides ST agents with the route and capability

   information needed to support encapsulation.

   On forwarding, the (mostly constant) IP Header must be inserted and

   the IP checksum appropriately updated.

   Applications are informed about the number of IP hops traversed on

   the path to each target. The IPHops field of the CONNECT message, see

   Section 10.4.4, carries the number of traversed IP hops to the target

   application. The field is incremented by each ST agent when IP

   encapsulation will be used to reach the next-hop ST agent. The number

   of IP hops traversed is returned to the origin in the IPHops field of

   the ACCEPT message, Section 10.4.1.

   When using IP Encapsulation, the MaxMsgSize field will not reflect

   the MTU of the IP encapsulated segments. This means that IP

   fragmentation and reassembly may be needed in the IP cloud to support

   a message of MaxMsgSize. IP fragmentation can only occur when the MTU

   of the IP cloud, less IP header length, is the smallest MTU in a

   stream's network path.

8.8 IP Multicasting

   If an ST agent must use IP encapsulation to reach multiple next-hops

   toward different targets, then either the packet must be replicated

   for transmission to each next-hop, or IP multicasting may be used if

   it is implemented in the next-hop ST agents and in the intervening IP

   routers.

   When the stream is established, the collection of next-hop ST agents

   must be set up as an IP multicast group. The ST agent must allocate

   an appropriate IP multicast address (see Section 10.3.3) and fill

   that address in the IPMulticastAddress field of the CONNECT message.

   The IP multicast address in the CONNECT message is used to inform the

   next-hop ST agents that they should join the multicast group to

   receive subsequent PDUs. Obviously, the CONNECT message itself must

   be sent using unicast. The next-hop ST agents must be able to receive

   on the specified multicast address in order to accept the connection.

   If the next-hop ST agent can not receive on the specified multicast

   address, it sends a REFUSE message with ReasonCode (BadMcastAddress).

   Upon receiving the REFUSE, the upstream agent can choose to retry

   with a different multicast address. Alternatively, it can choose to

   lose the efficiency of multicast and use unicast delivery.

   The following permanent IP multicast addresses have been assigned to

   ST:

           224.0.0.7 All ST routers (intermediate agents)

           224.0.0.8 All ST hosts (agents)

   In addition, a block of transient IP multicast addresses, 224.1.0.0 -

   224.1.255.255, has been allocated for ST multicast groups. For

   instance, the following two functions could be made available:

   o   AllocateMcastAddr() -> result, McastAddr

   o   ListenMcastAddr(McastAddr) -> result

   o   ReleaseMcastAddr(McastAddr) -> result

9. The ST2+ Flow Specification

   This section defines the ST2+ flow specification. The flow

   specification contains the user application requirements in terms of

   quality of service. Its contents are LRM dependent and are

   transparent to the ST2 setup protocol. ST2 carries the flow

   specification as part of the FlowSpec parameter, which is described

   in Section 10.3.1. The required ST2+ flow specification is included

   in the protocol only to support interoperability. ST2+ also defines a

   "null" flow specification to be used only to support testing.

   ST2 is not dependent on a particular flow specification format and it

   is expected that other versions of the flow specification will be

   needed in the future. Different flow specification formats are

   distinguished by the value of the Version field of the FlowSpec

   parameter, see Section 10.3.1. A single stream is always associated

   with a single flow specification format, i.e., the Version field is

   consistent throughout the whole stream. The following Version field

   values are defined:

   0 - Null FlowSpec       /* must be supported */

   1 - ST Version 1

   2 - ST Version 1.5

   3 - RFC 1190 FlowSpec

   4 - HeiTS FlowSpec

   5 - BerKom FlowSpec

   6 - RFC 1363 FlowSpec

   7 - ST2+ FlowSpec       /* must be supported */

   FlowSpecs version #0 and #7 must be supported by ST2+

   implementations. Version numbers in the range 1-6 indicate flow

   specifications are currently used in existing ST2 implementations.

   Values in the 128-255 range are reserved for private and experimental

   use.

   In general, a flow specification may support sophisticated flow

   descriptions. For example, a flow specification could represent sub-

   flows of a particular stream. This could then be used to by a

   cooperating application and LRM to forward designated packets to

   specific targets based on the different sub-flows. The reserved bits

   in the ST2 Data PDU, see Section 10.1, may be used with such a flow

   specification to designate packets associated with different sub-

   flows. The ST2+ FlowSpec is not so sophisticated, and is intended for

   use with applications that generate traffic at a single rate for

   uniform delivery to all targets.

9.1 FlowSpec Version #0 - (Null FlowSpec)

   The flow specification identified by a #0 value of the Version field

   is called the Null FlowSpec. This flow specification causes no

   resources to be allocated. It is ignored by the LRMs. Its contents

   are never updated. Stream setup takes place in the usual way leading

   to successful stream establishment, but no resources are actually

   reserved.

   The purpose of the Null FlowSpec is that of facilitating

   interoperability tests by allowing streams to be built without

   actually allocating the correspondent amount of resources. The Null

   FlowSpec may also be used for testing and debugging purposes.

   The Null FlowSpec comprises the 4-byte FlowSpec parameter only, see

   Section 10.3.1. The third byte (Version field) must be set to 0.

9.2 FlowSpec Version #7 - ST2+ FlowSpec

   The flow specification identified by a #7 value of the Version field

   is the ST2+ FlowSpec, to be used by all ST2+ implementations. It

   allows the user applications to express their real-time requirements

   in the form of a QoS class, precedence, and three basic QoS

   parameters:

   o   message size,

   o   message rate,

   o   end-to-end delay.

   The QoS class indicates what kind of QoS guarantees are expected by

   the application, e.g., strict guarantees or predictive, see Section

   9.2.1. QoS parameters are expressed via a set of values:

o   the "desired" values indicate the QoS desired by the application.

    These values are assigned by the application and never modified by

    the LRM.

o   the "limit" values indicate the lowest QoS the application is

    willing to accept. These values are also assigned by the application

    and never modified by the LRM.

o   the "actual" values indicate the QoS that the system is able to

    provide. They are updated by the LRM at each node. The "actual"

    values are always bounded by the "limit" and "desired" values.

9.2.1 QoS Classes

   Two QoS classes are defined:

   1 - QOS_PREDICTIVE      /* QoSClass field value = 0x01, must be

                              supported*/

   2 - QOS_GUARANTEED      /* QoSClass field value = 0x10, optional */

o   The QOS_PREDICTIVE class implies that the negotiated QoS may be

    violated for short time intervals during the data transfer. An

    application has to provide values that take into account the

    "normal" case, e.g., the "desired" message rate is the allocated rate

    for the transmission. Reservations are done for the "normal" case as

    opposite to the peak case required by the QOS_GUARANTEED service

    class. This QoS class must be supported by all implementations.

o   The QOS_GUARANTEED class implies that the negotiated QoS for the

    stream is never violated during the data transfer. An application

    has to provide values that take into account the worst possible

    case, e.g., the "desired" message rate is the peak rate for the

    transmission. As a result, sufficient resources to handle the peak

    rate are reserved. This strategy may lead to overbooking of

    resources, but it provides strict real-time guarantees. Support of

    this QoS class is optional.

   If a LRM that doesn't support class QOS_GUARANTEED receives a

   FlowSpec containing QOS_GUARANTEED class, it informs the local ST

   agent. The ST agent may try different paths or delete the

   correspondent portion of the stream as described in Section 5.5.3,

   i.e., ReasonCode (FlowSpecError).

9.2.2 Precedence

   Precedence is the importance of the connection being established.

   Zero represents the lowest precedence. The lowest level is expected

   to be used by default. In general, the distinction between precedence

   and priority is that precedence specifies streams that are permitted

   to take previously committed resources from another stream, while

   priority identifies those PDUs that a stream is most willing to have

   dropped.

9.2.3 Maximum Data Size

   This parameter is expressed in bytes. It represents the maximum

   amount of data, excluding ST and other headers, allowed to be sent in

   a messages as part of the stream. The LRM first checks whether it is

   possible to get the value desired by the application (DesMaxSize). If

   not, it updates the actual value (ActMaxSize) with the available size

   unless this value is inferior to the minimum allowed by the

   application (LimitMaxSize), in which case it informs the local ST

   agent that it is not possible to build the stream along this path.

9.2.4 Message Rate

   This parameter is expressed in messages/second. It represents the

   transmission rate for the stream. The LRM first checks whether it is

   possible to get the value desired by the application (DesRate). If

   not, it updates the actual value (ActRate) with the available rate

   unless this value is inferior to the minimum allowed by the

   application (LimitRate), in which case it informs the local ST agent

   that it is not possible to build the stream along this path.

9.2.5 Delay and Delay Jitter

   The delay parameter is expressed in milliseconds. It represents the

   maximum end-to-end delay for the stream. The LRM first checks whether

   it is possible to get the value desired by the application

   (DesMaxDelay). If not, it updates the actual value (ActMaxDelay) with

   the available delay unless this value is greater than the maximum

   delay allowed by the application (LimitMaxDelay), in which case it

   informs the local ST agent that it is not possible to build the

   stream along this path.

   The LRM also updates at each node the MinDelay field by incrementing

   it by the minimum possible delay to the next-hop. Information on the

   minimum possible delay allows to calculate the maximum end-to-end

   delay range, i.e., the time interval in which a data packet can be

   received. This interval should not exceed the DesMaxDelayRange value

   indicated by the application. The maximum end-to-end delay range is

   an upper bound of the delay jitter.

9.2.6 ST2+ FlowSpec Format

   The ST2+ FlowSpec has the following format:

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |    QosClass   | Precedence   |            0(unused)          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                             DesRate                           |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                            LimitRate                          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                             ActRate                           |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |            DesMaxSize         |           LimitMaxSize        |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |            ActMaxSize         |           DesMaxDelay         |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |            LimitMaxDelay      |           ActMaxDelay         |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |            DesMaxDelayRange   |           ActMinDelay         |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                        Figure 9: The ST2+ FlowSpec.

   The LRM modifies only "actual" fields, i.e., those beginning with

   "Act". The user application assigns values to all other fields.

o   QoSClass indicates which of the two defined classes of service

    applies. The two classes are: QOS_PREDICTIVE (QoSClass = 1) and

    QOS_GUARANTEED (QoSClass = 2).

o   Precedence indicates the stream's precedence. Zero represents the

    lowest precedence, and should be the default value.

o   DesRate is the desired transmission rate for the stream in messages/

    second. This field is set by the origin and is not modified by

    intermediate agents.

o   LimitRate is the minimum acceptable transmission rate in messages/

    second. This field is set by the origin and is not modified by

    intermediate agents.

o   ActRate is the actual transmission rate allocated for the stream in

    messages/second. Each agent updates this field with the available

    rate unless this value is less than LimitRate, in which case a

    REFUSE is generated.

o   DesMaxSize is the desired maximum data size in bytes that will be

    sent in a message in the stream. This field is set by the origin.

o   LimitMaxSize is the minimum acceptable data size in bytes. This

    field is set by the origin

o   ActMaxSize is the actual maximum data size that may be sent in a

    message in the stream. This field is updated by each agent based on

    MTU and available resources. If available maximum size is less than

    LimitMaxSize, the connection must be refused with ReasonCode

    (CantGetResrc).

o   DesMaxDelay is the desired maximum end-to-end delay for the stream

    in milliseconds. This field is set by the origin.

o   LimitMaxDelay is the upper-bound of acceptable end-to-end delay for

    the stream in milliseconds. This field is set by the origin.

o   ActMaxDelay is the maximum end-to-end delay that will be seen by

    data in the stream. Each ST agent adds to this field the maximum

    delay that will be introduced by the agent, including transmission

    time to the next-hop ST agent. If the actual maximum exceeds

    LimitMaxDelay, then the connection is refused with ReasonCode

    (CantGetResrc).

o   DesMaxDelayRange is the desired maximum delay range that may be

    encountered end-to-end by stream data in milliseconds. This value is

    set by the application at the origin.

o   ActMinDelay is the actual minimum end-to-end delay that will be

    encountered by stream data in milliseconds. Each ST agent adds to

    this field the minimum delay that will be introduced by the agent,

    including transmission time to the next-hop ST agent. Each agent

    must add at least 1 millisecond. The delay range for the stream can

    be calculated from the actual maximum and minimum delay fields. It

    is expected that the range will be important to some applications.

10. ST2 Protocol Data Units Specification

10.1 Data PDU

   IP and ST packets can be distinguished by the IP Version Number

   field, i.e., the first four (4) bits of the packet; ST has been

   assigned the value 5 (see [RFC1700]). There is no requirement for

   compatibility between IP and ST packet headers beyond the first four

   bits. (IP uses value 4.)

   The ST PDUs sent between ST agents consist of an ST Header

   encapsulating either a higher layer PDU or an ST Control Message.

   Data packets are distinguished from control messages via the D-bit

   (bit 8) in the ST header.

   The ST Header also includes an ST Version Number, a total length

   field, a header checksum, a unique id, and the stream origin 32-bit

   IP address. The unique id and the stream origin 32-bit IP address

   form the stream id (SID). This is shown in Figure 10. Please refer to

   Section 10.6 for an explanation of the notation.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | ST=5 | Ver=3 |D| Pri |   0   |            TotalBytes         |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |          HeaderChecksum       |            UniqueID           |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                         OriginIPAddress                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                            Figure 10: ST Header

o   ST is the IP Version Number assigned to identify ST packets. The

    value for ST is 5.

o   Ver is the ST Version Number. The value for the current ST2+ version

    is 3.

o   D (bit 8) is set to 1 in all ST data packets and to 0 in all SCMP

    control messages.

o   Pri (bits 9-11) is the packet-drop priority field with zero (0)

    being lowest priority and seven the highest. The field is to be used

    as described in Section 3.2.2.

o   TotalBytes is the length, in bytes, of the entire ST packet, it

    includes the ST Header but does not include any local network

    headers or trailers. In general, all length fields in the ST

    Protocol are in units of bytes.

o   HeaderChecksum covers only the ST Header (12 bytes). The ST Protocol

    uses 16-bit checksums here in the ST Header and in each Control

    Message. For checksum computation, see Section 8.3.

o   UniqueID is the first element of the stream ID (SID). It is locally

    unique at the stream origin, see Section 8.1.

o   OriginIPAddress is the second element of the SID. It is the 32-bit

    IP address of the stream origin, see Section 8.1.

   Bits 12-15 must be set to zero (0) when using the flow specifications

   defined in this document, see Section 9. They may be set accordingly

   when other flow specifications are used, e.g., as described in

   [WoHD95].

10.1.1 ST Data Packets

   ST packets whose D-bit is non-zero are data packets. Their

   interpretation is a matter for the higher layer protocols and

   consequently is not specified here. The data packets are not

   protected by an ST checksum and will be delivered to the higher layer

   protocol even with errors. ST agents will not pass data packets over

   a new hop whose setup is not complete.

10.2 Control PDUs

   SCMP control messages are exchanged between neighbor ST agents using

   a D-bit of zero (0). The control protocol follows a request-response

   model with all requests expecting responses. Retransmission after

   timeout (see Section 4.3) is used to allow for lost or ignored

   messages. Control messages do not extend across packet boundaries; if

   a control message is too large for the MTU of a hop, its information

   is partitioned and a control message per partition is sent (see

   Section 5.1.2). All control messages have the following format

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | OpCode       |     Options   |           TotalBytes          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |          Reference            |          LnkReference         |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                         SenderIPAddress                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |            Checksum           |            ReasonCode         |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                      OpCodeSpecificData                       :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                    Figure 11: ST Control Message Format

o   OpCode identifies the type of control message.

o   Options is used to convey OpCode-specific variations for a control

    message.

o   TotalBytes is the length of the control message, in bytes, including

    all OpCode specific fields and optional parameters. The value is

    always divisible by four (4).

o   Reference is a transaction number. Each sender of a request control

    message assigns a Reference number to the message that is unique

    with respect to the stream. The Reference number is used by the

    receiver to detect and discard duplicates. Each acknowledgment

    carries the Reference number of the request being acknowledged.

    Reference zero (0) is never used, and Reference numbers are assumed

    to be monotonically increasing with wraparound so that the older-

    than and more-recent-than relations are well defined.

o   LnkReference contains the Reference field of the request control

    message that caused this request control message to be created. It

    is used in situations where a single request leads to multiple

    responses from the same ST agent. Examples are CONNECT and CHANGE

    messages that are first acknowledged hop-by-hop and then lead to an

    ACCEPT or REFUSE response from each target.

o   SenderIPAddress is the 32-bit IP address of the network interface

    that the ST agent used to send the control message. This value

    changes each time the packet is forwarded by an ST agent (hop-by-

    hop).

o   Checksum is the checksum of the control message. Because the control

    messages are sent in packets that may be delivered with bits in

    error, each control message must be checked to be error free before

    it is acted upon.

o   ReasonCode is set to zero (0 = NoError) in most SCMP messages.

    Otherwise, it can be set to an appropriate value to indicate an

    error situation as defined in Section 10.5.3.

o   OpCodeSpecificData contains any additional information that is

    associated with the control message. It depends on the specific

    control message and is explained further below. In some response

    control messages, fields of zero (0) are included to allow the

    format to match that of the corresponding request message. The

    OpCodeSpecificData may also contain optional parameters. The

    specifics of OpCodeSpecificData are defined in Section 10.3.

10.3 Common SCMP Elements

   Several fields and parameters (referred to generically as elements)

   are common to two or more PDUs. They are described in detail here

   instead of repeating their description several times. In many cases,

   the presence of a parameter is optional. To permit the parameters to

   be easily defined and parsed, each is identified with a PCode byte

   that is followed by a PBytes byte indicating the length of the

   parameter in bytes (including the PCode, PByte, and any padding

   bytes). If the length of the information is not a multiple of four

   (4) bytes, the parameter is padded with one to three zero (0) bytes.

   PBytes is thus always a multiple of four (4). Parameters can be

   present in any order.

10.3.1 FlowSpec

   The FlowSpec parameter (PCode = 1) is used in several SCMP messages

   to convey the ST2 flow specification. The FlowSpec parameter has the

   following format:

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |   PCode = 1   |    PBytes     |   Version     |       0       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                        FlowSpec detail                        :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                       Figure 12: FlowSpec Parameter

o   the Version field contains the FlowSpec version.

o   the FlowSpec detail field contains the flow specification and is

    transparent to the ST agent. It is the data structure to be passed

    to the LRM. It must be 4-byte aligned.

   The Null FlowSpec, see Section 9.1, has no FlowSpec detail field.

   PBytes is set to four (4), and Version is set to zero (0). The ST2+

   FlowSpec, see Section 9.2, is a 32-byte data structure. PBytes is set

   to 36, and Version is set to seven (7).

10.3.2 Group

   The Group parameter (PCode = 2) is an optional argument used to

   indicate that the stream is a member in the specified group.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | PCode = 2    |   PBytes = 16 |           GroupUniqueID       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                        GroupCreationTime                      |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                     GroupInitiatorIPAddress                   |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |            Relationship       |                 N             |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                         Figure 13: Group Parameter

o   GroupUniqueID, GroupInitiatorIPAddress, and GroupCreationTime

    together form the GroupName field. They are allocated by the group

    name generator function, see Section 8.2. GroupUniqueID and

    GroupCreationTime are implementation specific and have only local

    definitions.

o   Relationship has the following format:

                                            0

                        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5

                       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                       |    0 (unused)         |S|P|F|B|

                       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                       Figure 14: Relationship Field

   The B, F, P, S bits correspond to Bandwidth, Fate, Path, and Subnet

   resources sharing, see Section 7. A value of 1 indicates that the

   relationship exists for this group. All combinations of the four bits

   are allowed. Bits 0-11 of the Relationship field are reserved for

   future use and must be set to 0.

o   N contains a legal value only if the B-bit is set. It is the value

    of the N parameter to be used as explained in Section 7.1.1.

10.3.3 MulticastAddress

   The MulticastAddress parameter (PCode = 3) is an optional parameter

   that is used when using IP encapsulation and setting up an IP

   multicast group. This parameter is used to communicate the desired IP

   multicast address to next-hop ST agents that should become members of

   the group, see Section 8.8.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | PCode = 3    |   PBytes = 8 |                0              |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                        IPMulticastAddress                     |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                        Figure 15: MulticastAddress

o   IPMulticastAddress is the 32-bit IP multicast address to be used to

    receive data packets for the stream.

10.3.4 Origin

   The Origin parameter (PCode = 4) is used to identify the next higher

   protocol, and the SAP being used in conjunction with that protocol.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | PCode = 5    |   PBytes      | NextPcol      |OriginSAPBytes |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                OriginSAP                      :     Padding   |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                             Figure 16: Origin

o   NextPcol is an 8-bit field used in demultiplexing operations to

    identify the protocol to be used above ST. The values of NextPcol

    are in the same number space as the IP header's Protocol field and

    are consequently defined in the Assigned Numbers RFC [RFC1700].

o   OriginSAPBytes specifies the length of the OriginSAP, exclusive of

    any padding required to maintain 32-bit alignment.

o   OriginSAP identifies the origin's SAP associated with the NextPcol

    protocol.

   Note that the 32-bit IP address of the stream origin is not included

   in this parameter because it is always available as part of the ST

   header.

10.3.5 RecordRoute

   The RecordRoute parameter (PCode = 5) is used to request that the

   route between the origin and a target be recorded and delivered to

   the user application. The ST agent at the origin (or target)

   including this parameter, has to determine the parameter's length,

   indicated by the PBytes field. ST agents processing messages

   containing this parameter add their receiving IP address in the

   position indicated by the FreeOffset field, space permitting. If no

   space is available, the parameter is passed unchanged. When included

   by the origin, all agents between the origin and the target add their

   IP addresses and this information is made available to the

   application at the target. When included by the target, all agents

   between the target and the origin, inclusive, add their IP addresses

   and this information is made available to the application at the

   origin.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |   PCode = 5   |     PBytes    |       0       | FreeOffset   |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                          IP Address 1                         |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                              ...                              :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                          IP Address N                         |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                           Figure 17: RecordRoute

o   PBytes is the length of the parameter in bytes. Length is determined

    by the agent (target or origin) that first introduces the parameter.

    Once set, the length of the parameter remains unchanged.

o   FreeOffset indicates the offset, relative to the start of the

    parameter, for the next IP address to be recorded. When the

    FreeOffset is greater than, or equal to, PBytes the RecordRoute

    parameter is full.

o   IP Address is filled in, space permitting, by each ST agent

    processing this parameter.

10.3.6 Target and TargetList

   Several control messages use a parameter called TargetList (PCode =

   6), which contains information about the targets to which the message

   pertains. For each Target in the TargetList, the information includes

   the 32-bit IP address of the target, the SAP applicable to the next

   higher layer protocol, and the length of the SAP (SAPBytes).

   Consequently, a Target structure can be of variable length. Each

   entry has the format shown in Figure 18.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                        Target IP Address                      |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | TargetBytes | SAPBytes     |     SAP       :    Padding    |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                             Figure 18: Target

o   TargetIPAddress is the 32-bit IP Address of the Target.

o   TargetBytes is the length of the Target structure, beginning with

    the TargetIPAddress.

o   SAPBytes is the length of the SAP, excluding any padding required to

    maintain 32-bit alignment.

o   SAP may be longer than 2 bytes and it includes a padding when

    required. There would be no padding required for SAPs with lengths

    of 2, 6, 10, etc., bytes.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | PCode = 6    |   PBytes      |           TargetCount = N     |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                           Target 1                            |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                               :                               :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                           Target N                            |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                           Figure 19: TargetList

10.3.7 UserData

   The UserData parameter (PCode = 7) is an optional parameter that may

   be used by the next higher protocol or an application to convey

   arbitrary information to its peers. This parameter is propagated in

   some control messages and its contents have no significance to ST

   agents. Note that since the size of control messages is limited by

   the smallest MTU in the path to the targets, the maximum size of this

   parameter cannot be specified a priori. If the size of this parameter

   causes a message to exceed the network MTU, an ST agent behaves as

   described in Section 5.1.2. The parameter must be padded to a

   multiple of 32 bits.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | PCode = 7    |   PBytes      |           UserBytes           |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                      UserInfo                 :   Padding     |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                            Figure 20: UserData

o   UserBytes specifies the number of valid UserInfo bytes.

o   UserInfo is arbitrary data meaningful to the next higher protocol

    layer or application.

10.3.8 Handling of Undefined Parameters

   An ST agent must be able to handle all parameters listed above. To

   support possible future uses, parameters with unknown PCodes must

   also be supported. If an agent receives a message containing a

   parameter with an unknown Pcode value, the agent should handle the

   parameter as if it was a UserData parameter. That is, the contents of

   the parameter should be ignored, and the message should be

   propagated, as appropriate, along with the related control message.

10.4 ST Control Message PDUs

   ST Control messages are described in the following section. Please

   refer to Section 10.6 for an explanation of the notation.

10.4.1 ACCEPT

   ACCEPT (OpCode = 1) is issued by a target as a positive response to a

   CONNECT message. It implies that the target is prepared to accept

   data from the origin along the stream that was established by the

   CONNECT. ACCEPT is also issued as a positive response to a CHANGE

   message. It implies that the target accepts the proposed stream

   modification.

   ACCEPT is relayed by the ST agents from the target to the origin

   along the path established by CONNECT (or CHANGE) but in the reverse

   direction. ACCEPT must be acknowledged with ACK at each hop.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | OpCode = 1   |      0        |           TotalBytes          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |      Reference                |         LnkReference          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                         SenderIPAddress                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |            Checksum           |          ReasonCode = 0       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |          MaxMsgSize           |          RecoveryTimeout      |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                      StreamCreationTime                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |   IPHops      |                        0                      |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                           FlowSpec                            :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                           TargetList                          :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                           RecordRoute                         :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                           UserData                            :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                     Figure 21: ACCEPT Control Message

o   Reference contains a number assigned by the ST agent sending ACCEPT

    for use in the acknowledging ACK.

o   LnkReference is the Reference number from the corresponding CONNECT

    (or CHANGE)

o   MaxMsgSize indicates the smallest MTU along the path traversed by

    the stream. This field is only set when responding to a CONNECT

    request.

o   RecoveryTimeout reflects the nominal number of milliseconds that the

    application is willing to wait for a failed system component to be

    detected and any corrective action to be taken. This field

    represents what can actually be supported by each participating

    agent, and is only set when responding to a CONNECT request.

o   StreamCreationTime is the 32- bits system dependent timestamp copied

    from the corresponding CONNECT request.

o   IPHops is the number of IP encapsulated hops traversed by the

    stream. This field is set to zero by the origin, and is incremented

    at each IP encapsulating agent.

10.4.2 ACK

   ACK (OpCode = 2) is used to acknowledge a request. The ACK message is

   not propagated beyond the previous-hop or next-hop ST agent.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | OpCode = 2   |     0         |           TotalBytes          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |       Reference               |           LnkReference = 0    |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                         SenderIPAddress                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |       Checksum                |           ReasonCode          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                       Figure 22: ACK Control Message

o   Reference is the Reference number of the control message being

    acknowledged.

o   ReasonCode is usually NoError, but other possibilities exist, e.g.,

    DuplicateIgn.

10.4.3 CHANGE

   CHANGE (OpCode = 3) is used to change the FlowSpec of an established

   stream. The CHANGE message is processed similarly to CONNECT, except

   that it travels along the path of an established stream. CHANGE must

   be propagated until it reaches the related stream's targets. CHANGE

   must be acknowledged with ACK at each hop.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | OpCode = 3   |G|I|     0     |           TotalBytes          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |           Reference           |          LnkReference = 0     |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                        SenderIPAddress                        |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |            Checksum           |          ReasonCode = 0       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                            FlowSpec                           :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                           TargetList                          :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                           RecordRoute                         :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                            UserData                           :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                     Figure 23: CHANGE Control Message

o   G (bit 8) is used to request a global, stream-wide change; the

    TargetList parameter should be omitted when the G bit is specified.

o   I (bit 7) is used to indicate that the LRM is permitted to interrupt

    and, if needed, break the stream in the process of trying to satisfy

    the requested change.

o   Reference contains a number assigned by the ST agent sending CHANGE

    for use in the acknowledging ACK.

10.4.4 CONNECT

   CONNECT (OpCode = 4) requests the setup of a new stream or an

   addition to or recovery of an existing stream. Only the origin can

   issue the initial set of CONNECTs to setup a stream, and the first

   CONNECT to each next-hop is used to convey the SID.

   The next-hop initially responds with an ACK, which implies that the

   CONNECT was valid and is being processed. The next-hop will later

   relay back either an ACCEPT or REFUSE from each target. An

   intermediate ST agent that receives a CONNECT behaves as explained in

   Section 4.5.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | OpCode = 4   |J N|S|    0    |           TotalBytes          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |           Reference           |          LnkReference = 0     |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                         SenderIPAddress                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |           Checksum            |          ReasonCode = 0       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |           MaxMsgSize          |          RecoveryTimeout      |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                        StreamCreationTime                     |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |   IPHops      |                        0                      |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                             Origin                            :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                           FlowSpec                            :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                          TargetList                           :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                          RecordRoute                          :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                             Group                             :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                        MulticastAddress                       :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                            UserData                           :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                     Figure 24: CONNECT Control Message

o   JN (bits 8 and 9) indicate the join authorization level for the

    stream, see Section 4.4.2.

o   S (bit 10) indicates the NoRecovery option (Section 4.4.1). When the

    S-bit is set (1), the NoRecovery option is specified for the stream.

o   Reference contains a number assigned by the ST agent sending CONNECT

    for use in the acknowledging ACK.

o   MaxMsgSize indicates the smallest MTU along the path traversed by

    the stream. This field is initially set to the network MTU of the

    agent issues the CONNECT.

o   RecoveryTimeout is the nominal number of milliseconds that the

    application is willing to wait for failed system component to be

    detected and any corrective action to be taken.

o   StreamCreationTime is the 32- bits system dependent timestamp

    generated by the ST agent issuing the CONNECT.

o   IPHops is the number of IP encapsulated hops traversed by the

    stream. This field is set to zero by the origin, and is incremented

    at each IP encapsulating agent.

10.4.5 DISCONNECT

   DISCONNECT (OpCode = 5) is used by an origin to tear down an

   established stream or part of a stream, or by an intermediate ST

   agent that detects a failure between itself and its previous-hop, as

   distinguished by the ReasonCode. The DISCONNECT message specifies the

   list of targets that are to be disconnected. An ACK is required in

   response to a DISCONNECT message. The DISCONNECT message is

   propagated all the way to the specified targets. The targets are

   expected to terminate their participation in the stream.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | OpCode = 5   |G|    0        |           TotalBytes          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |      Reference                |     LnkReference = 0          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                         SenderIPAddress                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |            Checksum           |          ReasonCode           |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                      GeneratorIPAddress                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                           TargetList                          :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                            UserData                           :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                   Figure 25: DISCONNECT Control Message

o   G (bit 8) is used to request a DISCONNECT of all the stream's

    targets. TargetList should be omitted when the G-bit is set (1). If

    TargetList is present, it is ignored.

o   Reference contains a number assigned by the ST agent sending

    DISCONNECT for use in the acknowledging ACK.

o   ReasonCode reflects the event that initiated the message.

o   GeneratorIPAddress is the 32-bit IP address of the host that first

    generated the DISCONNECT message.

10.4.6 ERROR

   ERROR (OpCode = 6) is sent in acknowledgment to a request in which an

   error is detected. No action is taken on the erroneous request. No

   ACK is expected. The ERROR message is not propagated beyond the

   previous-hop or next-hop ST agent. An ERROR is never sent in response

   to another ERROR. The receiver of an ERROR is encouraged to try again

   without waiting for a retransmission timeout.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | OpCode = 6   |       0       |           TotalBytes          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |      Reference                |     LnkReference = 0          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                         SenderIPAddress                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |            Checksum           |        ReasonCode             |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                           PDUInError                          :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                      Figure 26: ERROR Control Message

o   Reference is the Reference number of the erroneous request.

o   ReasonCode indicates the error that triggered the message.

o   PDUInError is the PDU in error, beginning with the ST Header. This

    parameter is optional. Its length is limited by network MTU, and may

    be truncated when too long.

10.4.7 HELLO

   HELLO (OpCode = 7) is used as part of the ST failure detection

   mechanism, see Section 6.1.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | OpCode = 7   |R|    0        |           TotalBytes          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |       Reference = 0           |        LnkReference = 0       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                         SenderIPAddress                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |         Checksum              |          ReasonCode = 0       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                          HelloTimer                           |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                      Figure 27: HELLO Control Message

o   R (bit 8) is used for the Restarted-bit.

o   HelloTimer represents the time in millisecond since the agent was

    restarted, modulo the precision of the field. It is used to detect

    duplicate or delayed HELLO messages.

10.4.8 JOIN

   JOIN (OpCode = 8) is used as part of the ST steam joining mechanism,

   see Section 4.6.3.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | OpCode = 8   |      0        |           TotalBytes          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |      Reference                |         LnkReference = 0      |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                         SenderIPAddress                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |            Checksum           |          ReasonCode = 0       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                      GeneratorIPAddress                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                          TargetList                           :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                      Figure 28: JOIN Control Message

o   Reference contains a number assigned by the ST agent sending JOIN

    for use in the acknowledging ACK.

o   GeneratorIPAddress is the 32-bit IP address of the host that

    generated the JOIN message.

o   TargetList is the information associated with the target to be added

    to the stream.

10.4.9 JOIN-REJECT

   JOIN-REJECT (OpCode = 9) is used as part of the ST steam joining

   mechanism, see Section 4.6.3.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | OpCode = 9   |      0        |           TotalBytes          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |      Reference                |          LnkReference         |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                         SenderIPAddress                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |            Checksum           |          ReasonCode           |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                      GeneratorIPAddress                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                   Figure 29: JOIN-REJECT Control Message

o   Reference contains a number assigned by the ST agent sending the

    REFUSE for use in the acknowledging ACK.

o   LnkReference is the Reference number from the corresponding JOIN

    message.

o   ReasonCode reflects the reason why the JOIN request was rejected.

o   GeneratorIPAddress is the 32-bit IP address of the host that first

    generated the JOIN-REJECT message.

10.4.10 NOTIFY

   NOTIFY (OpCode = 10) is issued by an ST agent to inform other ST

   agents of events that may be significant. NOTIFY may be propagated

   beyond the previous-hop or next-hop ST agent depending on the

   ReasonCode, see Section 10.5.3; NOTIFY must be acknowledged with an

   ACK.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | OpCode = 10 |      0        |           TotalBytes          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |      Reference                |         LnkReference = 0      |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                         SenderIPAddress                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |            Checksum           |          ReasonCode           |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                      DetectorIPAddress                        |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |          MaxMsgSize           |          RecoveryTimeout      |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                           FlowSpec                            :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                           TargetList                          :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                           UserData                            :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                     Figure 30: NOTIFY Control Message

o   Reference contains a number assigned by the ST agent sending the

    NOTIFY for use in the acknowledging ACK.

o   ReasonCode identifies the reason for the notification.

o   DetectorIPAddress is the 32-bit IP address of the ST agent that

    detects the event.

o   MaxMsgSize is set when the MTU of the listed targets has changed

    (e.g., due to recovery), or when the notification is generated after

    a successful JOIN. Otherwise it is set to zero (0).

o   RecoveryTimeout is set when the notification is generated after a

    successful JOIN. Otherwise it is set to zero (0).

o   FlowSpec is present when the notification is generated after a

    successful JOIN.

o   TargetList is present when the notification is related to one or

    more targets, or when MaxMsgSize is set

o   UserData is present if the notification is generated after a

    successful JOIN and the UserData parameter was set in the ACCEPT

    message.

10.4.11 REFUSE

   REFUSE (OpCode = 11) is issued by a target that either does not wish

   to accept a CONNECT message or wishes to remove itself from an

   established stream. It might also be issued by an intermediate ST

   agent in response to a CONNECT or CHANGE either to terminate a

   routing loop, or when a satisfactory next-hop to a target cannot be

   found. It may also be a separate command when an existing stream has

   been preempted by a higher precedence stream or an ST agent detects

   the failure of a previous-hop, next-hop, or the network between them.

   In all cases, the TargetList specifies the targets that are affected

   by the condition. Each REFUSE must be acknowledged by an ACK.

   The REFUSE is relayed back by the ST agents to the origin (or

   intermediate ST agent that created the CONNECT or CHANGE) along the

   path traced by the CONNECT. The ST agent receiving the REFUSE will

   process it differently depending on the condition that caused it, as

   specified in the ReasonCode field. No special effort is made to

   combine multiple REFUSE messages since it is considered most unlikely

   that separate REFUSEs will happen to both pass through an ST agent at

   the same time and be easily combined, e.g., have identical

   ReasonCodes and parameters.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | OpCode = 11 |G|E|N|    0    |           TotalBytes          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |      Reference                |         LnkReference          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                         SenderIPAddress                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |            Checksum           |          ReasonCode           |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                       DetectorIPAddress                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                       ValidTargetIPAddress                    |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                          TargetList                           :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                         RecordRoute                           :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                            UserData                           :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                     Figure 31: REFUSE Control Message

o   G (bit 8) is used to indicate that all targets down stream from the

    sender are refusing. It is expected that this will be set most

    commonly due to network failures. The TargetList parameter is

    ignored or not present when this bit is set, and must be included

    when not set.

o   E (bit 9) is set by an ST agent to indicate that the request failed

    and that the pre-change stream attributes, including resources, and

    the stream itself still exist.

o   N (bit 10) is used to indicate that no further attempts to recover

    the stream should be made. This bit must be set when stream recovery

    should not be attempted, even in the case where the target

    application has shut down normally (ApplDisconnect).

o   Reference contains a number assigned by the ST agent sending the

    REFUSE for use in the acknowledging ACK.

o   LnkReference is either the Reference number from the corresponding

    CONNECT or CHANGE, if it is the result of such a message, or zero

    when the REFUSE was originated as a separate command.

o   DetectorIPAddress is the 32-bit IP address of the host that first

    generated the REFUSE message.

o   ValidTargetIPAddress is the 32-bit IP address of a host that is

    properly connected as part of the stream. This parameter is only

    used when recovering from stream convergence, otherwise it is set to

    zero (0).

10.4.12 STATUS

   STATUS (OpCode = 12) is used to inquire about the existence of a

   particular stream identified by the SID. Use of STATUS is intended

   for collecting information from an neighbor ST agent, including

   general and specific stream information, and round trip time

   estimation. The use of this message type is described in Section 8.4.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | OpCode = 12   |       0       |           TotalBytes          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |      Reference                |       LnkReference = 0        |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                         SenderIPAddress                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |            Checksum           |          ReasonCode = 0       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                          TargetList                           :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                     Figure 32: STATUS Control Message

o   Reference contains a number assigned by the ST agent sending STATUS

    for use in the replying STATUS-RESPONSE.

o   TargetList is an optional parameter that when present indicates that

    only information related to the specific targets should be relayed

    in the STATUS-RESPONSE.

10.4.13 STATUS-RESPONSE

   STATUS-RESPONSE (OpCode = 13) is the reply to a STATUS message. If

   the stream specified in the STATUS message is not known, the STATUS-

   RESPONSE will contain the specified SID but no other parameters. It

   will otherwise contain the current SID, FlowSpec, TargetList, and

   possibly Groups of the stream. It the full target list can not fit in

   a single message, only those targets that can be included in one

   message will be included. As mentioned in Section 10.4.12, it is

   possible to request information on a specific target.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       | OpCode = 13 |    0          |           TotalBytes          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |      Reference                |       LnkReference = 0        |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |                         SenderIPAddress                       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |            Checksum           |       ReasonCode = 0          |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                           FlowSpec                            :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                           Groups                              :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       :                          TargetList                           :

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                 Figure 33: STATUS-RESPONSE Control Message

o   Reference contains a number assigned by the ST agent sending the

    STATUS.

10.5 Suggested Protocol Constants

   The ST Protocol uses several fields that must have specific values

   for the protocol to work, and also several values that an

   implementation must select. This section specifies the required

   values and suggests initial values for others. It is recommended that

   the latter be implemented as variables so that they may be easily

   changed when experience indicates better values. Eventually, they

   should be managed via the normal network management facilities.

   ST uses IP Version Number 5.

   When encapsulated in IP, ST uses IP Protocol Number 5.

10.5.1 SCMP Messages

   1)      ACCEPT

   2)      ACK

   3)      CHANGE

   4)      CONNECT

   5)      DISCONNECT

   6)      ERROR

   7)      HELLO

   8)      JOIN

   9)      JOIN-REJECT

   10)     NOTIFY

   11)     REFUSE

   12)     STATUS

   13)     STATUS-RESPONSE

10.5.2 SCMP Parameters

   1)      FlowSpec

   2)      Group

   3)      MulticastAddress

   4)      Origin

   5)      RecordRoute

   6)      TargetList

   7)      UserData

10.5.3 ReasonCode

   Several errors may occur during protocol processing. All ST error

   codes are taken from a single number space. The currently defined

   values and their meaning is presented in the list below. Note that

   new error codes may be defined from time to time. All implementations

   are expected to handle new codes in a graceful manner. If an unknown

   ReasonCode is encountered, it should be assumed to be fatal. The

   ReasonCode is an 8-bit field. Following values are defined:

1       NoError         No error has occurred.

2       ErrorUnknown    An error not contained in this list has been

                        detected.

3       AccessDenied    Access denied.

4       AckUnexpected   An unexpected ACK was received.

5       ApplAbort       The application aborted the stream abnormally.

6       ApplDisconnect The application closed the stream normally.

7       ApplRefused     Applications refused requested connection or

                        change.

8       AuthentFailed   The authentication function failed.

9       BadMcastAddress IP Multicast address is unacceptable in CONNECT

10      CantGetResrc    Unable to acquire (additional) resources.

11      CantRelResrc    Unable to release excess resources.

12      CantRecover     Unable to recover failed stream.

13      CksumBadCtl     Control PDU has a bad message checksum.

14      CksumBadST      PDU has a bad ST Header checksum.

15      DuplicateIgn    Control PDU is a duplicate and is being

                        acknowledged.

16      DuplicateTarget Control PDU contains a duplicate target, or an

                        attempt to add an existing target.

17      FlowSpecMismatch        FlowSpec in request does not match

                                existing FlowSpec.

18      FlowSpecError   An error occurred while processing the FlowSpec

19      FlowVerUnknown Control PDU has a FlowSpec Version Number that

                        is not supported.

20      GroupUnknown    Control PDU contains an unknown Group Name.

21      InconsistGroup An inconsistency has been detected with the

                        streams forming a group.

22      IntfcFailure    A network interface failure has been detected.

23      InvalidSender   Control PDU has an invalid SenderIPAddress

                        field.

24      InvalidTotByt   Control PDU has an invalid TotalBytes field.

25      JoinAuthFailure Join failed due to stream authorization level.

26      LnkRefUnknown   Control PDU contains an unknown LnkReference.

27      NetworkFailure A network failure has been detected.

28      NoRouteToAgent Cannot find a route to an ST agent.

29      NoRouteToHost   Cannot find a route to a host.

30      NoRouteToNet    Cannot find a route to a network.

31      OpCodeUnknown   Control PDU has an invalid OpCode field.

32      PCodeUnknown    Control PDU has a parameter with an invalid

                        PCode.

33      ParmValueBad    Control PDU contains an invalid parameter value.

34      PathConvergence Two branches of the stream join during the

                        CONNECT setup.

35      ProtocolUnknown Control PDU contains an unknown next-higher

                        layer protocol identifier.

36      RecordRouteSize RecordRoute parameter is too long to permit

                        message to fit a network's MTU.

37      RefUnknown      Control PDU contains an unknown Reference.

38      ResponseTimeout Control message has been acknowledged but not

                        answered by an appropriate control message.

39      RestartLocal    The local ST agent has recently restarted.

40      RestartRemote   The remote ST agent has recently restarted.

41      RetransTimeout An acknowledgment has not been received after

                        several retransmissions.

42      RouteBack       Route to next-hop through same interface as

                        previous-hop and is not previous-hop.

43      RouteInconsist A routing inconsistency has been detected.

44      RouteLoop       A routing loop has been detected.

45      SAPUnknown      Control PDU contains an unknown next-higher

                        layer SAP (port).

46      SIDUnknown      Control PDU contains an unknown SID.

47      STAgentFailure An ST agent failure has been detected.

48      STVer3Bad       A received PDU is not ST Version 3.

49      StreamExists    A stream with the given SID already exists.

50      StreamPreempted The stream has been preempted by one with a

                        higher precedence.

51      TargetExists    A CONNECT was received that specified an

                        existing target.

52      TargetUnknown   A target is not a member of the specified

                        stream.

53      TargetMissing   A target parameter was expected and is not

                        included, or is empty.

54      TruncatedCtl    Control PDU is shorter than expected.

55      TruncatedPDU    A received ST PDU is shorter than the ST Header

                        indicates.

56      UserDataSize    UserData parameter too large to permit a

                        message to fit into a network's MTU.

10.5.4 Timeouts and Other Constants

   SCMP uses retransmission to effect reliability and thus has several

   "retransmission timers". Each "timer" is modeled by an initial time

   interval (ToXxx), which may get updated dynamically through

   measurement of control traffic, and a number of times (NXxx) to

   retransmit a message before declaring a failure. All time intervals

   are in units of milliseconds. Note that the variables are described

   for reference purposes only, different implementations may not

   include the identical variables.

Value   Timeout Name    Meaning

------------------------------------------------------------------------

500   ToAccept        Initial hop-by-hop timeout for acknowledgment of

                        ACCEPT

    3   NAccept         ACCEPT retries before failure

500   ToChange        Initial hop-by-hop timeout for acknowledgment of

                        CHANGE

    3   NChange         CHANGE retries before failure

5000   ToChangeResp    End-to-End CHANGE timeout for receipt of ACCEPT

                        or REFUSE

500   ToConnect       Initial hop-by-hop timeout for acknowledgment of

                        CONNECT

    5   NConnect        CONNECT retries before failure

5000   ToConnectResp   End-to-End CONNECT timeout for receipt of ACCEPT

                        or REFUSE from targets by origin

500   ToDisconnect    Initial hop-by-hop timeout for acknowledgment of

                        DISCONNECT

    3   NDisconnect     DISCONNECT retries before failure

500   ToJoin          Initial hop-by-hop timeout for acknowledgment of

                        JOIN

    3   NJoin           JOIN retries before failure

500   ToJoinReject    Initial hop-by-hop timeout for acknowledgment of

                        JOIN-REJECT

    3   NJoinReject     JOIN-REJECT retries before failure

5000   ToJoinResp      Timeout for receipt of CONNECT or JOIN-REJECT

                        from origin or intermediate hop

500   ToNotify        Initial hop-by-hop timeout for acknowledgment of

                        NOTIFY

    3   NNotify         NOTIFY retries before failure

500   ToRefuse        Initial hop-by-hop timeout for acknowledgment of

                        REFUSE

    3   NRefuse         REFUSE retries before failure

500   ToRetryRoute    Timeout for receipt of ACCEPT or REFUSE from

                        targets during failure recovery

    5   NRetryRoute     CONNECT retries before failure

1000   ToStatusResp    Timeout for receipt of STATUS-RESPONSE

    3   NStatus         STATUS retries before failure

10000   HelloTimerHoldDown      Interval that Restarted bit must be set

                                after ST restart

    5   HelloLossFactor         Number of consecutively missed HELLO

                                messages before declaring link failure

2000   DefaultRecoveryTimeout Interval between successive HELLOs

                                to/from active neighbors

10.6 Data Notations

   The convention in the documentation of Internet Protocols is to

   express numbers in decimal and to picture data with the most

   significant octet on the left and the least significant octet on the

   right.

   The order of transmission of the header and data described in this

   document is resolved to the octet level. Whenever a diagram shows a

   group of octets, the order of transmission of those octets is the

   normal order in which they are read in English. For example, in the

   following diagram the octets are transmitted in the order they are

   numbered.

        0                   1                   2                   3

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |       1       |       2       |       3       |       4       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |       5       |       6       |       7       |       8       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       |       9       |      10       |      11       |      12       |

       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                  Figure 34: Transmission Order of Bytes

   Whenever an octet represents a numeric quantity the left most bit in

   the diagram is the high order or most significant bit. That is, the

   bit labeled 0 is the most significant bit. For example, the following

   diagram represents the value 170 (decimal).

                                0 1 2 3 4 5 6 7

                               +-+-+-+-+-+-+-+-+

                               |1 0 1 0 1 0 1 0|

                               +-+-+-+-+-+-+-+-+

                      Figure 35: Significance of Bits

   Similarly, whenever a multi-octet field represents a numeric quantity

   the left most bit of the whole field is the most significant bit.

   When a multi-octet quantity is transmitted the most significant octet

   is transmitted first.

   Fields whose length is fixed and fully illustrated are shown with a

   vertical bar (|) at the end; fixed fields whose contents are

   abbreviated are shown with an exclamation point (!); variable fields

   are shown with colons (:). Optional parameters are separated from

   control messages with a blank line. The order of parameters is not

   meaningful.

11. References

[RFC1071]       Braden, R., Borman, D., and C. Partridge,

                "Computing the Internet Checksum", RFC 1071,

                USC/Information Sciences Institute,

                Cray Research, BBN Laboratories, September 1988.

[RFC1112]       Deering, S., "Host Extensions for IP Multicasting",

                STD 5, RFC 1112, Stanford University, August 1989.

RFC 1819              ST2+ Protocol Specification            August 1995

[WoHD95]        L. Wolf, R. G. Herrtwich, L. Delgrossi: Filtering

                Multimedia Data in Reservation-based Networks,

                Kommunikation in Verteilten Systemen 1995 (KiVS),

                Chemnitz-Zwickau, Germany, February 1995.

[RFC1122]       Braden, R., "Requirements for Internet Hosts --

                Communication Layers", STD 3, RFC 1122,

                USC/Information Sciences Institute, October 1989.

[Jaco88]        Jacobson, V.: Congestion Avoidance and Control, ACM

                SIGCOMM-88, August 1988.

[KaPa87]        Karn, P. and C. Partridge: Round Trip Time Estimation,

                ACM SIGCOMM-87, August 1987.

[RFC1141]       Mallory, T., and A. Kullberg, "Incremental Updating

                of the Internet Checksum", RFC 1141, BBN, January 1990.

[RFC1363]       Partridge, C., "A Proposal Flow Specification",

                RFC 1363, BBN, September 1992.

[RFC791]        Postel, J., "Internet Protocol", STD 5, RFC 791,

                DARPA, September 1981.

[RFC1700]       Reynolds, J., and J. Postel, "Assigned Numbers",

                STD 2, RFC 1700, USC/Information Sciences Institute,

                October 1994.

[RFC1190]       Topolcic C., "Internet Stream Protocol Version 2

                (ST-II)", RFC 1190, CIP Working Group, October 1990.

[RFC1633]       Braden, R., Clark, D., and S. Shenker, "Integrated

                Services in the Internet Architecture: an Overview",

                RFC 1633, USC/Information Sciences Institute,

                MIT, Xerox PARC, June 1994.

[VoHN93]        C. Vogt, R. G. Herrtwich, R. Nagarajan: HeiRAT: the

                Heidelberg Resource Administration Technique - Design

                Philosophy and Goals, Kommunikation In Verteilten

                Systemen, Munich, Informatik Aktuell, Springer-Verlag,

                Heidelberg, 1993.

[Cohe81]        D. Cohen: A Network Voice Protocol NVP-II, University of

                Southern California, Los Angeles, 1981.

[Cole81]        R. Cole: PVP - A Packet Video Protocol, University of

                Southern California, Los Angeles, 1981.

[DeAl92]        L. Delgrossi (Ed.) The BERKOM-II Multimedia Transport

                System, Version 1, BERKOM Working Document, October,

                1992.

[DHHS92]        L. Delgrossi, C. Halstrick, R. G. Herrtwich, H.

                Stuettgen: HeiTP: a Transport Protocol for ST-II,

                GLOBECOM'92, Orlando (Florida), December 1992.

[Schu94]        H. Schulzrinne: RTP: A Transport Protocol for Real-Time

                Applications. Work in Progress, 1994.

12. Security Considerations

   Security issues are not discussed in this memo.

13. Acknowledgments and Authors' Addresses

   Many individuals have contributed to the work described in this memo.

   We thank the participants in the ST Working Group for their input,

   review, and constructive comments. George Mason University C3I Center

   for hosting an interim meeting. Murali Rajagopal for his efforts on

   ST2+ state machines. Special thanks are due to Steve DeJarnett, who

   served as working group co-chair until summer 1993.

   We would also like to acknowledge the authors of [RFC1190]. All

   authors of [RFC1190] should be considered authors of this document

   since this document contains much of their text and ideas.

   Louis Berger

   BBN Systems and Technologies

   1300 North 17th Street, Suite 1200

   Arlington, VA 22209

   Phone: 703-284-4651

   EMail: lberger@bbn.com

   Luca Delgrossi

   Andersen Consulting Technology Park

   449, Route des Cretes

   06902 Sophia Antipolis, France

   Phone: +33.92.94.80.92

   EMail: luca@andersen.fr

   Dat Duong

   BBN Systems and Technologies

   1300 North 17th Street, Suite 1200

   Arlington, VA 22209

   Phone: 703-284-4760

   EMail: dat@bbn.com

   Steve Jackowski

   Syzygy Communications Incorporated

   269 Mt. Hermon Road

   Scotts Valley, CA 95066

   Phone: 408-439-6834

   EMail: stevej@syzygycomm.com

   Sibylle Schaller

   IBM ENC

   Broadband Multimedia Communications

   Vangerowstr. 18

   D69020 Heidelberg, Germany

   Phone: +49-6221-5944553

   EMail: schaller@heidelbg.ibm.com

The post RFC 1819 – Internet Stream Protocol Version 2 (ST2) Protocol Specification appeared first on IPv6.net.