0018-8670/98/$5.00 (C) 1998 IBM A security architecture for the Internet Protocol by P.-C. Cheng, J. A. Garay, A. Herzberg, and H. Krawczyk In this paper we present the design, rationale, and implementation of a security architecture for protecting the secrecy and integrity of Internet traffic at the Internet Protocol (IP) layer. The design includes three components: (1) a security policy for determining when, where, and how security measures are to be applied; (2) a modular key management protocol, called MKMP, for establishing shared secrets between communicating parties and meta-information prescribed by the security policy; and (3) the IP Security Protocol, as it is being standardized by the Internet Engineering Task Force, for applying security measures using information provided through the key management protocol. Effectively, these three components together allow for the establishment of a secure channel between any two communicating systems over the Internet. This technology is a component of IBM's firewall product and is now being ported to other IBM computer platforms. As the Internet evolves from an academic and research network into a commercial network, more and more organizations and individuals are connecting their internal networks and computers to it. The secrecy and integrity of the data transmitted over the Internet have become a primary concern, and cryptographic data encryption and authentication constitute the tools to address this concern. We subscribe to the view that the Internet Protocol (IP) layer1 is a good place to secure the data being communicated. Reasons include: (1) The IP layer is at the choke point of Internet communication; it can capture all packets sent from the higher-layer protocols and applications and all packets received by the lower-layer network protocols. (2) By the very definition of IP, security provided at this layer is independent of lower-layer protocols. (3) Security provided at this layer can be made transparent to the higher-layer protocols and applications. Many application environments can benefit from security provided at the IP layer. Figure 1 depicts some of them, including mobile-to-base communication, telecommuting, and site-to-site or system-to-system communication. Three components are needed in order to provide IP layer security through cryptographic means: 1. A security policy to define the characteristics of the desired security. Such a policy specifies how the packets between two communicating systems must be protected. 2. A key management protocol to establish and maintain the necessary information as prescribed by the security policy. Such information usually includes cryptographic algorithms, secret keys, and other parameters shared between two communicating systems. 3. A protocol for security at the IP layer to protect IP packets as prescribed by the security policy, using information provided by the key management protocol. In our design, this protocol is the emerging Internet standard IP Security (IPSEC) Protocol.2 Also needed is a concept to thread the three components together; this is the secure tunnel concept discussed in the next section. In a nutshell, a secure tunnel is a secure passage for the IP packets transmitted between two communicating systems. The characteristics of a secure tunnel are defined by the security policy; the tunnel is established and maintained by the key management protocol, and the protection of IP packets is carried out through the IPSEC protocol. This paper3 presents the protocols and system architecture that implement the above three components and the secure tunnel concept. At the heart of our security architecture are the Modular Key Management Protocol (MKMP) and the IP Security Protocol. MKMP provides the necessary information to establish a secure tunnel, and IPSEC uses this information to provide data security by cryptographic means. MKMP is a set of protocols that we have designed for the management of the cryptographic keys used by secure tunnels. It provides secure mechanisms for the derivation and periodic refreshment of the cryptographic keys as required by the multiple cryptographic functions used within a secure tunnel. IPSEC is an Internet standard from the Internet Engineering Task Force (IETF) IPSEC Working Group. (Originally described in References 2, 4, and 5, the IPSEC specifications are now under revision. See References 2, 4, and 5 and follow-ups.) IPSEC is essentially an encapsulation protocol, namely, one that defines the syntax and semantics of placing one packet inside another. First, IPSEC protocol-specific operations are performed on an IP packet to protect its secrecy or integrity, or both, by cryptographic algorithms; then the output of the operations is appended to an IPSEC protocol header to form an IPSEC packet; finally, the IPSEC packet is placed inside an IP packet so it can be routed through the Internet. An early prototype of our design became part of IBM's firewall product. In addition, our key management approach and techniques have had an impact on the Internet standard proposed by the IETF IPSEC Working Group.6 An early proposal for providing IP layer security can be found in swIPe.7,8 However, our work differs from swIPe in several aspects, in particular, in that: (1) the key management protocol is implemented and linked with network layer security protocol, and (2) the network layer security functionality is placed inside the kernel IP module and not in a network device driver. In the remainder of the paper, the next two sections describe the notion of an IP Secure Tunnel and the MKMP, respectively. The fourth section outlines the format of the IPSEC protocols. The fifth section presents the architecture and provides the details of our implementation, and the sixth section discusses its performance. In concluding the paper we mention some related ongoing activities and directions for future work. A frequently asked question is, why are both an IP layer security mechanism and a session layer security mechanism--such as the Secure Socket Layer (SSL)9 and Transport Layer Security (TLS)10--needed. In the Appendix we present a comparison of features for IP and session layer security. IP Secure Tunnel In this section we define the concept of a secure tunnel. We first make some clarifications on how the word "tunnel" is used in both the IPSEC arena and in this paper. We then discuss the concept of a security association on which the secure tunnel concept is based. We finally discuss the secure tunnel concept itself. On the word "tunnel." The word tunnel is widely used in the IPSEC arena. However, depending on the context, it could refer to several different but related concepts: 1. Conceptually, it refers to a secure passage (or channel) between two systems across the insecure Internet. This passage is a realization of the security policies of two systems. In the context of IPSEC, a security policy establishes the specific requirements and meta-characteristics of a secure passage between two given systems. The meta-characteristics of a passage usually include the identities or addresses of its two endpoints, the encapsulation mode, the cryptographic algorithms to be used, parameters for the algorithms (such as key lifetime and key size[s]), etc. A security policy may also demand more than one secure passage between the two systems, each for a specific type of communication. 2. Implementation-wise, the word tunnel refers to a set of items of information shared between the endpoints of a secure passage. This set enables the realization of a secure passage; it includes in particular the meta-characteristics and secret keys used by the cryptographic algorithms. In the IPSEC terminology, such a set is called a security association. The next subsection elaborates on SAs in more detail. However, as the subsection subsequent to that explains, an SA is not a secure tunnel but an incarnation of a secure tunnel during a particular time interval. An SA is usually created and maintained by a key management engine. 3. Finally, in the standard terminology of IPSEC, tunnel refers to one of the two encapsulation modes defined by the IPSEC standard: tunnel mode and transport mode. Both modes can be used to construct a secure passage, although they provide slightly different protection. The fourth section presents a more detailed discussion on the IPSEC standard. From now on, unless otherwise specified, we use the words "tunnel" or "secure tunnel" to denote the secure passage concept or an instance of it. Security association. A security association, or SA, is a set of items of information that, when shared between two communicating parties, enables the two parties to protect the communication in a desired way. An IPSEC SA2 includes the following meta-characteristics: o Destination ID/IP address: the intended receiver of IPSEC packets o Security protocol: the kind of security--integrity or secrecy or both--provided by the SA on the IP packets. Under the security protocol, a set of cryptographic algorithms (called transforms in IPSEC) and its parameters, such as key lifetime and key size, are specified. o Secret keys: the keys to be used by the cryptographic transforms o Encapsulation mode: indicating which part(s) of the IP packet will be protected by the SA o Security Parameter Index (SPI): the identifier of the SA. On a given system, the SPI should be unique with respect to the destination address of the SA so that the pair (destination address, SPI) uniquely identifies an SA. An IPSEC packet constructed according to an SA carries the SPI of the SA so that the destination will know how to process the packet. The encapsulation mode of the SA, the security protocol, and the cryptographic transforms define the operations to be performed on a packet. These operations are discussed in more detail later. An IPSEC SA is unidirectional in the sense that the information in it should only be used to construct and process IPSEC packets intended for the destination address in the SA. However, a secure tunnel is bidirectional. Thus, a pair of IPSEC SAs, one for inbound traffic and another for outbound traffic, are needed for each of the endpoints of a secure tunnel. Both MKMP and ISAKMP/OAKLEY, the new standard for IPSEC key management,6,11 always generate such pairs. The two SAs in a pair share the same meta-characteristics but have different keys. Our design treats such a pair as a bidirectional SA. IPSEC also includes the notion of an SA bundle. An SA bundle is a pair of IPSEC SAs with different security protocols and transforms combined together to provide the desired security. For example, if one SA protects the integrity of packets and the other provides secrecy, then the two SAs can be combined to protect both integrity and secrecy. Note that the two SAs in a bundle are shared between the same two systems and that the destination addresses of the two SAs must be the same. In the fourth section we discuss when an SA bundle may be needed. Like an IPSEC SA, an IPSEC SA bundle is unidirectional, so a pair of SA bundles is needed for bidirectional security. Both MKMP and ISAKMP/OAKLEY can generate pairs of SA bundles. Our design treats such pairs as a bidirectional SA bundle. From now on, unless otherwise specified, we will use the words "security association" or "SA" to refer to a bidirectional SA or SA bundle that provides the desired security. We will use the words "IPSEC security association" or "IPSEC SA" to denote a unidirectional SA as specified in Reference 2. Secure tunnel. This subsection explains what we mean by a "secure tunnel" in this paper and how it differs from a security association. We arrive at the concept through an example of the series of steps that are needed to establish secure communication between two systems. Consider the case of two sites or systems A and B that are connected to the Internet through two systems X and Y, respectively, where A and X, and B and Y may or may not be the same. As a first step toward having secure communication, A and B negotiate a (hypothetical) set of rules as the policy for communication between them (see Reference 12 for a clear explanation of a good, concise security policy). Policy 1: Rule 1--Packets of type 1 must go through a secure passage between X and Y, and the meta-characteristics of this passage are in set 1. Rule 2--Packets of type 2 must go through a secure passage between X and Y, and the meta-characteristics of this passage are in set 2. Here the type of packet is usually defined by the source and destination addresses of the packet, the transport layer protocol (e.g., Transmission Control Protocol [TCP] or User Datagram Protocol [UDP]), and the port numbers or types, etc. Although at first sight Policy 1 seems reasonable, another piece of information is needed, namely, the keys to be used with the cryptographic algorithms in sets 1 and 2 in order to enforce the policy. In other words, the references to sets 1 and 2 must be translated into references to SAs. Rewriting Policy 1 yields the following. Policy 2: Rule 1--Packets of type 1 must go through a secure passage between X and Y using SA 1. Rule 2--Packets of type 2 must go through a secure passage between X and Y using SA 2. Policy 2 is already usable but still not satisfactory from a cryptographic standpoint. The reason is that the keys in a security association should be changed frequently in order to defeat cryptanalysis or brute-force attacks. One first option could be changing the keys in an SA frequently but keeping everything else the same. It turns out that this solution is not good either. In such a solution, the same reference to an SA must be reused for a key and its replacement. In practice, the life of a key and its replacement must have some time overlap in order to avoid a disruption of communication caused by expiration of the key. Using the same reference for different keys will cause false security alarms because the wrong key is used to check the integrity of a message or a significant processing overhead, or both, because both the old key and the new key have to be tried. Therefore, in our design, a security association expires as its keys expire. This design decision makes Policy 2 very impractical because a stable security policy cannot be defined in terms of short-lived SAs. To resolve the conflict between the need for a stable, relatively long-term security policy and the need for changing keys frequently, we took a further look at a security association and concluded that its meta-characteristics (i.e., the SA minus its keys) are relatively stable and long term. In fact, the strength of protection usually does not depend on the exact values of the keys, but rather on their sizes, lifetime, the cryptographic algorithms, etc. With this conclusion, we are now ready to introduce the secure tunnel concept: A secure tunnel is a secure passage between two systems across the insecure Internet. It has a set of meta-characteristics determined by the security policies of the two systems. Such a secure tunnel is realized by a series of SAs that change with time; these SAs share the same meta-characteristics but have different cryptographic keys. Each of these SAs is t-he replacement of its immediate predecessor and can be considered an incarnation of the secure tunnel during the lifetime of the SA. With the secure tunnel concept, the final form of the security policy becomes the following. Policy 3: Rule 1--Packets of type 1 must go through secure tunnel 1 between X and Y. Rule 2--Packets of type 2 must go through secure tunnel 2 between X and Y. Our implementation assigns each secure tunnel an identifier (ID). Each SA will include the ID of the secure tunnel to which it belongs; this inclusion of the secure tunnel ID provides a two-way linkage between a secure tunnel and its current incarnation. In the fifth section more details are given on these aspects of the implementation. From now on, if a packet is encapsulated according to the information in a secure tunnel, we say that the packet goes or comes through the tunnel. Modular Key Management Protocol The basic goal of a key management scheme is to provide the two communicating parties with a common, "freshly" shared cryptographic key that is known to these parties only. In general, a typical key management scheme will achieve that in two phases: one in which a "master key" is shared between the parties, and the other in which the already-shared master key is used for the derivation, sharing, and refreshment of additional "session keys." The term "session key" is used here to denote short-lived keys (say, in the range of seconds or minutes); it does not imply or require a session-based communication model. The frequent change in the values of these keys usually requires a fast way to generate and share the keys so as to reduce the key exchange overhead in terms of computation and communication. The term "master key" is used to denote keys with a longer life period than a session key (say, a range of hours), and then they may allow for more time-consuming procedures for their generation and sharing. The split into these two phases is not mandatory, and in fact there are systems that do not establish (at least explicitly) this separation. However, we argue here that for most scenarios this explicit separation has a significant methodological and design value. In particular, we advocate the separation of these functions into two modules: one for the sharing of the master key, and one for the key management "below" the shared master key. Thus, our approach to key management is hierarchical--namely, session keys are derived from the shared master keys and, in turn, the master keys are derived using any of the well-established key exchange methods: public key exchange, key distribution centers (e.g., Kerberos13), and manual key installation. (See Reference 14 for a further elaboration of these various scenarios and trust models.) Our hierarchical approach is illustrated in Figure 2. In this paper we concentrate on the basic mechanisms for the derivation and management of the session keys (i.e., the "lower" module). Our protocol can be extended to support the derivation of master keys from public keys; however, the description of these particular mechanisms is beyond the scope of this paper. We refer to SKEME14 for a design of such extensions as well as for a more comprehensive treatment of security requirements and mechanisms for key management. We now turn to the description of the session key protocol. Session key negotiation protocol. The basic goal of a key negotiation protocol is to provide both intervening IP nodes with shared session keys. The keys are then used for data authentication and encryption, thus allowing the establishment of a secure tunnel between the two parties. We now list some of the goals and requirements that our session key protocol is required to support. Security handshakes--A basic assumption that we make in this paper is that parties are able to establish periodic two-way (or interactive) communication, as opposed to just sending information in a one-way mode. This interactive mode of key refreshment allows for the parties to have periodic handshakes whereby cryptographic information is refreshed and verified by both parties simultaneously. This mode has a significant impact on the security aspects of the protocol, as pointed out below. Secrecy and authenticity--For the intervening parties the protocol needs to guarantee that only the intended party learns the key exchanged and that this key is "fresh," random, and unique. Secrecy and authenticity of the exchanged key need to be protected against passive (eavesdroppers) and active (man-in-the-middle) attackers, and these properties must be guaranteed for as long as the underlying cryptographic functions in use are secure against such adversaries. Efficiency--Another important goal of a key negotiation protocol is efficiency, namely, to keep both the number of messages exchanged between the parties and the computational overhead (e.g., the number of expensive public key operations) to a minimum. It is worth noting that by having a highly efficient method for session key renewal, the need for frequently updating the master key, which is usually computationally intensive, is alleviated. Forward secrecy--Another consideration is the level of security provided by the protocol. One desired property of the session keys is that of forward secrecy:15 Even if an attacker is eventually able to derive the key for one session, past and future session (and, of course, master) keys are not compromised. Our protocol achieves this level of security for session keys. We remark that achieving such security even in the case of the compromise of a master key requires relatively costly mechanisms such as a Diffie-Hellman exchange for the sharing of the master keys. However, this cost is usually compensated by the long-lived nature of these keys. Simplicity--Finally, simplicity and being amenable to analysis and proof are important features of any cryptographic protocol. Figure 3 shows the "cryptographic skeleton" of the session key negotiation protocol, including only the relevant (i.e., related to authentication) information. In the full implementation of the protocol, further information, such as cryptographic parameters, negotiation attributes, key identifiers, etc., is transmitted and authenticated between the parties, using the same two communication flows. There are two parties, a sender S and a receiver R. S is the party that initiates the protocol. In the presentation of the protocol we use the following terminology: IDX = The identity of party X. NX = A nonce (i.e., a random number) chosen by party X. K = The currently shared master key. In our case it represents a pair of keys K1, K2. MACk = A message authentication code (or integrity check function) applied to a piece of information for authentication using a secret key k. Examples include block ciphers (e.g., Data Encryption Standard, or DES, in cipher block chain-message authentication code, or CBC-MAC, mode16), and keyed cryptographic hash functions (e.g., keyed-MD5,17 HMAC,18 etc.). fk = A pseudorandom function with key k. Roughly speaking, pseudorandom functions19 are characterized by the pseudorandomness of their output; namely, each bit in the output of the function is unpredictable if k is unknown. We use pseudorandom functions primarily for the derivation or expansion of key material given an initial key k. The examples of MAC functions given above are also commonly believed to be pseudorandom functions. SK = The session key, outcome of the protocol. As Figure 3 shows, the protocol consists of two messages (or flows). It is assumed that S and R already share two master keys, K1 and K2, as well as the nonce NR, exchanged in a previous run of the protocol. New nonces are exchanged and authenticated in each run of the protocol under the MAC function keyed with the master key K1. There is no explicit transmission of a cryptographic key from one party to the other; instead, the shared session key is computed by both parties on the basis of the master key K2 and the fresh information exchanged in the protocol. Upon (successful) termination of the protocol, parties S and R have exchanged a session key SK, besides mutually authenticating each other. (We note that the same protocol allows for periodic key refreshments within a session.) We now turn to argue how the protocol satisfies the requirements listed above. Although we omit from this paper any formal proofs, we note that the protocol presented here is structured, simple, and thus easier to analyze. Hence, rigorous methods similar to those of References 20 and 21 can be used to establish the desired security properties of the protocol. The protocol is essentially a "cryptographic handshake" that allows the parties to directly authenticate one another. The challenge provided by the nonce guarantees the freshness of the authentication and avoids the so-called replay attacks. The binding between master key, nonces, and identities provides security even in the case where several such protocols are run in parallel, and thus it prevents interleaving attacks.20 We note that keeping a shared nonce (NR) between runs of the protocol is not essential; it can be replaced by the use of time stamps (at the expense of requiring a secure clock synchronization) or by adding an extra flow to the protocol (at the expense of performance). The nonce also serves the purpose of alleviating the effect of the so-called clogging, or "denial of service" attacks, as it allows for rapid detection and dumping of maliciously replayed traffic. In any case, the nonces do not require any secrecy; that is, they can be transmitted in the clear. Also notice, as mentioned before, that the session key SK is not explicitly transmitted. This avoids the need to encrypt this key as well as the need to explicitly authenticate it. The authenticity and freshness of SK are derived from the authenticity and freshness of the expression T. Even if an adversary succeeds in replaying an old message from S to R (for example, in case a time stamp is used instead of the nonce NS), the freshness of SK is guaranteed by the incorporation of NR, chosen by R, into the MAC expression T from which SK is derived. Thus, the session keys that are derived are fresh and independent from the past session keys (the only existing dependence is to the current shared master key). Regarding efficiency, the protocol requires only two message flows and no expensive operations (e.g., exponentiations) at all, given that the parties already share a master key; only efficient symmetric-key techniques are used. The forward secrecy of the session key SK follows from the unpredictability and one-wayness properties of pseudorandom functions and our particular application of these functions to derive SK. We stress that usually more than one key is required for a single security association. For example, one needs different keys for the encryption and for the authentication of information; in some cases the keys are used unidirectionally; etc. The derivation of more than one key using the protocol of Figure 3 is straightforward: Instead of a single application of fK2(T), one can apply fK2 ("transform-id," T) for each required key, where "transform-id" is a unique identifier that identifies the algorithm for which the key is to be used (e.g., DES-CBC), the key length, the direction of the communication (e.g., for message authentication from S to R only), etc. Finally, we note that one can replace the use of the message authentication code (MAC) function in the above protocol with the pseudorandom function f. The latter would provide the authentication properties of a message authentication function. (These properties follow from the strong unpredictability requirements of a pseudorandom function.) In this case, one could use only one master key, K, to key both the uses of fK as a MAC as well as for its uses for key derivation. The IPSEC protocols This section discusses the Internet IPSEC protocols presented in References 2, 4, and 5. These protocols are not in their final form yet, though they are converging toward a stable specification. These documents will eventually become standard RFCs (requests for comments). We refer the reader to them for more details.22 We previously discussed the concept of an IPSEC security association, which is essential to the protocols. Now we discuss the syntax and semantics of an IPSEC packet by explaining how an IP packet carrying an IPSEC packet is constructed. An IPSEC packet is constructed according to the information in an IPSEC SA, and its format depends on the security protocol in that SA. The security protocol can be either Encapsulating Security Payload (ESP)4 or Authentication Header (AH).5 ESP protects the secrecy and, optionally, the integrity of a packet, whereas AH protects only integrity. Secrecy is protected through encryption, and integrity is protected by a MAC function. The specific algorithms and keys (and related parameters) for computing these cryptographic functions are specified in the IPSEC SA. The default IPSEC encryption algorithm is DES in CBC mode; the default algorithm for IPSEC message authentication code is HMAC-MD5.18,23,24 The high-level layout of an ESP packet is shown in Figure 4, and the high-level layout of an AH packet is shown in Figure 5. An IPSEC packet is either an ESP packet or an AH packet. Although the ESP protocol allows for the application of encryption without integrity protection, this mode of operation is strongly discouraged. Except for very rare exceptions, a packet requiring secrecy will also require integrity protection. Moreover, as demonstrated in Reference 25, the lack of integrity protection may lead to a loss of confidentiality even if a secure encryption function is applied. Another measure for integrity protection is providing an anti-replay defense. This type of defense prevents an attacker from re-injecting previously authenticated IP packets into the communication stream. More precisely, anti-replay measures make it possible to detect such "replayed" packets. To implement these measures, the AH and ESP protocols add a sequence number field to their headers. This field is meaningful only when the packet is authenticated via a MAC function. The value of the sequence number is maintained in an IPSEC SA. A sequence number is 32 bits long and always starts from 1. It is incremented by 1 each time an IPSEC packet is constructed according to the SA, and it is not allowed to wrap around. The receiver of an IPSEC packet (the "destination" in an IPSEC SA) uses a locally defined sliding window to keep track of which sequence numbers have already been received. It rejects those packets that fall outside that window or that carry an already-seen sequence number. Although the security protocol in an IPSEC SA determines the format of the packet, the encapsulation mode determines which part of an IP packet is protected. There are two encapsulation modes: 1. Tunnel mode: This mode protects the entire IP packet. The entire IP packet is encapsulated in an IPSEC packet, and a new IP header is constructed and attached at the beginning of the IPSEC packet to form a new IP packet. The source and destination addresses may or may not be the same as those in the encapsulated IP packets. This mode is typically used for a secure tunnel between two firewalls, or between a firewall and a remote system, i.e., whenever either of the two communicating systems is not an endpoint of the tunnel.26 The source and destination addresses in the new IP header are the addresses of the endpoints of the tunnel. 2. Transport mode: This mode only protects the transport-layer packet (such as a TCP or a UDP packet) inside an IP packet. In this mode the IP header is first separated from the transport-layer packet, then the transport-layer packet is encapsulated in an IPSEC packet, then the IP header is attached to the IPSEC packet to form a new IP packet, and finally, the length, protocol, and header checksum fields in the IP header are modified accordingly. The source and destination addresses in the IP header remain unchanged. This mode is used when the endpoints of the secure tunnel are the two communicating systems. There are two exceptions to the above description of the two modes if the security protocol is AH: 1. In the case of AH, the integrity protection extends to part of the "prepended" IP header. More details are given below when describing the procedure to construct an AH packet. 2. An ESP packet can be encapsulated in an AH packet. This may happen when an SA bundle is used to protect secrecy and integrity. Conceptually, this case can be thought of as first constructing an IP packet with an ESP packet inside, and then using this IP packet as the input to the construction of an AH packet. More details are given toward the end of this section when the use of AH and SA bundles are discussed. Constructing the IPSEC packet. With the knowledge of the security protocol, the transform, and the encapsulation mode, one can determine how to assemble an IPSEC packet from the IP packet. We first describe the procedure to construct an ESP packet: 1. Construct an ESP protocol header; this includes the SPI of the IPSEC SA and the sequence number. 2. Append the to-be-encapsulated packet to the ESP header. 3. Construct an ESP trailer and append it to the to-be-encapsulated packet. The trailer serves the following purposes: - It contains the protocol number of the to-be-encapsulated packet, such as the TCP number (6) or the IP-IN-IP number (4, if tunnel mode is used). - If a block cipher (e.g., DES) is used for encryption, then the trailer contains some padding bytes so that the size of the encapsulated packet, the padding bytes, and the protocol number (1 byte) will be an integral multiple of the block size of the cipher. 4. Encrypt the to-be-encapsulated packet and the trailer and append the ciphertext (namely the output of the encryption) to the ESP header. 5. If, as recommended, integrity protection is provided, compute the message authentication code over the ESP header and the ciphertext and append the output of the MAC computation to the ciphertext. This output is indicated by the "MAC" field in Figure 4. The ESP header, the ciphertext, and, optionally, the MAC field form the ESP packet. The procedure to construct an AH packet is as follows: 1. Construct an AH protocol header. This header includes the SPI of the IPSEC SA, the sequence number, the protocol number of the to-be-encapsulated packet, and zero-filled bytes to hold the output of the message authentication function. 2. Append the to-be-encapsulated packet to the AH header. 3. Prepend the IP header to the AH header. 4. Compute the message authentication code over the IP header, AH header, and the to-be-encapsulated packet; place the output in the AH header. The output is indicated by the word "MAC" inside the "AH header" box in Figure 5. The AH header and the packet attached to it form an AH packet. One special feature of the AH protocol is that the computation of the message authentication code includes the IP header of the IP packet that carries the AH packet. However, not all fields in the IP header are covered. Prior to the computation, a field in the IP header is zeroed if its content may change en route. Such fields are called mutable in Reference 5. After the computation, the values of these mutable fields must be restored. The efficient construction of an AH packet and restoration of the mutable fields is a tricky implementation issue not discussed here. Although ESP can provide both secrecy and integrity protection, AH is still needed for the following reasons: o If secrecy protection is not needed or prohibited by law, AH can provide integrity protection without the cost of encryption. o The integrity protection of AH covers part of the IP header, whereas that of ESP does not. Whether this difference in coverage is needed depends on the specific operational security requirements. (For example, a military application may want to protect the sensitivity labels included in the option fields in an IP header.) If both the difference in coverage and secrecy protection are needed, an IPSEC SA bundle (see earlier subsection on security association) can be used, and the result is shown in Figure 6. In this figure an ESP packet is encapsulated inside an AH packet, and the AH packet is encapsulated inside an IP packet. This layout implies that the ESP packet is constructed first. The sequence number field in the AH header enables the receiver to detect a replay attack early, therefore saving the decryption operation. System architecture Figure 7 shows the system architecture of our implementation on IBM's Advanced Interactive Executive (AIX*) operating system. It consists of three parts. On the right of the figure is the key management part, including the MKMP engine, the tunnel interface, and the tunnel cache. On the left is the IPSEC policy part, including the policy engine, the policy interface, and the policy cache. In the middle is the IPSEC encapsulation part, including the IPSEC engine and the crypto engine. There is also an IPSEC-enabled IP engine. The MKMP engine establishes and manages the secure tunnels and stores the security associations of these tunnels in the tunnel cache through the tunnel interface. The policy engine allows an administrator to define and to manage IPSEC policies; it translates a human-readable policy into binary form and stores the binary form in the policy cache through the policy interface. For each inbound and outbound IP packet, the IP engine queries the policy cache on what actions to take on the packet. For an inbound packet, the action may be either to permit or deny the packet, a decision that may be partially based on what tunnel the packet has gone through. For an outbound packet, the action may also be to encapsulate, meaning that the packet should go through a secure tunnel designated by the policy. The IPSEC encapsulation part is invoked by the IP engine to perform IPSEC encapsulation and decapsulation on IPSEC packets. We stress that an important principle followed by our design is to decouple the IPSEC encapsulation part from the key management part; doing so allows us to link different key management protocols--and their underlying distributed security infrastructures--with the IPSEC encapsulation protocol. The linkages between the two parts are the IDs of secure tunnels. The key management part generates SAs with their tunnel IDs inside. The IPSEC encapsulation part can use a secure tunnel ID to find which SA is the current incarnation of the tunnel. It can also use the destination address and SPI in a received packet to find the corresponding SA and therefore the secure tunnel to which the SA belongs. Another important design principle is to decouple the policy part from the key management part and the IPSEC encapsulation part. We believe that it should be up to the policy part to decide what to do with a packet, whereas the other two parts should implement the decision regardless of how the decision is made. Doing so allows us to accommodate different policy models and switch to better decision-making mechanisms in the future. The linkages between the policy part and the other parts are also ID's secure tunnels; the following subsection provides more details. Figures 8 and 9 show the flowcharts of IP/IPSEC output and input processing, respectively. IPSEC processing was introduced into the so-called "vanilla IP" processing by adding function call hooks into the original AIX IP code. The blocks outlined in dashes indicate IPSEC processing. On output processing, a packet passes through the output packet filter12,27 implemented by the policy cache. The outcome of output packet filtering can be: (1) permit: proceed with vanilla IP processing, (2) deny: throw away the packet and (optionally) log the event, or (3) encapsulate: perform IPSEC encapsulation on the packet based on the specification of a policy-designated secure tunnel, and then transmit the resultant new IP packet. On input processing, a packet first passes through the input packet filter implemented by the policy cache. The result of input packet filtering can be: (1) permit: proceed with vanilla IP processing, or (2) deny: throw away the packet and (optionally) log the event. If the result is "permit," then after some vanilla IP processing, the "protocol" field in the IP header is examined. If its value is either ESP or AH, which indicates that the payload is IPSEC-encapsulated, the packet is sent through the IPSEC decapsulation process implemented by the IPSEC encapsulation part; otherwise, the packet is passed directly up to the transport layer protocol. In turn, the result of IPSEC decapsulation consists of two parts: (1) the IP packet that was encapsulated, and (2) the ID of the secure tunnel through which the packet has come. This ID indicates in what way the packet has been protected. This result is fed back to the beginning of IP input processing and passes through the input filter again. Now the filter can make decisions based on the packet and the way in which it was protected. For example, the policy can state "accept the packet only if it came through a certain tunnel." We now describe the parts of the architecture in more detail. The IPSEC policy part. Conceptually, an IPSEC policy consists of packet-filtering rules. These rules determine whether an IP packet should be received (i.e., passed up to the transport layer protocol) or transmitted. Basically, the rules are put in a list, and the first rule that matches the relevant information of the IP packet determines the decision. A rule is composed of several fields, including the action to be taken (permit, deny, or encapsulate), the source address and address mask, the destination address and address mask, the transport protocol, the port number or message type (if applicable), the direction (inbound or outbound), and the secure tunnel ID. For an outbound packet, a nonzero tunnel ID specifies that the packet should be encapsulated by the IPSEC encapsulation defined by the tunnel--i.e., the packet should go through the tunnel. For an inbound packet, a nonzero tunnel ID says that the packet should be protected by the IPSEC encapsulation designated by the tunnel ID--the packet should have come through the tunnel. We have realized the policy part with the following components: o Administration tools--These tools are applications running in the user space that allow the administration to create and modify IPSEC policies. One important function of the tools is to translate human-friendly policy statements into binary form and to put the binary form into the kernel policy cache through the policy interface. o Policy interface--This interface is a pseudodevice driver acting as the interface between the administration tools and the policy cache. It allows the administration to change or view the kernel policy cache. o Policy cache--This cache is a repository for IPSEC policies in the kernel, used by the IP module. The policy cache exports an IP packet filter interface to the IP module. The interface is divided into input filtering and output filtering. The input to the interface is an IP packet. For input filtering, the output is a simple permit or deny answer. For output filtering, the output is a permit, deny, or encapsulate answer. In the case of encapsulate, the packet is encapsulated according to the current SA in the corresponding tunnel. In our implementation, the policy cache can only be changed as a whole. If the administration tool wants to add one rule to the policy, it will read the whole cache into a copy in the user space, add the rule to the copy, and write the modified copy back to the policy interface. The policy interface will lock the policy cache, do a pointer-swap operation to make the cache point to the new copy, and then unlock and delete the old copy. The reason for this wholesale manner of operation is performance. Since the processing of every packet involves searching and matching the policy cache, a per-rule lock would definitely degrade the performance. In fact, our experience shows that the number of locking operations on the policy cache should be kept to a minimum. The key management part. A brief description of the key management part is given at the beginning of this section. This subsection discusses the MKMP engine and tunnel administration tools shown in Figure 7. More details can be found in Reference 3. The MKMP engine. The MKMP engine establishes and manages secure tunnels by creating and refreshing SAs within the tunnels. It also caches the SAs of a secure tunnel in the tunnel cache through the tunnel interface. The engine is divided into two modules: the session key engine and the master key engine, which we now describe in detail. The master key engine negotiates the master keys, the first shared nonce, and the meta-characteristics of a secure tunnel, and passes the information to the session key engine in a data structure called master key context. The master key is actually a pair of keys: one key is used to authenticate the messages from the session key protocol, and the other is used as an input to the pseudorandom function in order to derive session keys (see Figure 3). The master key engine can be instantiated in several ways: It can be a simple user-level command that implements the manual distribution of master keys, or it may use a Key Distribution Center-based protocol, such as Kerberos or NetSP,28 or it may be a process that derives master keys using public key cryptography, such as ISAKMP/OAKLEY.6,11 At present, we have implemented two master key engines: (1) manual parameter negotiation and key distribution, and (2) manual parameter negotiation and Diffie-Hellman key exchange,29 authenticated by a preshared secret. We are currently in the process of building a prototype of the ISAKMP/OAKLEY protocol. The session key engine implements the session key protocol described earlier. It accepts a master key context from the master key engine and uses this context to establish and maintain a secure tunnel with the session key engine on the other end of the tunnel. A run of the session key protocol derives two session keys, one for IP packet encryption and the other for authentication. The IDs of the encryption and authentication algorithms are used to index the pseudorandom functions to generate different keys. The session key engine refreshes a key before it expires to ensure uninterrupted secure communication. The length of the overlapping period between the old and new keys is in the master key context. For each key refreshment, the session key engine creates new SAs with new SPIs and keys, but keeps all other information unchanged. Figure 10 shows the architecture of the session key engine and the master key engine and their relations to other parts of the operating system. The design goals of this part of the architecture are modularity, flexibility, and portability. Since we envision the possibility of the session key engine interoperating with many master key engines of different types, the session key engine is an independent process separated from the master key engines. A session key context is sent in a UDP message from the master key engine to the session key engine. In order to prevent an adversary from sending a bogus master key context to the session key engine, the content of the UDP message is authenticated by a secret key that is preshared between the session key engine and the master key engine. In order to provide portability across different platforms, we have implemented an OS (operating system) dependency library. This library provides OS-independent application programming interfaces (APIs), which in turn provide the following services: o Secure communication for network communication, encryption, and authentication of messages from the master key engine to the session key engine o Timer/alarm for retransmission and key refreshment or deletion o Asynchronous wait for capturing asynchronous events (e.g, time-out events and receipts of messages) o Tunnel caching for the caching of security associations The tunnel administration tools. Briefly, the tunnel administration tools are a set of applications to examine and delete security associations in the tunnel cache. They also provide an interface for inserting manually created security associations into the tunnel cache. These tools have proved to be invaluable for the debugging of IPSEC protocols and for diagnosing secure tunnel configurations. The IPSEC encapsulation part. As a follow-on to the discussion in the previous section, the IPSEC encapsulation process can be divided into two main steps: (1) packaging that handles all of the IP headers, IPSEC headers, and ESP trailers, and invokes the necessary crypto operations, and (2) cryptographic operation that performs the actual cryptographic computation. Accordingly, the IPSEC encapsulation part is divided into the IPSEC engine and the crypto engine. This division of functionality has the following advantages: o Implementations of cryptographic algorithms are decoupled from the details of IPSEC syntax and semantics. Thus, new algorithms and improved implementations can be easily introduced. o It is much easier to keep up with the changes in IPSEC syntax and semantics. We have actually updated the IPSEC engine a few times as the IPSEC standards evolved; the changes were all within the IPSEC engine and no other parts were affected. We now describe the two engines in detail. The IPSEC engine. The IPSEC engine is responsible for the packaging part of the IPSEC encapsulation and decapsulation processes. For encapsulation, the inputs of the engine are an IP packet from the IP engine and a secure tunnel ID from the policy cache. The IPSEC engine will find the current SA of the tunnel through the tunnel ID and encapsulate the input packet using the information in the SA. The encapsulation procedure was discussed previously. The output of the procedure is an IPSEC packet inside an IP packet. For decapsulation, the input of the engine is a received IP packet with an IPSEC packet inside. This IP packet should be the output of its sender's encapsulation procedure. The IPSEC engine finds the SA corresponding to the SPI in the IPSEC packet and the destination address of the IP packet. The decapsulation procedure is the inverse of the encapsulation procedure. If the SA indicates that the integrity of the message should be protected, the decapsulation procedure first checks on whether the received IP packet is a replay and then verifies the MAC inside the received IPSEC packet. The received packet is rejected if it is a replay or if the MAC verification fails. Other error conditions, such as a failure of decryption, may also happen during decapsulation. All error conditions cause the received packet to be rejected. The crypto engine. The crypto engine is divided into a lower and an upper layer. The lower-layer module implements and exports a specific cryptographic algorithm. For example, one module implements DES-CBC and another module implements HMAC-MD5. The lower-layer module is identified by an implementation-dependent integer ID of the specific cryptographic algorithm it implements and exports one of two generic interfaces, depending on whether it implements encryption or message authentication. The upper layer is also a framework that holds different lower-layer modules. However, the crypto upper layer does not hide the interfaces exported by the lower layer. A reference to a lower-layer crypto module can be acquired through a search function provided by the upper layer, with the ID of the crypto transform as the search key. All the lower-layer modules in the IPSEC engine and the crypto engine are optimized to use mbuf chains30 as I/O buffers. Performance The performance of our implementation is very similar to that reported in our earlier work.3 Although the code has been improved and modified to fit the new standard, the dominant factor of performance, namely, the cost of the cryptographic operations, remains unchanged. Since the same cryptographic operations are still used by the new standard, the performance stays about the same. Refer to Reference 3 for more details. Currently, work is underway to produce handcrafted, high-performance cryptographic code. We are also investigating the use of cryptographic hardware. Ongoing and future work We are currently implementing a key management engine based on the emerging Internet standard key management protocols, ISAKMP11 and OAKLEY.6,31 OAKLEY has been strongly influenced by the SKEME protocol,14 which in turn is a natural extension of the MKMP protocol presented here. As IPSEC and other security protocols such as TLS10 become ubiquitous, we believe it is important to refine the policy part in order to provide a unified, easy-to-use administration and user interface so that different security protocols can be combined in a correct and desirable way to achieve a security goal. Acknowledgments We wish to thank the following colleagues for their useful technical advice and logistic support without whose help this work would have been impossible: Erol Basturk, Mihir Bellare, Maria A. Butrico, Chee-Seng Chow, Mark C. Davis, Edie E. Gunter, Donald B. Johnson, Dilip D. Kandlur, Jed Kaplan, Arvind Krishna, Mark H. Linehan, Charles C. Palmer, Ed Pring, Stephen E. Smith, and Moti M. Yung. We also thank the anonymous referees for their many comments, which helped improve the presentation of the paper. Appendix: IP layer security vs session layer security A frequently asked question is why are both an IP layer security mechanism and a session layer security mechanism--such as SSL9 and TLS10--needed. Table 1 presents a comparison of features for IPSEC and SSL. In summary, we feel that IPSEC and SSL are largely complementary technologies. When the traffic is restricted to Web or HTTP (HyperText Transfer Protocol) type, then SSL is more suitable, assuming that all the browsers and Web servers have SSL already built in. When traffic cannot be restricted to only Web or HTTP, but must include other popular Internet applications such as Telnet, FTP, network file system (NFS), directory services, database applications, real-time audio and video, as well as many other legacy applications, then IPSEC is simply more suitable. *Trademark or registered trademark of International Business Machines Corporation. Cited references and notes 1. J. Postel, Internet Protocol, Internet RFC 791 (September 1981). 2. S. Kent and R. Atkinson, Security Architecture for the Internet Protocol, IETF (March 1997). 3. A preliminary version of this paper was presented in Salt Lake City by the authors: P.-C. Cheng, J. A. Garay, A. Herzberg, and H. Krawczyk, "Design and Implementation of Modular Key Management Protocol and IP Secure Tunnel on AIX," Proceedings of the 5th USENIX UNIX Security Symposium (June 1995), pp. 41-54. 4. S. Kent and R. Atkinson, IP Encapsulating Security Payload (ESP), IETF (July 1997). 5. S. Kent and R. Atkinson, IP Authentication Header, IETF (July 1997). 6. D. Harkins and D. Carrel, The Resolution of ISAKMP with Oakley, IETF (July 1997). 7. J. Ioannidis and M. Blaze, "The Architecture and Implementation of Network-Layer Security under UNIX," Proceedings of the 4th USENIX UNIX Security Symposium (1993), pp. 29-39. 8. J. Ioannidis and M. Blaze, The swIPe IP Security Protocol, IETF (June 1994). 9. A. O. Freier, P. Karlton, and P. C. Kocher, The SSL Protocol Version 3.0, IETF (November 1996). 10. T. Dierks and C. Allen, The TLS Protocol Version 1.0, IETF (March 1997). 11. D. Maughan, M. Schertler, M. Schneide, and J. Turner, Internet Security Association and Key Management Protocol (ISAKMP), IETF (July 1997). 12. W. R. Cheswick and S. M. Bellovin, Firewalls and Internet Security, Repelling the Wily Hacker, Addison-Wesley Publishing Co., Reading, MA (1994). 13. J. Kohl and B. C. Neuman, The Kerberos Network Authentication Service (V5), Internet RFC 1510 (September 1993). 14. H. Krawczyk, "SKEME: A Versatile Secure Key Exchange Mechanism for Internet," Proceedings of the 1996 Internet Society Symposium on Network and Distributed Systems Security (February 1996), pp. 114-127. 15. W. Diffie, P. van Oorschot, and M. Wiener, "Authentication and Authenticated Key Exchanges," Designs, Codes and Cryptography 2, 107-125 (1992). 16. American Bankers Association, American National Standard for Financial Institution Message Authentication (Wholesale), ANSI X9.9 (1981, revised 1986). 17. G. Tsudik, "Message Authentication with One-Way Hash Functions," Proceedings of Infocom 92 (1992), pp. 2055-2059. 18. M. Bellare, R. Canetti, and H. Krawczyk, "Keyed Hash Functions and Message Authentication," Advances in Cryptology--Crypto '96, N. Koblitz, Editor, Lecture Notes in Computer Science No. 1109, Springer-Verlag, (1996), pp. 1-15. 19. O. Goldreich, S. Goldwasser, and S. Micali, "How to Construct Random Functions," Journal of the ACM 33, No. 4, 210-217 (1986). 20. R. Bird, I. Gopal, A. Herzberg, P. A. Janson, S. Kutten, R. Molva, and M. Yung, "Systematic Design of a Family of Attack-Resistant Authentication Protocols," IEEE Journal on Selected Areas in Communications 11, No. 5, 679-693 (June 1993). 21. M. Bellare and P. Rogaway, "Entity Authentication and Key Distribution," Advances in Cryptography, Springer-Verlag, New York (August 1993), pp. 232-249. 22. Information on the development of this standard can be found in the IPSEC home page, http://www.ietf.org/html.charters/ ipsec-charter.html and the IPSEC mailing list ipsec@tis.com. 23. M. Oehler and R. Glenn, HMAC-MD5-96 IP Authentication with Replay Prevention, IETF (March 1997). 24. H. Krawczyk, M. Bellare, and R. Canetti, HMAC: Keyed-Hashing for Message Authentication, Internet RFC 2104 (February 1997). 25. S. M. Bellovin, "Problem Areas for the IP Security Protocols," Proceedings of the 6th USENIX UNIX Security Symposium (July 1996), pp. 205-214. 26. In other words, in the scenario in the section about the secure tunnel, either A and X or B and Y are not the same. 27. D. B. Chapman, "Network (In)Security Through IP Packet Filtering," UNIX Security Symposium III Proceedings (1992), pp. 63-76. 28. R. Bird, I. Gopal, A. Herzberg, P. Janson, S. Kutten, R. Molva, and M. Yung, "The KryptoKnight Family of Light-Weight Protocols for Authentication and Key Distribution," IEEE/ACM Transactions on Networking 3, No. 1, 31-41 (February 1995). 29. W. Diffie and M. E. Hellman, "New Directions in Cryptography," IEEE Transactions on Information Theory IT-22, No. 6, 644-654 (November 1976). 30. S. J. Leffler, W. N. Joy, R. S. Farby, and M. J. Karel, "Networking Implementation Notes, 4.3BSD Edition," UNIX System Manager's Manual, 4.3 Berkeley Software Distribution, Virtual VAX-11 Edition, USENIX Association (April 1986). 31. H. Orman, The Oakley Key Determination Protocol, IETF (July 1997). Accepted for publication September 22, 1997. Pau-Chen Cheng IBM Research Division, Thomas J. Watson Research Center, P.O. Box 704, Yorktown Heights, New York 10598 (electronic mail: pau@watson.ibm.com). Dr. Cheng is a research staff member. He received his Ph.D. in electrical engineering from the University of Maryland in 1990. He joined the Computing Systems Department of the Watson Research Center in 1990 to work on the development of security functions of AIX. In 1994 he joined the Network Security group to work on IP Security technology. He is the principal developer of the IPSEC technology on the AIX operating system. His areas of interest are in the system aspects of computer and network security. He has been involved in the design, analysis, and implementation of solutions for data encryption and authentication, key management, user authentication, and Internet security. Juan A. Garay IBM Research Division, Thomas J. Watson Research Center, P.O. Box 704, Yorktown Heights, New York 10598 (electronic mail: garay@watson.ibm.com). Dr. Garay received his Ph.D. in computer science from the Pennsylvania State University in 1989. He also holds a degree in electrical engineering from the Universidad Nacional de Rosario in Argentina, and a master's degree in electronic engineering from the Eindhoven International Institute of the Eindhoven University of Technology in the Netherlands. He has been with IBM Research since 1990. In 1992 he was a postdoctoral Fellow at The Weizmann Institute of Science in Israel, and in 1996 a visiting scientist at the Centrum voor Wiskunde en Informatica (CWI) of the Stichting Mathematish Centrum in the Netherlands. Dr. Garay has published extensively in the areas of algorithms, distributed computing, fault tolerance, and cryptographic protocols. Amir Herzberg IBM Research Division, Haifa Research Laboratory at Tel Aviv, IBM Building, 2 Weizmann Street, Tel Aviv 61336, Israel (electronic mail: amir@haifa.vnet.ibm.com). Dr. Herzberg received the B.Sc. in computer engineering, the M.Sc. in electrical engineering, and the D.Sc. in computer science from the Technion-Israel Institute of Technology, in 1982, 1986, and 1991, respectively. In 1991, he joined the IBM Research Division where he now manages the Network Computing and Security group. He established this group, as a Tel-Aviv annex of the Haifa Research Laboratory, in January 1996. His previous assignment was manager of the Network Security group in the IBM Thomas J. Watson Research Center. His research areas include network security, applied cryptography, electronic commerce, communication protocols, and fault tolerant distributed algorithms. Dr. Herzberg is the author of numerous publications and patents in these areas. Hugo Krawczyk Department of Electrical Engineering, Technion, Haifa 32000, Israel (electronic mail: hugo@ee.technion.ac.il). Dr. Krawczyk is a senior lecturer in the Department of Electrical Engineering at the Technion and a visiting scientist at the IBM Watson Research Center. He received his Ph.D. in computer science from the Technion in 1990. In 1990 and 1991 he spent a year in the Computer Science Department of Princeton University under a Weizmann postdoctoral fellowship. From 1991 to 1997 he was a research staff member in the Cryptography and Network Security group at the IBM Watson Research Center. His areas of interest span applied and theoretical aspects of cryptography with particular emphasis on applications to network security. He has been involved in the design and implementation of solutions for data encryption and authentication, key management, public key cryptography, Internet security, electronic commerce, payment systems, and security of mobile and wireless systems. Reprint Order No. G321-5662. Table 1 IP layer security vs session layer security Features IPSEC SSL Hardware-independence Yes Yes Code No modifications to Modifications to applications. May need applications. May need access to TCP/IP stack new DLL or access to source code. application source code. Protection Entire IP packet. Only application layer. Includes protection for higher-layer protocols. Packet filtering Based on authenticated Based on content and headers, source or higher-level semantics. destination addresses, More intelligent and etc. Simpler and lower more complex. But also cost. Suitable for desirable. routers. Performance Less context-switching More context-switching and data movement. and data movement. Larger data units may help speed up cryptographic operations and provide better compression. Platform Any systems, routers Mainly end systems included. (clients/servers), also on firewalls. Firewall/VPN All traffic is protected. Only application-level traffic is protected. ICMP, RSVP, QoS, etc., may not be protected. Transparency To users and applications. To users only. Current deployment Emerging standard, Widely used by WWW supported by most firewall browsers; also used by vendors. some products to provide session layer secure tunnel.