DIDS (Distributed Intrusion Detection System) [5] is a system where all TCP connections and logins within the supervised network are monitored and the system keeps track of all the movement and the current states of users. A host monitor resides on each host in the network, gathering audit information about the host, which is transmitted to the central DIDS director, where the network behavior is accounted for.
CIS (Caller Identification System) [1] is a system to authenticate the origin of a user when the user attempts to log into a host at the end of a connection chain.
When a user tries to log into the
th host, the
th host queries the
th host for a list of its predecessor hosts:
.
The
th host then queries each of the predecessor host a list of their predecessor hosts.
The
th host accepts the user's login only if those lists of predecessor hosts are consistent.
Caller ID [10] is a technique the United States Air Force employed to trace intruders. It breaks into the hosts of the chain in the same way as the intruder did to reach the target, going backwards up the chain towards the intruder. It does this while the intruder is active, using the same knowledge and methods as the intruder. However, it is often difficult or impossible to break into a host if the intruder closed the security hole after compromising the host. It is also still illegal to break into someone else's computer, even in response to the intruder's illegal act.
Generally, tracing methods can be categorized into two types: `host-based' and `network-based'. While host-based methods set up the components for tracing at each host, network-based methods set up components in the network infrastructure. Examples of host-based systems are [1,5,9]. The major drawback of these host-based systems is that if the tracing system is not used on a particular host or is modified by an intruder, the whole system can not function reliably once the intruder goes through that host. In the Internet environment, it is difficult to require that all administrative domains employ a particular tracing system on all hosts: every one of which must be kept secured from an intruder's attacks. Therefore, we believe that a host-based system is not feasible on the Internet.
Thumbprinting [6] is a network-based method which is based on the fact that the content of the data in a connection is invariant at all points on a connection chain, after taking into account the details of the protocols. A `thumbprint', is a small signature which effectively summarizes a certain section of a connection and uniquely distinguishes a given connection from all other unrelated connections but has a similar value for any two connections in the same connection chain. These thumbprints can be routinely stored at many points in the network. When an intrusion is detected at some host, the thumbprint of that connection during the intrusion can be later compared to various thumbprints all over the network during the same period to find the other connections in the chain.
The advantage of a network-based approach is that it is useful even if part of the Internet employs it. That is, all the links of a connection chain will not be found sequentially, but parts of the links will be found separately at network locations covered by the system. Although there is still a chance that a tracing system in the network will be compromised by an intruder, it requires fewer components than we need in a host-based system, and these components can be special boxes which are only passively monitoring the traffic and have no other functions. We believe these `traffic log boxes' can be made very secure.
The advantage of thumbprinting is that it requires a very small disk space to store thumbprints. But the special software needs to be installed on all hosts at traffic points for computing thumbprints and the saved thumbprints cannot be used for other purposes such as traffic analysis or intrusion detection. A thumbprint is a summary of contents of a connection for a certain fixed range of time. Because of clock synchronization errors or propagation delays, if a connection continues within one range of time, but another connection in the same chain crosses a boundary of the range, the three thumbprints might be quite different. While our method requires a relatively large disk space to store packet header data, they can be collected by packet capture software already installed on many hosts. The saved data can be used for other purposes and timing errors do not affect the result of our method.