next up previous
Next: Deviation for Packet Streams Up: Finding Connections in the Previous: Conditions

Basic Idea

Figure 2 (left) is a graph of a packet stream on a connection plotted with sequence numbers of the packets on the Y-axis and time stamps when the packets were captured on the X-axis. The data point should move down and to the right when a retransmission occurs, but because we take the upper bound of sequence numbers for each of the time stamps, the graph is monotonically increasing. We don't assume that an intruder runs a script on a host so that commands are automatically executed within a short time, but assume that an intruder manually inputs commands by hands and operates a host interactively for a longer time, so that graphs of the packet streams of those connections must show characteristic patterns for each intrusion. Therefore, it can be expected that graphs of packet streams of different connections will be similar if the proper parts of the graphs from the same chain are compared to each other. Therefore, we will introduce the `deviation' for a packet stream from another packet stream as a metric of this similarity. If the value is small, one stream is likely to be in the same chain with the other. Otherwise they are probably unrelated.

Figure 2: Sample graph of a packet stream $ A$ (left) and the position of a graph of a packet stream $ B$ (right) where the average gap from $ A$ on the X-axis is the smallest.
\includegraphics[scale=.46]{sampleA} \includegraphics[scale=.46]{sampleAB}

Next we discuss what features remain unchanged and what features get changed between graphs of packet streams on different connections in the same connection chain. First we notice that while we are using telnet or rlogin in a normal way, the same TCP data bytes flow at any connections in the same chain when taking into account flow control and retransmissions of packets. Therefore, the height of the part of a graph which shows the increase in sequence numbers (which is the number of data byte transmitted) should be equal to others in the same connection chain. But since we cannot determine exactly what part of a graph corresponds to the other because of timing errors, we have to try every starting position of the graph to compare to the other.

We use the upper bound of the sequence numbers, and when a packet is lost and a retransmission occurs the data bytes following the lost data is not forwarded to the next connection in the chain until the lost data bytes are retransmitted and acknowledged. Therefore, the propagation delay includes the retransmission time. Hence, if the clocks used by the packet capture software are accurate, a data byte at a downstream connection compared with the same data byte at an upstream connection is observed earlier if the direction of the packet is upstream and later if the direction is downstream, as is expected. However, the propagation delays may have large variances. If a graph is repositioned along the Y-axis so as to match the proper part of the other graph, that part of the graph may be distorted by being extended along the X-axis. Because we assume that an intruder is manipulating a host interactively, we also assume that the average propagation delay a packet travels between the first upstream connection and the last downstream connection is usually several hundred milliseconds and at most a few seconds. It would be too inefficient for an intruder to manipulate a host in a connection chain of a few seconds of delay each way.


next up previous
Next: Deviation for Packet Streams Up: Finding Connections in the Previous: Conditions
Yoda 2000-11-20