next up previous
Next: Performance in Computing Deviations Up: Experiments Previous: Experiments


Distribution of Deviations

Since it might be possible that a small deviation could be computed from a packet stream unrelated to a given one, we examine experimentally with real-life data a distribution of deviations in this section.

We have implemented software which computes deviations for packet streams as defined by Def. 4. The program is written in C and runs under Linux (Red Hat 6.1) using libpcap[*] to read packet data recorded by tcpdump[*].

The first dataset we used is traffic data recorded at some Internet backbone network locations for an hour by tcpdump. The dataset contains about 2.4 million TCP packets, 5.6 % of which are packets of telnet or rlogin. We took only packets of telnet or rlogin connections which continue for at least one minute and where the size of the total data is at least 60 bytes. We computed deviations from each of the packet stream against all other packet streams (18733 deviations in total). Figure 3 is the distribution of deviations computed on this dataset.

Figure 3: Distribution of deviations computed on a dataset in the range [0,80) with a grid width of two seconds (left) and a closer look over the range [0,12) with a finer grid width of 0.5 seconds (right)
\includegraphics[scale=.46]{distribution80} \includegraphics[scale=.46]{distribution12}

We can see from Fig. 3 (right) that a deviation of less than three seconds is extremely rare. This indicates that if the deviation of a packet stream is in this range, it is highly likely that the packet stream is in the same connection chain with the given one. We also notice that there are a few, actually two, deviations below one second. Examining the headers of the packets used to derive the deviation, we found that these are really packet streams of adjacent connections in a connection chain; the two deviations are for each direction of packets in the connection. Therefore, we can find a packet stream on a connection in the same connection chain with that of the given one by looking for connections whose average propagation delay minus minimum propagation delay is at most three seconds between the beginning and the end of the chain in this dataset. Generally, this upper bound of the average propagation delay minus the minimum propagation delay of a connection chain gets larger as the time period of a given connection is longer and more data bytes are available.

Next we used the data set of NLANR network traffic traces[*]. We chose traffic data whose file names begin with AIX, ANL, APN, MRT, NCA, NCL, ODU, OSU, SDC, TAU, or TXS under directory 20000115/, and performed the same analysis as we did for the first dataset. The number of deviations computed is in total 40,433. Figure 4 is the result.

Figure 4: Distribution of deviations computed on a data set of NLANR network traffic traces in the range [0,52) with a grid width of two seconds (left) and the closer look over the range [0,12) with a finer grid width of 0.5 seconds (right)
\includegraphics[scale=.46]{nlanr52} \includegraphics[scale=.46]{nlanr12}

We can see from Fig. 4 (right) that the frequency gradually decreases to zero as the deviation moves down to around three seconds just like we saw in Fig. 3 (right) for the first dataset, except in the range [1.0, 3.5). Almost all of the deviations in this range involve the same packet stream of a particular connection, so it is considered an error or an exception.


next up previous
Next: Performance in Computing Deviations Up: Experiments Previous: Experiments
Yoda 2000-11-20