next up previous
Next: Related Work Up: Finding a Connection Chain Previous: Finding a Connection Chain

Introduction

In recent years, unauthorized accesses to computer systems are increasing as more and more commercial activities and services take place on the Internet. One characteristic of network break-ins is that it is very hard to trace the source of an intruder back to the origin after the incident has occurred. In order to hide their identities, intruders usually keep several computers under their control, called step-through hosts, from which they access another computer. Since there are many vulnerable hosts on the Internet and scanning tools are widely available and easy to use to locate these hosts, they are constantly gathering a collection of computers to be used as step-through hosts. Intruders don't log in directly to their targets from their own computers, but rather they first log into a step-through host and then another, and continue this step several times making a chain of hosts, before breaking into their targets. They usually erase logs on these step-through hosts. Even if logs remain on a particular host, we can only use it to trace back one link in the chain. Thus, we have to examine each host at a time to follow each of their predecessors in the chain in order to get to the origin. Because the step-through hosts may be in different countries operated by administrators not paying much attention to their systems, it takes a lot of time and effort to get in touch with these administrators to investigate the chain of hosts step by step. Often we would end up at a host where no logs remained to continue the investigation [8]. Intruders know this and take advantage of the features of the Internet to preserve their anonymity.

When a user logs into a computer via a network, from there logs into another computer, and then another and so on, TCP connections are established between each pair of computers. We want to find this kind of `connection chain'. (We will give the formal definition of the connection chain in Sect. 3.) Our approach to tracing considers the following problem: Given a stream of packets on a connection $ C^I$ an intruder used at some step-through host and a very large number of connections $ C=\{C_1, C_2, \ldots\}$ at various traffic points on the Internet, find $ C' \subset C$ such that $ C^I$ and $ \forall X\in C'$ are in the same connection chain. We are particularly interested in the case where $ \forall X\in C'$ are connections closer to the origin than $ C^I$. Although we don't have to trace the links in the chain one by one in our approach, the connection chain found will probably be partial. However, it may contain a host that is or is closer to the origin.

In this paper we provide a method to find a connection similar to a given one from very large traffic data. To cope with real-life traffic data, errors and variations of packet data at different connections on the same chain should be taken into consideration. Those problems include propagation delays through the chain, packetization variations because of TCP flow control, clock synchronization errors on time stamps, and others. We focus on telnet [4] and rlogin [2] as the interactive applications whose packets are transmitted through the connection chain. We define the `deviation' for one stream of packets on a connection from another. It is the difference between the average propagation delay and the minimum propagation delay between the two connections. Experiments show that the deviation for streams of packets on the same chain is much smaller than that for a pair of unrelated streams.

The rest of the paper is organized as follows. Section 2 provides a survey of related work. We present our definition of deviation and describe our method in Sect. 3. We show some experimental results in Sect. 4. Finally, Sect. 5 concludes the paper and discusses future work.


next up previous
Next: Related Work Up: Finding a Connection Chain Previous: Finding a Connection Chain
Yoda 2000-11-20