Internet Traffic Analysis for Intrusion Detection and Active Queue Management

Our view of Internet traffic analysis is that it is a technique through which simulated and real traffic data is measured, analyzed, and dynamically modeled, with the objective of understanding discrepancies between, on the one hand, the traffic observed and, on the other hand, the traffic predicted by the model. A discrepancy between the traffic predicted by a model identified under "normal" conditions and the traffic observed under unknown network status would indicate an outage, or, more seriously, an intrusion or some illegal activity. From a broader perspective, even in case the traffic is "normal," there will still be a discrepancy between the predicted data and the observed data--because no model is perfect--and it is important to understand how predictable the "normal" traffic is, especially since such techniques as Random Early Detection (RED) and Active Queue Management (AQM) rely on anticipating when a queue will overflow and taking appropriate action to avoid such overflow.

If the network traffic were to follow a Poison or Markovian arrival process, it would have a characteristic burst length which would tend to be smoothed out by averaging over a long enough time scale. Quite on the contrary, the measurements made on real networks indicate that significant traffic burstiness is still present over a wide range of time scale.  In fact, traffic in a packet-based network can be characterized by its bursts of activity. These bursts can be idealized as existing at every time scale, from milliseconds to days, and as looking similar independently of the time scale, i.e., the traffic can be idealized as self-similar. One characteristic of self-similar traffic is long range dependence (LRD).

Based on a preliminary analysis of individual sources on an Ethernet, Wilinger observed that individual sources can be represented by the familiar ON/OFF abstraction: the source is either transmitting at a peak rate when it is in the ON state, or it is completely idle when it is in the OFF state. So, this observation motivates the investigation of ON/OFF traffic, in which the distribution of times in one of the two states (ON) is heavy tailed, meaning the variance is infinite. One approach to model this type of traffic is having ON periods uniformly distributed with the number of sources and the shape of the parameter Pareto distributed to adjust the number of synthesized work load.

The above mentioned procedure was followed to generate synthesized workload by Network Simulator and Topogen. As expected, the traffic was manifesting self-similar properties. The simulations were done on Constant Bit Rate (CBR) traffic, FTP traffic, WWW traffic, on the dumbbell and parking lot topologies, and on hundred node random topologies reflecting the real Internet.

The major tool used in the analysis of the traffic, in terms of throughput, link utilization, and packet drops, is the Canonical Correlation Analysis. The latter is a technique for assessing, in terms of the canonical correlation coefficients and the Akaike mutual information, the amount of interdependence between the past and the future of the traffic signal. Should the mutual information be large enough, then a reliable state space model, also referred to as innovation model, of the traffic can be constructed. Such a related method as the Alternating Conditional Expectation (ACE) was also used to construct nonlinear autoregressive (AR) models.

Regarding the specific intrusion detection application, we observed that an intrusion (CBR attack, UDP flooding) is almost always accompanied by a change in the Akaike mutual information relative to the "normal" traffic. However, under an intrusion, the mutual information could increase or decrease, depending on the nature of the attack and the status of the traffic before the attack. Such a change  in some "signature" of the traffic signal that could go either way under intrusion was also observed by the group of Stephen F. Bush at General Electric, using the Kolmogorov complexity instead of the Akaike mutual information as "signature" of the traffic.

Still within the intrusion detection application, we consistently observed a deterioration of the ability of models identified under normal conditions to predict the traffic under attack conditions. This leads to yet another intrusion detection scheme.

Regarding predictability of the traffic under normal network status with a view toward Active Queue Management (AQM), we reached the conclusion that traffic is not predictable well enough to afford AQM at low time scales, while this can be done a higher time scales. With a slightly varying drop probability imposed as control action, the TCP dynamics from drop probability to packet arrival can be linearized. As the time scale decreases, we observed a progressive deterioration of the ability to control the open-loop dynamics as revealed by the pole/zero configuration that goes from a well-damped stable/minimum phase configuration at large time scales all the way to lightly damped, nearly unstable/nonminimum phase configuration at low time scales. The pole/zero configuration passes through a dramatic transition when the time scale is of the same order as the Round Trip Time (RTT). Interestingly, the open-loop dynamics at low time scale reveals oscillatory poles, the periods of which appeared amazingly consistent with the RTT. This indicates that the identification was accurately done.

On a slightly different tone, the discrepancy between pole/zero configuration at low time scale and at large time scale indicates that traffic is not quite self-similar at all time scales, as it has been claimed.


For more information about this project, contact Khushboo Shah, Ph. D., at khushboo@usc.edu or at khushboo.shah@nevisnetworks.com. You may also contact Prof. S. Bohacek at bohacek@eecis.udel.edu.


Selected publications



 

References