Frequency Estimation of Internet Packet Streams with Limited Space. E. D. Demaine, A. López-Ortiz and J. I. Munro. To appear in Proceedings of the 10th Annual European Symposium on Algorithms (ESA 2002), Rome, Italy, September 17-21, 2002. PostScript file.
Abstract.
We consider a router on the Internet analyzing the statistical
properties of a TCP/IP packet stream. A fundamental difficulty with
measuring traffic behavior on the Internet is that there is simply too
much data to be recorded for later analysis, on the order of gigabytes
a second. As a result, network routers can collect only relatively few
statistics about the data. The central problem addressed here is to use
the limited memory of routers to determine essential features of the
network traffic stream. A particularly difficult and representative
subproblem is to determine the top k categories to which the most
packets belong, for a desired value of k and for a given notion of
categorization such as the destination IP address.
We present an
algorithm that deterministically finds (in particular) all categories
having a frequency above 1/(m+1) using m counters, which we prove is
best possible in the worst case. We also present a sampling-based
algorithm for the case that packet categories follow an arbitrary
distribution, but their order over time is permuted uniformly at
random. Under this model, our algorithm identifies flows above a
frequency threshold of roughly 1/sqrt(n m) with high probability, where
m is the number of counters and n is the number of packets observed.
This guarantee is not far off from the ideal of identifying all flows
(probability 1/n), and we prove that it is best possible up to a
logarithmic factor. We show that the algorithm ranks the identified
flows according to frequency within any desired constant factor of
accuracy.