TY - GEN
T1 - Algorithms and estimators for accurate summarization of internet traffic
AU - Cohen, Edith
AU - Duffield, Nick
AU - Kaplan, Haim
AU - Lund, Carsten
AU - Thorup, Mikkel
PY - 2007
Y1 - 2007
N2 - Statistical summaries of traffic in IP networks are at the heart of network operation and are used to recover information on arbitrary subpopulations of flows. It is therefore of great importance to collect the most accurate and informative summaries given the router's resource constraints. Cisco's sampled NetFlow, based on aggregating a sampled packet stream into flows, is the most widely deployed such system. We observe two sources of inefficiency in current methods. Firstly, a single parameter (the sampling rate) is used to control utilization of both memory and processing/access speed, which means that it has to be set according to the bottleneck resource. Secondly, the unbiased estimators are applicable to summaries that in effect are collected through uneven use of resources during the measurement period (information from the earlier part of the measurement period is either not collected at all and fewer counter are utilized or discarded when performing a sampling rate adaptation). We develop algorithms that collect more informative summaries through an even and more efficient use of available resources. The heart of our approach is a novel derivation of unbiased estimators that use these more informative counts. We show how to efficiently compute these estimators and prove analytically that they are superior (have smaller variance on all packet streams and subpopulations) to previous approaches. Simulations on Pareto distributions and IP flow data show that the new summaries provide significantly more accurate estimates. We provide an implementation design that can be efficiently deployed at routers.
AB - Statistical summaries of traffic in IP networks are at the heart of network operation and are used to recover information on arbitrary subpopulations of flows. It is therefore of great importance to collect the most accurate and informative summaries given the router's resource constraints. Cisco's sampled NetFlow, based on aggregating a sampled packet stream into flows, is the most widely deployed such system. We observe two sources of inefficiency in current methods. Firstly, a single parameter (the sampling rate) is used to control utilization of both memory and processing/access speed, which means that it has to be set according to the bottleneck resource. Secondly, the unbiased estimators are applicable to summaries that in effect are collected through uneven use of resources during the measurement period (information from the earlier part of the measurement period is either not collected at all and fewer counter are utilized or discarded when performing a sampling rate adaptation). We develop algorithms that collect more informative summaries through an even and more efficient use of available resources. The heart of our approach is a novel derivation of unbiased estimators that use these more informative counts. We show how to efficiently compute these estimators and prove analytically that they are superior (have smaller variance on all packet streams and subpopulations) to previous approaches. Simulations on Pareto distributions and IP flow data show that the new summaries provide significantly more accurate estimates. We provide an implementation design that can be efficiently deployed at routers.
KW - Data streams
KW - IP flows
KW - NetFlow
KW - Network management
KW - Sketches
KW - Subpopulation queries
UR - http://www.scopus.com/inward/record.url?scp=42149174887&partnerID=8YFLogxK
U2 - 10.1145/1298306.1298344
DO - 10.1145/1298306.1298344
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:42149174887
SN - 9781595939081
T3 - Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC
SP - 265
EP - 278
BT - IMC'07
T2 - IMC'07: 2007 7th ACM SIGCOMM Internet Measurement Conference
Y2 - 24 October 2007 through 26 October 2007
ER -