M. Zhang, C. Zhang, V. Pai, L. Peterson, and R. Wang.
Proc. Sixth Symposium on Operating Systems Design and
Implementation.
December 2004.
Detecting network path anomalies generally requires examining large
volumes of traffic data to find misbehavior. We observe that wide-area
services, such as peer-to-peer systems and content distribution
networks, exhibit large traffic volumes, spread over large numbers of
geographically-dispersed endpoints. This makes them ideal candidates
for observing wide-area network behavior. Specifically, we can combine
passive monitoring of wide-area traffic to detect anomalous network
behavior, with active probes from multiple nodes to quantify and
characterize the scope of these anomalies.
This approach provides several advantages over other techniques: (1) we obtain more complete and finer-grained views of failures since the wide-area nodes already provide geographically diverse vantage points; (2) we incur limited additional measurement cost since most active probing is initiated when passive monitoring detects oddities; and (3) we detect failures at a much higher rate than other researchers have reported since the services provide large volumes of traffic to sample. This paper shows how to exploit this combination of wide-area traffic, passive monitoring, and active probing, to both understand path anomalies and to provide optimization opportunities for the host service.