Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 132 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
132
Dung lượng
693,79 KB
Nội dung
UNDERSTANDINGINTERNET ROUTING
ANOMALIES ANDBUILDING ROBUST
TRANSPORT LAYER PROTOCOLS
MING ZHANG
A DISSERTATIO N
PRESENTED TO THE FACULTY
OF PRINCETO N UNIVERSI TY
IN CANDIDACY FOR THE DEGREE
OF DOCTOR OF PHILOSOPHY
RECOMMENDED FOR ACCEPTANCE
BY THE DEPARTMENT OF
COMPUTER SCIENCE
SEPTEMB ER 2005
c
Copyright by Ming Zhang, 2005. All rights reserved.
Abstract
As the Internet grows androuting complexity increases, network-level instabilities are be-
coming more and more common. End-to-end communications are especially susceptible
to service disruptions, while diagnosing and mitigating these disruptions are extremely
challenging. In this dissertation, we design and build systems for diagnosing routing
anomalies and improving robustness of end-to-end communications.
The first piece of this work describes PlanetSeer, a novel distributed system for di-
agnosing routing anomalies. PlanetSeer passively monitors traffic in wide-area services,
such as Content Distribution Networks (CDNs) or Peer-to-Peer (P2P) systems, to detect
anomalous behavior. It then coordinates active probes from multiple vantage points to
confirm the anomaly, characterize it, and determine its scope. There are several advan-
tages of this approach: first, we obtain more complete and finer-grained views of routing
anomalies since the wide-area nodes provide geographically-diverse vantage points. Sec-
ond, we incur limited additional measurement cost since most active probes are initiated
when passive monitoring detects oddities. Third, we detect anomalies at a much higher
rate than other researchers have reported since the wide-area services provide large vol-
umes of traffic to sample. Through extensive experimental study in the wide-area net-
work, we demonstrate that PlanetSeer is an effective system for both gaining a better
understanding about routinganomaliesand for providing optimization opportunities for
the host service.
To improve the robustness of end-to-end communications during performance anoma-
lies, we design mTCP, a novel transportlayer protocol that can minimize the impact of
anomalies using redundant paths. mTCP separates the congestion control for each path
so that it can not only obtain higher throughput but also be more robust to path failures.
mTCP can quickly react to failures, and the recovery process normally takes only several
iii
seconds. We integrate a shared congestion detection mechanism into mTCP that allows
us to suppress paths with shared congestion. This helps alleviate the aggressiveness of
mTCP. We also propose a heuristic to find disjoint paths between pairs of nodes. This can
minimize the chance of concurrent failures and shared congestion. We implement mTCP
on top of an overlay network and evaluate it using both emulations and experiments in
the wide-area network.
iv
Acknowledgments
I have been incredibly fortunate to have had three mentors during the course of my PhD
study. The first one is Professor Randy Wang. I would like to thank him for his guid-
ance, support, and help throughout the years. I consider myself very lucky to have the
chance to work and learn from him. He provided the enthusiasm and encouragement that
I needed to complete this work. The second one is Professor Larry Peterson. He made
himself available for numerous discussions, often started by my dropping by his office
unexpectedly. I always left with a deeper and clearer understanding about those research
problems than I’d had when I arrived. I learned from him that research requires combina-
tion of dedication, confidence, and truly long-term thinking. I am sincerely grateful for
his high standard for research, kindness, and patience. The third one is Professor Vivek
Pai. He provided me invaluable guidance and frequent advice on the PlanetSeer project.
His vigorous approach both to research and to life has greatly shaped and enriched my
view of networking and systems research. I have to thank him for letting me steal an
enormous amount of time and wisdom during the last two years of my PhD study.
I am fortunate to collaborate with Chi Zhang on lots of the work presented in this
thesis. Chi is my friend, lab-mate, as well as apartment-mate. I drew immense inspiration
from him both inside and outside work. He is the best collaborator one could ask for. I
am also grateful to Junwen Lai. The mTCP project would not have been possible without
his help on the user-level TCP implementation.
The second part of my thesis was inspired by my work at ICIR, starting in the summer
of 2001. I thank Dr. Brad Karp for making my visit possible. Later, Brad gave me the
chance to continue collaborating with him at Intel Research Pittsburgh in the summer of
2003. I benefited enormously from the two summers I spent working with him. While at
ICIR, I thank Dr. Sally Floyd for teaching me a lot on TCP related problems. It was a
v
great honor to work with Professor Arvind Krishnamurthy, who provided many vigilant
comments on various algorithms in my work. I am especially grateful to Professor Jen-
nifer Rexford. She always patiently listened to my incoherent thoughts and provided me
amazingly insightful and detailed feedback. I learned a tremendous amount from her on
doing research as well as on writing and presentation.
I am grateful to the PlanetLab staffs for their help with deploying the PlanetSeer
system. Andy Bavier answered me lots of questions on safe raw socket. Marc Fiuczynski
shared with me his extensive experience in vserver. I would like to thank Scott Karlin,
Mark Huang, Aaron Klingaman, Martin Makowiecki, and Steve Muir for their support
and patience. I also thank KyoungSoo Park for his effort in keeping CoDeeN operational
during my experiment.
I would like to thank Professor David Walker and Moses Charikar for serving as
non-readers on my dissertation committee. They gave many valuable comments and
suggestions on my work.
My work was supported in part by NSF grants CNS-0335214 and CNS-0435087, and
DARPA contract F30602-00-2-0561.
I greatly enjoyed my life at Princeton because of the many close friends I had there.
I thank Ding Liu, Chi Zhang, Yaoping Ruan, Fengzhou Zheng, Ting Liu, Wen Xu, Gang
Tan, and Fengyun Cao for their support and encouragement throughput the years. I also
thank my non-Princeton friends, especially Xuehua Shen and Ningning Hu. They made
my life lots of fun.
This thesis is dedicated to my parents. They always gave me love, trust, and pride.
They played the most important role in directing me into pursuing a research career.
vi
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
1 Introduction 1
1.1 Why Do Performance Anomalies Occur on the Internet? . . . . . . . . . 3
1.2 Difficulties in Anomaly Diagnosis . . . . . . . . . . . . . . . . . . . . . 5
1.3 Difficulties in Anomaly Mitigation . . . . . . . . . . . . . . . . . . . . . 8
1.4 Overview of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Background and Related Work 12
2.1 Network Testbeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Intradomain RoutingAnomalies . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Interdomain RoutingAnomalies . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Traffic Anomalies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 End-to-End Failure Measurement . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Link-Layer and Application-Layer Striping . . . . . . . . . . . . . . . . 18
2.7 Transport-Layer Striping . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 PlanetSeer: Internet Path Failure Monitoring and Characterization 21
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
vii
3.2 PlanetSeer Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.2 MonD Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.3 MonD Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.4 MonD Flow/Path Statistics . . . . . . . . . . . . . . . . . . . . . 28
3.2.5 ProbeD Operation . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.6 ProbeD Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.7 Path Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Confirming Anomalies . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.1 Massaging Traceroute Data . . . . . . . . . . . . . . . . . . . . 33
3.3.2 Final Confirmation . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 Loop-Based Anomalies . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.2 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4.3 End-to-End Effects . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5 Building a Reference Path . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.6 Classifying Non-loop Anomalies . . . . . . . . . . . . . . . . . . . . . . 48
3.6.1 Path Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.6.2 Path Outage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.7.1 Bypassing Anomalies . . . . . . . . . . . . . . . . . . . . . . . 58
3.7.2 Reducing Measurement Overhead . . . . . . . . . . . . . . . . . 60
3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4 mTCP: RobustTransportLayer Protocol Using Redundant Paths 63
viii
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2.1 TransportLayer Protocol . . . . . . . . . . . . . . . . . . . . . . 67
4.2.2 Shared Congestion Detection . . . . . . . . . . . . . . . . . . . . 72
4.2.3 Path Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2.4 Path Management . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2.5 Path Failure Detection and Recovery . . . . . . . . . . . . . . . . 81
4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.4.2 Utilizing Multiple Independent Paths . . . . . . . . . . . . . . . 85
4.4.3 Recovering from Partial Path Failures . . . . . . . . . . . . . . . 90
4.4.4 Detecting Shared Congestion . . . . . . . . . . . . . . . . . . . . 92
4.4.5 Alleviating Aggressiveness with Path Suppression . . . . . . . . 97
4.4.6 Suppressing Bad Paths . . . . . . . . . . . . . . . . . . . . . . . 98
4.4.7 Comparing with Single-Path Flows . . . . . . . . . . . . . . . . 99
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5 Conclusion and Future Work 104
5.1 Summary of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . 104
5.1.1 Internet Path Failure Monitoring and Characterization . . . . . . 105
5.1.2 RobustTransportLayer Protocol Using Redundant Paths . . . . . 106
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.2.1 Debugging RoutingAnomalies . . . . . . . . . . . . . . . . . . . 107
5.2.2 Debugging Non-Routing Anomalies . . . . . . . . . . . . . . . . 109
ix
5.2.3 Internet Weather Service . . . . . . . . . . . . . . . . . . . . . . 110
x
[...]... studying anomalies in the Internet and designing robust network protocols We will focus on those that are most relevant and discuss their difference from our approaches We first briefly introduce the network testbeds used for our experiments and evaluations We then turn to the recent studies on network anomalies, which include interdomain and intradomain routing anomalies, traffic anomalies, and end-to-end... causes of performance degradation on the Internet We first look at intradomain routinganomaliesand defer the discussion about interdomain routinganomalies to the next section Nowadays, the most commonly used intradomain routingprotocols are OSPF and IS-IS Researchers have been using routing updates collected in individual ISPs to study routinganomalies Labovitz and Ahuja used the OSPF messages gathered... the Internet Based on their methodologies, we classify them into intra- and inter-domain routing anomalies, traffic anomalies, and end9 to-end measurements At the end of Chapter 2, we will discuss the research efforts that improve the end-to-end performance using striping at the link -layer, application -layer and transport- layer Chapter 3 focuses on PlanetSeer, a large-scale distributed system for routing. .. recovery, and path selection 20 Chapter 3 PlanetSeer: Internet Path Failure Monitoring and Characterization As we have explained in Section 1.1, performance degradations are often caused by routinganomalies on today’s InternetUnderstandingroutinganomalies is crucial for improving the overall stability of the Internet In this chapter, we introduce PlanetSeer, a large-scale distributed system for routing. .. overhead and can easily scale to a large number of nodes • It provides a finer-grained and more complete view on routing anomaly by correlating the probing from multiple vantage points In the past, a series of proposals have been made to enhance network performance using striping techniques at the link -layer, transport- layer, and application -layer We are the first to implement and evaluate a transport- layer. .. Routing instability is one of the major sources of performance anomaliesRoutingprotocols are responsible for discovering the paths to reach any destination on the Internet Routing protocols can be classified into interdomain and intradomain protocols Intradomain protocols (IGP), such as OSPF[44] or IS-IS[20], are responsible for dis- 3 seminating reachability information within an AS Interdomain protocols. .. use striping techniques to improve performance and robustness Based on the network layer where the striping techniques are applied, we classify them into link -layer, transport- layer, and application -layer striping 2.1 Network Testbeds We evaluate our systems with both emulations and real-world deployment The emulations are conducted on Emulab [24], a time- and space-shared network emulator It 12 consists... 3, our work complements these two approaches by studying routinganomalies from an end-to-end perspective We will also quantify the impact of anomalies on end-to-end performance, such as loss rate and RTT 2.4 Traffic Anomalies Parallel to routing anomalies, several research efforts have focused on traffic anomalies which are defined as unusual and significant changes in network traffic These efforts examined... for confirming the routing anomalies, classifying them, and characterizing their scopes, locations, and end-to-end effects In the end, we quantify the effectiveness of overlay routing in bypassing path failures Chapter 4 presents mTCP, a novel transport layer protocol that is robust to performance anomaly mTCP differs from traditional transport layerprotocols in that it can use more than one paths in... violating these agreements On the other hand, knowing why the anomalies occur will help the network operators to fix the problems quickly and to prevent the similar problems from occurring in the future Although understanding the characteristics and origins of performance anomalies can help us improve the long-term stability of the Internet, we are still going to encounter anomalies frequently in the foreseeable . UNDERSTANDING INTERNET ROUTING
ANOMALIES AND BUILDING ROBUST
TRANSPORT LAYER PROTOCOLS
MING ZHANG
A DISSERTATIO N
PRESENTED. gaining a better
understanding about routing anomalies and for providing optimization opportunities for
the host service.
To improve the robustness of end-to-end