Transcript
Netmanias 기술문서: IP망장애복구기술[4] IPFRR(IP Fast-Reroute)
IP망장애복구기술[4] IPFRR(IP Fast-Reroute)
2009년8월2일
NMC Consulting Group(tech@netmanias.com)
2
IP Fast-Reroute
.NSF, non-stop routing 등장비내부적architecture와연관된기술이아닌, 표준적인라우터구현으로IP
routing protocol 자체가빠르게reconverge할수있도록보강하려는기술.s
.IP Fast Reroute Framework (draft-ietf-rtgwg-ipfrr-framework-03.txt, M. Shand& S. Bryant, Cisco
Systems)
.Basic Specification for IP Fast-Reroute: Loop-free Alternates (draft-ietf-rtgwg-ipfrr-spec-base-03.txt, A.
Atlas (Avici, 대표필자. 그외에AT&T, Verizon, Lightcore, Nortel 등에서공동집필함)
.장애가발생하여primary next hop을사용하지못하게될때, 미리계산해놓은제2의경로로트래픽을돌려forwarding하고, Routing protocol reconvergence가완료되면다시새로운Primary nexhop에게forwarding을한다.
.IP Fast Reroute Framework1)Fast Failure Detection: neighbor의장애발생을보다빨리감지하기위한방법도입한다.
1)Fast Hello2)Bidirectional Forwarding Detection (BFD)
2)Preparing Repair paths: 장애발생시즉시대체경로로reroute 시킬수있도록대체경로를미리준비해둔다.
1)ECMP2)Loop-free alternate paths3)Multi-hop repair paths3)Micro-loop prevention: reconvergence과정에서발생하는routing loop의발생을회피하기위한방법을명확히한다.
.IP-FRR은아직Multicast에대한고려가없다.
3
IPFRR : Fast Failure Detection
Node 장애를감지하기위해neighbor timeout이발생하기까지기다리는시갂이, node 장애복구과정에서가장시갂을소모하는일이므로, 이시갂을줄이기위한방법이필요함.
1)Fast Hello1)Implementaionof Cisco Systems2)Router-Dead-Interval = 1s 로설정하고, 1초에Hello를몇번보낼것인지설정함.
3)Hello PDU안에수록된Hello Interval 필드에는0이쓰여짐(표준구현과호홖되지않음!)
4)Hello message에는Hello interval과Router Dead Interval을값을쓰도록되어있는데, 단위는초단위이다. 이값들이Neighbor 갂에일치하지않으면Adjacency가형성되지못해연동이불가하다. 따라서1초이하의시갂을표준OSPF Hello 메시지에수록할수없다.
5)Neighbor로부터1초동안Hello가수싞되지않으면Neighbor상태가Down인것으로갂주.
2)BFD (Bidirectional Forwarding Detection)
1)IETF Routing Area 내에BFD working Group 결성되어홗동중2)기존Routing protocol은각자의방식으로neighbor의liveness를확인하는방법을정의하여사용하였음.
3)BFD는, neighbor의livenesscheck를위한별로의protocol을정의한것으로, millisecond 단위의짧은timeout값을이용해빠르게neighbor의장애여부를감지할수있도록하였음.
4)최대한갂단하게정의함으로써Hardware적인구현이용이하도록할것을목표로하여작성된프로토콜이다. (Control plane(즉route processor)상에software process로구현되는것이아니라, data
plane(즉line card)에하드웨어적으로구현된다. (50ms 이내의node failure 감지가가능하다!)
-Bidirectional Forwarding Detection : draft-ietf-bfd-base-02.txt-BFD for IPv4 and IPv6 (Single Hop) : draft-ietf-bfd-v4v6-1hop-02.txt-BFD For MPLS LSPs : draft-ietf-bfd-mpls-01.txt-상용라우터에구현되어사용중(Cisco IOS® Software Releases 12.0S and 12.2S: the Cisco 7600
SeriesRouter and the Cisco 12000 Series Internet Router and so on)
4
IPFRR : BFD (1)
BFD payload
[ in milliseconds]
[ in milliseconds]
[ in milliseconds]
BFD Control Packet Payload
Detect Mult
Detect time multiplier. The negotiated transmit interval, multiplied by this value, provides the
detection time for the transmitting system.
Desired Min TX Interval
This is the minimum interval, in microseconds, that the local system would like to use when
transmitting BFD Control packets.
Required Min RX Interval
This is the minimum interval, in microseconds, between received BFD Control packets that this
system is capable of supporting.
Required Min Echo RX Interval
This is the minimum interval, in microseconds, between received BFD Echo packets that this system
is capable of supporting. If this value is zero, the transmitting system does not support the receipt of
BFD Echo packets.
(The first phase of the Cisco BFD implementation does not support the use of Echo packets.)
5
IPFRR : BFD (2)
BFD Timer Negotiation
6
IPFRR : Preparing Repair paths
장애발생시사용할대체경로를미리계산하여준비해둔다.
1)Equal Cost multi-paths (ECMP)
2)Loop free alternate paths
“Such a path exists when a direct neighbor of the router adjacent to the failure has a path to the destination which
can be guaranteed not to traverse the failure.” (draft-ietf-rtgwg-ipfrr-framework-03.txt)
3)Multi-hop repair paths
“a router which is more than one hop away from the router adjacent to the failure, from which traffic will be
forwarded to the destination without traversing the failure”
“1) 과2)만으로도80% 가량의failure에대처가능하다.” -draft-ietf-rtgwg-ipfrr-framework-03.txt
SN_1EDN_2S : A router that is the source of a repairD : The destination routerE : the primary nexthop from S to DN_i: i-th neighbor of SPrimary PathAlternate path558588SE_2E_1DNE_i : i-th primary nexthop in the ECMP set
from S to DN : neighbor of SPrimary PathEqual cost
alternate path555588SN_2EDN_1558858Primary PathR33Multi-hop repair path
7
IPFRR : Micro-loop Prevention
LFA : Loop free alternate
①link-protecting alternate nexthop;특정link 단절시에대비한대체경로
②Node-protecting alternate nexthop;특정node 장애를대비한대체경로※ Shared Risk Link Group (SRLG)
;동일한physical link 상에존재하는여러logical link(802.1Q tunnel, PVCs) 또는하나의line card에소속된여러link 등과같이, 장애가발생할때함께장애를겪을가능성이있는Link들의집합.
;운영자에의해Configuration되는것임. Router는alternate path를계산할때, 하나의SRLG에소속된Link 끼리는선정하지않음.
SN_1EDRepair path로돌릴때, 해당path로의Premium Traffic에대한CAC를미리수행해두는것은어려운일이다. (대체경로로Premium traffic이올것에대비해over
provisioning해두지않는한, 거의불가능할것. 이게숙제네…)
SNEDPrimary PathLink-protecting
alternate pathPrimary PathNode-protecting
alternate path
8
Conclusion
.다음과같은방법으로Node Failure로인한서비스중단을최소화할수있음.
.Route Processor와Line Card의FIB를분리
.Non-stop Forwarding
.Graceful Restart
.Non-stop Routing
.Link Failure, Line Card 내의I/O module 장애등으로인한서비스중단시갂을줄이기위해다음과같은기술이도입됨.
.Fast Hello, BFD, …
.MPLS Fast-Reroute
.MPLS Protection
.IP Fast-reroute
.그러나, 아직Multicast경로에대한Fast Reroute은표준화되지않음.
9
End of Document