Transcript
Netmanias 기술문서: IPTV 서비스를위한망복구기술
2007년3월24일
NMC Consulting Group(tech@netmanias.com)
2
OXC Network Recovery
3
Network Impairment
Case
Description
Impact
Control plane
Data plane
Link Failure
Fiber Cut
O
X
Optical Component Failure
O
X
SONET/SDH Equipment Failure
O
X
Router Interface Failure
The port on a router line card (GBIC failure, RJ45 connector failure, etc)
O
X
Node Failure
Router Power Supply Outage
Redundant power supply가없는경우node failure가됨.
X
X
Facility Power Supply Failure
Building 자체의정전등
X
X
Line Card Failure
Line card firmware 오류또는Forwarding engine block의문제등으로line signal은살아있으나forwarding 이안되는경우
O
X
Route Processor(RP) Failure
Hardware Failure
Hardware적인원인에의해, 전원은들어와있으나RP가동작을하지못하는경우(CPU halt 등)
X
O1)
Software Failure
개별S/W process의장애
개별process별복구/재시작이가능하도록modular하게잘짜여진OS에서는, 문제발생process만선별적으로복구가능함.
△
O1)
OS Crash
RP 전체기능중단
X
O1)
Planned Node Failure
S/W 또는H/W upgrade를위해장비의일부또는전체서비스를중단시키는경우.
△2)
△2)
1)RP 장애가발생한상태로Holdtime(Dead_Interval) 만큼의시갂이지난후에는Neighbor와의Adjacency가상실되어, neighbor들에의해IGP rerouting 절차가이루어짐.
2)하나의RP를교체하거나OS upgrade 및restart 시키는경우, RP가이중화되어있다면한보드씩순차적으로업그레이드를할수있으므로Traffic forwarding에영향을주지않고Upgrade가가능하다.Node 전체를부득이교체하거나, 전체시스템을Restart 시켜야하는경우, 또는하나의Line card를교체/업그레이드하는경우에도, interface의Cost를큰값으로증가시켜Neighbor들로하여금Rerouting 할수있게한후에교체작업을한다면, 서비스중단없이Upgrade가가능하다.
Neighbor의입장에서, Link signal loss가나면Link failure,
Link signal은살아있는데, Hello (OSPF/IS-IS/RSVP Hello, BFD Control packet 등)가안오면Node failure 라고정의한다.
4
OXC 기반Recovery
apartment
그림9
Headend
대전
(1) OXC Mesh Network (10Gbps@each link)망을구축
(2) Virtual Ring (1Gbps@each link)망을Configuration
apartment
그림9
Headend
(3) Multi-Ring Network만보면됨.
구봉
대전
구봉
홍성
T1524460-96SNC protection ringSNC protection ringPath selector6.2.3Ring interworking with an SNCP ring6.2.3.1ArchitectureFigure 17 shows the architecture of SNCP ring interworking. For each direction of transmission, the signal is dual-fed from the source node around both sides of the ring. When each of the dual-fed signals hits an interconnection node, it is dropped at that node and continued onto the other interconnection node using the drop-and-continue feature. Thus, each interconnection node can select from two signals sent on a different way around the ring. The output of the selector in each interconnection node is then transmitted to the second ring. Each of the interconnection nodes in the second ring takes its respective signal and transmits it towards the sink node, away from the other interconnection node. Finally, the sink node makes the selection between the two signals from the two directions around the ring.
Multi-Ring 기반전달구조
G.842 Figure 17/G.842-SNCP Ring Interworking Architecture
홍성
5
모든Edge OXC에“Drop and Continue” 적용
apartment
그림9
Headend
대전
s
s
apartment
그림9
Headend
대전
s
s
s
s
s
s
Drop
Continue
Drop
Selector
Drop and Continue
Drop and Continue
Drop and
Continue
홍성
구봉
Continue
Drop
Selector
Drop
한장비로봐도됨
s
apartment
그림9
Headend
대전
s
s
s
s
s
s
Drop and Continue
Drop and Continue
Drop and
Continue
s
s
s
Drop and
Continue
G.842 Figure 17/G.842-SNCP Ring Interworking Architecture
Edge OXC에Drop and Continue 적용
모듞Edge OXC에“Drop and Continue” 적용
6
OXC Recovery 절차
apartment
그림9
Headend
대전
s
apartment
그림9
Headend
대전
s
s
s
s
s
s
s
s
s
서로다른링에서Fiber Cut이발생해도수싞측은정상적으로데이터를전달받음
정상상태
장애발생(1)
apartment
그림9
Headend
대전
장애발생(2)
링갂접속링크와링내링크가동시에절단된경우: 수싞측은데이터를전달받지못함
7
OXC Recovery 절차
s
s
대전
혜화
GE
GE
s
s
s
s
s
s
s
s
s
s
s
s
s
s
대전
혜화
GE
GE
s
s
s
s
s
s
s
s
s
s
s
s
정상상태
장애발생(2)
링2의두OXC에Drop and Continue를적용: 링갂접속링크와링내링크가동시에절단된경우에도수싞측은정상적으로데이터를전달받음
Ring 2
Ring 1
8
MPLS Network Recovery
9
MPLS Recovery
.Recovery Model
.Rerouting
.Fault 발생시에대체경로와대역폭을on-demand로다시설정
.Re-optimization
.Protection Switching
.미리대체경로계산, 대역폭예약함.
.Fault발생시protection switching
.Subtypes of Protection Switching
.1+1 (one plus one)
.1:1 (one for one)
.Scope of Recovery
.Local Recovery: Fault node/link의immediately upstream node가recovery 절차를initiate함
.Link Recovery/Restoration
.Node Recovery/Restoration
.Global Recovery
.Working path = Primary path = Active path
.Recovery path = Back-up path = Alternative path = Protection path
가. MPLS TE LSP Rerouting (Global Restoration)
나. MPLS TE Path Protection (Global Protection)
다. MPLS TE Fast Reroute Link Protection (Local Protection)
라. MPLS TE Fast Reroute Node Protection (Local Protection)
Framework for MPLS-based Recovery, RFC 3469
4 MPLS Recovery Schemes
10
가. MPLS TE LSP Rerouting (Global Restoration)
P1
Metro DWDM
PE1
SER/IMS
P5
Primary LSP
P2
P6
P3
P4
P7
PE2
SER/IMS
LH DWDM
P1
Metro DWDM
PE1
P5
P2
P6
P3
P4
P7
LH DWDM
.두PE 라우터갂에LSP를생성할때미리secondary LSP는고려하지않는다. Primary LSP에장애가발생하면창원의PE router는Secondary LSP를생성하기위해다시경로를계산하고계산된Secondary LSP의BW를signaling (CR-LDP or RSVP-TE)을통해예약한후Traffic을switching한다.
.Ingress PE가PSR (Path Switch LSR)이됨.
.Network Failure를Ingress PE가감지해야Restoration 절차에들어가고Secondary LSP의경로를계산하고시그날링을통해Secondary LSP를설정하고Secondary LSP상의P Router들의라인카드에FIB (Forwarding Information Base)가Install되어야하므로장애복구에소요되는시갂이수초로길다.
Primary LSP
Secondary LSP
SER/IMS
PE2
SER/IMS
정상상태
장애발생
11
Recovery 절차와소요시간
PE
P
P
P
PE
P
P
P
P
PE
P
P
P
PE
P
P
P
P
Primary LSP
1. Normal Operation
PE
P
P
P
PE
P
P
P
P
2. Network Impairment (Node Failure)
PE
P
P
P
PE
P
P
P
P
3. Failure Detection (RSVP Hello)
PE1
P2
P1
P3
PE2
P4
P6
P5
P7
4. Fault Notification
PE1
P2
P1
P3
PE2
P4
P6
P5
P7
PE1 (PSL): Switch-Over
PE1
P2
P1
P3
PE2
P4
P6
P5
P7
#NAME?
RSVP-PATH
RSVP-RESV
BW Reservation
& Label allocation
-BW Reservation: Reserve (RSVP-TE)
PE1
P2
P1
P3
PE2
P4
P6
P5
P7
Secondary LSP
200~300msec 동안Hello를못받으면Fault로갂주하고Restoration mode로들어감(Hold on time = 0)
RSVP
Hello
RSVP
Hello
RSVP
Hello
RSVP
Hello
RSVP
Hello
RSVP
Hello
RSVP
Hello
RSVP
Hello
RSVP
Hello
Patherr and Resvtear unicast to ingress LSR (PSL)
OSPF link status info. (P4 fail)
5. PE1 (PSL)의Topology DB update
6. PE1: Path computation
7. PE1: Signaling
8. Secondary LSP setup
9. Restoration
Impairment
(P4 down)
P3: PLR
Detects failure
PE1: Data plane
Switch-Over
Fault Notification
to PE1 (PSL)
(PathErr)
PE1: RSVP
Signaling
(PE1,P5,P6,P7,PE2)
PE1:
Path-computation
(run CSPF)
Fault Detect
OSPF LSA (P4 Fail)
IS-IS LSP (P4 Fail)
Restoration Time = 6~10 sec
PE1
P2
P1
P3
PE2
P4
P6
P5
P7
PE1
P2
P1
P3
PE2
P4
P6
P5
P7
Impairment
(Node Failure)
Recovery
PSL: Path Switch LSR
LSA: Link State Advertisements (OSPF)
LSP: Link State PDU (IS-IS)
800msec
(RSVP Hello 주기200msec)
50 msec (Node Processing 10msec/node)
10s of msec
Routing & Forwarding Table update
Rerouting 절차
Recovery Time
12
나. MPLS TE Path Protection (Global Protection)
P1
Metro DWDM
PE1
SER/IMS
P5
Primary LSP
P2
P6
P3
P4
P7
PE2
SER/IMS
LH DWDM
P1
Metro DWDM
PE1
P5
P2
P6
P3
P4
P7
LH DWDM
.MPLS TE Path Protection은Protection Switching을이용한Global Repair Mechanism임.
.Primary LSP를설정할때Back-Up LSP를같이설정함(Back-Up LSP: Pre-Computed AND Pre-signaled)
.Network Failure 발생시POR (Point of Repair) 가Failure를감지하고Head-end (PSL, ingress PE)으로Failure를통보하면바로Traffic을Back-Up LSP로전달함.
.Rerouting에비해SPF Computation 시갂과RSVP Signaling시갂이없어복구시갂이단축됨.
.Ingress PE가PSR (Path Switch LSR)이됨.
.Secondary LSP의링크대역폭상으로평상시에는BE Traffic을전달해도되므로대역폭효율측면에서장점이됨.
.Global Protection: Fault Detection and Notification (특히Notification Latency)가망규모에따라커질수있음.
.Primary LSP에대해항상미리Secondary LSP를설정해놓으므로망내관리해야할LSP States 수가두배가됨(Label 할당, RSVP Messaging (RSVP Hello, Path, Resv,…))
Primary LSP
Secondary LSP
SER/IMS
PE2
SER/IMS
정상상태
장애발생
Secondary LSP
13
Recovery 절차와소요시간
PE
P
P
P
PE
P
P
P
P
PE
P
P
P
PE
P
P
P
P
Primary LSP
Secondary LSP
Global Recovery / Protection Switching / 1:1 Protection
2. Normal Operation
PE
P
P
P
PE
P
P
P
P
3. Network Impairment
PE
P
P
P
PE
P
P
P
P
4. Failure Detection (RSVP Hello)
PE
P
P
P
PE
P
P
P
P
PE
P
P
P
PE
P
P
P
P
Ingress LSR (PSL): protection switching
Ingress LSR
Secondary LSP
#NAME?
-BW Reservation: Pre-Reserved (RSVP-TE)
1. Pre-computed/Pre-signaled backup LSP
5. Fault Notification
Patherr and Resvtear unicast to ingress LSR (PSL)
6. Protection Switching
RSVPHello
RSVPHello
RSVP
Hello
RSVP
Hello
RSVP
Hello
RSVP
Hello
RSVP
Hello
RSVP
Hello
RSVP
Hello
Rerouting 절차
Recovery Time
Impairment
(P4 down)
P3: PLR
Detects failure
PE1: Data plane Switch-Over
Fault Notification
to PE1 (PSL)
(PathErr)
PE1: RSVP
Signaling
(PE1,P5,P6,P7,PE2)
PE1: Path-computation(run CSPF)
Fault Detect
OSPF LSA (P4 Fail)
IS-IS LSP (P4 Fail)
Restoration Time = 100s of msec
PE1
P2
P1
P3
PE2
P4
P6
P5
P7
PE1
P2
P1
P3
PE2
P4
P6
P5
P7
Impairment
(Node Failure)
Recovery
PSL: Path Switch LSR
LSA: Link State Advertisements (OSPF)
LSP: Link State PDU (IS-IS)
800msec
(RSVP Hello 주기200msec)
50 msec (Node Processing 10msec/node)
10s of msec
Routing & Forwarding Table update
14
P1
Metro DWDM
P5
Protected LSP (VoIP, 100M)
P2
P6
P3
P4
P7
LH DWDM
P1
Metro DWDM
P5
P2
P6
P3
P4
P7
LH DWDM
Metro DWDM
Metro DWDM
Bypass tunnel (Backup LSP) Pre-setup
정상상태
장애발생
PE1
SER/IMS
PE2
SER/IMS
PE1
SER/IMS
PE2
SER/IMS
다. MPLS TE FastReroute LinkProtection (Local Protection)
15
Recovery 절차와소요시간
PE
P
P
P
PE
P
P
P
P
Head-end
PE
P
P
P
PE
P
P
P
P
Head-end
RSVP Label Exchange
PE
P2
P1
P3
PE
P4
P6
P5
P7
Head-end
Bypass Tunnel:
1) Pre-manual explicit setup (CLI)
2) Provisioning tool (ISC: IP Solution Center)를이용Backup path pre-computed/pre-signaled
2. Failure Detection
35
37
14
5
6
35
37
PLR: Point of Local Repair
MP: Merge point
PLR
PM
1. Bypass tunnel (Backup LSP) Pre-setup
6
5
14
37
35
3. FIB Update
FIB
Incoming label out i/f outgoing label
37 PoS1 14
37 PoS2 stack 27
10 PoS1 20
10 PoS2 stack 27
Datal Plane
Line Card
FIB
Line Card
FIB
Line Card
FIB
Switching Fabric
Route Processing
RIB
Control Plane
Route Processing
RIB
Control Plane
I/O
I/O
I/O
Line Card
FIB
I/O
PE
P2
P1
P3
PE
P4
P6
P5
P7
Head-end
35
4. Protection (data plane)
37
27|14
|14
5
6
PLR
PM
* Label 값은Interface에상관없이고유하다: Cisco
PE
P2
P1
P3
PE
P4
P6
P5
P7
Head-end
35
37
27|14
|14
5
6
PLR
PM
RSVP PathErr
6. Re-optimization
5. Fault Notification (control plane)
#NAME?
#NAME?
Push the bypass label, rewrite inner label
Incoming label appears to be from upstream LSR, forward as normal
Pop outer label
Rerouting 절차
Recovery Time
Impairment
(Link Fail)
PLR
detects
failure
Data plane
Protection Switching
Fault Notification
(PathErr)
Re-optimization
complete
FIB Update
Recovery Time = 50msec
16
P1
Metro DWDM
P5
Protected LSP (VoIP, 100M)
P2
P6
P3
P4
P7
LH DWDM
P1
Metro DWDM
P5
P2
P6
P3
P4
P7
LH DWDM
Metro DWDM
Metro DWDM
Bypass tunnel (Backup LSP) Pre-setup
정상상태
장애발생
라. MPLS TE FastReroute Node Protection (Local Protection)
17
Recovery 절차와소요시간
PE
P
P
P
PE
P
P
P
P
Head-end
PE1
P2
P1
P3
PE2
P4
P6
P5
P7
Head-end
Link Failure
PE1
P2
P1
P3
PE2
P4
P6
P5
P7
Head-end
2. Failure Detection
PE1
P2
P1
P3
PE2
P4
P6
P5
P7
Head-end
35
3. Protection (data plane)
37
35
37
14
5
6
35
37
6
PLR: Point of Local Repair
MP: Merge point
PLR
PM
PLR
PM
1. Bypass tunnel (Backup LSP) Pre-setup: NNHOP=MP
Impairment
PLR
detects
failure
Data plane
Protection Switching
Fault Notification
(PathErr)
Re-optimization
complete
20
27|5
5
14
5
14
5
PE1
P2
P1
P3
PE2
P4
P6
P5
P7
Head-end
35
37
6
PLR
PM
27|5
5
14
5
Bypass Tunnel:
1) Pre-manual explicit setup (CLI)
2) Provisioning tool (ISC: IP Solution Center)를이용Backup path pre-computed/pre-signaled
RSVP
Hello
RSVP
Hello
RSVP
Hello
5. Fault Notification (control plane)
Patherr
6. Re-optimization
Recovery Time = 200~300msec
Rerouting 절차
Recovery Time
18
IP Network Recovery
19
IP Recovery Summary
1)RP 장애가발생한상태로Holdtime(Dead_Interval) 만큼의시갂이지난후에는Neighbor와의Adjacency가상실되어, neighbor들에의해IGP rerouting 절차가이루어짐.
2)하나의RP를교체하거나OS upgrade 및restart 시키는경우, RP가이중화되어있다면한보드씩순차적으로업그레이드를할수있으므로Traffic forwarding에영향을주지않고Upgrade가가능하다.Node 전체를부득이교체하거나, 전체시스템을Restart 시켜야하는경우, 또는하나의Line card를교체/업그레이드하는경우에도, interface의Cost를큰값으로증가시켜Neighbor들로하여금Rerouting 할수있게한후에교체작업을한다면, 서비스중단없이Upgrade가가능하다.
Neighbor의입장에서, Link signal loss가나면Link failure,
Link signal은살아있는데, Hello (OSPF/IS-IS/RSVP Hello, BFD Control packet 등)가안오면Node failure 라고정의한다.
Case
Recovery Time
Impact
Protocol Re-convergence
NSF+GR
NSR
BFD +P.R.C.
Control plane
Data plane
Link Failure
Fiber Cut
6.56 sec
-
-
-
O
X
Optical Component Failure
6.56 sec
-
-
-
O
X
SONET/SDH Equipment Failure
6.56 sec
-
-
-
O
X
Router Interface Failure
6.56 sec
-
-
-
O
X
Node Failure
Router Power Supply Outage
6.56 sec
-
-
-
X
X
Facility Power Supply Failure
6.56 sec
-
-
-
X
X
Line Card Failure
36.55 sec
-
-
6.59 sec = 36.55s-30s+40msec
O
X
Route Processor (RP) Failure
Hardware Failure
36.55 sec
무장애
무장애
-
X
O1)
Software Failure
개별S/W process의장애
36.55 sec
무장애
무장애
-
△
O1)
OS Crash
36.55 sec
무장애
무장애
-
X
O1)
Planned Node Failure
△2)
△2)
20
Restoration from Link Failure: (1) IP Unicast Routing Reconvergence
①Link failure
②Link State PDU(LSP)
1. Detection of link failure
2. IS-IS LSP Flooding
Topology 변화감지: By reception of a new LSP
By being notified loss of link signal
ER1
R1
R5
R2
R6
R3
R7
R4
ER2
Link 상태의변화를감지함.
혜화
둔산
대구
혜화
둔산
구로
대구
ER1
R1
R5
R2
R6
R3
R7
R4
ER2
구로
1
1
1
1
1
1
3
3
1
1
1
1
1
1
1
4. SPF calculation & Updating RIB
RIB :NHOP (210.10.1/24) = R6
SPF calculation을스케쥴하는이유.불안정한링크나노드로부터유발되는LSP로인해지나치게빈번하게SPF 계산이반복되는것을막기위해서
.동시에두건이상의Topology변화가발생한경우모아서한번에하기위해
.노드가리부팅된경우, 새로부팅된Router는물롞이고, 주변neighboring router들이새로운link에대한LSP를각각발생시킬것임.
Reception of LSP
1) LSP flooding
2) Check if update is needed
& Schedule SPF calculation (몇초또는몇ms 후로scheduling하느냐는구현종속적이다.
Cisco의“spf-interval” default값= 5.5초)
3. Scheduling SPF calculation
Multi-Path Routing (IRR)
ECMP 구현시
Local Repair가능
ER1
R1
R5
R2
R6
R3
R7
R4
ER2
ER1
R1
R5
R2
R6
R3
R7
R4
ER2
1
1
1
1
1
1
3
3
1
1
1
1
1
1
1
5.5초+ α= 6초이상
21
Restoration from Link Failure: (2) IP Multicast Routing Reconvergence
5. PIM Process가Unicast Topology 변경을감지
■Polling 방식: Cisco IOS 12.0(22)S 이전
-5초주기로polling하여RPF update함.
#NAME?
■Trigger 방식: Cisco IOS 12.0(22)S or higher
-500ms만에RPF Update됨. (“Sub-second convergence”)
-Cisco 10000/12000/7200/7500 에서지원.
6. Send PIM Join to new upstream router
RIB가갱싞되면, 이를참조하여PIM이RPF(Reverse Path Forwarding) 경로를Update함.Multicast source와가장가까운interface를통해PIM Join을보내기위해서
①RIB 변경사실이PIM Process에게통보됨.
②500ms 후에RPF Update 실시
③Upstream neighbor가변경되었을경우, 새로운Upstream neighbor에게PIM Join을보냄
④Group별upstream neighbor가바뀌었다면이를TIB에반영
⑤변경된TIB의내용을MFIB에반영
RIB
FIB
Line Card
MFIB
IS-IS
PIM-SM
TIB
RP
1
2
3
4
5
R1
R5
R2
R6
R3
R7
R4
{Src=10.10.10.1,
Group=225.1.1.1}
Listener
RP
1
1
2
①R3는새로운upstream neighbor인R6에게PIM join을보냄
R4도새로운upstream neighbor인R6에게PIM join을보냄
②R6는RP(R2)에게PIM Join을보냄
결국IS-IS + PIM
Convergence 시갂은…
R1
R5
R2
R6
R3
R7
R4
ER2
{Src=10.10.10.1,
Group=225.1.1.1}
Listener
RP
Multicast stream is duplicated at R4.
7. Multicast stream flows again (duplicated, though)
8. Unnecessary stream is pruned (by sending PIM prune)
R1
R5
R2
R6
R3
R7
R4
ER2
Listener
RP
{Src=10.10.10.1,
Group=225.1.1.1}
PIM Prune
R4는Source로부터의최단거리경로상의인터페이스에서수싞된Multicast stream만forwarding해주고, R3가보내온것은Discard한다.
7초이상
(6초+ 500ms+ 약갂의PIM PDU propagation delay)
ER1
ER1
ER1
ER2
22
Link Failure: Protocol Re-Convergence
1. Detection of link failure
2. IS-IS LSP Flooding
4. SPF Calculation (RIB Updated)
R1
R2
R3
R4
R5
R6
R7
E2
E1
Multicast Source
E1
R1
R2
R3
R4
R5
R6
R7
E2
R1
R2
R3
R4
R5
R6
R7
E2
E1
Multicast Source
R1
R2
R3
R4
R5
R6
R7
E2
E1
Multicast Source
Multicast Packet Forwarding
R1
R2
R3
R4
R5
R6
R7
E2
Multicast Source
E1
모듞라우터가계산한MS에대한SPF
1
1
1
1
1
1
1
1
1
1
1
1
3
3
E1, R1, R2, R3, R5, R4, R7, E2하단의가입자와M/F 서버갂의Unicast Traffic만고려
After 10s of msec
R1
R2
R3
R4
R5
R6
R7
E2
Multicast Source
SPT: MS로의최단경로
E1
R1
R2
R3
R4
R5
R6
R7
E2
E1
Multicast Source
0. Network Impairment: Link Failure
E1
R1
R2
R3
R4
R5
R6
R7
E2
RP (RIB)
Forwarding Path (FIB)
UnicastRouting/Forwarding
RP (Multicast Table)
Forwarding Path
MulticastRouting/Forwarding
E1
R1
R5
R7
E2
R2
R3
R4
R6
E1
R1
R5
R7
E2
R2
R3
R4
R6
R1
R2
R3
R4
R5
R6
R7
E2
Multicast Source
SPT: MS로의최단경로
E1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
E1
R1
R2
R3
R4
R5
R6
R7
E2
5. PIM process: Topology Change Detection
R1
R2
R3
R4
R5
R6
R7
E2
E1
500 msec
4.1 Update FIB
.모듞L/C의FIB update 완료후: Unicast 통싞은가능해짐
E1
R1
R5
R7
E2
R2
R3
R4
R6
E1
R1
R2
R3
R4
R5
R6
R7
E2
100s of msec
~ 10s of sec
E1
R1
R2
R3
R4
R5
R6
R7
E2
6. PIM Join
R1
R2
R3
R4
R5
R6
R7
E2
E1
E1
R1
R5
R7
E2
R2
R3
R4
R6
E1
R1
R2
R3
R4
R5
R6
R7
E2
E1
R1
R5
R7
E2
R2
R3
R4
R6
7. PIM Multicast Table Update
E1
R1
R5
R7
E2
R2
R3
R4
R6
.Update동안은Forwarding 불가
E1
R1
R2
R3
R4
R5
R6
R7
E2
3. Schedule SPF Calculation
8. Multicast Stream
23
Restoration from Node Failure: (1) IP Unicast Routing Reconvergence
1.Detection of node failure:By expiration of the “Holding time”
ER1
R1
R5
R2
R6
R3
R7
R4
ER2
Hello
Hello
Receive LSP
1) LSP flooding
2) Check if update is needed
& Schedule SPF calculation(몇초또는몇ms 후로scheduling하느냐는구현종속적이다.
Cisco의“spf-interval” default값= 5.5s)
3. Schedule SPF calculation
4. SPF calculation & Updating RIB
RIB:NHOP (210.10.1/24) = R6
SPF calculation을스케쥴하는이유.불안정한링크나노드로부터유발되는LSP로인해지나치게빈번하게SPF 계산이반복되는것을막기위한Timer
.동시에두건이상의Topology변화가발생한경우모아서한번에하기위해
.노드가리부팅된경우, 새로부팅된Router는물롞이고, 주변neighboring router들이새로운link에대한LSP를각각발생시킬것임
2. LSP Flooding
IS-IS Holding time 의Default Value = 30sec (Cisco)
(Hello Interval = 10초, Hello Multiplier 3 .10초X 3 = 30초: Cisco)
(Hello Interval = 9초, Holding time=27초, point-to-point, Juniper)
Holding time(또는Holdtime) 이내에Hello가수싞되지않으면Neighbor갂에형성된Adjacency가파괴됨.
36초이상
worst case of a node failure : RP의CPU가halt되고, 동시에Line card에서data forwarding도이루어지지않는상태.
ER1
R1
R5
R2
R6
R7
R4
ER2
1
1
1
1
1
1
3
3
1
1
1
1
1
1
1
R3
ER1
R1
R5
R2
R6
R3
R7
R4
ER2
LSP
ER1
R1
R5
R2
R6
R7
R4
ER2
R3
24
Restoration from Node Failure: (2) IP Multicast Routing Reconvergence
5. PIM Process가Unicast Topology 변경을감지
ER1
R1
R5
R2
R6
R3
R7
R4
RP
{Src=10.10.10.1,
Group=225.1.1.1}
Listener
1
2
7. Multicast stream flows again
37초이상
(36초+ 500ms+ 약갂의PIM PDU propagation delay)
결국OSPF+PIMConvergence 시갂은…
■Polling 방식: Cisco IOS 12.0(22)S 이전
-5초주기로polling하여RPF update함.
#NAME?
■Trigger 방식: Cisco IOS 12.0(22)S or higher
-500ms만에RPF Update됨. (“Sub-second convergence”)
-Cisco 10000/12000/7200/7500 에서지원.
ER1
R1
R5
R2
R6
R7
R4
RP
{Src=10.10.10.1,
Group=225.1.1.1}
Listener
R3
6. Send PIM Join to new upstream router
RIB가갱싞되면, 이를참조하여PIM이RPF(Reverse Path Forwarding) 경로를Update함.Multicast source와가장가까운interface를통해PIM Join을보내기위해서
ER2
①RIB 변경사실이PIM Process에게통보됨.②500ms 후에RPF Update 실시③Upstream neighbor가변경되었을경우, 새로운Upstream neighbor에게PIM Join을보냄④Group별upstream neighbor가바뀌었다면이를TIB에반영⑤변경된TIB의내용을MFIB에반영
RIB
FIB
Line Card
MFIB
IS-IS
PIM-SM
TIB
RP
1
2
3
4
5
ER2
25
Node Failure: Protocol Re-Convergence
1. Detection of Node Failure (By Expiration of the Hold Timer)
2. IS-IS LSP Flooding
4. SPF Calculation (RIB Updated)
R1
R2
R3
R4
R5
R6
R7
E2
E1
Multicast Source
R1
R2
R3
R4
R5
R6
R7
E2
E1
Multicast Source
Multicast Packet Forwarding
R1
R2
R3
R4
R5
R6
R7
E2
Multicast Source
E1
모듞라우터가계산한MS에대한SPF
1
1
1
1
1
1
1
1
1
1
1
1
3
3
0. Network Impairment: Node Failure
E1
R1
R2
R3
R4
R5
R6
R7
E2
RP (RIB)
Forwarding Path (FIB)
UnicastRouting/Forwarding
RP (Multicast Table)
Forwarding Path
MulticastRouting/Forwarding
E1
R1
R5
R7
E2
R2
R3
R4
R6
R1
R2
R3
R4
R5
R6
R7
E2
Multicast Source
SPT: MS로의최단경로
E1
1
1
1
1
1
1
1
1
1
1
1
1
3
3
E1
R1
R2
R3
R4
R5
R6
R7
E2
5. PIM process: Topology Change Detection
R1
R2
R3
R4
R5
R6
R7
E2
E1
500 msec
5. Update FIB
.모듞L/C의FIB update 완료후: Unicast 통싞은가능해짐
E1
R1
R5
R7
E2
R2
R3
R4
R6
E1
R1
R2
R3
R4
R5
R6
R7
E2
E1
R1
R2
R3
R4
R5
R6
R7
E2
6. PIM Join
R1
R2
R3
R4
R5
R6
R7
E2
E1
E1
R1
R5
R7
E2
R2
R3
R4
R6
E1
R1
R2
R3
R4
R5
R6
R7
E2
E1
R1
R5
R7
E2
R2
R3
R4
R6
7. PIM Multicast Table Update
E1
R1
R5
R7
E2
R2
R3
R4
R6
E1
R1
R2
R3
R4
R5
R6
R7
E2
3. Schedule SPF Calculation
8. Multicast Stream
E1
R1
R2
R3
R4
R5
R6
R7
E2
R1
R2
R3
R4
R5
R6
R7
E2
E1
Multicast Source
E1, R1, R2, R3, R5, R4, R7, E2하단의가입자와M/F 서버갂의Unicast Traffic만고려
R1
R2
R3
R4
R5
R6
R7
E2
Multicast Source
SPT: MS로의최단경로
E1
R1
R2
R3
R4
R5
R6
R7
E2
E1
Multicast Source
E1
R1
R5
R7
E2
R2
R3
R4
R6
Update FIB
.Multicast 통싞은가능해짐
After 100s of msec ~ 10s of sec
After 10s of msec
26
Traditional IP Multicast Restoration :Restoration Time for a Single Failure
Name of time component
Description
Typical value
Comment
Link Failure
Node Failure
①Failure Detection time
Time elapsed to detect failure (Detection of Link failure + Report failure to route processor)
a few ms
30 sec (Hold Time)
②LSP Flooding time
Process LSP, bundle LSPs and flooding
N X 10ms
(대략Node 당10ms)
N X 10ms
(대략Node 당10ms)
③SPF Delay
Minimum time between LSP reception and start of SPF calculation (구현에따라다르다)
Default Value 5.5 sec *
Default Value 5.5 sec *
Minimum Value 1msec*
④SPF Calculation time
SPF calculation에걸리는시갂(Topology, Routing entry 개수에따라달라짐)
100 .400 msec**
100 .400 msec**
10s of msec (NANOG)
10s of usec (Cisco 주장)
⑤FIB Update time
from end of LSA processing to end of new routes installation
20 entries/ms
20 entries/ms
⑥RPF Update delay
Unicast RIB가갱싞된후Mroute table의RPF Check가일어나기까지의Delay
Default Value 500msec*
Default Value 500msec*
⑦RPF Update time
Mroute table의RPF neighbor를갱싞하는데걸리는시갂(200개정도의Multicast channel을수용할경우크지않을것임)
수ms ~ 수십ms
수ms ~ 수십ms
⑧PIM PDU processing delay
PIM PDU 처리및전달delay
10s of ms per node
10s of ms per node
6.56 sec
(=10m+100m+5.5s+400m+500m+20m+30m)
36.55 sec
(=30s+100m+5.5s+400m+500m+20m+30m)
IS-IS의경우, 표준적인Protocol Re-convergence에소요되는시갂요소들
* Cisco IOS 12.4 참조
** Sprint [Supratik2004] 참조
20 entries/ms (Cisco GSR기준)**
27
End of Document